I’ve recently posted online the research papers written by my EAP students (English
for Academic Purposes) on misleading statistics.
The Tuesday-Thursday Class is here, and the Monday-Wednesday-Friday Class is here.
(Update: I've since added 3 more classes to the blog--here, here, and here).
The Tuesday-Thursday Class is here, and the Monday-Wednesday-Friday Class is here.
(Update: I've since added 3 more classes to the blog--here, here, and here).
The
motivation for doing this was an idea I got from I.S.P. Nation’s book Teaching ESL/EFL Reading and Writing. Nation mentions that a way to increase
students’ motivation on writing is to provide opportunities to publish their
writing.
I was also
thinking about the examples that Lourdes Ortega mentions about
students finding their second language identity through writing on-line. (I’m stretching this somewhat, because Ortega
cites examples of students writing about things they are actually interested
in, like creating websites about Anime. A paper on misleading statistics that the
students are forced to write is probably not the same, but nevertheless I hope
it will give them some sense of using their English abilities to engage with
the wider world.)
My students,
mostly teenagers and digital natives, were not overly amazed at the novelty of
something being published online. But I
think they were moderately pleased, and that’s the most you could realistically
hope for.
[There was
also an ulterior motive for this which I didn’t tell my students about. We’re having a big problem with plagiarism at
my school. It’s possible, using sites
like paperrater.com and duplichecker.com to check the student papers against the Internet for
plagiarism, but, since the essay topic remains the same from term to term, a
bigger problem is that essays are often getting passed down from one class to
another. And since the school does not keep a database of past essays, this is
harder to check for. So, I figured, why
not start posting the essays online?]
A couple other notes:
In addition
to publishing these papers online, I also tried out a couple more ideas from
Nation’s book last term.
One idea
was to have the students write their papers in groups instead of
individually. The idea is hopefully that
in collaborative writing the students’ share their knowledge and learn from
each other. (The danger is that one
student just writes the whole thing, but I tried to guard against this by giving
the students time to collaborate in class.)
The other
idea, also from Nation, is to put the student papers into a larger book. So instead of having each group write
generally about “misleading statistics” (as the curriculum states, and as I’ve
done in the past), I had each group write about an aspect of misleading
statistics, and then at the end of term we arranged the various papers together
into one booklet. I took this booklet
down to the printers shop and had bound copies made up, and on the last day of
class we had a book signing party.
(Nation
suggests this, but it was also an idea one of my Calvin professor’s used. This History 101 paper I wrote was
actually part of a much larger book on religion in the ancient world.)
The
introduction and conclusion were composed by the class as a whole—one student
was nominated to write down the class’s ideas, and then the other students
suggested sentences. At the end I
corrected it. (Another idea from
Nation.)
Another note:
I’ve been
teaching this EAP course on misleading statistics for a couple years now, and a
few years back, to my surprise, I saw a student actually cite Calvin College, my alma mater, as a source on her research paper.
It turns
out a Calvin professors had set up a site on misleading statistics that had been
attracting the attention of my students all the way over here in Cambodia.
(This is
one of my favorite “randomly-running-into-Calvin-College-in-distant-areas-of-the
world” stories. My other favorite is
finding Howard Van Till’s book The Fourth
Day (A) at the check-out counter at Melbourne
University library—which meant that not only did Melbourne University library
have this book, but that someone was actually reading it!)
Anyway,
after I discovered this website, I started assigning it to my students to read
to help them prepare for the paper. (The Website is located HERE.)
After a few
terms, I ended up re-writing the first couple chapters in order to make it more
accessible to ESL Cambodian students.
(Or I tried anyway—I’m not sure I’m the best writer, but my goal was to
make it simpler and easier to understand.)
I only re-wrote the first couple chapters, and then after that the
students had to read the authentic text for the rest of the book.
Below are
my re-writes that I’ve been using in my classes.
Let’s look at some examples of real life misleading statistics. This is a true story. A lady was listening to the radio and heard about a poll taken that said that "11% of Americans don’t believe that the Obama Administration cut taxes last year."
Chapter 1: Our Treacherous Tendency
(Source: http://www.calvin.edu/academic/economics/faculty/bios/HaneyDocs/page-58952330.html)
Shady Statistics
1. Discreet
Deceit
There is a huge tendency of human beings
to insert bias, and the strangest thing about this tendency is that it occurs
sometimes without us even realizing it. One major contributor of this bias is
our pride. We all desire to look good for other people, to look like we've got
it together, and even to twist the truth in order to preserve our reputation
and successful appearance. The results of some surveys and statistics simply
cannot be trusted due to the nature of the content in which they seek to gather
information about. Sometimes people lie
consciously (knowingly) and sometimes people lie unconsciously (unknowingly)
but the fact is people will often adjust the truth to make themselves look
better, even if they are talking to a complete stranger.
For example, imagine you were doing a study on
whether people washed their hands after using the toilet. Should you try and do this study by
survey? What would be the problem with
this survey? Are you likely to get
honest results?
"Studies show that people wash their hands 4.67
times a day." In a scenario like this, we should ask ourselves, "How
in the world did they get that figure?" Will a person, no matter how
randomly selected they are, ever admit to occasionally not washing their hands
to a complete stranger? These kinds of statistics are only useful in
determining what people say about washing their hands. We can
hardly draw any other conclusions.
2. Sample Problems
It is also important to note that sometimes
the samples of people who participate in surveys and statistics can
result in misleading statistics. For example, imagine if I wanted to find out
what percentage of Cambodian people enjoyed shopping at Soriya shopping center.
I went to Soriya shopping center and did
my survey there. What would be the
problem with this survey?
If I wanted to gather information about whether
customers enjoyed shopping at a particular mall, I would not gather my sample
from the people that are already inside that mall. Chances are, if they are
shopping there, they like it.
Surveyors must be extremely cautious when it comes
to how a survey is set up and how the results are gathered. For example, what would be the problems of
conducting a survey on the Internet? Or
using text messaging from cell phones?
Conducting a survey about teenagers' opinions whose
information is gathered via text messaging eliminates those teens who a) don't
have a cell phone and b) don't have text messaging. Furthermore, the survey
only obtains results from those who choose to participate! This is sometimes only
a small fraction of those who were asked, and results like these can do
dangerous things to any statistics that are calculated. (This is also the same reason why Internet
surveys and polls are never reliable.
They collect information only from the people that choose to
participate.)
In order to truly get an accurate statistic, I
would need either a random sample or a stratified sample.
A random sample was once described as a sample selected
by pure chance from the population.
(When statisticians use the word “population” they mean the whole of
whatever they are studying, and the “sample” is just a small part of it.)
However stratified sampling is the best kind of
sampling. It allows the sample to consist of the same proportions of things as
they exist in reality. For example, imagine we were doing a survey on whether
Cambodians like Angry Birds. I would try
and get a sample that reflects (or mirrors) the general population. In Cambodia now, 31.9% of the people are
under 14 years old, 64.3% of people are between 15 and 64 years old, and 3.8%
are 65 or older. To get a truly accurate
statistic, I would want to make sure these same percentages are reflected in my
sample size. I would also want to take
into account the percentage of males and females. Instead of simply selecting males and females
at random, I would need to find out what percentage of Cambodia’s population is
female, and then work that into my sample.
What other factors would I need to consider to get
an accurate stratified sample?
You can see perhaps that it actually takes a lot of
hard work and preparation to get good stratified samples. Not surprisingly then, many surveys don’t
bother with this. Therefore, many
surveys are unreliable.
3. Biased
Questions
Often with polls, the questions lead people to
answer one way or another. Occasionally,
the questions are intentionally designed to get people to answer a certain way.
For example, imagine a survey question that said:
What do you think of Labor Unions?
a). a terrible idea
b). Inefficient
c). okay, but not great
d). evil
Imagine as a result of this
survey, we found that 70% of Americans thought labor unions were only okay, and
not great. What would be the problem
with this statistic?
How
could we design a survey that would be more accurate?
In fact, the best most reliable types of survey
questions are open ended survey questions. This strategy allows any kind of answer and
thus does not leave out any opinions that someone may have. Each person is free
to answer how he or she likes.
For example:
What words or
phrases come to mind when you think of labor unions?
_____________________________________________
Afterwards, the information can be collected in a
survey that would give the most frequent answers to the questions. The
strategies of an open-ended question and not giving options for an answer gives
an unbiased approach to getting the opinions about labor unions. By putting the
information in a table that clearly organizes national, Republican, and
Democratic results, it's clear to see the opinions of each group. There is no
confusion in how to read the answers, and there is no answer not listed as an
option. Clearly, this is the best approach to finding out the nation's opinions
on labor unions.
Let’s look at some examples of real life misleading statistics. This is a true story. A lady was listening to the radio and heard about a poll taken that said that "11% of Americans don’t believe that the Obama Administration cut taxes last year."
This surprised her.
It had been big news last year that Obama had cut taxes. It had made him very popular in America. So she decided to research the
statistic. Where did this statistic come
from? How did they know this?
She did some research on it, and it led her to find
out that the statistic was found under a link to "Poll: Who are the tea
partiers?" The Tea Party is a
political group in America that is opposed to Barak Obama. So the poll was only from a selected group
(the Tea Party), which represents a bad example of random sampling. All the people in the survey opposed Barak
Obama, so it did not accurately represent the American population as a whole.
Look at the categories we
studied above. Which category does this
statistic fit under?
Chapter 2: Deceptively Mean
Have you ever heard the word “average”
before? What does this word mean? Does it have the same meaning in all
situations?
In normal everyday conversation,
average often means normal, or usual.
However in statistics, average has a much more mathematical meaning.
Here’s a question to think about: how
could a dishonest person use a mathematical average to create a misleading statistic—that
is, a statistic which is technically true, but creates a misleading perception.
Part of the problem is that in statistics average has 3
different meanings: mean, median, and mode.
Using each type of average in an appropriate manner can be
easily done however it is important that you are knowledgeable in the
differences between the three first. The mean average takes the sum total of
all the collected data and then divides that total by the amount of
participants within the study. This type of average can be used when figuring
out things such as the average grade on a quiz for students within a math
class. The next type of average is referred to as the median average. The
median average is determined by taking an overall set of values or data and finding
out which falls directly in the middle. The final type of average is called the
mode average. This average accounts for the most often occurring item within a
data set. This type of average helps to show how frequent a particular portion
of the data is common across the group of subjects being studied. In becoming
knowledgeable about each type of average that can be used it is very important
that those displaying statistical data are ethically using the information to
inform their audience and not to just influence the audience’s perception with
deception.
Confused? Let’s go
over the same thing again a little bit slower.
A mean average is created by adding all the numbers
together, and then dividing by the size of the sample. So for example, imagine Sue has $1, Tom has
$1, Sam has $20 and Jason has $30. On
average, how much money do they have?
Well, we add up all the numbers (1 +1+20+30) and then divide by the
number of our sample (4 people). So the
average is (1 +1+20+30) /4=$13.
The median is the middle number. Put all the numbers in a row with the least
number on the left, and the greatest number on the right, and the number in the
middle is the median.
Let’s use our example from above. What was the middle number?
$1, $20, $30
The middle number here is $20, so that is our median.
And the mode is the number that occurs most frequently. Again, using the same example:
$1 (2), $20 (1), $30 (1)
The mode here is $1, because that number is the most
frequent.
The important thing to remember is when you see the word
“average” written down in a statistic, you often have no idea if the researcher
is referring to the mean average, the median average, or the mode average. How could the dishonest people abuse this to mislead
someone?
We often think of the word average as being the same as
normal, but this is not always the case.
For example, if 9 people in a company earn $10,000 a year, and one
person earns a billion dollars ($1,000,000,000) a year, what is the mean average? What is the mode average?
In this case, either
are correct, but one can be slightly misleading. For example, if a company wanted to recruit
new employees by advertising a high wage, how could they be dishonest about
their average employee wage?
No comments:
Post a Comment