In a modern world people are drowning in the large volumes of data. Thousands of papers and researches are being published every day to support or refute some kind of statement. Is smoking associated with lung cancer? Did educational reform increase the efficiency of education system? Is the new medicine a better way of illness treatment? All these kinds of questions are trying to be answered by statistical analysis which often includes flaws and fallacies. Some institutions manipulate the statistics intentionally, while others fail to provide society with a reliable result because of a poor knowledge of statistical techniques.
According to Andrew Gelman (2002), the simplest misuse of statistics happens when an interested party tries to manipulate with meaningless or fabricated numbers. For example, one of the magazines published an article “Survey: U.S. Kids Reading Well”, which asserted that American kids read better than their counterparts in other countries. However, article does not define how this conclusion was made. It is not clear whether it is possible to adequately compare reading skills of American children relatively to Canadian, Brazilian or Chinese because of the language differences.
Another way of statistical manipulation is operating the absolute data instead of relative numbers. Gelman (2002), provides a news article which criticizes Arizona state authorities for poor governance in the educational sector. Article claims that California has more teachers than Arizona, thus, California is better at delivering education to children. This claim is totally unjustified, since California has a much larger population than Arizona, and as a result, it employs more teachers. If the article used more adequate measurements, such as number of teachers per student, or number of teachers per 1000 inhabitants this flaw could have been avoided. In addition, some researchers often manipulate with average numbers in their researches. For example, governmental officials claim that average income per capita increased by 20% from $50,000 to $60,000. Thus, it may seem that an average individual in the country is better off. However, these officials do not mention that average numbers are very vulnerable to outliers, and, thus, changes in average income does not necessarily mean that overall the society now earns more. This can be easily illustrated by an example: the average of 1, 2, 3, 4 and 5 is 3. The average of 1, 2, 3, 4 and 40 is 10. Compared to the first case the acerage gone up more than 3 times, however, only one number in the whole distribution has changed. In other words, increase in average income of the population may be caused by the fact that few richest persons in the region (say, 2% of the population) increased their wealth significantly, while the wealth of other 98% remained the same.
Hooke (1983), provides several ways how interested parties tend to manipulate statistical data. One of the easy tricks is just to discard unfavorable data from the research process. For example, if a researcher wants to prove that there is no relationship between smoking and probability of getting a lung cancer, it would be enough to conduct a study multiple times, proving that in 95% of the studies the causality effect has not been detected. Tobacco companies use these techniques quite often. They conduct a series of statistical studies, and publish only those which do not detect the relationship between smoking and caner.
Another way of statistical manipulation may be introduced while conducting social opinion surveys on various important issues. Hooke (1983), states that the answers to these surveys may be manipulated by awkward wording of the questions in the poll, which would provoke the participant to give a “right” answer. For instance, one of the opinion polls in the US wanted to assert social opinion on the war in Iraq, by asking 2 questions in their poll: (1) Do you support the attempt by the USA to bring freedom and democracy to other places in the world? (2) Do you support the unprovoked military action by the USA?
Clearly, such kind of questions will not reflect the actual opinion of the respondents, since the wording of the question themselves provoke the respondent to support the military intervention.
Some researchers introduce a selection bias into their research. As a result, findings of such a research may be misleading. One of the key requirements of the statistical experiment is that data shall be collected by forming a random and independent sample so that the results of the study could be applied to the whole population. According to Gelman (2002), selection bias is can generally be categorized as a sample being unrepresentative of the population because some units are much more likely than others to be represented, with the more likely units differing from the unlikely units in some important way. Gelman provides an example of the study which assessed how many people used internet as a main source of information (opposed to television, newspapers and radio). Opinion poll was posted on the website of one the UK’s largests news agencies. According to that study, 77% of people relied on the internet as a primary source of information. Clearly, this study introduces a selection bias, as most of the respondents are frequent users of the internet, while those who prefer television, radio or newspapers never got a chance to participate in the survey.
One of the most recent astonishing examples of statistical manipulations is described by Jeff Leek (2012). In his analysis, he describes how Fox News uses its data presentation to deceive audience, and make one political party look better than the other. Agency’s graphic department uses different techniques to manipulate the graphical presentation of the data. This includes truncating the axis on the graphs and charts, using pie charts which do not add up to 100%, changing the units of comparison so that the comparison does not make sense, changing the magnitude of units at different values on the x-axis of the graph, manipulating trend lines by sub-sampling values and using misleading chart titles. As a result, a less informed and less attentive viewer gets deceived by the statistical data.
Clearly, there are numerous ways how statistics may be used for manipulation and deception. That is why it is important to be aware of such techniques and develop a professional knowledge and some critical thinking in order not to be misled by the large volumes of data which surrounds us on a daily basis.
References
Gelman, A, 2002 & Nolan, D , “Teaching Statistics: A Bag of Tricks”, Oxford University Press, London. http://www.stat.columbia.edu/~gelman/bag-of-tricks/chap10.pdf
Hooke, R, 1983, “How to Tell the Liars from the Statisticians”, New York: M. Dekker.
Leek, J, 2012, “The Statisticians at Fox News use Classic and Novel Graphical Techniques to Lead with Data”, Simply Statistics. http://simplystatistics.org/2012/11/26/the-statisticians-at-fox-news-use-classic-and-novel-graphical-techniques-to-lead-with-data/