“If it has ‘science’ in its name, it ain’t (Minnick).” Those words were said to this writer by the smartest person he ever knew personally, a man with PhDs in Mathematics and Physics, a true scientist. He went on to explain that science, if it is science, can be reduced to mathematics and then can predict. “Science can generate electricity or flatten cities,” he said. The problem, of course, is that whenever one deals with the “soft sciences,” political science or economics (the dismal science) or sociology or history or any of the disciplines that are housed in the “Social Science” building of any university campus, the subject is people. Unlike the subjects of “real” science such as physics or chemistry where things can be manipulated without concern for anything beyond basic safety, human beings are notoriously unpredictable. They are, moreover, rarely subject to experiments to confirm hypotheses. Only in extreme cases are experiments possible. Prisoners may be allowed to volunteer in free societies or may be coerced in totalitarian societies. In either case, the vast complexity of the human experience is reduced to a very small sample.
It is good to recall the scientific method as taught in junior high school. The scientific method starts with seeing something (observation) and thinking about why it happened (hypothesis). Isaac Newton saw that famous apple fall (observed) and wondered why (hypothesis). Experiments are then formulated to test the hypothesis and the results are compared to the expected results. The hypothesis is modified based on those results until, in the end, enough samples have been collected, the results of each experiment fall well within expected parameters and “laws” are identified. The perceived brightness of a light source is inversely related to the square of the distance from that light source, for example (Hyper Physics) and we know that with sufficient certainty to identify the “Inverse Square Law.” Or we know that in a vacuum where air resistance is not a factor both a canon ball and a feather will accelerate at 9.8 meters per second per second. Once again, it was experiments being refined over thousands of iterations that yield the “law.” This is such a well documented fact that it is given a special character, “g,” which any scientist anywhere understands (Physics Classroom).
No such precision is possible in any of the social sciences. Political scientists and economists prepare long, compound, complex formulae to describe their various theories to be sure. But these fail the most basic test to be a “science,” they are not predictive. For confirmation of this simply look at how completely the Great Recession of 2008 or the election of 2016 were missed by economists and political scientists respectively. While the economist may speak confidently of the “law” of supply and demand, this remains only a statement of observed principle and cannot give a precise prediction of what will happen. If, for example, 5,000 acres of apple orchards are added to the total acreage growing apples we know that the supply curve will shift marginally and, “all other things being equal,” the price would be driven down slightly as shown in Exhibit No. 1. The economist can say, with high confidence, that if the supply moves from S1 to S2 the price, in a free market, will move from P1 to P2. What we do not know with any precision at all is what the value of either P would be.
The converse is true as well. If 5,000 acres of apple orchards are suddenly taken out of production the supply is moved from S2 to S1 and the price increases from P2 to P1. The general “law” holds as we see whenever there is a freeze in Florida. Prices of citrus will rise. The exact amount is subject to so many factors though that no actual values can be assigned.
Alex Tabarrok, an economics teacher at George Mason University, a prestigious institution in the field of economics that boasts, among others, Walter Williams as a Professor Emeritus, presents a detailed formula for calculating elasticity of demand. In this formulation:
%ΔQdemanded
ED =
%ΔPrice
In plain English, the Elasticity of Demand equals the percent change in quantity demanded divided by the percent change in the Price. Once sufficient data is gathered then, and entered into a spreadsheet, an Elasticity of Demand can be calculated to as many significant decimal places as the economist cares to take it. Unfortunately, this is a case where the economist can be absolutely wrong with great confidence to several decimal places.
The problem, of course, is that an economy is a wildly complex and complicated thing and to simply look at the change in two factors ignores the vast majority of factors that influence decisions. The researcher is trying to simplify enough to allow the “experiment” to work (Roche). One of the first steps in any economic model is to assume away everything. The world, when the model builder starts, is a flat featureless plain. In that two dimensional environment, Tabarrok’s formula works perfectly. In the real world, it falls apart except as a statement that changes in price do, indeed, affect demand to some level or other.
Thomas Sowell makes this point over and over in Basic Economics. If his definition is accepted, that “economics is the study of the allocation of scarce resources that have alternative uses (Sowell, 2011, p. 17),” then all of the decision factors involved need to be considered before there is any chance at prediction at all. This is impossible. “Nobody knows how to make a pencil” is a quotation that Sowell (2006), Milton Friedman and Josh Harness (in relation to computer program development) just to name a few, cite with regularity. Such a simple thing as a wooden number 2 pencil, the kind that every grade school kid has had for at least a century, is beyond the capacity of any individual to produce. If that is the case, how can we possibly expect to be able to predict how an economy, or even a segment of an economy, will function? The people who crafted the Patient Protection and Affordable Care Act (Obamacare), for example, were neither evil nor stupid. That they were unsuccessful, though, is manifest as the system collapses. They probably thought they could make a pencil too.
Faced with such overwhelming complexity the economist, along with other social scientists, resort to statistics and probability. Before proceeding further with this discussion definitions are in order. Statistics involve gathering and analyzing masses of numerical data. Probability, on the other hand, addresses how likely it is that a specific event will occur. Both of these are important to social scientists.
Human beings are notoriously unpredictable as individuals. In groups some level of prediction is possible. As groups get larger the level of confidence in predictions gets higher. Actuaries, basically statisticians who work for insurance companies, can predict the number of deaths in any cohort of the population with near certainty. This, in turn, allows insurance companies to establish premium payment levels that ensure sufficient reserves to pay claims, cover overhead and allow a normal profit.
Probability, on the other hand, addresses specific events with a numerical prediction of likelihood of a specific outcome. This is expressed as a number between 0 and 1 and often presented as a percentage. For example, the probability that the result of a coin flip will be either heads or tails is 1 or 100%. It is a certainty that it will be one or the other presuming that the coin does not land on an edge (Rouse). The probability that a coin flip will be heads is not, however, exactly .5 (50%). As it turns out, empirical evidence is that there is a slight but measurable bias in favor of the coin landing on the side that was up when it was flipped (Aldous).
Backgammon players are very interested in knowing the probability of a variety of dice rolls (Bray). The fact that an opponent needs to roll three consecutive double sixes, each at a 35-1 odds against, makes a player very confident when he doubles the bet, something that can be done in backgammon by the player holding the dice. Those long odds, however, do not make such a string an impossibility as the author has heard from a tournament quality backgammon player on more than one occasion.
Social scientists are more in tune with actuaries than with backgammon players. They gather numbers and attempt to derive conclusions, formulate hypotheses and predict future actions. On the macro scale they are largely successful. When the United States Census Bureau publishes their annual estimate of population for the nation at 321,418,320 in 2015 they are probably accurate within a fraction of a percent although that final three digits is arrogance derived from the ability of computer models, as mentioned above, to perform extremely precise calculations. A better presentation would probably have been 321.4 million persons. However, when they estimate the population of, for example, Baca County, Colorado at 3,615 for 2015 they could be off by several percent since the statistical universe involved is much smaller.
The issue is that in statistics, size matters. Whether addressing results of medical research (Blau, Kernels and Porcher), political science (Filho, et. al. p. 33), or general questions of statistics (What researchers mean by sample size and power), the bigger the sample the more confidence in the result. No data set covers every member of a population although the decennial U.S. census approaches that ideal. But larger samples tend to smooth out the “outliers.” That is why the 321 million figure for the United States given above can be assumed to be quite accurate while the 3,600 or so population cited for Baca County, Colorado is suspect.
Probability mathematics found their origin in attempting to find an edge in games of chance. The concept of dice games far predated Blaise Pascal who was the first to formalize probability theory as he sought answer to the age old question “what is the number of turns required to ensure obtaining a six in the roll of two dice (Leung)?” This was hardly a new question. Greeks were celebrating when they threw “an Aphrodite” (double sixes) and won the dice game when Socrates was teaching Plato who, in turn, was teaching Aristotle (Black). Minoans were playing poker before that. And of course, Zeus, Hades and Poseidon threw dice to split the universe amongst themselves (Black). Pascal and his contemporary Christiaan Huygens attempted to codify the answers to these important questions in the middle and late 1600s.
The first true actuary was John Graunt. His 1662 Observations Made Upon the Bills of Mortality was the first formal attempt at gathering a significant body of data together for analysis and prediction of demographics. The work was designed to predict the outbreak of plague. He compiled tables from decades of data including such then-sophisticated “sorts” as mortality by sex. More importantly though, he used his data to draw inferences and make predictions. Huygens was influenced by Graunt although his (Huygens) work turned back to games of chance (Leung).
The related fields of statistics and probability came of age in the second decade of the 18th Century. Anders Hald, quoted by Aldrich, calls the decade from 1708 – 1718 “the great leap forward.” Several publications expanded on both statistics and probability as well as identifying the close relationship between the two. In the 18th Century two Bernoullis, Daniel and Jakob, published works on the subject and new terms and approaches were developed. Permutation, for example, was the concept ordering data in various ways which was created in this era. Similarly, the Cartesian Coordinate System for graphing multi variable problems, invented by Rene Descarte a century before, came into wide use. Whether or not this system was developed by the philosopher-mathematician, most famous for his formulation of cogito ergo sum, while bedridden and trying to figure out how to identify the specific location of a fly on the ceiling (Witcher), this system of X and Y coordinates has bedeviled junior high school algebra and high school calculus students for decades.
It remained for Ronald Aylmer Fisher, the accidental statistician, to complete the transition to truly modern statistics. Fisher was a geneticist first. His mathematical approach to analyzing genetics included wholly new tools to test results. Fisher’s analysis, incidentally, lead to the conclusion that Gregor Mendel had “fudged” his results. With his theory of optimal estimation, which remains a standard, the basic toolbox of the statistician was complete (Spanos). The concept of the p-value then provided the basis for another famous modernizer, Jerzy Neyman, as the Neyman-Pearson theory of optimal testing was developed (Spanos, also Chiang). This fresh approach, actually a way to test whether the test being used is the “best,” or the most powerful, test gave statisticians a whole new level of confidence (Neyman-Pearson Lemma). The final step to achieving the modern level of confidence was the development of appropriate sampling techniques. For this students of statistics turn to William Cochran who, with his Sampling Techniques, literally wrote the book.
With those basic concepts in place it has been a matter of refining methods and techniques to improve confidence levels. The election cycle of 2015-16 brought high levels of interest to politics and political scientists. The famous polling organizations, Gallup and Pew and Rasmussen published daily polls on a variety of topics. Rasmussen publishes poll results daily (Daily Presidential Tracking Polls) as does Gallup (Gallup Daily: Obama Job Approval) and others. One website, Real Clear Politics, summarizes and averages these polls (Live Reports) and was watched closely in the run up to the November 8, 2016 election.
That these polls, whether individually or as averaged by the Real Clear Politics system, were so completely wrong should give pause. The consensus was that Hillary Diane Rodham Clinton would win the election. Indeed, the expectation was that she would win in a landslide leaving the Republican Party as a minority party with only a tenuous hold on the House of Representatives. When the only poll that matters, the election, was tallied Donald John Trump will be inaugurated as the 45th President of the United States and the Republican Party will hold both housed of the congress as well as a near-record number of state governorships and legislatures. The next year will be devoted to analyzing “what went wrong” and attempting, by the polling organizations, to correct their methodologies to ensure such embarrassment is not repeated.
Statistics remain the primary tool in the tool box of city planners and the economic development professionals. Planners use census data as their primary tool to project needs for a variety of components in the process of estimating future requirements. Questions of annexation, transportation, housing, commercial development, schools and the like all require reliable demographic statistics.
With computerized mapping capabilities a whole new discipline, Geographic Information Systems (GIS) has developed over the past two decades (GIS: geographic information systems). These systems allow for very sophisticated “what if” analyses. Data bases overlaid onto maps can demonstrate population densities to whatever level of detail desired. For communities seeking young citizens and actively promoting economic development projects, an overlay that shows population broken down by sex and age cohorts in conjunction with schools can be the basis of a capital improvements program. For other areas, the southwest United States for example, where retirement has become a prime driver of growth, these same cohorts can identify health care and transportation needs.
Statistics are at the core of national economic policy as well. The Federal Reserve System (the Fed) has set a 2% inflation rate as the target (Why does the Federal Reserve aim for 2 percent inflation over time?). It requires statistics to determine whether or not this target is being achieved. The Bureau of Labor Statistics tracks this information in great detail. Categories for which data is available include, for example, food. Subcategories under food include “Food at home” and “Food away from home.” Under “Food at home” are included: Cereals and bakery products; Meats, poultry, fish and eggs; Dairy and related products; Fruit and vegetables; Nonalcoholic beverages and beverage materials; and, Other food at home. Energy, Commodities (less food and energy) and Services (less energy services) are also included in the data sets (Databases, Tables & Calculators by Subject). This level of data allows the Fed to analyze the economy in great detail to check progress on hitting targets. The overall average target of 2% inflation may be met, but individual components of the economy may be over- or under-performing.
The data available has exploded in the 21st Century. Online retail operations such as Amazon have vast and growing databases on virtually every customer. When combined with the databases and marketing capacity of Google or Yahoo or the Microsoft network, retailers have statistics that allow them to purchase products from their suppliers and operate distribution systems with a precision and efficiency never before possible. Purchase guitar strings from Amazon, for example, and the next time you use one of those search engines you will have advertisements on your screen for metronomes, guitars, keyboards, mixing boards, effects pedals and guitar books. You will keep having these targeted advertisements popping up on your screen until you order something else. Order a pair of shoes online and then you will have advertisements for jackets and slacks and belts and socks. With this database available, the guitar string seller will be able to predict how many sets of strings of which gauge will be needed, will place orders affecting the guitar string manufacturer, the suppliers of raw materials to the guitar string manufacturer and so on, all down the chain. A predictive capacity at this level imbues the economy with efficiency impossible to obtain in the past. For all of this sophistication, though, almost every Christmas season some product or other turns out to have been completely missed and prices are bid up on Cabbage Patch Dolls or Furbees or whatever the fad of the year is. Conversely, December 26th is one of the busiest shopping days of the year because stores, in spite of their databases and probability estimating algorithms have large quantities of products which have gone unsold.
These same databases become the core of many economic development efforts. In the state of Missouri, for example, the Missouri Economic Research and Information Center has prepared a series of statistical analyses regarding industries to assist local economic developers in their programs. These analyses will allow local developers to target growth “clusters” (Looking To The Future: Missouri Targeted Industry Clusters). In Texas the Texas Workforce Commission provides similar analyses to developers (Texas Industry Cluster Initiative). Across the nation developers are taking advantage of the capability for analysis and projection that the massive databases supported by computer analysis to target their efforts. By knowing which industries are stable and growing, which pay the best wages and which offer the best opportunities for citizens, industries to which a community markets can be selected and addressed with a “scope sighted rifle” rather than the “shotgun” approach used in the past.
Statistics, along with their handmaiden probability, are important in the study of human beings in groups. Whether it is the sociologist reviewing the data on marriage and divorce to determine trends, the criminologist studying arrest records to project where police need to be deployed, the economist poring over production data or the government administrator reviewing race patterns in housing to identify discrimination, the key tool is statistics. Statistics take the place of the controlled experiments of chemistry, physics or engineering.
Works Cited
Aldous, D. 40,000 coin tosses yield ambiguous evidence for dynamical bias. (n.d.) University of California – Berkeley. (n.d.). https://www.stat.berkeley.edu/~aldous/Real-World/coin_tosses.html. Accessed 12 January 2017.
Aldrich, J. Figures from the History of Probability and Statistics. University of Southampton, Southampton, UK. (2012). http://www.economics.soton.ac.uk/staff/aldrich/Figures.htm Accessed 12 January 2017.
American Fact Finder. United States Census Bureau. (2017). https://factfinder.census.gov/faces/nav/jsf/pages/community_facts.xhtml Accessed 12 January 2017.
Black, J. Gambling in Ancient Civilizations. Ancient Origins. (14 October 2013). http://www.ancient-origins.net/ancient-places-europe/gambling-ancient-civilizations-00931 Accessed 12 January 2017.
Blau, D., Kernels, S and Porcher, R. Statistics in Brief: The Importance of Sample Size in the Planning and Interpretation of Medical Research. National Center for Biotechnology Information. (20 June 2008). https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2493004/ Accessed 12 January 2017.
Bray, C. Remembering the dice number-rolling odds in backgammon. Backgammon for Dummies. (2011). http://www.dummies.com/games/remembering-the-dice-number-rolling-odds-in-backgammon/ Accessed 12 January 2017.
Chiang, C. Jerzy Neyman: 1894 - 1981. Statisticians in History. (n.d.). https://ww2.amstat.org/about/statisticiansinhistory/index.cfm?fuseaction=biosinfo&BioID=11 Accessed 25 January 2017.
Cochran, W. Sampling Techniques (third edition). (1977). John Wiley & Sons, New York. http://hbanaszak.mjr.uw.edu.pl/StatRozw/Books/Cochran_1977_Sampling%20Techniques.pdf Accessed 25 January 2017.
County Business Patterns. U.S. Census Bureau. (n.d.). https://censtats.census.gov/cbpnaic/cbpnaic.shtml Accessed 12 January 2017.
Daily Presidential Tracking Polls. Rasmussen Report. (n.d.). http://www.rasmussenreports.com/public_content/politics/obama_administration/prez_track_jan12 Accessed 12 January 2017.
Databases, Tables & Calculators by Subject. United States Department of Labor, Bureau of Labor Statistics. (2016). https://data.bls.gov/timeseries/CUUR0000SA0L1E?output_view=pct_12mths Accessed 12 January 2017.
Filho, D., Paranhos, R., da Rocha, E., Batista, M., da Silva, J., Santos, M. and Marino, J. When is statistical significance not significant? Brazilian Political Science Review, (2013) 7 (1). (2013). http://www.scielo.br/pdf/bpsr/v7n1/02.pdf Accessed 12 January 2017.
Friedman, M. Nobody can make a pencil except by spontaneous order. YouTube. (27 September 2012). https://www.youtube.com/watch?v=DNp_qGZ7c68 Accessed 12 January 2017.
Gallup Daily: Obama Job Approval. Gallup. (n.d.). http://www.gallup.com/poll/113980/gallup-daily-obama-job-approval.aspx 12 January 2017.
GIS (geographic information system). National Geographic Society. (n.d.). http://nationalgeographic.org/encyclopedia/geographic-information-system-gis/ Accessed 12 January 2017.
Hald, A. A History of Probability & Statistics. Wiley’s Series in Probability and Statistics. (1990). Quoted by Aldrich.
Harness, J. Nobody knows how to make a pencil. Joshharness.com. (20 October 2012). http://www.joshharness.com/2012/10/nobody-knows-how-to-make-pencil.html Accessed 12 January 2017.
Hyper Physics. Inverse Square Law, Light. n.d. http://hyperphysics.phy-astr.gsu.edu/hbase/vision/isql.html Accessed 12 January 2017.
Leung, M. The Beginning. University of Texas, El Paso. (n.d.). http://www.math.utep.edu/Faculty/mleung/probabilityandstatistics/beg.html Accessed 12 January 2017.
Live Reports. Real Clear Politics. (n.d.). http://www.realclearpolitics.com/elections/live_results/2016_general/president/ Accessed 12 January 2017.
Looking To The Future: Missouri Targeted Industry Clusters. Missouri Economic Research and Information Center. (2017). https://www.missourieconomy.org/industry/cluster/targetclusters.stm Accessed 13 January 2017.
Minnick, R. Private conversation. Approximately July 2014.
Neyman-Pearson Lemma. Penn State University: Stat 414/415. (n.d.). https://onlinecourses.science.psu.edu/stat414/node/307 Accessed 25 January 2017.
Physics Classroom. The acceleration of gravity. (2016). http://www.physicsclassroom.com/class/1DKin/Lesson-5/Acceleration-of-Gravity Accessed 12 January 2017.
Roche, C. Why do economists assume away reality? Pragmatic Capitalism. (n.d.). http://www.pragcap.com/why-do-economists-assume-away-reality/ Accessed 12 January 2017.
Rouse, M. Probability. Whatis.com. (n.d.). http://whatis.techtarget.com/definition/probability Accessed 12 January 2017.
Sowell, T. A dangerous obsession. Townhall. (26 December 2006). http://townhall.com/columnists/thomassowell/2006/12/26/a_dangerous_obsession Accessed 12 January 2017.
Sowell, T. Basic Economics: A Common Sense Guide to the Economy (6th Edition). (2011). Basic Books, New York, NY. Kindle edition.
Spanos, A. R.A. Fisher: how an outsider revolutionized statistics. Error Statistics Philosophy. (17 February 2014). https://errorstatistics.com/2014/02/17/r-a-fisher-how-an-outsider-revolutionized-statistics-2/ Accessed 25 january 2017.
Tabarrok, A. Calculating the Elasticity of Demand. MRUniveristy. (n.d.). http://www.mruniversity.com/courses/principles-economics-microeconomics/calculate-elasticity-demand-formula Accessed 12 January 2017.
Texas Industry Cluster Initiative. Texas Workforce Commission. (2017). https://www.missourieconomy.org/industry/cluster/targetclusters.stm Accessed 13 January 2017.
What researchers mean by sample size and power. Institute for Work and Health. At Work, Issue 53, Summer 2008. http://www.scielo.br/pdf/bpsr/v7n1/02.pdf Accessed 12 January 2017.
Whitcher, U. Descartes and the Fly. Mathforum.org. (n.d.). http://mathforum.org/cgraph/history/fly.html Accessed 12 January 2017.
Why does the Federal Reserve aim for 2 percent inflation over time? Board of Governors of the Federal Reserve System. (n.d.). https://www.federalreserve.gov/faqs/economy_14400.htm Accessed 12 January 2017.