As the name suggests, Analysis of Variance (ANOVA) is a statistical technique that analyses relationship between different groups of data to find out or reject similar patterns and tries to establish or reject a relationship. It is one of the most used statistical techniques among the statistical and research community across the globe.
First before using ANOVA a researcher or statistician needs to understand and define what he or she needs to find out. For example, a researcher may want to find out if there is any correlation between rainfall and GDP growth rate. Once that is identified the researcher finalizes what is known as the “Null Hypothesis. As the name suggests “Null Hypothesis” states or assumes that the means of all the sample groups are equal for the test variables in concern. For the above example the “Null Hypothesis” will be “average of different groups collected for the above two variables, rainfall and GDP growth rate, are equal”.
Once the subject of the research is finalized then comes the part where data collection is done. If the researcher only takes two groups for testing using ANOVA then it may incur a statistical error known as Type I error(false positives leading to false scientific claims) and hence ANOVA is used only when data group size is 3 or more. It also assumes that the data variation within each sample group is normally distributed and sample groups are independent of each other. Keeping those assumptions in mind for our example, the researcher can now go and start collecting data. In this case the researcher can get last 30 years of data of GDP growth rate and rainfall for 10 different countries. Now he will have 10 different groups of data of GDP growth rate and rainfall to test. He will try to find out, even if the group (country in our case) changes, the existence of similar kind of relation for all the countries.
Additionally, the researcher needs to define a critical value or a probability percentage to test the data against. In our case the researcher says 95% is the critical value. That means the test result will have to validate a relationship between rainfall and GDP with 95% or more accuracy and only then it will be accepted.
Finally, once the data is collected ANOVA techniques are used to analyze and compare the data “between groups” and “within groups”. “Within group” means the data average and variation of GDP growth rate and rainfall for a particular country. On the other hand, “Between groups” means the variation of data between different countries. If it is found that the variability ‘between groups’ is smaller than that of ‘within groups’ then one can conclude that the means of the groups are identical and null hypothesis is true and there is possibility of some positive correlation between the two variables (Rainfall and GDP growth rate). If the variation ‘between groups’ are higher compared to the ‘within group’ variation then the Null Hypothesis is rejected and the researcher needs to test for some alternate hypothesis.
References:
Electronic Statistic Textbook, (24th March, 2013)
http://www.statsoft.com/textbook/anova-manova/
Anova, Tools for science, College of saint benedict and Saint John’s University (24th March, 2013)
http://www.physics.csbsju.edu/stats/anova.html
ANOVA, Wikipedia, ( 24th Mar, 2013)
http://en.wikipedia.org/wiki/Analysis_of_variance