In this paper, the importance and the application of probability and statistics tools to a real world problem will be considered. My goal is to investigate if the hours that students studied and the grade they earned on a test are related factors. The sample provided contains the data of 30 students. Their hours of study and exam scores are recorded and given in the spreadsheet. In order to determine the strength of the association, I will use Pearson’s correlation coefficient. The correlation reflects the statistical association between random variables. If the variables are related, a change in one part of variables is accompanied by the corresponding change of the other variables. The mathematical measure of correlation between two variables is the correlation coefficient (r). The correlation coefficient was calculated in Excel and the result is given below:
r=0.886
The value of the Pearson’s r indicates a very strong linear association between the variables. The positive sign of the r-value indicates, that the direction of the relationship is straight ("Pearson Product-Moment Correlation - When You Should Run This Test, The Range Of Values The Coefficient Can Take And How To Measure Strength Of Association."). It means that a decrease (increase) in one variable is associated with the corresponding decrease (increase) of the other variable. I feel that this indicates a causal relationship, because if a student spend more hours to study, he understands the material better and as a consequence, he performs better on the exam. The next step is to find a linear regression equation of the relationship between hours of study and the exam score. I generate the scatter plot in Excel and add a linear trend line with the equation and R-squared value displayed on the chart:
The R-square value is reported at 0.785. We can find the linear correlation coefficient simply taking a square root of this value:
0.785=0.886
The value of square root is positive because the direction of the association is positive. This implies the following association: as students spend more hours on the study, their exam scores increase. A student who spends less hours of study, most likely will receive a lower grade on exam.
The regression equation has the following form:
y=1.5608x+55.767
Here, y is exam score and x is hours of study. This equation can be used for making forecasts of the students’ exam scores with the given number of hours spend on the study. The value of the coefficient of determination (R-squared) indicates that approximately 78.5% of variance in the exam score is explained by hours of study spent (Roberts). This is a good indicator, as the value is quite close to 100%. However, there are possible factors that probably were not included in the model. For example, the type of the subject can play an important role in predicting exam scores. Other factors that may influence exam scores is students’ faculty, professor (as all marks are the subjective opinion of each professor), gender of students, age, etc.
The statistical concepts of correlation and regression analysis are very important to learn, because they have a wide practical application.
Works Cited
"Pearson Product-Moment Correlation - When You Should Run This Test, The Range Of Values The Coefficient Can Take And How To Measure Strength Of Association.". Statistics.laerd.com. N.p., 2016. Web. 25 June 2016.
Roberts, Donna. "Statistics 2 - Correlation Coefficient And Coefficient Of Determination". Mathbits.com. N.p., 2016. Web. 25 June 2016.