Part 1. Primary Data Analysis.
In this paper we describe and discuss the application of statistics and probability theory to a real world problem in the area of psychology. It is believed that it is harder to read printed text on white paper than reading it on colored paper. The researchers decided to examine this claim. In order to do this, they performed a case study: 21 undergraduate students were randomly chosen. Since they were selected regardless of their age, skills, majors, status and abilities, the sampling method is simple random sampling. They were tested for the time spent on reading a passage printed on white paper and the same passage printed on light blue-colored paper. There are one numeric (quantitative) variable and one categorical (nominal) variable were determined for the research:
Time (in seconds) – time of reading a passage printed on a light blue-colored or white paper (measured on interval scale)
Type of paper – a variable that indicates whether the paper was light blue-colored or white (nominal variable)
We consider “time” variable as interval variable, because in this case study, zero time is meaningless. None of students can read a portion of text in 0 seconds.
For the purposes of inferential statistics that will be discussed in part 3, we have to determine independent and dependent variables. “Time” variable is a dependent variable and “Type of paper” variable is an independent variable (grouping variable).
This is an experimental study, because the researchers manipulated the sample, when collected their measurements on the same population. The manipulation is in changing color of the paper and passage printed on two different papers (Roberts). The purpose of the research study is to examine correlation between the variables, perform regression analysis and determine if the change of paper color results in a significant improvement in time spent on reading. In this regard, the research is quantitative.
We should pay attention that some lurking and confounding variables may be missed in this research design. For example, the order of reading may affect the time score of students. The researchers performed experiment with color paper in one day and the experiment with white paper was performed after. However, there is a chance that the results could be different if the first experiment was conducted with white paper and the second – with colored paper. Hence, experiments’ order might be a confounding variable. An example of lurking variable could be the different day of the week or different weather in two days of experiment – some factor that affects both dependent and independent variables (an external factor). Missing variables can be students’ ability to read, their majors, age and any other factor that is different from student to student and may significantly impact their reading skills.
The graphs and charts will be provided in the next part of the paper, as they are part of descriptive statistics.
Part 2. Examination of Descriptive Statistics.
Before conducting statistical tests, we have to state and check assumptions of the chosen tests. There are three research questions that will be examined in this research work:
Is there a significant improve in reading after change white paper on light blue-colored paper?
Is there a significant relationship between the two given variables
Is regression model for the given two variables is significant.
In order to avoid misunderstanding, it should be noted that in the 2nd and 3rd question, we divide “time” variable on two groups and further analysis is performed considering them as two quantitative numerical variables.
We start from distribution examination. “1.5*IQR” rule is used to determine possible outliers in the sample data. The measures of central tendency (mean, median and mode), the measures of variation (range, variance and standard deviation) and three of the five elements summary (Q1, Q3, IQR) are given in the table below:
According to the outliers rule, all values that are bigger than Q3+1.5*IQR and that are smaller than Q1-1.5*IQR are outliers (extremely high values of extremely low values, respectively). The following table represents lower and upper bounds for “usual” values:
The value of 40 in white paper group is an extremely low value, because it is less than 42.5. The value of 120 in white paper group is an extremely high value, because it is bigger than 110.5. These two pairs of observations will be excluded from the sample data.
The further steps we perform with corrected data. There are two ways used to check the normality of the distribution for both variables. The first way is visual, constructing frequency histograms. Two frequency histograms are given below:
At the first glance, both distributions look approximately symmetric and the form of the histograms are quite close to a bell-shaped curve. However, it is better to verify our observations statistically. We use Kolmogorov-Smirnov test to examine the distribution.
Kolmogorov-Smirnov test indicates that the distributions of the variables are not significantly different from a normal distribution (p=0.200). This fact allows us to use parametric tests. In order to examine the first research question, paired t-test will be used. This test is used when researchers observe the same objects in different conditions, for example, in different moments of time, before and after treatment, etc. Since the same students were tested in two different days, the groups are dependent and we cannot use t-test for independent samples.
The researchers want to test the relationship between these two variables. This relationship is visualized on the following scatterplot:
It seems that the data indicates a strong positive linear association between the variables. The statistical inference that will be used for examination of the relationship is correlation analysis. We will compute Pearson’s correlation coefficient and interpret it accordingly. The association will be mathematically displayed by the means of regression analysis. A simple linear regression model will be developed to show the estimated linear formula that connects time scores of reading text printed on the different backgrounds.
Part 3. Examination of Inferential Statistics
The first procedure is paired t-test for dependent samples.
Assumptions (Statistics.laerd.com):
Dependent variable is either ratio or interval (True)
Independent variable divides dependent variable on two groups (True)
There is no extreme values (outliers) in both groups (True)
The distribution of variables in both groups is approximately normal (True)
Hypotheses:
Null hypothesis: there is no significant difference in time score of students between reading on a white paper and reading on a light blue-colored paper.
Alternative hypothesis: the time of reading text printed on a light blue-colored paper is significantly less than the time of reading text printed on a white paper.
Significance level: alpha = 0.05. One-tailed test.
Paired t-test is used. The results are presented in the table below:
Paired t-test indicates a significant difference in time score of students between reading on a white paper and reading on a light blue-colored paper (t=5.164, p<0.001). However, it shows that the time spent on reading text printed on white paper is significantly less than the time of reading text printed on a light blue-colored paper. The null hypothesis is not rejected. There is no sufficient evidence to show that reading text on light blue-color paper is more effective (at 5% level of significance).
The second procedure is correlation analysis.
Assumptions (Statistics.laerd.com):
Two variables are either ratio or interval (True)
Linear relationship should be expected between populations (True)
There is no extreme values (outliers) in both variables (True)
Both variables are approximately normal (True)
Hypotheses:
Null hypothesis: there is no significant linear association between time spent on reading on white paper and on light blue-colored paper.
Alternative hypothesis: there is a significant linear association between time spent on reading on white paper and on light blue-colored paper.
Significance level: alpha = 0.05. Two-tailed test.
The results are given in the table below:
Pearson’s correlation analysis indicates very strong positive linear relationship between the variables (r=0.964, p<0.001). The null hypothesis is rejected. We are 95% confident that there is a significant linear relationship between the variables.
The final step is to perform regression analysis.
Assumptions (Statistics.laerd.com):
Two variables are either ratio or interval (True)
Linear relationship should be expected between populations (True)
There is no extreme values (outliers) in both variables (True)
Independence of observations (Not checked)
Homoscedasticity (Not checked)
Residuals are approximately normally distributed (Not checked).
We proceed with this analysis, assuming that #4, 5 and 6 are true. Perform regression analysis, assuming white paper as a response variable and colored paper as a predictor. The form of the regression equation is:
y=β0+β1x
Where y is time score of reading on white paper, x is time score of reading on colored paper.
Hypotheses:
Null hypothesis:
β0=β1=0
Alternative hypothesis: not all beta are equal to 0.
Significance level: alpha = 0.05.
The following regression equation is obtained:
y=15.434+0.72x
ANOVA shows that the model is significant (F=221.33, p<0.001). Approximately 92.87% of variance in time score of reading on white paper is explained by this model (R^2=0.9287). The null hypothesis is rejected.
Part 4. Conclusions and Recommendations
In this research paper we tested the claim that it is harder to read printed text on white paper than reading it on colored paper. It is appeared that the claim is not supported by our research. Based on the collected sample, we have obtained that reading text printed on white paper is significantly less than the time of reading text printed on a light blue-colored paper. Thus, reading on white paper appeared to be more effective.
In order to draw a more certain conclusion it is better to increase the number of students involved in the test. Larger samples usually give more accurate result. The further researches may study an impact on any other factors (for example, confounding, missing or lurking variables that were presented above) on the examined variables. The impact of such factors may lead to a different conclusion.
References
Roberts, Donna. "Statistical Studies". Regentsprep.org. N.p., 2016. Web. 11 Feb. 2016.
Statistics.laerd.com,. "Dependent T-Test In SPSS Statistics - The Procedure For Running The Test, Generating The Output And Understanding The Output Using A Relevant Example | Laerd Statistics". N.p., 2016. Web. 11 Feb. 2016.
Statistics.laerd.com,. "Linear Regression Analysis In SPSS Statistics - Procedure, Assumptions And Reporting The Output.". N.p., 2016. Web. 11 Feb. 2016.
Statistics.laerd.com,. "Pearson's Product-Moment Correlation In SPSS Statistics - Procedure, Assumptions, And Output Using A Relevant Example.". N.p., 2016. Web. 11 Feb. 2016.