Produce the Pearson correlation matrix for all variables except the name of the printer. What are the two strongest relationships shown in the Pearson correlation matrix?
The correlation matrix is given in a table below ("Annotated SPSS Output: Correlation", 2016):
The strongest correlation coefficient is between Text Cost and Color Photo Cost (r=0.606, p=0.017). The next strongest correlation is between Price and Color Photo Time (r=-0.518, p=0.048). There is a strong positive linear relationship between Text Cost and Color Photo Cost. There is a strong negative relation between Price and Color Photo Time. These relationships are one we would expect, because lesser time spent on producing color photo usually costs higher.
(b) If we are trying to model the relationship between Price and the other variables, from these correlations, which variables should be considered for inclusion in the model?
We should include only significant correlations (assume that the correlation is significant if p<0.10). We can see only two correlations that are quite significant for Price: vs. Text Cost (r=-0.501, p=0.057) and vs. Color Photo Time (r=-0.518, p=0.048). Only these two variables should be included.
(c) SPSS offers a variety of methods (e.g. ‘Forwards’ ‘Backwards’ and ‘Stepwise’) for fitting regression models involving more than one predictor variable. Explain briefly the purpose of these methods.
When we use the forward selection method, we start with an empty equation. Then, the predictors are added one by one, starting with the predictor with the highest correlation with the dependent variable. Backward deletion method is an opposite procedure. Starting from the inclusion of all independent variables, each predictor is deleted if they are not significant in the regression equation. The stepwise selection is a combination of the previous methods. It analyzes the predictors at each step to determine the significance of their contribution in the model. Then, the new variable is added or one of the insignificant variables is deleted ("Selection Process for Multiple Regression - Statistics Solutions", 2016).
Perform a Regression analysis to predict the “price” variable from all the other numeric variables, using the “Enter” method:
(d) What percentage of the variation in APR is accounted for by the four variables?
The coefficient of determination indicates approximately 71% of variation is accounted for by the four variables (R-square = 0.71).
(e) What are the null hypotheses tested by each of the sig values in your output? What conclusions should be drawn from these results?
We test the following pairs of hypothesis for each coefficient:
H0:bi=0Ha:bi≠0
The conclusion is that Text Cost (t=-1.282, p=0.229), Color Photo Time (t=-1.379, p=0.200), Text Speed (t=-1.644, p=0.131) and Color Photo Cost (t=0.327, p=0.750) are all insignificant to predict the price. All null hypotheses are not rejected at the 10% level of significance.
Perform another Regression analysis to predict the “price” variable from all the other numeric variables, using the “Backward” method ("SPSS Textbook Examples: Regression Analysis by Example, Third Edition, Chapter 11", 2016):
(f) Explain what has happened in the sequence of model fitting.
At the beginning, all the four variables were included in the model. Then, the Color Photo variable was excluded, because the p-value of this variable was higher than 0.1. After this, Text Cost and Text Speed were also removed for the same reason. The final model consists only of one predictor – Color Photo Time.
(g) Using the final model in your output, on average how much does it cost to reduce the time taken to print a text page by one second?
The last regression model contained Text Speed variable indicated that it costs 22.702 on average to reduce the time taken to print a text page by one second. However, this is not our final model.
(h) Using the final model, on average how much does it cost to reduce the time taken for a color photograph by one minute?
According to the final model, it costs 9.7 pounds to reduce the time taken for a color photograph by one minute (the coefficient of the variable “Color Photo Time”).
(i) Looking at the excluded variables in the final model, are any of them “almost” included in the model? Explain your reasoning.
Yes, the Text Speed variable was “almost” included in the final model, because its p-value was 0.128 – quite close to the criterion value of 0.100.
References
Annotated SPSS Output: Correlation. (2016). Ats.ucla.edu. Retrieved 24 April 2016, from http://www.ats.ucla.edu/stat/spss/output/corr.htm
Selection Process for Multiple Regression - Statistics Solutions. (2016). Statistics Solutions. Retrieved 24 April 2016, from http://www.statisticssolutions.com/selection-process-for-multiple-regression/
SPSS Textbook Examples: Regression Analysis by Example, Third Edition, Chapter 11. (2016). Ats.ucla.edu. Retrieved 24 April 2016, from http://www.ats.ucla.edu/stat/spss/examples/chp/chp11.htm