The first table presented a matrix of correlations coefficients for Exam performance, Exam anxiety and Time spent revising. It was found that performance in exam was negatively linked to exam anxiety; the Pearson Correlation was r = −.441. P value was less than 0.05 so this relation is significant. It was also found that exam performance was significantly positively correlated to time spent revising; r = 0.3967 and p < 0.001. Lastly, exam anxiety appears to be significantly negatively linked to time spent revising (r = -0.709, p= 0.000). The above Descriptive table presented the mean and standard deviation of all the four variables for instance; the mean of Record Sales was found to be 193.20 (in thousands). The above table provided useful review of the data.
Along with descriptive statistics, a correlation matrix was also generated. This matrix first presented Pearson’s correlation coefficient values for each variable pair. For instance, it was found that advertising budget was largely positively correlated with sales with r = .578. The table also provided significance (one-tailed) for these correlations. Lastly, number of cases for these correlations was presented (N).
The above table presented the value of R and R square for the model (the model had used Enter method of regression). Here R = .815 and represents the correlation between advertising budget and sales. Also the R square is .665; implying that advertising expense, attractiveness of band and number of plays on radio accounted for around 66.5% of the variation in sales revenue.
The next column of adjusted R square indicated the generalizability of the model which is .660. This meant that if a sample was used to derive this model, the difference in variance would be around 0.5%.
The ANOVA table would tests the model significance; whether it is good at forecasting the outcome. With 196 degrees of freedom, F-ratio is 129.498 and is significant (p less than 0.05).
Lastly, the scatter plot revealed that the data points were indiscriminately and evenly dispersed throughout the plot which indicated that the assumptions of linearity and homoscedasticity have been met. Similarly, the histogram revealed that data points are normally distributed so normality assumption is also met.
Interpretation: (Second Set of Data)
The above Descriptive table presented the mean and standard deviation of all the four variables for instance; the mean of Record Sales was found to be 193.20 (in thousands).
Along with descriptive statistics, a correlation matrix was also generated. This matrix first presented Pearson’s correlation coefficient values for each variable pair. For instance, it was found that advertising budget was largely positively correlated with sales with r = .578. The table also provided significance (one-tailed) for these correlations. Lastly, number of cases for these correlations was presented (N).
The above table presented the value of R and R square for the model (the model had used Enter method of regression). Initially only one predictor (Advertising Budget) was added. Here R = .578 – a simple correlation between advertising and record sales. Similarly the R square initially is .335 which means that only advertising budget accounts for 33.5% of the variation in record sales.
Here R = .815 and represents the correlation between advertising budget and sales. Also the R square is .665; implying that advertising expense, attractiveness of band and number of plays on radio accounted for around 66.5% of the variation in sales revenue.
The next column of adjusted R square indicated the generalizability of the model which is .660. This meant that if a sample was used to derive this model, the difference in variance would be around 0.5%.
The ANOVA table would tests the model significance; whether it is good at forecasting the outcome. Initially with 198 degrees of freedom, F-ratio is 99.587 and is significant (p less than 0.05). Hence the initial model with one predictor is significantly good at forecasting the outcome. The next section of the table was already explained in the previous part’s interpretation. Altogether, it could be stated that the initial model had significantly improved ability to forecast the outcome variable, but that the new model (involving more predictors) was even better.
Lastly, the scatter plot revealed that the data points were indiscriminately and evenly dispersed throughout the plot which indicated that the assumptions of linearity and homoscedasticity have been met. Similarly, the histogram revealed that data points are normally distributed so normality assumption is also met.
eel.sav – Interpretation:
Logistic Regression : Initial Model:
The above tables inform about total number of cases which is 100%; no case was excluded during analysis. Also, it describes the outcome variables coding as 0 = not cured, and 1 = cured. The third table presented the parameter coding for the categorical predictor variable and Indicator coding as 0 no treatment, 1 treatment/Intervention.
A forward stepwise method has been used for this analysis and only the constant will be used in the regression equation to derive the model. Also only the constant has been included in the model and all other predictors (variables) have been omitted. The table “Iteration History” explains the log-likelihood of the model is 154.08; the fit of the data model.
The classification table indicated that there were approx.65 patients cured and remaining 48 were not.
It was predicted that all patients were cured, since there was 0% accuracy for the patients not cured, and 100% for the cured. Taken as a whole, this model had classified 57.5% (as per classification table) of patients correctly. The next table “Variables in the Equation” showed the value of constant which is 0.303 where as the next table “Variables not in the Equation” showed that the residual chi-square statistic is 9.773 which is also significant (p-value less than 0.05).
These statistics explain that the addition of any of these variables in the model could significantly influence its predictive power. In other words, if residual chi-square was not significant, the forced exclusion of these variables in the model would not have contributed significantly to the predictive power.
New Model:
There is a second statistic called the step statistic that shows the improvement in the predictive power of the model since the last stage.
Here, the new model containing the intervention; the overall fit has been analyzed from the log-likelihood (refer to Iteration History, 2nd column). The log has been multiplied by “2” (also called 2LL), in this way it is roughly chi-square distribution and we could compare values with those we anticipated to get unexpectedly. It has been recommended that the log value be not more than the value when only constant was included and lower value would signify that model has predicted the resultant variables more precisely. In the initial model with only one variable, the log value was 154.084, but in new model the value is 144.2. This explains that the current model is better in forecasting whether before the introduction of intervention, any patient was cured. The very next table depicted the chi-square statistic which would help in assessing how much “better” (the variation in the model in current phase and when the constant was included) this model could forecast the outcome variable. Here the value of chi square statistic has been derived in the same manner (154.084- 144.156 = 9.928). As the p-value is less than alpha, we could state that the model has been predicting if a patient was cured or not considerably better than when constant was included in it.
The classification table showed how better the model had forecasted group membership; again model has used Intervention to forecast the outcome variable. The overall precision of the model is actually the average (64.6%) (Refer to previous table; when the constant was added its accurateness was 57.5% which has enhanced now). The next table is very important as it presented Wald statistic (a chi-square distribution) which explains whether the beta coefficient of the predictor is different from zero. If the coefficient is significantly different from zero then we can assume that the predictor is making a significant contribution to the prediction of the outcome. For the above output, it could be stated that having the intervention (or not) is a significant predictor of whether the patient is cured.
Lastly the observed group and predicted probabilities indicated that the model had predicted cured cases relatively well but for not cured cases the model is less good (the probability of classification is only slightly lower than .5 or chance).
An ROC curve was also provided with the data set that displays all possible cut-off points. ROC curve helps in finding the required value of sensitivity at fixed value of specificity. The area under the curve is .658 with 95% confidence interval (.555, 760). Also, the area under the curve is significantly different from 0.5 since p-value is .004 meaning that the logistic regression classifies the group significantly better than by chance.
References:
Howell, David C. (2001). t-tests. Retrieved from http://www.uvm.edu/~dhowell/gradstat/psych340/Lectures/t-tests/Class15.html
Leard Statistics.com. (2013). Linear regression analysis using spss. Retrieved from https://statistics.laerd.com/spss-tutorials/linear-regression-using-spss-statistics.php
Talkstats.com. (2013). Correlation: Data need to be normally distributed?. Retrieved from http://www.talkstats.com/showthread.php/10123-Correlation-(data-need-to-be-normall-distributed-)