Introduction
In this assignment I will apply the basics of statistics and probability theory to a real world problem. Assume I would like to know about the effectiveness of teaching in a particular university of the United States. The goal of the research is to find out significant factors which affect teaching evaluation score among professors at the University of Texas at Austin in 2000-2002 (Source: http://wps.aw.com/aw_stock_ie_3/178/45691/11696965.cw/index.html). According to this source, the data is measured by the following variables:
I begin with descriptive statistics and summary table for all variables.
According to the output of the descriptive statistics, we can conclude, that there are 463 observations in the data set. As a measure of central tendency we use sample mean value, as a measure of variability sample standard deviation has been used.
For the numeric variables we conclude the following: the typical (average) value of age is 48.36501 with a standard deviation of 9.802742. The typical value of course_eval is 3.998272 with a standard deviation of 0.5548656. The average value of beauty is 4.75e-08 with a standard deviation of 0.7886477. The youngest teacher in the sample is 29 years old and the oldest is 73 years old. The lowest course evaluation score in the sample is 2.1 and the highest is a maximum possible - 5. The beauty varies from -1.450494 to 1.970023.
I start regression analysis from a simple linear regression for course_eval and beauty. Course_eval is a dependent variable and beauty is independent variable.
In the STATA output we have obtained 95% confidence interval for the effect of the beauty. This interval is (0.0697687; 0.1962342). This means that we are 95% confident that a change of beauty variable by 1 unit causes a corresponding change in course_eval variable for a value between 0.0697687 and 0.1962342 units.
The overall regression equation is significant with F=17.08 with p<0.001. The coefficients of the regression are also significant (p<0.001). However, the coefficient of determination R-squared is only 0.0357. Hence, this regression explains only 3.57% of course_eval variation. This is an extremely low value. There are many factors left which have a significant impact on course_eval and didn’t include in the regression equation.
Now I want to perform regression analysis choosing course_eval as a dependent variable and other variables as independent variables.
This time the 95% confidence interval for the effect of beauty on the resulting variable has been changed. We are we are 95% confident that a change of beauty variable by 1 unit causes a corresponding change in course_eval variable for a value between 0.1058569 and 0.2262299 units.
According to the ANOVA, the overall model is significant with F=11.98 and p<0.001. However, age and intro variables are not significant at 5% level of significance. We can exclude these variables from our research. Leave the others and repeat the analysis.
Now, all coefficients are significant and the obtained regression may be used to complete forecasts. However, the coefficient of determination R-squared is very low (0.1546). This means that the model explains only 15.46% of the variance of teaching evaluation score. That’s why we conclude that there might be some other factors which have a significant impact on course_eval variable, but these factors are not included in this research. Another probable reason is that the association between the variables is not linear. Actually, this model is better than single regression between course_eval and beauty, however, it is still weak and not reliable. The regression analysis requires further explorations to be improved.
As an example how the regression equation works, we would like to predict a course evaluation score for Professor Smith. According to the instructions “Professor Smith is a black male with average beauty and is a native English speaker. He teaches a three-credit upper-division course.” We do a prediction using the last model. This means that the values of the variables are the following:
Minority: 1
Female: 0
Onecredit: 0
Beauty: 0.5
Nnenglish: 0
Substitute the given values in the equation:
Course_eval = 4.072006 - 0.1647853 * 1 + 0 + 0 + 0.5 * 0.1660434 – 0 = 3.9902424
Hence, the expected course evaluation score for Professor Smith is 3.9902424 – this value is pretty close to a sample mean value for course_eval. That’s why we can conclude that Professor Smith is a regular, typical teacher of the University of Texas. His performance is average.
Do File:
. summarize age minority female onecredit beauty course_eval intro nnenglish
. regress course_eval beauty
. regress course_eval age minority female onecredit beauty intro nnenglish
. regress course_eval minority female onecredit beauty nnenglish