Purpose Statement
In this paper we will describe and discuss how the basics of probability theory and statistics analysis can be applied to real world problems. In our case we are considering the data of houses prices in a particular region of the United States of America. Our goal is to perform regression analysis to make forecasts on houses selling price based on the factors which appear to have a significant impact on the price.
Definition of the Variables and Data Description
The data is gathered in 2007 and consists of 105 observations. Each observation represents a house sold in 2007, characterized by 8 variables.
Price – a selling price of a house
Bedrooms – the number of bedrooms in a house
Size – the area of a house in sq. ft.
Pool – whether a house has a pool or not (1 – yes, 0 – no)
Distance – how far is a house from a subway station (I’m not sure, just wondering. Please check this)
Twnship – no idea what is it
Garage - whether a house has a garage or not (1 – yes, 0 – no)
Baths – the number of bathrooms in a house
We begin with descriptive statistics showing the distribution of each variable (the output for this paper is from SPSS 22):
Now perform regression analysis choosing Price as a dependent variable and others as independent:
If we take the most common level of significance 5%, the following variables appeared to be not significant in prediction of price:
Distance, Twnship
These variables may be exluded from the analysis. We repeat the regression analysis once again:
Now each variable is significant because the corresponding p-values are less than 0.05. The overall model is also significant, because ANOVA reports F=21.78 with p<0.001. However, the coefficient of determination R-squared is only 0.524. This means that approximately 52.4% of Price’s variance is explained by this model.
It seems that the most significant factor is Garage (r=0.526) – there is a strong positive linear relationship between price and garage.
Thus, our regression equation is:
Price=36.123+7.169*Bedrooms+0.039*Size-19.11*Pool+38.847*Garage+24.624*Baths
References
Allison, P. (1999). Multiple regression a primer. Thousand Oaks, Calif.: Pine Forge Press.
Berry, W., & Feldman, S. (1985). Multiple regression in practice. Beverly Hills, Calif.: Sage Publications.
Keith, T. (2006). Multiple regression and beyond. Boston, Mass.: Pearson Education.