Introduction
This term paper is completed to show the understanding of the statistics and to demonstrate the application of probability theory to a real world problem. The aim of this research project is to pick a data set that most closely ties my chosen field and perform statistical procedures such as descriptive statistics and statistical inference using the selected data. Since my field lies in the area of sales, real estate and business administration, I selected the Goodyear Arizona Real Estate Sales Data.
The main research question I will investigate in this paper is: what are the factors that have a significant effect on the sale prices of houses? The issue of the house price determination is very important for everyone, because the decision to buy a new property is one of the most important decisions we made in our life. Consequently, it is very useful to know about the characteristics of houses, which affects their price most significantly.
The one of the most important characteristics of a house is its location (Taylor, Susan). The most expensive properties are located close to the borders of cottage settlements. These objects can have private entries and driveways, the site is usually bordered by woods. The sale price of the house increases even more, if it is on the waterfront or in the woods. These are the factors for which the buyer is willing to pay a higher price. The availability of convenient entrance to the home’s location is a huge advantage, which increases the cost of housing. However, passing a number of highways becomes a big disadvantage. Any kind of a noisy object, such as racecourse or stadium will negatively affect the value of the property. All these suggested options are attentively considered by the customer as he tries to choose the best accommodation for the best price.
The other housing options are no less important in the formation of its price: the size (the bigger house is usually more expensive), the materials used and the communications connected determine the cost of housing.
Special attention should be paid to the interior decoration. Without it, the cost of housing is significantly reduced. The availability of interior decoration, the quality, especially the quality of the materials used, affects the final cost of the house. Natural materials have always been more expensive than artificial, designers’ works are valued higher than the “serial punching”.
Summing up, I can conclude that considering these factors, it will be not that difficult to find the best option of a country house at the right price.
Body
The determination of the most significant factors will be completed by the means of multiple regression analysis. In order to develop a multiple regression equation, I have to select a dependent variable and a set of independent variables. Since the research question is related to price determination, it would be natural to assume that the price of property is a dependent (response) variable. According to the selected data sample, the following variables are considered to be independent variables of the regression equation:
Bedrooms – the number of bedrooms in the house.
Size – the total square footage of the house.
Pool – a dichotomous variable, which is equal to 0 if there is no pool in the property and 1 if there is a pool.
Distance – how far is the house from the center of the city, in miles.
Twnship – the characteristics of the house in the township.
Garage – a dichotomous variable, which is equal to 0 if there is no garage in the house and 1 otherwise.
Baths – the number of bathrooms in the house.
The original regression equation will have the following form:
Price = b0 + b1*Bedrooms + b2*Size + b3*Pool + b4*Distance + b5*Twnship + b6*Garage + b7*Baths
The variables included in the regression equation will be described statistically (descriptive statistics will be provided). After that, the multiple linear regression will be developed and evaluated. The insignificant factors, if any, will be excluded from the equation and the procedure will be repeated until I get a significant regression model.
Results
The data set consists of 105 observations that represent 105 houses sold in Arizona. Based on this data sample, the following descriptive statistics were calculated for the variables included in the regression model:
The average sales price of a house in this data sample was $221,103 with the standard deviation if $47,105.4. The cheapest house had a price of $125,000 and the most expensive was sold for $345,300. The half of the houses were cheaper than $213,600. The average number of bedrooms in a house was 3.8 with the standard deviation of 1.50 bedrooms. The smallest number of bedrooms was 2 and the highest number was in the house with 8 bedrooms. The half of the houses had less than 4 bedrooms. The average total area in the sample was 2,223.81 sq.ft. with the standard deviation of 248.66 sq.ft. The smallest size was 1600 sq.ft. and the biggest was 2900 sq.ft. The half of the houses had their total square footage above 2200. The average measure of the distance from the house to the center of the city was 14.63 with the standard deviation of 4.87. The shortest distance was 6 miles and the longest was 28 miles. The 50% of the houses are situated further than 15 units away from the the center of the city. The average indicator of the township was 3.10 with the standard deviation of 1.29. This value varied from 1 to 5 with the median of 3. The average number of bathrooms was 2.08 with the standard deviation of 0.39. The least number of bathrooms was 1.5 and the maximum was 3.
Among the sample of 105 houses, 36.19% of the houses had no pool (38 houses), 63.81% were with a pool (n=67). 32.38% had no garage (n=34) and 67.62% were with garage (n=71).
The regression equation was developed and the following results were obtained. The equation has the following form:
Price = 43.14 + 7.38*Bedrooms + 0.04*Size + 19.11*Pool – 1.01*Distance - 1.74*Twnship + 35.50*Garage + 23.09*Baths
The analysis of variance showed the overall significance of the model (F=15.853, p<0.001). The equation mentioned above explains a significant part of the variance in the house prices. The coefficient of determination R-squared showed that approximately 49.99% of the variance in the outcome variable is explained by the regression equation. However, not all coefficients are individually significant. In particular, the p-values of Distance variable and Twnship variable are higher than 0.05 (0.175 and 0.521 respectively). These factors do not have a significant effect in the house price determination and hence should be excluded from the model.
The new regression equation is developed and the following results were obtained:
Price = 17.0125 +7.16888*Bedrooms + 0.03919*Size + 19.1105*Pool + 38.8472*Garage + 24.6236*Baths
The ANOVA shows that the overall significance of the regression model has been improved (F=21.7796, p<0.001). Also, the adjusted R-squared remained almost unchanged – the equation explains 49.98% of the variance in the house prices. Finally, all coefficients demonstrated their individual significance (p-values are less than 0.05). Summing up, I think that this regression model is my final model, which can be used to make predictions on the house prices.
Discussion and Conclusion
The purpose of this research work was to determine the most significant characteristics of the houses that may significantly affect the final sales price of a house. A sample of 105 houses sold in Arizona was used to examine this research question. In this research, the following explanatory factors were considered: number of bathrooms and bedrooms, the total area, township, garage attached, swimming pool and the distance from the center of the city in miles.
The research question was examined by the means of the multiple regression analysis. The results of the analysis indicated that such factors as township and the distance from the center of the city are not significant predictors for the house prices. These factors were excluded from the model and the analysis was made again, using only the remaining variables. The final regression equation showed that all other variables were significant for the price determination. The developed regression equation can be used to predict the estimated sale price of a house, using the given information of its number of bathrooms and bedrooms, pool, garage attached and the total area in square feet. It should be also noted that the coefficients of the regression equation were all positive. This can be interpreted as follows:
The houses with higher total square footage are usually more expensive
If a house has a pool, it increases its selling price
If a house has a garage attached, the selling price of the house will also be higher
Houses with higher number of bathrooms and bedrooms are more expensive.
Works Cited
Taylor, Susan. "5 Factors That Influence Your Home’S Resale Value". US News, 2017, http://money.usnews.com/money/personal-finance/articles/2014/07/02/5-factors-that-influence-your-homes-resale-value.
Appendix
Descriptive Statistics
Frequency Distribution
Original Regression
Final Regression