Abstract
In this case, we illustrate the application of multiple regression in economics. We gather data on the unemployment rate (dependent variable) and three independent variables; GDP per capita, maximum weekly unemployment benefit and population estimate for 2015. The data is collected from 39 different states. Regression analysis results show that there is a positive association between the unemployment rate and GDP per capita and population estimate. The relationship between unemployment rate and maximum weekly benefits is inverse. The evaluation of the reliability of the first-order model shows that it has a small coefficient of determination. The first-order model is statistically significant as shown by the p-value of the F Statistic. Two of the three coefficients are statistically significant since their p-values are less than 0.05. Besides the standard error of the entire model is high relative to the values of unemployment rate used in the derivation of the equation. The analysis of residuals also indicates that the residuals are normally distributed hence a linear model fits the variables. However, the model is not adequate for predicting unemployment rate. Furthermore, the coefficient of determination is small and should be increased by adding variables such as inflation rates and government spending to the model. Data on more states should be added to the model to enhance its accuracy.
Key words: coefficient of determination, standard error of estimate, p-value, statistically significant
The problem
This case highlights the application of regression analysis in economics. Unemployment is a critical variable to the overall state of the economy. Economic policy makers monitor the employment rate in a state or country and establish measures to control unemployment rates. Unemployment rate is influenced by several macroeconomic variables such as GDP per capita, population, unemployment benefits, among other variables. In this problem, we use the concept of multiple regression to build a model for estimating the average unemployment rate.
The data
Each state of the United States of America has a different unemployment rate. The unemployment rate is determined by the measure of the macroeconomic variables that affect employment in the state. In this case, secondary data on unemployment rates in 2015 in 39 states are collected. Corresponding data on state GDP per capita, population estimate for 2015 and maximum weekly compensation benefits are also collected for the 39 states. The data is obtained from the US Bureau of Labor Statistics and the US Bureau of Economic Analysis.
Data of the variables (Bls.gov, 2016)
A model for average unemployment rate
In determining the model for predicting unemployment rate, we assume that the state unemployment rate, state GDP per capita, state population and maximum unemployment benefits have a linear relationship. Thus, unemployment rate can be expressed by the first-order model as follows:
E(y) = β0 + β1X1 + β2X2 + β3X3
Y is the average annual unemployment rate and β0 is the autonomous unemployment rate (the unemployment rate when the values of the three variables are zero). X1 is GDP per capita, X2 is population estimate for 2015 while X3 is maximum weekly unemployment benefits. β1, β2 and β3 are coefficients.
The model also assumes that the variables are normally distributed hence there are no outliers that may distort the relationship between the variables. It also assumes homoscedasticity implying that the mean variance of errors is constant across all levels of the independent variables.
Regression output
Y = 5.2614429200657 + 0.0000126180611X1 + 0.0000000243003X2 - 0.0023473995008X3
Analysis of results
This section evaluates the first-order model obtained above to assess whether it is adequate for use in predicting average unemployment rate given state GDP per capita, population estimate, and maximum weekly unemployment benefits.
The model indicates the unemployment rate and GDP per capita have a positive relationship. The dependent variable is also positively related to population estimate but is negatively related to maximum unemployment benefits.
F significance test
This test assesses whether there is a significant statistical relationship between the variables. If the association between the independent and dependent variables is not statistically significant, then the whole model is not adequate. In this case, the F Statistic (Global F) is 3.026558. The value is more than two indicating that the relationship between unemployment rate and the three independent variables is not accidental. The p-value for the F-statistic is 0.0423. The value is less than 0.005 hence we reject the null hypothesis that the values of the three coefficients are zero. Therefore, we can infer that the model is statistically significant.
R-Square/Coefficient of Determination
The coefficient of determination for this first-order model is 0.205983 showing that only 20.6% of the variations in the unemployment rate are caused by changes in the three independent variables included in the model (Mendenhall and Sincich 121). This further indicates that a large percentage of the changes in state unemployment rates were caused by independent variables other than GDP per capita, population estimate, and maximum employment benefits. The coefficient of determination is less than 50% indicating that the model is not reliable since most of the variations in the unemployment rate are caused by variables not included in the model.
Standard error of estimate (SS)
This is a measure of the error or deviation of the predicted unemployment rate from the actual unemployment rate. The standard error for this model is 0.844902 implying that the actual average unemployment rate will fall between 1.6898% of the rate predicted by the first-order model. To assess the impact of the standard error of estimate, we look at the range of value of unemployment rate. The unemployment rate for Iowa used in the model is 3.7% while the highest unemployment rate used in the derivation of the model is 6.9%. Considering the values of the unemployment rate, a standard error of estimate of 0.844902% is high hence the model is not reliable. The percentage deviation of the actual unemployment rate from the predicted rate is likely to be high.
Coefficients
β1 = 0.0000126180611: This implies that if the population estimate and maximum weekly unemployment benefits are held constant, a 1% change in GDP per capita will lead to an increase in the average unemployment rate by 0.000012618%. Regarding economic plausibility, the model defies economic theory. It shows that unemployment rate will increase if there is an increase in economic growth rate. According to Okun’s Law, employment and GDP are positively related. Therefore, an increase in GDP should lead to a decline in unemployment. The coefficient of GDP per capita in the model should be positive to reflect the theoretical relationship. The t-Statistic for β1 is 2.055, and its p-value is 0.0474. Therefore, we can be 95% confident that β1 is statistically significant. This implies that β1 is a reliable measure of the change in unemployment rate caused by a 1% change in GDP per capita. The coefficient's standard error is 0.0000061395002.
β2 = 0.0000000243003: The coefficient is positive indicating that unemployment rate and population estimate are positively related. It implies that if GDP per capita and maximum weekly unemployment benefits are kept constant, the unemployment rate will increase by 0.0000000243% if the population estimate increases by 1%. An increase in population causes an increase in the labour force in an economy. If the increase is not coupled with an increase in economic growth rate, the unemployment rate in the economy will increase. In any economy where the population growth rate is greater than the economic growth rate, there will be an increase in unemployment rate. Therefore, there is a positive theoretical relationship between unemployment rate and population. The t-Statistic for β2 is 1.385, and the p-value is 0.1747. The p-value is more than 0.05 hence the null hypothesis that β2 = 0 is true. Therefore, β2 is not a statically significant measure of the variation in unemployment rate caused by a 1% variation in population estimate. Besides, the standard error of β2 (0.0000000175398) of the coefficient is low.
β3 = -0.0023473995008: The coefficient shows that when GDP per capita and population estimate are held constant, the unemployment rate will fall by 0.0023474% if there is a 1% increase in the maximum weekly unemployment benefits. This is consistent with the arguments of opponents of increasing unemployment benefits. Unreasonably high unemployment benefits may encourage laziness among the unemployed. The t-Statistic for β3 is given as -2.0341607466435 and the p-value is 0.0495689. The p-value is less than 0.05 implying the coefficient is statistically significant. It is, therefore, a reliable measure of the change in unemployment rate per dollar change in maximum unemployment benefits.
β0 = 5.2614429200657: The p-value for the intercept is 0.0000000001989 indicating that it is statistically significant at 95% confidence level hence it is a good prediction of the unemployment rate when the values of the three independent variables are zero.
Residual analysis
The analysis of residuals indicates that the residuals are normally distributed hence a linear model is appropriate for the variables. The p-value above is more than 0.05 hence we do not reject the null hypothesis that skewness and excess kurtosis is zero.
The pattern exhibited by the residuals in the above graphs suggest that the residuals are normally distributed. This implies that the condition for linear regression is met hence it is appropriate to use linear regression model in estimating the unemployment rate. Besides, an initial test of multicollinearity indicates that there is no significant multicollinearity between the variable. The table below indicates that the correlation between independent variables is weak. The highest correlation coefficient is 0.26.
Adjusting the model
As shown by the above tests, the model is statistically significant, and it also obeys the normality and multicollinearity assumptions. A small coefficient of determination and one coefficient that is not statistically significant are the only problems with the model. The coefficient of determination will be increased by adding data points to the analysis. Therefore, the model should be adjusted by incorporating more states into the analysis. Besides, adding the sample size will improve the accuracy of the model by lowering the standard error of estimate. Multicollinearity will also reduce when the sample size is increased. This will also enhance the statistical significance of the population estimate. If it does not improve, the model should be adjusted by eliminating the population The pattern exhibited by the residuals in the above graphs suggest that the residuals are normally distributed. This implies that the condition for linear regression is met hence it is appropriate to use linear regression model in estimating the unemployment rate.
Besides, an initial test of multicollinearity indicates that there is no significant multicollinearity between the variable. The table below shows that the correlation between independent variables is weak. The highest correlation coefficient is 0.26.
Adjusting the model
As shown by the above tests, the model is statistically significant, and it also obeys the normality and multicollinearity assumptions. A small coefficient of determination and one coefficient that is not statistically significant are the only problems with the model. The coefficient of determination will be increased by adding data points to the analysis. Therefore, the model should be adjusted by incorporating more states into the analysis. Besides, adding the sample size will improve the accuracy of the model by lowering the standard error of estimate. Multicollinearity will also reduce when the sample size is increased. This will also enhance the statistical significance of the population estimate. If it does not improve, the model should be adjusted by eliminating the population estimate variable. Besides, the model can be improved by adding other variables affecting unemployment rate such as inflation rates, interest rates, tax rates, among other variables.
Conclusion
The first-order model obtained has a low coefficient of determination indicating that most of the variations in the unemployment rate were not caused by GDP per capita, population estimate, and maximum weekly employment benefits. Most of the variations were due to the variables not included in the analysis. Two of the coefficients of the model are statistically significant since their p-values are less than 0.05. β2 is not statistically significant hence it is not a good predictor. The standard error of the model is high as explained above hence the deviation of the actual unemployment rate from the fitted rate is likely to be large. The standard errors for all the coefficients are small hence, their VIFs are small. This indicates that there is little multicollinearity between the variables. An evaluation of the entire model shows that it is statistically significant since the p-value of its F-Statistic is less than 0.05.
Owing to the above reasons, the first-order model derived is not adequate for predicting average unemployment rate. Although the entire model is statistically significant, the coefficient of determination is so small that it would be unreasonable to use the model. Besides, not all the coefficients are statistically significant. The coefficient that is not statistically significant adversely affects the accuracy of the estimate of unemployment rate using the model. Besides, the standard of error of the model is high. The standard error is more than 20% of the some of the values of unemployment rate used in the analysis. The deviation of eth actual unemployment rate from the fitted rate will be significant if the model is used without modification. Therefore, the first-order model should be improved since it is not adequate. To improve its coefficient of determination, other variables affecting unemployment rates such as inflation rates, government spending, among other variables should be included in the model. Besides, data points should be added by including the figures for more states. An increase in the sample size will reduce the standard error of estimate thus improving the accuracy and reliability of the model.
Works cited
Mendenhall, William and Terry Sincich. A Second Course In Statistics. 7th ed. Upper Saddle
River, N.J.: Prentice Hall, 2012. Print.
"Unemployment Rates For States". Bls.gov. N.p., 2016. Web. 1 May 2016.