Why did you choose this topic? What are you trying to find out?
Any training or coaching program for preparing good sportsmen has certain criteria for choosing the trainees. Off course performance in the games they have played is one such criteria that can prove their ability. But if the choice is made among younger people who have got little chance to play in tournaments the choice have to be made based on other characteristics. In our present study we try to find out whether scores in a game by a person depends on the physical characteristics of the person such as height and weight. If we find a significant relationship between height weight and the points scores in games then height and weight can be a criteria for choosing players for school or college teams for inter school or inter college tournaments or even bigger tournaments.
We have collected data on the points scored in game and the height and weight of 54 players. We intend to conduct a regression analysis to understand whether height and weight affects the game scores earned by a player. Our model is presented below:
X3 = b0 + b1X1 + b2X2
Where,
X1= height in feetX2= weight in poundsX3= average points scored per gameWe are going to estimate b0, b1 and b2 from the regression exercise.
What you think the model will predict. (What do you expect the “b” values for each variable to be? (positive? Negative? Why?)
In terms of casual observation it seems that both height and weight should positively affect the test scores. Thus both b1 and b2 should be positive. Positive values of coefficients indicate that as the independent variable changes the dependent variable changes in the same direction. Players with better physical conditions, physical strength and bigger build are believed to be better performers. Since height and weight are indicators of physical condition we have taken these to variables to study how game score is affected by physical conditions.
What are the coefficient parameters (b’s) and what do they mean?
We have already presented the regression model. Let us now present the estimated model:
X3 = 22.29 -2.57X1 + 0.03X2
(1.59) (-0.79) (0.62)
The figures in the parenthesis are the t-stat for the coefficients.
We can see that X1 or height has a negative coefficient. The negative value implies that as height increases game scores fall. Taller people will have lower average scores compared to shorter people. Coefficient of X2 is positive but quite low, less than one. Thus weight influences game score positively but to smaller extent.
Which of these coefficient parameters are individually significant? Are they economically significant?
We have presented the t stat for each coefficient in our estimated equation and we observe that none of the coefficients are significant statistically as all the t values are less than 2.
Since the coefficients are not significant we cannot find an economic interpretation from the results. In terms of policy making one significant implication is that choices of players cannot be made by observing their height and weight as these two variables are not indicative of player performance.
How well do the predictor variables explain the dependent variable?
The value of R square is 0.01 which is too low. The model has only 0.01% predictive power. Thus the predicands of the model can explain only 0.01% of the variations of the dependent variable. Thus the model is not suitable as a predictor.
Is the model significant as a whole? How do you know?
The model is not significant as a whole since the F stat is lower than the significance level.
Did the model and the signs and values of the parameter agree with your prediction? If no, why do you think it didn’t? (Was there perhaps a violation of the 5 assumptions?)
The signs of the coefficients did not match our expectation. It seems that the choice of independent variable was wrong since height and weight cannot determine game score. Another problem with the model is that height and weight themselves are related to each other. Thus the model suffers from the problem of multicollinearity.
How do you think you might improve your model? Be specific. If you were to add another variable in there, which one would it be and why?
Since the model suffers from multicollinearity problem each independent variable must be regressed separately. Other variables like some indicators of stamina, speed of running and other indicators of sound health and physical strength should be used as independent variables.
How do you think you might improve your research next time?
We should include data from different groups of people in different areas and age group. Moreover a number of other indices and variables should be used. We should test our model for autocorrelation and heteroscedasticity.
Submit the actual data set as well (and tell me the link where you got the data)