Introduction
In this paper, we will discuss and describe the application of statistics and probability theory to a real world situation. We will choose a data set and apply one of the statistical procedures learned during this course.
The data set was retrieved from the R. Weintraub (1962), "The Birth Rate and Economic Development: An Empirical Study", Econometrica, Vol. 40, #4, pp 812-817. This data provides information about the birth rates, per capita income, proportion of population in farming and infant mortality during the early 1950s for 30 nations.
The purpose of this project assignment is to apply correlation and linear regression technique to the given data set and examine the factors that have a significant impact on the birth rates.
Methods
We are given with the data set of 30 observations. Each observation represents information about the birth rates, per capita income, proportion of population in farming and infant mortality during the early 1950s for 30 nations. The dependent variable is the birth rates. All other variables are considered to be independent variables. The first step of the statistical research is to calculate correlation coefficients between the dependent variable and each of the independent variables.
The null hypothesis is:
H0:ρ=0The alternative hypothesis is:
Ha:ρ≠0
Next step is linear regression analysis. We want to develop the linear regression model of the following kind:
y=b0+b1x1+b2x2+b3x3
where, y is the birth rate, x1 is per capita income, x2 is the proportion of population in farming and x3 is infant mortality.
The null hypothesis is:
H0:b1=b2=b3=0
The alternative hypothesis is:
H0:not all coefficients are equal to 0
Set the level of significance of each procedure at 0.05
Results
The correlation matrix is given in the table below:
There is a moderate negative association between the birth rate and per capita income (r=-0.419), there is a moderate positive linear relationship between the birth rate and the proportion of farming (r=0.480) and there is a strong positive linear relationship between the birth rate and infant mortality (r=0.664). According to the Pearson’s correlation coefficient critical values table, all these correlations coefficients are significant (p<0.05). Hence, the null hypothesis is rejected.
The regression analysis output is given in the table below:
The regression equation is:
y=5.554+0.007x1+9.105x2+0.243x3
The results of ANOVA indicate that the coefficients are jointly significant (F=7.525, p<0.001). The coefficient of determination R-square showed that the model explains approximately 46.47% of variability in the birth rates. Only infant mortality coefficient is significantly different from 0 (t=3.332, p<0.05). Other coefficients are not significantly different from 0. We reject the null hypothesis.
Discussion
In this paper, we applied the tools of correlation and regression analysis in order to examine the factors that have a significant impact on the birth rates. It is appeared, that each of the three factors is somehow associated with the birth rate (moderately or strongly correlated). However, not all factors appeared to be significant in the forecasting regression model. Only infant mortality rates can be used to predict the birth rate in the given countries. Both per capita income and the proportion of the population farming are not significant factors in predicting birth rates.
Appendix