Executive Summary
This paper assesses the determinants of earnings among workers in mid-Atlantic. The study uses a sample of 3,000 workers. The data for the study was collected between 2003 and 2009. The average age of the participants was 42.41 years. All the study participants were male. The study participants had diverse demographic profiles in terms of race, educational background, and marital status. A preliminary analysis was conducted to identify factors that show some relationship with wages among the variables in the dataset. A preliminary analysis revealed that wages appear to be influenced by age, educational attainment, and race. A linear regression analysis was used to test the relationship between the variables. The model was overall significant at 1%. Therefore, age, educational attainment and race are jointly significant determinants of wages in Middle Atlantic region. All the individual co-efficient were significant at 1% significance level. Therefore, age, educational achievement and race are significant determinants of wages individually. The model has an adjusted R of 0.2606. This means that the model explains 26.06 percent of the variations in wages.
Description of the Data
The dataset comprises of wages as well as demographic information of 3,000 workers drawn from Mid-Atlantic region. The dataset has 12 variables: wages, log of wages, year of recording, age, sex, marital status, race, education, region, job class, health insurance.
The year indicates the year during which the data was collected. The data was collected between 2003 and 2009 as shown in table 1 below. Age measured the workers age. The youngest worker was 18 years old and the oldest worker was 80 years old. The average age was 42.41 years. Race variable was dummy variable with four levels that represented various ethnicities: whites, blacks, Asian, and other. Whites were the majority at 2480 (82.67%) and the minority were the ethnicities classified under other (see Table 1). Sex was a dummy variable for gender. There were no women in the dataset. All the 3,000 observations were male. Marital status was a dummy variable with five levels: never married, married, divorced, widowed and separated. Married were the majority at 2074 while separated were the least at 55. Education was a dummy variable that showed the level of educational achievement. Most of the observed persons were high school graduates (971) while those with less than high school education were the least (268). Health was a dummy variable that assessed the health of the observed persons with two levels; good to below good and very good. 858 rated their health as good or below good while 2142 rated their health as very good or better. Health insurance was a dummy variable indicating whether some has health insurance. Of the observed persons, 2083 had insurance while 858 did not have any insurance. There were levels for job class: industrial and Information. 1,544 worked in industrial while 1,456 worked in information. Region indicated the location of the workers. All the workers were from Middle Atlantic. Lastly, wages indicated the wage earnings of the workers. The lowest was 20.09 and the highest was 318.34. The mean wages were 111.70 (see table 1).
Earnings Vs Wage
A scatter plot of wages and age does not show any unique relationships. The data points are scattered without any pattern (see figure 1)
Figure 1
Earnings vs Race
A scatter plot of earnings against races shows that there are differences in the earnings of different groups according to ethnicity. The four race levels have different means and data spread of wages (see Fig.2).
Figure 2
Earnings Versus Schooling
A scatter plot of earnings against education shows that there are differences in the earnings of different groups with different educational achievement. The average wages and spread of income increases as education level rises (see Fig.3).
Figure 3
Earnings Versus Job
A scatter plot of earnings against education shows that there are differences in the earnings in earnings among workers of different classes. Information workers have a higher average wage and a higher spread of wages.
Figure 4
Linear Regression
The following regression model was estimated
wage = β1age + β2race + β3education + ɛ
The model has an adjusted R of 0.2606. This means that the model explains 26.06 percent of the variations in wages. The F-statistic of the model is signifanct at 1% significance level. Therefore, all the independent variables are jointly significant.
The coefficient for age is statistically significant. An increase in age by one year increases earnings by 0.57. The co-efficient for race 2, 3 and 4 are statistically significant. Being Black, Asian or other lowers the earnings ability. There is a significant positive relationship for all levels of education from high school graduate.
Contrast
LDA Results
The results show that a linear model specification is appropriate for the dataset.
QDA Results
The results shows that a quadratic specification of the model is inappropriate.
Validation
The validation shows how well the model fits the data of interest. The validation results are as presented below.
Call:
lm(formula = wage ~ age + race + education + jobclass, data = Wage)
Residuals:
Min 1Q Median 3Q Max
-107.186 -19.815 -3.531 14.759 224.210
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 60.84551 3.27214 18.595 < 2e-16 ***
age 0.55703 0.05723 9.734 < 2e-16 ***
race2. Black -8.20782 2.23996 -3.664 0.000252 ***
race3. Asian -3.23645 2.73229 -1.185 0.236302
race4. Other -9.11680 5.95949 -1.530 0.126173
education2. HS Grad 10.77690 2.47857 4.348 1.42e-05 ***
education3. Some College 23.08406 2.61947 8.812 < 2e-16 ***
education4. College Grad 37.53070 2.63241 14.257 < 2e-16 ***
education5. Advanced Degree 62.10791 2.89343 21.465 < 2e-16 ***
jobclass2. Information 5.09389 1.38739 3.672 0.000245 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 35.81 on 2990 degrees of freedom
Multiple R-squared: 0.2659, Adjusted R-squared: 0.2637
F-statistic: 120.3 on 9 and 2990 DF, p-value: < 2.2e-16
Appendix
List of Tables
Min. :2003 Min. :18.00 1. Male :3000 1. Never Married: 648
1st Qu.:2004 1st Qu.:33.75 2. Female: 0 2. Married :2074
Median :2006 Median :42.00 3. Widowed : 19
Mean :2006 Mean :42.41 4. Divorced : 204
3rd Qu.:2008 3rd Qu.:51.00 5. Separated : 55
Max. :2009 Max. :80.00
race education region
1. White:2480 1. < HS Grad :268 2. Middle Atlantic :3000
2. Black: 293 2. HS Grad :971 1. New England : 0
3. Asian: 190 3. Some College :650 3. East North Central: 0
4. Other: 37 4. College Grad :685 4. West North Central: 0
5. Advanced Degree:426 5. South Atlantic : 0
6. East South Central: 0
(Other) : 0
jobclass health health_ins logwage
1. Industrial :1544 1. <=Good : 858 1. Yes:2083 Min. :3.000
2. Information:1456 2. >=Very Good:2142 2. No : 917 1st Qu.:4.447
Median :4.653
Mean :4.654
3rd Qu.:4.857
Max. :5.763
wage
Min. : 20.09
1st Qu.: 85.38
Median :104.92
Mean :111.70
3rd Qu.:128.68
Max. :318.34
Call:
lm(formula = wage ~ age + race + education)
Residuals:
Min 1Q Median 3Q Max
-110.952 -19.550 -3.633 14.793 226.643
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 61.62468 3.27206 18.834 < 2e-16 ***
age 0.57108 0.05722 9.981 < 2e-16 ***
race2. Black -7.25262 2.22944 -3.253 0.00115 **
race3. Asian -3.27335 2.73797 -1.196 0.23197
race4. Other -8.92064 5.97167 -1.494 0.13533
education2. HS Grad 11.05757 2.48255 4.454 8.73e-06 ***
education3. Some College 24.00785 2.61280 9.189 < 2e-16 ***
education4. College Grad 39.15413 2.60041 15.057 < 2e-16 ***
education5. Advanced Degree 64.51917 2.82378 22.848 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 35.88 on 2991 degrees of freedom
Multiple R-squared: 0.2626, Adjusted R-squared: 0.2606
F-statistic: 133.1 on 8 and 2991 DF, p-value: < 2.2e-16
1. < HS Grad 0 0 0 0
2. HS Grad 1 0 0 0
3. Some College 0 1 0 0
4. College Grad 0 0 1 0
5. Advanced Degree 0 0 0 1
2. Information
1. Industrial 0
2. Information 1
Table 6