Abstract
Annual wages of the employees are studied for their relation with several dependent variables to identify the most factors most impacting the variation in the annual wages. This paper attempts to analyze and establish the relationship of annual wages with the number of years of experience. Annual wages are considered to be a function of number of years of experience and there’s assumed to be a positive correlation. This paper tests the relationship using linear regression analysis method between the two variables. Annual wages is considered as the dependent variable and number of years of employment is taken as the independent variable.
1. Introduction
With the increase in the number of years of experience, the salary earned by professionals usually demonstrates an upward trend. Some notions exist in the professional world without support from the data and the aim of statistical research is to either accept or reject these notions. Several studies have been conducted to test the association of wages, hourly or annual with variables like education of the individual, number of years of education, gender, occupational status and race (Earle, 2010; Zhang, 2004)) . The direct relationship between annual wages and number of years of experience has not been analyzed though there are rational viewpoints of them being intrinsically related.
This relationship has not being tested for positive correlation through research and nor there exists any evidence of the positive relationship between these two factors. Therefore, this research focuses on two variables, annual wage in dollars and number of years of experience. The natural expectation is a positive relationship between the variables, i.e., as the years of experience increases the wage also increases. However, the results may yield no relationship at all. To assess the relationship, the variables have to be tested statistically.
Linear regression can be used to assess the annual wages by developing annual wages as the function of number of years of experience. The paper analyses the nature and degree of relationship between two variables, annual wage and years of experience statistically, by using linear regression analysis. There are statistical packages available to perform the linear regression, however this research paper populates the graph and descriptive tables on MS-excel as excel has the required tools.
2. Purpose Statement
The research question to be addressed in this problem is, whether the given data support the claim that there exists a positive liner relationship between annual wage in dollars and the number of years of experience and hence the variations in the annual wages in dollars can be explained by the variations in the number of years of experience.
3. Model Development
Dependent Variable – annual wages is the dependent variable. It has been considered as dependent variable because the salary of the individual is the resultant of several other components and is based on the job performance, role and duties of an individual.
Independent Variable – number of years of employment is considered as the independent variable which implies the number of years for which an individual has performed job. It is an independent variable in this research paper because years of employment are not restricted because of the annual wages.
4. Data Collection
The data used for developing the regression model is taken from the datasets for statistical studies. The data for annual wages and number of years of experience comprises data of 100 professionals. The data was collected on two categories; annual wages and number of years of experience. The data is the cross-sectional data of the annual wages along with independent variable. The data used for this study is presented in Annexure-1.
5. Methodology
In order to address the research question, simple linear regression model of the form Y = α + β X is populated, where Y denotes the annual wage in dollars and X denotes the number of years of experience. Simple linear regression describes the linear relation between two variables, one independent and one dependent. More than one predictor variable is required is very often required in practical forecast problems. Essentially, simple linear regression seeks to summarize the relationship between dependent and independent variable represented graphically in the scatter plot, using a single straight line. The regression procedure chooses that line producing the least error for predictions of dependent variable given observations of independent variable (Weiers, 2010).
The explanatory power and the statistical significance of the estimated model is determined. The explanatory power of the model can be examined by using the coefficient of determination (R2) and the statistical significance by using either F test or t- test. The relation between more experienced and less experienced workers would be established using t-test for independent sample.
5.1 Hypothesis
The null hypothesis to be tested is that there is no statistically significant linear relationship between annual wages in dollars and the number of years of experience against the alternative hypothesis that there exists statistically significant linear relationship between annual wage in dollars and the number of years of experience.
Since we are dealing with a simple linear regression analysis problem, this hypothesis can be tested either by testing the statistical significance of R2 or by testing the statistical significance of β. Thus two equivalent statistical tests can be performed.
Test 1: H0: R2 = 0 against H1: R2 > 0
This hypothesis can be tested by using F statistic.
Test 2: H0: β = 0 against H1: β > 0
This hypothesis can be tested by using t statistic.
H0: more experienced workers do not get more wages
H1: More experienced workers get more wages
6. Results
Data on independent variable and dependent variable is presented graphically with the scatter plot diagram
Descriptive Statistics of the Linear Regression
Annual Wage in dollars (Y)
Years of Experience(X)
Mean
30833.46
20.38
Standard Error
1694.71
1.35
Standard Deviation
16947.10
13.55
Sample Variance
287204105.89
183.59
Kurtosis
1.66
-0.82
Skewness
1.34
0.48
Range
73722.00
54.00
Minimum
9879.00
0.00
Maximum
83601.00
54.00
Sum
3083346.00
2038.00
Count
100.00
100.00
Regression Results:
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.0705
R Square
0.0050
Adjusted R Square
-0.0052
Standard Error
16990.91
Observations
100
ANOVA
df
SS
MS
F
ANOVA
+++++*+++
Regression
1
141474986.11
141474986.11
0.49
0.49
ANOVA
+++
98
28291731496.73
288691137.72
ANOVA
99
28433206482.84
ANOVA
+++++++
Standard Error
t Stat
P-value
Intercept
29035.42
3079.61
9.43
0.00
ANOVA
+++++++++*+++++++
88.23
126.03
0.70
0.49
ANOVA
++++++++++
ANOVA
++++++++++*+++
Variable 1
Variable 2
Mean
27.14705882
5.870967742
ANOVA
+++
122.1273047
8.182795699
Observations
68
31
ANOVA
++++++++++
86.88673487
ANOVA
++++++++++++++*++++++++
0
df
97
ANOVA
+
10.53253251
ANOVA
+++++++++++
4.76537E-18
ANOVA
++++++++++++++
1.660714611
ANOVA
+++++++++++
9.53073E-18
ANOVA
++++++++++++++
1.984723136
7. Discussion
ANOVA
++++++++++++++++++++
ANOVA
++++++++++++++++++++
Explanatory Power: The explanatory power of a regression model is examined by using the estimated R-square value. For the present problem, R-square = 0.005. This means that only 0.5 percent variations in the annual wage can be explained by the variations in the number of years of experience. Thus, the explanatory power of the model is negligibly small. Practically one can say that the model is not suitable for explaining the variations in the annual wages.
Statistical Significance: The statistical significance of the model can be examined either by testing the statistical significance of R-square (test 1) or by testing the statistical significance of β (test 2)
ANOVA
+++++++++++++++++++++*+++++
The hypothesis can be tested by using the F statistic is given by.
This statistic follows F-distribution with 1 and 98 degrees of freedom.
Inference: The data do not support the claim that there exist statistically significant linear relationship between annual wages in dollars and the number of years of experience.
ANOVA
++++++++++++++++++++++*++++
The hypothesis can be tested by using t-statistic given by.
This statistic follows student’s t distribution with 98 degrees of freedom.
Inference: The data do not support the claim that there exist statistically significant positive linear relationship between annual wages and number of years of experience.
The result of t test for independent variables to identify that more experienced workers get more wages than less experienced workers or not at 0.05 significance level rejects the null hypothesis with p value for two tailed test as 9.53 x 10-18. P value is lower than 0.05 and therefore, null hypothesis is rejected and alternate hypothesis that more experienced workers get more annual wages is accepted.
ANOVA
+++++
It is natural to expect a positive relationship between annual wages and number of years of experience. In order to examine this claim a simple linear regression analysis is done on a sample data consisting of 100 observations on the annual wages and number of years of experience. The model is examined in terms of explanatory power and the statistical significance and found not satisfactory. The R-square value of the model is obtained as 0.005. This means that explanatory power of the model is negligibly small (0.5% only). Based on both F and t test, we arrived at the conclusion that the model is not statistically significant. Thus the overall conclusion is that the present data do not support the claim of existence of positive linear relationship between annual wages in dollars and the number of years of experience. The t-test for independent sample established the fact that annual wages of the employees is dependent on the number of years of experience, however the regression analysis and z test failed to established the linearity of the relationship between annual wages and number of years of experience.
ANOVA
+++++
Earle, D. (2010). Skills, qualifications, experience and the distribution of wages. ALL .
Weiers, R. M. (2010). Introduction to Business Statistics. Mason, OH: Cengage Learning.
Zhang, Y. (2004). Wage Data Analysis.
Appendices
Appendix 1
Annual Wage in dollars (Y)
Years of Experience(X)
$83,601.00
18
$83,569.00
29
$83,443.00
5
$75,165.00
12
$68,573.00
14
$66,738.00
29
$60,626.00
7
$60,152.00
38
$57,623.00
31
$55,777.00
21
$52,762.00
7
$50,235.00
12
$50,187.00
24
$50,171.00
39
$49,974.00
26
$49,898.00
33
$46,646.00
44
$45,976.00
43
$44,543.00
10
$41,780.00
9
$39,888.00
5
$37,771.00
5
$37,664.00
19
$36,178.00
40
$35,185.00
12
$34,746.00
15
$34,484.00
28
$33,959.00
26
$33,498.00
20
$33,461.00
7
$33,411.00
20
$33,389.00
22
$33,351.00
4
$32,786.00
37
$32,235.00
38
$32,138.00
22
$32,094.00
14
$31,799.00
25
$31,702.00
39
$31,691.00
13
$31,304.00
26
$30,308.00
10
$30,133.00
27
$30,006.00
27
$29,977.00
6
$29,809.00
29
$29,736.00
47
$29,407.00
19
$29,390.00
18
$29,191.00
9
$28,440.00
24
$28,219.00
12
$28,168.00
17
$26,820.00
33
$26,795.00
44
$26,614.00
19
$25,670.00
8
$25,166.00
10
$24,509.00
15
$23,027.00
34
$22,485.00
22
$22,133.00
10
$21,994.00
24
$21,716.00
11
$20,852.00
38
$20,852.00
1
$20,793.00
6
$19,981.00
54
$19,452.00
3
$19,388.00
45
$19,306.00
34
$19,284.00
3
$19,227.00
15
$18,752.00
45
$18,121.00
18
$17,694.00
38
$17,690.00
14
$17,626.00
45
$16,817.00
26
$16,796.00
14
$16,789.00
6
$16,667.00
4
$15,957.00
10
$15,234.00
4
$15,193.00
15
$15,160.00
45
$15,013.00
21
$14,476.00
3
$13,787.00
4
$13,481.00
7
$13,318.00
25
$13,312.00
9
$13,162.00
6
$12,285.00
42
$11,780.00
33
$11,702.00
6
$11,451.00
8
$11,186.00
0
$10,997.00
0
$9,879.00
28
More Experienced (X)
Less Experienced
18
5
29
7
12
7
14
10
29
9
38
5
31
5
21
7
12
4
24
10
39
6
26
9
33
8
44
10
43
1
19
6
40
3
12
3
15
6
28
4
26
10
20
4
20
3
22
4
37
7
38
9
22
6
14
6
25
8
39
0
13
0
26
27
27
29
47
19
18
24
12
17
33
44
19
15
34
22
24
11
38
54
45
34
15
45
18
38
14
45
26
14
15
45
21
25
42
33
28
Appendix 2
Summary Statistics
Annual Wage in dollars (Y)
Years of Experience(X)
Mean
30833.46
Mean
20.38
Standard Error
1694.709727
Standard Error
1.354959
Median
28815.5
Median
18.5
Mode
20852
Mode
10
Standard Deviation
16947.09727
Standard Deviation
13.54959
Sample Variance
287204105.9
Sample Variance
183.5915
Kurtosis
1.659685446
Kurtosis
-0.81761
Skewness
1.340025731
Skewness
0.481551
Range
73722
Range
54
Minimum
9879
Minimum
0
Maximum
83601
Maximum
54
Sum
3083346
Sum
2038
Count
100
Count
100
Regression Results
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.0705
R Square
0.0050
Adjusted R Square
-0.0052
Standard Error
16990.9134
Observations
100.0000
ANOVA
df
SS
MS
F
Significance F
Regression
1
141474986.11
141474986.11
0.49
0.49
Residual
98
28291731496.73
288691137.72
Total
99
28433206482.84
Coefficients
Standard Error
t Stat
P-value
Lower 95%
Upper 95%
Lower 95.0%
Upper 95.0%
Intercept
29035.42
3079.61
9.43
0.00
22924.02
35146.81
22924.02
35146.81
Years of Experience(X)
88.23
126.03
0.70
0.49
-161.88
338.33
-161.88
338.33
t-Test: Two-Sample Assuming Equal Variances
Variable 1
Variable 2
Mean
27.14705882
5.870967742
Variance
122.1273047
8.182795699
Observations
68
31
Pooled Variance
86.88673487
Hypothesized Mean Difference
0
df
97
t Stat
10.53253251
P(T<=t) one-tail 4.76537E-18 t Critical one-tail 1.660714611 P(T<=t) two-tail 9.53073E-18 t Critical two-tail 1.984723136