Final Grade: 72/100
This paper is being submitted on January 18, 2017 for Tramaine Ingram’s STA3215 Inferential Statistics and Analytics course.
At the request of the company’s client, we have performed research pertaining to the salary distributions of jobs in the state of Minnesota. The sampling was haphazard (non-statistical. The client is only interested in the jobs that pay salaries of between $40,000 and $120,000. Any job that that falls within this range was included in the analysis. The data set was obtained from the Bureau of Labor Statistics, and consists of a total of 64 records. In the following report, we will detail the job titles and the salary for each job.
I’ll start by classifying the variables of our data set. Quantitative data was collected in the form of yearly pay for each job listed. As I stated above, the quantitative data was collected on jobs paying $40,000 to $120,000 per year. Qualitative data was collected in the form of job titles. 365 different job titles had a yearly pay scale in Minnesota that fit the characteristics of our sampling.
Salary is a continuous variable since it can be measured in decimals. Discrete variables can only be measured in whole numbers. Both discrete and continuous variables are quantitative variables. Thus, the job title is neither a discrete nor continuous variable since it is a qualitative variable.
The job title is a nominal level of measurement since letters are used to distinguish the data. In this case, there is no ordering of data (Graham, 2008). For instance, Accountants and Auditors are the first in the list, but it does not mean it is more superior to the other jobs. Salary, on the other hand, is measured on an ordinal level. The salaries for the different jobs can be ranked from the lowest to the highest.
They are summary measures describing a data set with a single value the represents the middle of the distribution (Graham, 2008). They help in understanding and analyzing data as well as making inferences about the population. For instance, it is not possible to describe the salaries of all the 364 job titles one by one, but it is easier through a measure of center like the mean, median, among other measures.
These measures show the extent to which values in a data set deviate from the mean or any other measure of central tendency (Graham, 2008). They help in understanding the distribution of values in a data set.
Descriptive statistics
As shown above, the mean is $62,306.13 implying that the average salary for the 364 jobs in Minnesota is $62,306.13. The mode is $46,100 indicating that more jobs are paying $46,100 than any other salary. The median is $56,520 showing that the middle salary for the 364 jobs is $56,520. The sample variance is 3.67E+08 and the standard deviation is 19149.21. This means that the salaries of the 364 jobs deviated by an average of 19149.21 of the mean salary. The range is $79,680 implying that the difference between the highest salary and lowest salary is $79,680. The midrange is $80,010 implying that the average between the highest and lowest salaries is $80,010.
In conclusion, the average salary is $62,306.13 while the median salary is $56,520 implying that salaries are positively skewed. This implies that the most jobs had salaries below the mean salary, but a few jobs had higher salaries that pushed the mean above the median. There is also a large gap between the lowest salary and the highest salary as shown by the range.
Importance of Confidence intervals
The confidence interval is a set or range of values that a population parameter is likely to fall within (Graham, 2008). It is calculated using the sample statistics and a given confidence level. For instance, a confidence interval of the mean calculated using the sample statistics, gives the lowest and highest possible value of the population mean. The population mean is expected to be within the confidence interval.
A point estimate is a single value that approximates a population parameter (Graham, 2008). For instance, a sample mean is point estimate of the population mean, while a sample proportion is the point estimate of the population proportion. The point estimate is used in estimating the confidence interval. The confidence interval is given by the point estimate plus/minus the margin of error.
The sample mean is the most suitable point estimate of the mean of the population since it is not biased. Besides, the sample mean has the least variation from the population mean. The above features make the sample mean the most suitable point estimate of the population average.
Confidence intervals are important in statistics. In most cases, statistical analysis is based on samples since it would be expensive and time-consuming to analyze an entire population. The confidence interval helps in estimating the population parameters using the sample statistics. It also reduces the probability of errors in estimating population parameters. Using a single value for an estimate of the population parameter is likely to be inaccurate than using a range of values. Determining the exact value of a population parameter using the sample statistics is impossible.
95% confidence interval
The confidence interval is given by the point estimate plus the margin of error. Determining the margin of error requires the choice of a test statistic and the confidence level. The Z-statistic is suitable for normally distributed data. The distribution usually tends to be normal if the sample size is greater than 30. The t-statistic is used when the population standard deviation is unknown. In this case, salaries in Minnesota are normally distributed but the population standard deviation is not known hence we will use the t-statistic.
Confidence interval = X ± t0.05/2 ×Sn
X = sample mean
S = sample standard deviation
n = sample size.
Sample mean, X = $62,306.13
Sample standard deviation, s = 19149.21
Sample size, n = 364
Degrees of freedom = n – 1 = 364 -1 = 365
t-score for df = 363 and alpha of 0.05 (two-tailed) = 1.96652062
Confidence interval = 62,306.13 ± 1.96652062 × 19,149.21364
= 62,306.13 ± 1973.78
Upper limit = 62,306.13 + 1973.78 = $64,279.91
Lower limit = 62,306.13 - 1973.78 = $60,332.35
Confidence interval: ($60,332.35 < µ < $64,279.91)
The above confidence interval indicates that there is a 95% chance that the average salary for jobs is Minnesota is between $60,332.35 and $64,279.91.
99% confidence interval
Confidence interval = X ± t ×Sn
Sample mean, X = $62,306.13
Sample standard deviation, s = 19149.21
Sample size, n = 364
Degrees of freedom = 364 – 1 = 363
t-score for df 363 and alpha 0.01 (two-tailed) = 2.58944082
Confidence interval = 62,306.13 ± 2.58944082 × 19,149.21364
= 62,306.13 ± 2,599.00
Upper limit = 62,306.13 + 2,599.00 = $64,905.13
Lower limit = 62,306.13 - 2,599.00 = $59,707.13
Confidence interval: ($59,707.13< µ < $64,905.13)
The above confidence interval indicates that there is a 95% chance that the average salary for jobs is Minnesota is between $59,707.13 and $64,905.13.
Comparing 95% and 99% confidence intervals
As shown above, the 95% confidence interval is narrower than the 99% confidence interval. Thus, the confidence interval is wider for a higher confidence level than for a lower confidence level. If a higher confidence level is required, the confidence interval must be wider.
References
Graham, A. (2008). Statistics (1st ed.). Blacklick, OH: McGraw-Hill.