We have to note the shape of distribution of the data of 25 highest paid women.
We start from descriptive statistics and frequency distribution table. Minitab 16 Statistical Software is used.
Descriptive Statistics: Compensation ($ million)
Variable N N* Mean SE Mean StDev Minimum Q1 Median
Compensation ($ million) 25 0 13,85 1,44 7,19 8,90 10,05 11,60
Variable Q3 Maximum
Compensation ($ million) 14,85 38,60
Tally for Discrete Variables: Compensation ($ million)
Compensation
($ million) Count
8,9 2
9,1 1
9,4 1
9,6 1
10,0 1
10,1 2
10,5 1
11,0 1
11,1 1
11,5 1
11,6 1
11,8 1
11,9 1
12,4 1
12,8 1
14,7 1
14,8 1
14,9 1
15,7 1
16,3 1
16,4 1
34,1 1
38,6 1
N= 25
It seems that the data is not normally distributed. There are some outliers with the compensation higher than $30 millions. Now we do stem-and-leaf plot:
Stem-and-Leaf Display: Compensation ($ million)
Stem-and-leaf of Compensation ($ million) N = 25
Leaf Unit = 0,10
2 8 99
5 9 146
9 10 0115
(6) 11 015689
10 12 48
8 13
8 14 789
5 15 7
4 16 34
HI 341; 386
The last step of our calculations is to find mean, median and standard deviation of the sample by hand.
We know that the mean is the average value in a set, so:
Mx=125*38.6+34.1++8.9=13.85
Median value is the middle value of the ranged data set. So it is the 13th value in ranged data set. And this value is $11.6 million.
σ=125*[38.6-13.852+34.1-13.852++8.9-13.852]≈7.19
As a conclusion we can say that the data set is not normally distributed. It is quite dispersed, because the standard deviation is big. We see 2 outliers – the richest women in a sample (first and second positions). If we remove these outliers, the data is a little bit “better”:
but still quite dispersed.