In this assignment we will describe and discuss the application of statistics and probability theory to a real world problem. Our goal is to explore the given data of the results of XRT test and provide readers with the appropriate descriptive statistics and statistical inference. According to the XRT test manual, the XRT© test of intelligence has been designed to measure “fluid intelligence in a way that allows the test user to make excellent predictions about the future job performance of the test taker”. In this paper we should examine the publisher’s claim that the test is free from adverse impact and has superb test-re-test reliability. The test has 50 multiple choice questions each with only one correct answer (maximum score = 50).
We are given with the data set of 72 observations that represent information from 72 job applicants. The following variables reflect the characteristics of these applicants:
REF: a unique identifier for each participant
GENDER: Indicates whether the participant was male (coded as 1) or female (coded as 2).
XRT1: the score obtained (number of correct answers) on the first administration of the XRT test.
XRT2: the score obtained (number of correct answers) on the second administration of the XRT test. Note: this second test was taken only by a small number of volunteers from the participant group and hence there is not data for this variable for everyone. This second administration was carried out at the end of the selection process and the results were not used in the selection decision-making.
JOBPERF: An overall rating of job performance by the participant’s manager (0=poor to 10 = excellent). These data are only provided for those participants who were offered the job and remain in the company.
The research is started with descriptive statistics. We have data of XRT scores for all 72 participants who passed this test on the first administration. Only 44 of them have passed this test on the second administration. Of those 72 applicants, only 25 were offered the job and remain in the company. The mean value of XRT scores on the first administration is 28.3333 with a standard deviation of 12.46855.
The overall characteristics of the distributions of the variables are given on histograms (see appendix). The distribution of “gender” variable is binomial – there are 32 males and 40 females participating in the study. The distributions of “XRT1” and “XRT2” variables are negatively skewed – these distributions are visually similar. The distribution of “Jobref” variable is symmetric and it is quite close to a normal distribution.
The first problem that requires statistical inference in this assignment is to examine the test-re-test reliability of the XRT test. In order to do this, we formulate a pair of hypotheses.
Null hypothesis: there is no significant difference in mean scores of XRT test between the first and the second administrations.
Alternative hypothesis: there is a significant difference in mean scores of XRT test between the first and the second administrations.
Set the level of significance at 0.05.
We are observing the same applicants performing test two times, that is why we use paired t-test to examine the difference. Student’s paired t-test indicated insignificant difference between mean scores of the two tests (t=1.661, p=0.104). The null hypothesis is not rejected. The test is reliable at 5% level of significance.
The next problem is related to the comparison of the scores males and females obtained on the first administration of XRT test. Depending on gender, the XRT scores on the first administration are with the mean value of 30.3125 with a standard deviation of 10.80453 for males and 26.75 with 13.58119 respectively for females. The pair of hypotheses to test this claim is as follows:
Null hypothesis: there is no significant difference in mean scores of XRT test on the first administration between males and females.
Alternative hypothesis: there is a significant difference in mean scores of XRT test on the first administration between males and females.
Set the level of significance at 0.05.
Since observations are not dependent from each other (male and female applicants), we use two-sample Student’s t-test for independent samples. The results of the test indicated insignificant difference in the mean XRT scores between males and females (t=1.209, p=0.231). The null hypothesis is not rejected. XRT test scores are not affected by gender factor at 5% level of significance (Statistics.laerd.com, 2016).
The last problem covered in this report is the significance and strength of the link between scores on the first administration of the XRT test and subsequent job performance. We develop a scatter diagram of XRT1 by Jobref in order to look on the location of the points. It seems that there might be linear relationship between the variables. Assuming that the data is normally distributed, we examine the issue with the use of Pearson’s correlation coefficient.
Pearson’s r indicated insignificant positive weak relationship between the variables (r=0.328, p=0.11). At 5% level of significance we cannot conclude that there is a relationship between scores on the first administration of the XRT test and subsequent job performance (Statistics.laerd.com, 2016).
In this paper we have considered the problem of test-re-test reliability of the XRT test of intelligence. There were 72 participants, 32 males and 40 females. The test-re-test reliability was examined by using two-sample Student’s t-test for independent samples. Two samples were two different results of XRT test – scores on the first administration and scores on the second administration. The impact of gender variable was examined only on results of the first administration of XRT test. In order to provide a more rigorous examination of the test-re-test reliability, I would like to suggest some additional analysis. For example, it is possible to compare the results of XRT on the first administration and XRT on the second administration separately for males and for females. This will show whether the test is stable for men and women separately. Next, we have evaluated the impact of overall rating of job performance by the participant’s manager on the XRT test results on the first administration. It is appeared that the association between these variables is insignificant. However, there was no analysis provided about the association of “Jobref” variable and the results of XRT test on the second administration. Moreover, not all applicants were offered the job and remain in the company. It is possible that XRT1 and XRT2 scores are significantly different between those who were employed and others. Finally, we have tested only linear association between Jobref and XRT1, but the characteristic of the relationship may be different. It may be useful to test the significance of polynomial, exponential or another type of relationship between the given variables.
Another suggestion is referred to the procedure of statistical inference. In this paper, we used parametric statistical test (Student’s t-test for independent samples). However, this test is appropriate to use only if samples are approximately normally distributed. This was assumed, however, the assumption was not checked. The issue with distribution can be resolved in two possible ways. The practice of studying random events shows that although the results of individual observations (that were even conducted under the same conditions) may be very different, the average results for a sufficiently large number of observations are stable and slightly depend on the outcome of individual cases. The theoretical justification for this remarkable property of random phenomena is the law of large numbers. The general sense of the law of large numbers is that the combined effect of large number of random factors leads to a result that is almost independent of individual cases. The theorem suggests that whenever a random variable is formed by the summation of a large number of independent random variables, variance is small compared with the variance of the sum. The distribution of the random variable is approximately normal. Hence, in order to resolve the issue with distribution, we would suggest increasing the sample size.
If the sample size cannot be increased, there is another way to overcome this problem. In this case it is possible to use a non-parametric analogue of Student’s test that does not require normality. This non-parametric test may be Mann-Whitney U Test.
Appendix
References
Statistics.laerd.com,. (2016). Independent T-Test in SPSS Statistics - Procedure, output and interpretation of the output using a relevant example | Laerd Statistics. Retrieved 22 January 2016, from https://statistics.laerd.com/spss-tutorials/independent-t-test-using-spss-statistics.php
Statistics.laerd.com,. (2016). Pearson's Product-Moment Correlation in SPSS Statistics - Procedure, assumptions, and output using a relevant example.. Retrieved 22 January 2016, from https://statistics.laerd.com/spss-tutorials/pearsons-product-moment-correlation-using-spss-statistics.php