The data shows positive skewness in all the three categories. This indicated that most of the data lies on the mean. This implies that most of the people interviewed indicated that the number of married people many married people are happier than the unmarried couples.
In regards to the happiness of a relationship with partner, the highest percentage of 5.2% indicated that they were very happy in their marriages, 3.1% of the respondents indicated that they were pretty happy with their marriages while the rests that represented 0.2 percentage indicated that they were not happy with their marriages.
The data survey involved 1974 correspondents of whom 1806 (91.5%) were missing in the final data analysis. This implies that more than 90% of the information collected from the surveyed individuals was not included in the final data analysis. The missing data would reduce the overall representativeness of the data sample and thus it would have an effect of distorting the inferences of the population leading to wrong conclusions about the number of people who are happy, unhappy and not happy with their marriages. To improve the reliability of the analysis, the researcher is supposed to employ more statistical methods of data analysis that are very robust to missingness of data. In this technique, the analysis would be robust if there is enough confidence that the moderate violations of the techniques assumptions would produce little or no bias and thus no distortions about the final data.
However, the fact that 91.5% of the data was missing implies that the results so obtained in the above data analysis would not be a true representative of the population. This is because there is higher distortion and thus the results are inconsistent. In such a situation, the researcher will have to review the data analysis to ensure there is consistency and thus lead to true values of the data being analysis, inference and correct conclusions being drawn.
The above data analysis, 55.4% of the respondents indicated that they strongly agreed that they were very happy in their marriages. 30.0% of the correspondents indicated that they did not strongly agree that they were very happy in their marriage while 14% of the respondents agreed that they were happy in their marriage.
However, there is still high level of inconsistency on these results due to the fact that there was a large number of missing data and the results may be unreliable to large extend due o the same factors.
The confidence interval
The confidence measures the level of reliance over which the values obtained from the data analysis could be relied
It’s given by (X- µ)/S.d< U< (X- µ)/s.d
(40.274-39.38)/15.54< U< (41.17-40.274)/15.54
(0.0575, 0.0765)
The confidence level for the data is lies between 5.75% and 7.65%. This implies the data can be used with the certainty that the results obtained are between 5.75% and 7.65% correct. The low confidence interval is due to the fact that the data had huge data omissions and thus its reliability to give fair reflection of the actual data is very low.
Age and earnings
The frequency distribution table that shows the frequency of age and distribution of earnings
Race and earnings
The frequency table showing the distribution of earnings and race of a person
Strength of affiliation and age
The frequency distribution table that shows the strength of affiliation and age of individuals
Part B:
1. Describe the relationship between respondent’s personal (earnings) income and sex. Be thorough by discussing the test of significance, measure of association, and comparison of frequency distributions
It is evident that for those that are in high school, females were earning more than males. This is evident by the percentage of 30.3% for female and 29.5% for male. Looking into those that are in some college category, males earned higher as compared to females at 25.8% and 21.7% respectively. It was also noted that in college graduates, males still dominated in the earnings and hence there was difference where males had 9.8% and females had 4.6%. This means that in the overall, male’s earnings were more as compared to females, given each level of education except at the high school level. This was tested at 0.05 level of significance where there is a relationship that showed there was no significant difference in the level of earning and the sex of an individual. According to Himmelfarb (1975), using the level of education as a measure of association, it is important to note that there is a significant difference that shows the link between earnings and gender. In this value, we find out that there is an unequal frequency distribution and hence, this shows that in the relationship that exists, there are differences that show the variation between sex and income.
2. From the crosstabs printout, thoroughly discuss what happens to the relationship between respondent’s income and sex when you control for educational attainment. Describe the relationships using the test of statistical significance, appropriate measure of association and comparative frequency distributions. Based on the elaboration model, what is this an example of and tell why? Support your answer.
When a control of the educational attainment is done, this shows that there is a significant change in the income and the sex. This is because, as it has been observed, males are earning more than females in most of the levels of education. The implication is that this means there is a certain change that has been observed and it is significant to the variation especially when focusing on the given level of significance. Further, using eta ad the control variable, there is a relationship that has been noted as the variation from high school is high at .137 to .181, then, there is another variation that shows that .120 and .307 in the some college category and finally, .021 to .151 in the college graduate. This means that there are more job openings at the high school level as compared to the college graduate and this favors the females mostly. This is an explanation type of data because, the Relationships in all partial cross tabulations or correlations are weaker in comparison to the relationship in the original ones. Consequently, the type of elaboration model here is one that applies Suppressor Variable because there is a weak relationship between variables in the original relationship; a third variable (educational attainments) may be suppressing it. In essence, the educational variable would be the key to the zeroorder relationship. Partial tables are stronger than the original relationship (Lubin, 19671).
3. Run Pearson’s correlations for TV hours, age, educ, realrinc, race2, and sex1. Your dependent variable is TV hours. Describe the relationship or non-relationship for each bivariate correlation. Which correlation exhibits the strongest relationship and speculates why this is the case? For the correlation with the strongest relationship, compute the coefficient of determination and describe what it means.
The main important point to note is that there is a significant correlation at 0.01 levels of significant and also at 0.05 levels of significance. The strongest correlation with the strongest relationship is that of the hours spent watching TV and the highest yeast of school completed this is using a two tailed test. It shows that there more respondents tend to spend more time watching TV. This is because, there is a high variation from the negative at -.197 and hence the speculation that this is the highest correlation that exists. The coefficient of determination is 0.53 this means that it is an acceptable correlation and hence it will be applicable in the data (Davis, 1969).
4. Run Spearman’s correlations for socbar (spend evening at a bar), degree (not degree1), age1, sex1, rincom06, and race2. Your dependent variable is socbar. Describe the relationship or non-relationship for each bivariate correlation. Which correlation exhibits the strongest relationship and speculates why this is the case?
There is a relationship that exists given the level of significant at 0.01 on a two tailed test. This means that spending time at the bar, the highest degree, the sex, the race, age of the respondent and the income are factors that will determine the overall time spent at the bar in the evening (Davidson, 1972). Those with more income tend to spend more time at the bar, more males spend time at the bar as well and also, the race a least determinant of the time spent at the bar. The strongest relationship is sex and the time spent at the bar, this is because, there is a 1 correlation value and hence, there is an equal distribution. In the table below, it shows that, the relationship of each variable to the dependent variable is that in some of the independent variables, there are values that show a significant variation from the dependent variable.
References
Davidson, M. L. 1972. Univariate versus multivariate tests in repeated measures experiments. Psychological Bulletin, 77, 4446-452.
Davis, D. J. 1969. Flexibility and power in comparisons among means. Psychological Bulletin, 71, 441-444.
Himmelfarb, S. 1975. What do you do when the control group doesn't fit into the factorial design? Psychological Bulletin, 82, 363-368.
Lubin, A. 1961. The interpretation of significant interaction. Educational and Psychological Measurement, 21, 807-817.