Impact of Education, Employment and Income (Socio-Economic Conditions) on Health: Applied Multivariate Statistics in Public Affairs
Impact of Education, Employment and Income (Socio-Economic Conditions) on Health: Applied Multivariate Statistics in Public Affairs
Introduction
The present study concerns itself with identification and development of a set of predictive factors for state or condition of overall health in a give country or region. With increasing population and strain on natural as well as human resources, various stress and genealogical health issues are emerging as threats to public health. In this context the work done in this study can be pivotal in an international context. As it attempts and paves way for a establishment of an empirical base for the study of causal structure of health conditions.
The present study, examines the various factors and their interactions to better understand the relation between health and socio-economic conditions. The present research attempts to fill in the gap left by descriptive research on the given topic. The paper also addresses the need conduct a comparative analysis of the impact of the socio economic factors on the state of health. There is also a need for this analysis to be ingrained by interplay of factors over a large range of geographical locations. The study endeavors to be an empirical research for bringing about a favorable change in health outcomes. There is usually a need for research to find the causes other than immediate risk through future research. The paper, draws from a huge sample through stratified random sampling and uses correlation, regression and hypotheses tests to validate the hypotheses designed to study the relation between health and various other socio economic factors. The data was collected from the World Values Survey which is an international project of changing attitudes and values around the world.
Theoretical arguments
As a preliminary review of literature suggests, there is a positive relationship between the health and Socio economic indicators, Income being the most important or focal variable. Income is understood to be directly related to the opportunities towards better health, while other factors such as Income and education share a causal relationship to income and in turn Health.
The focal relationship between state of health and income levels is expected to be linear in nature as income is related to almost all other factors and serves an interface between Health and other socio economic conditions. A good education level is expected to result in a good employment condition, a good employment status is likely related to a good income level and the income levels as mentioned above facilitate better health opportunities. Apart from this, each of the above factors seems to be having a direct bearing to some extent over the Health conditions. For instance, women are more susceptible to adverse health conditions owing to social as well as physiological conditions. Education level is likely to have an impact over the awareness levels about the factors affecting one’s health and the propensity to take care of it. A better employment status might mean for example better opportunities for health care provided at workplace.
However, none of the relationships is absolute and there is an interplay of one or more of the above factors in predicting a better health outcome. Also, there might be certain biases that might creep in the data owing to errors in data collection such as missing data due to a failure to administer the questionnaire properly or due to an incomplete understanding of the questions by respondents. Also, the estimation of the variables and their effects on health and each other is fraught with redundancies owing to effects such as multicollinearity
Hypotheses
H1: There is a positive linear relationship between health and Income levels as compared with other socio economic variables versus income as predictor of health
H1o: There is no evidence of a positive linear relationship between health and Income levels as compared with other socio economic variables versus income as predictor of health
H1a: There is a positive linear relationship between health and Income levels as compared with other socio economic variables versus income as predictor of health
H2: There is a positive linear relationship between health and socio economic factors (Income, education and employment)
Given by the function H=B0+B1X1+B2X2+B3X3+U
Where H is the Health, B0 is the constant or value of H when X1, X2 and X3 are zero and X1 to X3 are the predictor variables
H2o: There is no evidence of a positive linear relationship between health and socio economic factors (Income, education and employment)
H2a: There is a positive linear relationship between health and socio economic factors (Income, education and employment)
H3i: There is a significant correlation between Income and Education
H3io: There is no significant correlation between Income and Education
H3ia: There is a significant correlation between Income and Education
H3ii: There is a significant correlation between Income and Employment
H3iio: There is no significant correlation between Income and Employment
H3iia: There is a significant correlation between Income and Employment
H3iii: There is a significant correlation between Education and Employment
H3iiio: There is no significant correlation between Education and Employment
H3iiia: There is a significant correlation between Education and Employment
Review of Literature
As per a health impact assessment conducted by the WHO, many factors combine to form an affective framework for health. As per the assessment the determinants of health includes the social and economic environment as one of the three important pillars of affective health apart from the physical environmental and individual characteristics of people.
Impact of Overall Socio-Economic Conditions on Health
Scholars have researched the importance of socio economic status on health self perception of people from a racial background. As per research, the observed differences in health as per racial differences are markedly reduced when they are adjusted for education. This is especially important with respect to the current study, since the racial backgrounds though not explicitly accounted for in the study, are inevitably and arguably related to the nationalities or national backgrounds of people. This present paper, which studies the impact of various socio economic factors based on a sample drawn from different national backgrounds, does learn from the observations made in the above mentioned study. Existing research on the impact of perceived discrimination as stress factor and a contributor to the stress underlying modern day health issues based on lifestyles has underlined the need for more empirical and focused research on the subject.
Socioeconomic status also denoted as SES involves three major determinants of health, namely healthcare, environmental exposure and health behavior. Additionally, stress related with a low SES also tends to contribute to health deterioration. As suggested by scholars, inequality in education, income and occupation increases the gaps in the health “haves” and have- nots”. Further, scholars have suggested that though socio economic status is quite apparently linked to morbidity and mortality, both of which are acute and severe health conditions in given area, the phenomenon or the factors and the way they interact to form a mechanism is not very well understood. Their research was based on an exploratory and descriptive approach to the review of the relevant research on the topic.
In this context, another study conducted on the influence of the social status seen as social capital and socio economic status on self rated health conducted within the context of an economically and socially backward community in South Africa is notable. As per the study, not much research has gone into the interaction and interplay of the multiple factors that seemingly affect the outcomes of the self rated health of people especially in developing countries. Thus a need for studying the interaction effects among the various variables has been presented and supported by the relevant literature. Further, the study, found that there were no significant correlations between the self rated health and factors such as income and gender. Though the findings of the study are contrary to the assumptions of the present paper, it does point out the need for caution in terms of the choice of variables and the interpretations of the relevant tests conducted on the same. Besides, the difference between a macro and micro level study in terms of the scope and size of the sample also need to be taken into account. The socio economic conditions are likely composed of a range of implicit factors which exercise influence on the overall health perceptions of the population in a given socio-economic and geographic realm.
The impact of socio-economic status on the medical conditions as such have been studied, especially with respect to the age and it has been found that there is ground for further research on the interplay or interaction between socio economic and medical conditions, which directly affect the health of the people in such conditions. This is supported by a study conducted on the older adults in Ghana. The study drew from the population sample of 4770 respondents based on the WHO SAGE conducted in 2007-2008. Although this is a country specific research, the implications for further research in the field of socio-economic determinants of health and their interaction effects needs to be studied further. The need is to do a comparative analysis of the above impact along with interplay of underlying factors on health over a range of geographic locations, the unit of geographic analysis being countries with apparently severe health issues.
Impact of Economic status
Income and employment as indicators of economic status are seen to exert influence on the health indicators in a given region. In this context it is worthwhile to mention a path breaking study that evaluates variations in health on a global level. The study is based on an evaluation of the Wilkinson’s income inequality hypothesis and uses the World Values Survey to conduct a random-coefficient, multilevel modeling to account for a direct testing of the Wilkinson’s Hypothesis . A logical regression analysis of the data has revealed that the countries differ substantially in self rated health, once age and gender are accounted for and income in the sense that poorer people report worse health. However, the macro economic factors indicating income inequality in general do not have a significant relationship with health conditions per se. In fact as per the study, the former communist countries with lesser inequality in income perform worse than others on health indicators than other countries where inequality in income might be more apparent.
Impact of Education
Education has been identified as the most basic components of SES. Education is seen as source of health inequality and not just years of completed education, but also early education also plays an important role in access to and awareness of the health care opportunities available in an area. It has been suggested through exploratory research on the subject, that an empirical research is needed to show the need and extent of change required in bringing about a favorable change in health outcomes in a given region.
As mentioned above the study on influence of the social status seen as social capital and socio economic status on self rated health does not provide much evidence for overall socio - economic impact on health. However, the study does suggest a positive correlation between the self health perceptions and education level, which is one of the key predictors in our study. Thus insofar as effect of education on health is concerned this study provides a literary support.
Impact of Social conditions
Apart from factors such as education, income and occupation, and resultant affordability and accessibility to healthcare, the behavior and lifestyle aspect also find their way into the causal paradigm of health conditions and hence is also seen as an important predictor of health conditions. The behavioral patterns though important contributors of health conditions, are still affected by other socio economic factors to have any major change in health conditions. Thus there is need for further research insofar as the analytical understanding of the relationship between the underlying factors is concerned .
Over a period of last few decades, various epidemiological studies have contributed to identification of risk factors associated with the various major diseases. As suggested by major studies, the social factors such as socio economic status and social support are the major identified causes of diseases. The social factors characterize the access to important resources including the requisite medical care. Evidence that establishes a clear and strong association between diseases and socio economic conditions has been cited by scholars. However, the evidence from the same range of studies has also brought to surface, the various controversies surrounding the same. For instance does lower socio economic status causes poor health cause or does poor health lower the level of socio economic status. Although the above research has focused on the risk factors and basic causes of diseases, it also emphasizes the need to examine causes other than the immediate risk factors through future research. The data for the analysis in the research came from the Detroit Area Study (DTS) in 1995 and the scope of the findings though significant is likely to be localized at a city level no matter how big the sample size, which in this case was 1139. Firstly based on an exploratory study, the researchers came up with various levels of six independent variables. Upon further analysis, many of the levels were found to be ambiguous and thus they were reduced to an unambiguous composition.
Data and Methods
Data Collection
The data for the project has been collected through an online database called world values survey, which is a long running survey of social and political values and beliefs conducted in more than 100 countries around the world. The survey is deemed to be a high quality source of information on changes in attitudes and the present paper has drawn an across the nations sample from the 2010-2014 wave 6 survey data. The data being huge in size is advantageous in terms of its spread and reach, thus the applicability. However, the data is huge consisting of 90,350 observations from around the world. This necessitates drawing a random sample from the huge data, which presents the problems of proper stratification and representation.
Sample
The study area or population for this study is the whole world consisting of over 183 countries. A random 1% sample stratified proportionally according to the observations per country was drawn and the sample size arrived at was n=902 from a huge sample of 90,350.
Variables
The variables chosen to test the hypotheses are as shown below. There are six variables in total. HEALTH is the dependent variable. INCOME is the focal variable, while EMPSTAT, EDULEVEL, EDUAGE AND GENDER are control variables
Data Processing For Analysis
The data was truncated by drawing a 1 percent random sample from a huge set of 90,350 observations from 183 countries around the world. However, proper representation was maintained through stratification obtained by drawing a 1 percent sample from each of the countries represented in the data. In order to maintain ease of use, the data labels were modified and easy and more identifiable names were given to the data as shown in the above table. The new dataset was checked for any missing data entries and there were found to be none.
Statistical Tests
The paper employed a number of tests to assess the hypothesis such as t-test and F-test. Both correlation and regression tests were conducted and strong linear relationships are expected from the tests.
Models
The models used in the study to assess both the predictive ability of the independent variables as well as relationships between the variables themselves and the testing of the various hypotheses as defined in the study were conducted. Pairwise correlation model was used to assess the relationships between the independent and intervening variables.
Both bivariate and multivariate regression models were used to arrive at results crucial to validate the hypothesis. Three iterations of regression models incrementally adding control variables to the focal variable in order to improve the explained to total variation ratio, thus improving the mode by minimizing the residuals were followed.
The basic model of the equations being estimated is as follows
H=B0+B1X1+B2X2+B3X3+U
Where H is the Health, B0 is the constant or value of H when X1, X2 and X3 are zero and X1 to X3 are the predictor variables.
Analysis & Results
Description of Variables
The descriptive statistics for the variables used in the analysis is as shown below.
As shown above, the HEALTH score is 2 on a scale of 1 to 4 with 1 as very good and 4 as poor while the no response and not applicability conditions are denoted from -2 to 0. EMPSTAT has a mean score of 3.2 and SD=2.31 on a scale of -4 to 8. EMPSTAT has the greatest relative SD, while GENDER has the lowest SD since it has only two categories MALE and FEMALE. The standard deviation of the score is 0.91. In other words, among the independent variables, EMPSTAT has the greatest dispersion while EDULEVEL has the least dispersion after GENDER. INCOME has a relatively mid segment dispersion, which contributes to its suitability as a focal variable.
Correlation
In order to gauge the relationships between different variables, pairwise correlation was conducted with significance levels especially the dependent variable and the focal independent variable. The results are tabulated as shown below
Bivariate Regression
The focal independent variable being INCOME let us regress HEALTH on INCOME. The regression results are tabulated as shown below.
The relationship can be expressed as
HEALTH=2.40 + 0.078 INCOME
The coefficient is significant at 5% level and a unit of increase in income is likely to produce an 8% improvement in health if the overall model is significant since HEALTH is inversely defined (that is 1=very good and 4=poor). However, since the value of adjusted R2 is 0.42, a mere 4.2 percent variance is explained by the bivariate model. In order to improve upon the model, we add more variables as shown below.
First Multiple Regression (Model 2)
A regression analysis using the variables HEALTH, INCOME, EMPSTAT, and EDULEVEL is performed next and results are as shown below.
Let us call the above model as Model 2. We examine the relationship of the three predictors INCOME, EMPSTAT and EDULEVEL with the dependent variable, HEALTH. Also, let us find whether they are statistically significant or not.
Model 2 can be expressed as
HEALTH=2.50 + 0.07 INCOME+ 0.0006 EMPSTAT + 0.02 EDULEVEL
INCOME or V239 (b1=-0.070) is statistically significant at 0.05 level (p=0.000). EMPTSTAT i.e. V229 or employment Status (b2=0.00066) nd (p=0.959) is not statistically significant and EDULEVEL (Educational Level) or V248 is (b=-0.026) and (p=0.030) is statistically significant. However, the negative values of the coefficients is owing to the way the variables are defined, wherein a score deemed good corresponds to a poor performance. For example, the dependent variable HEALTH or V11 is defined as shown below.
The other variables in question that is INCOME, EMPSTAT and EDULEVEL are defined as shown below
Thus, the negative relationship between HEALTH and INCOME is in fact to be seen as a positive sign due to the way HEALTH is defined. Thus a negative coefficient (b1=-0.70) can be seen expressed as (b1=0.70), which would mean that one unit of improvement in the income level shall produce a 100-7=93 percent increase in the state of health; besides, the relationship is significant at 5 % level, which means that we can be confident that out of every 100 samples of size n=902 randomly drawn from the population, 95 would contain data which shall be in conformance with the condition that b1 is not equal to zero and that the coefficient b1 shall be a good estimator of population parameter of relationship between HEALTH and INCOME. Similarly, b3=-0.23 can be construed as b3=0.23 and this means that one unit of improvement in EDULEVEL or Level of education produces a 2 percent improvement in HEALTH. The relationship is similarly significant at 5 percent level. Further, since b2=0.00066 and p=0.959, EMPSTAT or Employment status is a very weak predictor of HEALTH. Since the b2 is also not statistically significant this coincides with the very low value of b2 which is close to zero and is more or less likely to be zero by virtue of being insignificant.
Also, in this case INCOME is the focal independent variable as is also evidenced by the greatest ‘t’ value, while the other variables can best be treated as control variables.
Also, as is evident from above tables, there are no missing values in the data.
Model2: Variance
The model 2 is able to explain 4.5 percent variance as opposed to 4.3 percent variance explained in the model 1 (an increase of 0.2 percent). Thus the additional variables EDULEVEL and EMPSTAT did not contribute much to the model.
Test of Model2 Significance
The model 2 can be tested for overall model significance as follows:
Ho: R2pop = 0
Ha: R2pop is not =0
Where R2pop is the coefficient of multiple determinant in the population.
Since F=15.33, a high ratio of explained to unexplained variance, which is also statistically significant (p=0.000), thus the null hypothesis is rejected and it is indicated that the independent variables in the model (INCOME, EDULEVEL and INCOMESTAT do have a systematic association with HEALTH.
Since, the model 2 did not improve much upon model 1 in terms of addition of two variables, EDULEVEL and EMPSTAT, a third model with all the relevant variables adding EDUAGE and GENDER was construed and the HEALTH was regressed on the new set of 5 variables as shown below.
The coefficient of EDUAGE or V249 (b4= -0.007, p=0.009) is significant at .05 level and GENDER or V240 (b5=0.09) is not significant at 0.05 level. This shows that although the unit contribution of EDUAGE is much less than other variables especially GENDER, it is statistically more accurate as an estimator. However, GENDER, though has highest unit contribution to health (meaning more is the male population, lower the score on GENDER and higher the score on health represented by 1 unit increase in number of males corresponds to a 9 percent improvement in HEALTH indicator as a whole.)
Besides adding to the overall predictability of HEALTH, the model 3 has improved in terms of explained variance as adjusted R2 value of model 3 is 0.052 as opposed to 0.045 for model 3. This is an improvement of 0.7 percentage points. Thus, addition of EDUAGE and GENDER have contributed to the model more than EDULEVEL and EMPSTAT and thus can be seen as better predictors of HEALTH the former two.
Based on the above analysis, the hypotheses H1 through H3 can be validated as follows
Hypothesis H1
The results of the descriptive analysis show INCOME to be having a moderate dispersion
The results of correlation analysis show that INCOME has a high and significant correlation with HEALTH ( r= -0.21, p=0.00)
As per regression Model 1, with HEALTH and INCOME as dependent and independent variables respectively the coefficient of INCOME is significant at 5% level.
As per model 2, INCOME is the focal independent variable as is also evidenced by the greatest ‘t’ value, while the other variables can best be treated as control variables.
The regression model 2 is able to explain just 4.5 percent variance as opposed to 4.3 percent of model 1, thus EDULEVEL and EMPSTAT did not contribute much to the relationship.
Based on the above evidence, Ho1 is rejected and thus it is inferred that there is a positive linear relationship between health and Income levels as compared with other socio economic variables versus income as predictor of health.
Hypothesis H2
As per tests of significance of regression model 2, the independent variables in the model (INCOME, EDULEVEL and INCOMESTAT do have a systematic association with HEALTH. Thus, Ho2 is rejected and it is inferred that There is a positive linear relationship between health and socio economic factors (Income, education and employment).
Given by the function H=B0+B1X1+B2X2+B3X3+U
Where H is the Health, B0 is the constant or value of H when X1, X2 and X3 are zero and X1 to X3 are the predictor variables.
Hypothesis H3i
Based on the pairwise correlation test, there is a significant correlation between INCOME and EDCUATION for both EDULEVEL(r= 0.27, p=0.000) and EDUAGE (r=0.13, p=0.0001) at 5% level, thus Ho3i is rejected.
Hypothesis H3ii
Based on the pair wise correlation test, there is no significant correlation between INCOME and EMPSTAT(r= -0.0555, p=0.0958) at 5 % level, thus Ho3ii cannot be rejected.
Hypothesis H3iii
Based on the pair wise correlation test, there is a significant negative correlation between employment characterized by EMPSTAT and EDCUATION for both EDULEVEL(r= 0.1590, p=0.000) and EDUAGE (r=0.0868, p=0.0001), thus Ho3iii is rejected.
Discussion/Conclusion
Based on the robust analysis of data and validation of the hypotheses, it is learnt that irrespective of the composition or interplay of the socio economic factors, it is the income level that finally plays a crucial role as a predictor of the health levels in a country or region. The central hypotheses are supported by the analysis except the relationship between INCOME and EMPSTAT or employment status. This can be attributed to the fact that nowadays the students and housewives as well as people not fully employed are earning huge incomes owing to technology enabled freelancing services. The final regression model 3 suggests that HEALTH as a function of INCOME is supported more by EDUCATION completion AGE LEVEL and GENDER than EDUCATION LEVEL itself and the EMPLOYMENT STATUS. This is probably owing to the fact that the point at which people usually graduate is highly likely to be correlated with taking a up a job or profession as a source of earning and this improves the income levels thus improving the health status substantially. However, the models suggested herein have still have a high level of unexplained variance, which is concern that can be addressed in future studies.
References
Adler, N. E., & Newman, K. (2001, October 4). Socioeconomic Disparities in Health: Pathways and Policies. Retrieved from Health Affairs : http://content.healthaffairs.org/content/21/2/60.full
Cramm, J. M., & Nieboer, A. P. (2011). The influence of social capital and socio-economic conditions on self rated health among residents of an economically and health deprived South African Township. International Journal Equity Health, 10-51.
Heallth Impact assessment: The determinants of Health . (2016). Retrieved from www.who.int: http://www.who.int/hia/evidence/doh/en/
Jen, M., Jones, K., & Johnston, R. (2010). Global variations in health: evaluating Wilkinson's income inequality hypothesis using the World Values Survey. Soc Sci Med, 643-653.
Link, B. J., & Phelan, J. (1995). Social Conditions as Fundamental Causes of Disease. Journal of Health and Social Behavior, 80-94.
Saeed, B. I., Xicang, Z., Yawson, A. E., Nguah, S. B., & Nuamah. (2015). Impact of socioeconomic status and medical conditions on health and healthcare utilization among aging Ghanaians. MBC Public Health, 276.
Williams, D. R., Yu, Y., & Jackson, J. S. (1997). Racial Differences in Physical and Mental Health. Journal of health Pychology, 335-351.