Statistics in Medicine is one of the tools for analyzing experimental data and clinical observations, as well as the language with which to communicate the mathematical results. However, this is not the only task of statistics in medicine. Mathematical apparatus is widely used for diagnostic purposes, solving classification problems and finding new patterns for the formulation of new scientific hypotheses. Using statistical programs requires knowledge of basic methods and statistical analysis phases: their sequence, necessity and sufficiency. The proposed statement of the main emphasis is not on a detailed understanding of the formulas that make up the statistical methods, and in their essence and rules of application. Statistical processing of medical research is based on the principle that the right of the random sample is true for the general population (population), from which this sample was obtained. However, select or type a truly random sample of the population is practically very difficult. Therefore, you should strive to ensure that the sample was representative with respect to the target population, ie adequately reflect all possible aspects of the studied condition or disease in the population, aided by clearly articulating goals and strict adherence to inclusion and exclusion criteria as in the study and in the statistical analysis.
Statistical data can be represented as a quantitative (continuous or discrete numerical) and qualitative (categorical or ordinal nominal) variables. Should clearly indicate the type (type) variable when the database and to stick to the selected data type, as may depend on further processing variables in many currently used statistical programs. For example, you can not simultaneously make a column variables and numeric and text, even in the sense of similar data: if filling of the " yes / no " as a 1 or 0, it does not make letter abbreviations, and vice versa.
Important task of statistical data processing of medical research is to identify the relationship between samples. For example, to examine the relationship of concentration of cholesterol in the blood plasma and erythrocyte deformability, an analysis of relationship between the level of cholesterol in the blood with age. To assess the degree of correlation using the correlation coefficient.
The correlation coefficient (r) - a parameter that characterizes the degree of linear relationship between two samples. It shows to what extent the change in the value of one feature is accompanied by a change in the value of another attribute in this sample.
The correlation coefficient takes a value in the range -1< = r <= 1.
If r = -1, then in this case there is a strict linear dependence between samples.
If r = 1, then there is a strong linear relationship between samples.
If r = 0, the linear relationship between the two samples does not exist.
In practice, the correlation coefficient takes on some intermediate value. Evaluate the correlation between the depth values based on the following criteria:
0,0 <| r | <0,4
- A linear relationship between the parameters could not be found;
0,3 <| r | <0,6
- The relationship between the parameters of a fair;
0,6 ≤ | r | <0,8
- There is a linear relationship between the parameters;
0,8 ≤ | r | <0,95
- A strong relationship between the parameters;
0,95 <| r | ≤ 1,0
- Relationship between the parameters is very strong, almost linear dependence
Sign (plus or minus) with a correlation coefficient indicates the direction of communication. With a negative correlation coefficient show an inverse correlation (the higher the value of one feature, the smaller the second feature), with a positive sign of the coefficient of correlation connection is direct (the higher the value of one feature, the greater the value of the second feature).
Understand which of the options takes place in reality, correlation analysis does not allow, and it finds that force only statistical relationship.
The lack of linear correlation does not imply that symptoms are independent analyzed as their relationship may be linear.
The most commonly used correlation studies are:
parametric Pearson correlation analysis - to investigate the relationship of normally distributed traits;
nonparametric Spearman correlation analysis methods, Kendal, gamma for:
quantitative characters regardless of their distribution;
quantitative and qualitative ordinal traits;
two ordinal attributes.
Regression analysis - one of the methods of statistical modeling. In this case, the model is a regression equation parameters (coefficients) of which is calculated in the regression analysis.
Regression analysis is closely related to other statistical methods - methods of correlation and analysis of variance. In contrast to the correlation analysis, which examines the direction and strength of the statistical relationship signs, regression analysis examines the dependence of symptoms. For example, examine the possibility of predicting the age at which some will begin to develop a hereditary disease in a patient on enzyme activity in the plasma of his blood and the age at which the disease began in the parents of the patient.
In the linear regression analysis of the construction of the equation takes the form:
Consider all stages of correlation analysis on a specific example. Let biosystem is patient. He measured two signs: weight (P) and blood pressure (BP).
We calculate the correlation coefficient using MS Excel tools.
The obtained value r=0.828, which is an evidence of very strong positive linear relationship between the variables.
Now consider the scatterplot. We want to construct the linear regression line to make an equation and see how Weight affects Blood Pressure. We also do it in Excel:
The obtained regression equation give us an ability to predict the blood pressure value (y) based on a given weight (x):
Blood Pressure=0.9159*Weight+96.416
Sources
Casella, G., Berger, R.L. (2001). Statistical Inference. Duxbury Press. ISBN 0-534-24312-6
David A. Freedman. "Statistical Models and Shoe Leather" (1991). Sociological Methodology, vol. 21, pp. 291–313.
David A. Freedman. Statistical Models and Causal Inferences: A Dialogue with the Social Sciences. 2010. Edited by David Collier, Jasjeet S. Sekhon, and Philip B. Stark. Cambridge University Press.
Kruskal, William (December 1988). "Miracles and Statistics: The Casual Assumption of Independence (ASA Presidential address)". Journal of the American Statistical Association 83 (404): 929–940. JSTOR 2290117.