Final Project
Final Project
Introduction
Statistical analysis is an integral part of the clinical trial and nursing. The purpose of this paper is to demonstrate the understanding the essence of the various methods of statistical processing of medical data, without going into details of mathematical calculations. We consider the most useful and popular types of analysis of the nursing, clinical and experimental medicine.
There are a number of indicators are included in the term of "descriptive statistics". They characterize the sample as a whole. Each researcher knows such concept as average (mean) value, which is calculated by dividing the sum of the values of a variable by the number of observations. It characterizes the "central position" of a quantitative variable. Due to the fact that the values are summed and divided by the number of cases (observations), very high or low values of the variables (outliers) can significantly affect the value of the average in small samples. As the sample increases in size, the influence of extreme values on average is reduced.
The median is a value that occupies a middle position among the data points, breaking the sample into two equal parts. Half of the variable is on one side of the median value, and half is on the other. This is a so-called 50th percentile of a sample. Obviously, the outliers, i.e. extreme values of the variable, have much less impact on median than on the mean value. In this regard, the median is often used to describe skewed data - for example, height or body weight in the groups.
Standard deviation (SD) reflects the variability (range, variation) of the variable and evaluates the degree of difference from the mean. It is calculated based on the calculated index of the data scattering that is called variance, by extracting the square root from it. The standard deviation can vary unpredictably, i.e. increase or decrease with increasing sample size, but usually not too much.
Confidence Interval (CI) is a range of values, a region that contains the true population value (e.g., average) at a certain level of reliability (or confidence). 90% confidence interval means that the true value falls in the calculated interval with a probability of 90%. The confidence interval from mean value is usually used in biomedical research and nursing with the level of confidence at 95%. It is defined as ± 1,96 standard error (coefficient 1.96 derives from the assumption of normal distribution of variable values given that the sample is sufficiently large). For example, if the value of the average systolic blood pressure in the studied group was 125 mm Hg, and the standard error was 5 mm Hg, the 95% confidence interval of the mean values of the range will be from 115.2 to 134.8 mm Hg (that is ± 9,8 (5 x 1.96) mm Hg in both directions from the mean). Combining the mean and confidence interval, we can say that the sample average value of systolic blood pressure in a group of 125 mm Hg, and at the same time we are 95% confident that the true average value varies between 115.2 and 134.8 mm Hg.
Probability Distributions
The normal probability distribution is most widely used in the area of nursing and clinical research. The normal (or Gaussian) distribution has a bell-shaped form that is perfectly symmetrical about the axis passing through the mean value and mathematically described by the formula, which includes two parameters: mean and standard deviation.
Conformity assessment of a Gaussian distribution of the data is performed in the statistical programs by using normality tests (such as the Kolmogorov-Smirnov or Shapiro-Wilk tests). Visual inspection with the histogram is also quite clear. In cases where data is not normally distributed, but are a subject of a different distribution (which can be determined by using statistical programs), it can be transformed to the normality by completing various mathematical operations, such as taking the logarithm or square root.
Hypothesis Testing
Moving from the general formulation of the problem and design studies to calculations, it is first necessary to formulate a statistical hypothesis. It serves as a link between the data and the possible use of statistical methods for analyzing, formulating a probability distribution of the data. The formulated statistical hypothesis gives a description of the expected results of the study, which compares with the observed. If the hypothesis is correct, the observed data only randomly differ from the expected, namely in accordance with the probability distribution of this hypothesis. The null hypothesis (denoted by Ho) suggests no difference (correlation) between the compared samples. As a control sample, a common standard (method of approach) is used. If the null hypothesis is rejected, then the alternative hypothesis (Ha) is accepted. It indicates the significant differences between the groups. Assume that we want to compare the efficiency of HIV treatment for the group of patients. In this case, the null hypothesis is: there is no significant difference in the mean value of viral load before and after treatment. The alternative hypothesis is: the viral load after the treatment is significantly lower than before the treatment.