1.Identify three uses for a frequency distribution.
Epidemiologists use a variety of methods to summarize data. One fundamental method is the frequency distribution. The frequency distribution is a table, which displays how many people fall into each category of a variable such as age, income level, or disease status. Epidemiologists examine factors/variables that are predictors for health outcomes; they also observe patterns of distribution of illness, disability and health-related outcomes (Woodward 10).
Among values in statistical data, there may be a presence of repetition within the data set; this is referred to as the frequency of that variable. The frequency of a variable is denoted as the “frequency distribution”, and is often listed in a table known as a “frequency distribution table” (Woodward 27). For example, this table presents the number of people that fall into a particular variable such as socioeconomic status, diagnosed disease, religious status, age, mortality rate and/or patient discharge (hours or days). Epidemiologists can examine a certain disease population and provide more information by use the frequency of certain variables, most commonly age, sex, race, socioeconomic status and family history in relation(Woodward 30). For example, people with breast cancer – women have a higher frequency of breast cancer than men.
The primary uses for frequency distribution are: (1) to summarize & analyze the data collected, (2) assists in estimation of frequencies within population-based off this particular sample, and (3) used in the calculations of statistical measures that require the use of mean, median, mode, range and standard deviation, all of which is derived from the frequency distribution(Woodward 27). Therefore, a health example would the determining the average age of mortality among men and women, which is based on the frequency count of deaths at a particular age by sex.
2.Briefly, identify the differences between a normal, positive and negative skew.
The normal distribution is the total amount of values beneath the curve representing 100% of the frequency distribution from the sample. See diagram below for a visual representation of a normal distribution/curve (Woodward 58).
“The normal distribution”(Woodward 58) is defined as being bell-shaped, symmetric with the mean being in the center. The distribution is shaped whereby 68% of the values fall between +/- 1 standard deviation away from the mean, and 95% of the values fall under +/- 2 standard deviation (SD= standard units from the averaged frequency) from the mean, and lastly 99% are within +/- 2.5 SD from the mean.
Height of the population is often provided as an example of normally distributed values/data. In other words, most of the individuals in the population cluster around the average height and as you go left to the curve (shorter) and right (taller) it becomes more polarized and is less common (i.e.: very tall individuals, and very short persons)
Skewed data occurs when the values are not normally distributed (Woodward 51). When the tail end of the curve is to the right is referred to a being positively skewed, and when this end is toward the left it is negatively skewed. Is to the right, as you look at the distribution, it is called right or positively skewed. When the tail is to the left, it is called left or negatively skewed data.
The mean or average is the most common measure of central tendency used in the normal distribution but is not appropriate when the distribution is skewed because the mean is impacted by outliers (extreme values). For example, if you look at mortality rate by age in developing countries compared to western developed countries you can see that as a result of poor conditions the average age of death is skewed in developing countries because there is a high frequency of infant mortality thus pushing the direction of the mean. In this case, the use of the median is more accurate description of the frequency distribution. )
3. How does this skew or distribution curve relate to standard error?
Many smaller samples drawn from the larger population will typically vary, this variation is due to random “sampling error” or “standard error” (Woodward 90). An example of how this is relates to skewness of insulin testing values For instance if measurement of insulin in sample is made up of fasting individuals then there will be more values on the lower end, and vice versa. If a measurement of a person’s insulin level is repeated it will probably change throughout the day. This is considered within the subject thus intrasubject variability. Between groups of individuals is intersubject. The more variability there is in the sample population the higher the chance for error. The calculation for this is the standard deviation divided by the sample size (square root) (Woodward 90).
4. What does the standard error mean for the results? Feel free to use and example to explain your answer.
Standard error falls under the category of inferential statistics, which warrant the research analyst to construct confidence interval about the sample (Woodward 69) . Confidence interval provides an approximation of an estimate of the specified value category or interval in which the population will be under. The standard error of the mean is usually calculated with a p value of <. 05, suggesting 95% confidence in the estimate. In other words, you are 95% confident with a 5% chance of error(Woodward 69).
The Standard Error of the estimate is another example used by researchers; this will inform the researcher if the population sample reflects that of the larger sample. For example, if the standard or average blood pressure of a sample is much higher than that of the national average this could be an error of sampling or error of measurement or something about your sample is significantly influencing the value of blood pressure to be high (you sample from only overweight or obese persons who are reported to have higher blood pressure). Considering this, the standard error is critical to determine the accuracy of statistics calculated among random samples.
Works Cited:
Woodward, Mark. Epidemiology: Study Design and Data Analysis. CRC Press, 2013. Print.