When data is collected in statistics, some data points fall far below or over the other data collected. Data of this kind is usually referred to as outliers, which influence the distribution of the data and the results got from interpreting the data (Mann 90). In statistical collection and analysis, most of the data usually follows a normal distribution, and measures of dispersion like the mean tend to look favorable. However, in the presence of outliers, these measures of data become skewed or do not reflect the real state of data in the sample collected. Sometimes, in the calculation of measures of dispersion or analysis of data, outliers produce results that are far from normal, so the statisticians remove the outliers to make data analysis easier.
An example of an outlier in data is when a coach is training long-jump athletes. Most of the athletes will get heights that are grouped together, but some athletes will get heights that are far above or far below the team average. Considering a situation where one or two athletes get heights that are far lower than the team average, the calculation of the overall team mean height will show that the whole team is performing badly. However, when the data for the outliers are removed, the team average is restored. Therefore, it can be seen that outliers distort the readings or interpretation of data, and the best way to deal with this is to clean the data or remove the outliers (Mann 93). In cases where the outliers have to be used, other measures of data are used, measures that are not adversely affected by outliers, for example, the mode and median of data.
Work Cited
Mann, Prem. Introductory Statistics, 2006. Wiley and Sons, Boston. Print.