Question One
Statistics, as many may not be aware is an important part of our everyday life. Human beings are confronted by statistical issues in magazines and newspapers, on television and as well in general conversations and encounters. We majorly encounter them as we try to evaluate the cost of living where you find people claiming that the cost of living is high basing the decision on the percentage increase, unemployment rate, weather predictions where the previous data was used, sports in recording performance and determining the winner, in politics to analyze and predict results and in gambling and lottery. Although this might be new and shocking, it is amazing to note that each individual is an informal statistician in one way or another.
Statistics is a branch of mathematics that equips one with the basic concepts, procedures and rules that are useful in organizing numerical data in the form of graphs, tables and charts. A person is able to study and understand the organized numerical data using various statistical techniques and make an informed decision concerning their effect and future implication.
Under descriptive statistics, a person deals with all the methods that are used in collecting, organizing and analyzing numerical data. These methods include conducting interviews, tallying, use of questionnaires and observation. Data collection has many challenges which involve time wastage and sometimes it might be costly. Also, while collecting data experience so many incidences of bias. That is why random sampling is regarded as the best method of data collection.
While dealing with the issue of time management and cost reduction, statisticians prefer to collect data from samples rather than from the whole population. A population refers to the entire species under study. A sample is a just a small portion of the population that resembles the whole population. Generally, it is assumed that the characteristics of the sample are actually the same as the characteristics of that of population.
The data collected can either be discrete or continuous data. Discrete data is usually in whole numbers. We say 20 people rather than 20.1 people. On the other hand, continuous data can either be in whole numbers or fractions. Example, when recording heights or mile coverage, we can say 3.6 meters.
Having collected data, it is recorded in charts graphs and tables. This is meant to simplify the data for easier analysis. Data recording entails data organization where related data are put together and analyzed differently.
Data analysis is all about finding out the characteristics of the data. It involves the computation of statistical parameters such as median, mean, standard deviation, mode, quartiles and variance.
The median shows the most central variable. The mode is the variable that appears many times. The mean is the average variable where as the standard deviation is a measure of dispersion. It measure how data or variables are spread from the mean.
While interpreting the data, we have to factor in some possibilities of having an error in the final outcome. This is done by defining a degree of freedom and is meant to increase accuracy.
The second type of statistics is inferential statistics. This entails drawing an opinion or inference concerning how representative the sample is to the population. One has to come up with a confidence level and using the t test for normal distributed data, one can tell the range within where the actual lies.
Hypothesis testing is also useful in statistical inference. This is used when one wants to establish the relationship between the calculated figure and the actual one. A null and alternative hypothesis is calculated and its validity determined. While formulating a hypothesis, type I error or type II error may occur. Type one error occurs when we reject a true hypothesis whereas the type II error occurs when we accept a wrong hypothesis.
Question 2
Normal distribution, also known as the z distribution, the T distribution, the Chi square and the F distribution are the parameters used in statistical inference. They possess the following basic properties.
Properties of normal distribution
It is also known as z distribution.
The mean is assumed to be zero.
The standard deviation is 1.
At the center, mean =mode=median.
It is bell shaped.
The area under the curve is the probability and has a probability of 1.
It is symmetrical.
It ranges from negative infinity to positive infinity.
Properties of t distribution
It has a mound-shaped
It is perfectly symmetric about t=0
It has more variable than normal distribution.
T-distribution is affected by the size of the sample.
It deals with positive values
Properties of F distribution and Chi square
Used for more than one sample.
The area under the curve has a probability of one.
It depends on the degree of freedom.
It deals with positive values.
These parameters are used in testing hypothesis since they offer a clear guideline on how to formulate a hypothesis are simple to evaluate.