Confidence Interval for the Mean of Population
Taking a sample from a population, we get the point estimate of the parameter of interest and calculate the standard error to indicate the accuracy of the estimate. However, in most cases the standard error is not acceptable as a measure of accuracy. Much more useful to combine this measure with the interval estimate of the population parameter.
This can be done using knowledge of the theoretical probability distribution of sample statistics (parameter) in order to calculate the confidence interval (CI - Confidence Interval) for the parameter. The sample mean has a normal distribution when the sample size is large, so you can apply the knowledge of the normal distribution when considering the sample mean. In particular, the 95% of the sample mean distribution is within 1.96 standard deviations (SD) of the population average. When we have only one sample, we call it the standard error of the mean (SEM), and calculate the 95% confidence interval for the mean as follows:
x-1.96*SEM;x+1.96*SEM
If we repeat this experiment several times, the interval will contain the true mean of the population in 95% of cases.
Now consider the example of developing 95% CI for mean value: Find the confidence interval to estimate the unknown population mean with 95% level of confidence of a normally distributed random variable, if the standard deviation is 5, the sample mean is 20 and the sample size is 100.
The sample size is quite large (n>30), hence, we can apply the properties of normal distribution. First, calculate SEM:
SEM=sn=510=0.5
Then the confidence interval has the following form:
20-1.96*0.5;20+1.96*0.5
We are 95% confident that the true population mean value is between 19.02 and 20.98.
P-value and Critical Region for Hypothesis
The conclusions that are based on statistical data may often contain errors. There are two types of errors appear when checking stat hypotheses. A type 1 error is the rejection of the null hypothesis when in fact it is true. A type 2 error is the rejection of alternative hypothesis when it is true. When checking the statistical hypotheses, the type 1 error is limited by the given number that is called the level of significance (alpha or p-value). Historically, the most common levels of significance are one of the numbers 0.005, 0.01, 0.05.
A critical region is the region of test values for which the null hypothesis is rejected. The region of acceptance of the hypothesis is the area of the criterion values for which the hypothesis is accepted.
So, the hypothesis testing process consists of the following steps:
1) Select a statistical criterion C;
2) Calculate the observed value of C (C-observed) based on the given sample;
3) As the distribution of C is known, the critical value C-critical is determined (by the prior significance level alpha). This value divides the critical region and the region of acceptance of the hypothesis.
4) If the calculated value of C-observed falls into the acceptance region of the hypothesis, then the null hypothesis is accepted. If it falls into the critical area - the null hypothesis is rejected.
There are different types of critical regions:
- Right-sided critical region is defined by the inequality C>C-critical (C-critical >0);
- Left-sided critical region is defined by the inequality C<C-critical (C-critical <0);
- Two-sided critical region is defined by the inequality C<C1, C>C2 (C2>C1).