Probability:
It is the measure of how likelihood an event is likely to happen given a number of actors that control the chance or occurrence of the given event.
Consider the case given below:
Let S be the sample space and E be the event of interest. n(S) will be the number of elements in the sample space and n (E) will be the number of elements in the event.
If a fair sided die is rolled, find out the probability that an old number is obtained.
Solution:
We formulate the sample space S= (1, 2, 3 ,4 ,5 ,6) which shows the total number of events available and likely to happen.
Let E be the event “An old number is obtained”
E= (1, 3 ,5). Which shows all events of odd numbers likely to happen? We now use the classical probability formulae P(E)=n(E)/n(S)=3/6=1/2
We conclude that the probability of obtaining an odd number is half.
Frequency Distribution:
It is representation of observations which are mutually exclusive and exhaustive in either graphs, charts or in a table. Consider the case below:
An experiment was done on Roy settlement to determine the number of motorbikes per house. In each of the 20 homes visited, people were required to state how many motorbikes were registered under their households. The results are shown as 1,2,1,0,3,4,0,1,1,1,2,2,3,2,3,2,1,4,0,0. Represent the data in a frequency table.
Steps:
1. Divide the outcome results (y) into intervals and count the number of results in each interval. This will show the number of homes and the number of motorbikes in each.
2. Draw a table with different columns for the interval numbers (the number of motorbikes per house) , the frequency of results in each interval and the tally. Name these columns Number of motorbikes, Tally and Frequency
3. Tally the number of motorbikes per house
4. Add up the number of tally marks in each row while recording them in the frequency column.
Confidence intervals:
It is used to calculate the probability of a given estimator. It is a form of point estimator.
Consider the following case:
A sample of chicks is tested to assess the amount of drugs needed in stage I of growth. It has been hypothesized that chicks at stage I of growth may spend less time per day in the absorbing a certain quantity of drug for growth once it is administered. Number of hours spent is Stage I of growth is recorded for 61 chicks. The sample produced a mean of 48 hours (S=14 hours) of stage I drug intake over a 72 hour period of time. Calculate a 95 % confidence interval for this data.
Facts: The sample size is less than 30 hence it is considered large. We will therefore use t-distribution table values to estimate the CI for the mean.
Solution:
We will use the formula below to calculate the confidence interval.
S (Sample mean) =14 hours, n=61 confidence interval 95%.
T-tabulated from t-distribution table=2.000. Note that this is the value of Z in the formula.
Confidence Interval at 95 percent: 43.5 < population mean < 52.5
We are 95 percent sure that the population mean for the number of a chick will spend in absorbing a drug in stage I of growth in a 72 hour period of time lies between 43.5 minutes and 52.5 minutes.
Hypothesis Testing
It is used for testing true and null hypothesis in statistical inference of data
Consider a case in testing the average number of smokers and the number of minutes they take per day on smoking. It is assumed that on average all smokers take 190 minutes per day on smoking. If we test 100 smokers and find that there mean smoking minutes per day is 198 and standard deviation of 15, test at 5% level of significance if there is any evidence to show that the average mean smoking minutes of the 100 smokers per day exceed 190 minutes?
Facts: Null Hypothesis; Mean =190 H0: m = 190 and
Alternative hypothesis: Mean> 190 H1: m > 190
Mean of 100 smokers: X=198 and standard deviation: s = 15.
The z- statistics tabulated is given as 1.6449
Z= (198-190)/ (15/ Square root 100)
=5.333
We compare this value with the z-tabulated from the z-tables
Since z = 5.333 are greater than the z-statistics tabulated from the z-tables, we reject Ho and adopt H: 1. Therefore, there is significant evidence to conclude that smokers’ average numbers of smoking minutes is greater than 190.
Linear correlation
Correlation refers to statistical relationship between two or more random variables. Linear correlation therefore refers to any statistical relationship between two or more random variables that are linearly dependent. Example is the correlation between the demand of a product such as maize floor and its price.
Consider the case below
What is the correlation between these two variables?Level of mothers IQ(x) Level of a child intelligence(y)4 210 812 115 37 86 72 314 13
The level of child’s intelligence is correlated with the mother’s IQ and we wish to test the level of correlation. Sum of X^2=570: sum of X=60: Number of items =8: Sum of (XY) =521: sum of Y= 55: Sum of Y^2 =489
Mean of X=7.5, Mean of Y=6.875. SSx = 513.75SSy = 441.734375SSxy = 108.5
R (Correlation) = SSxy/ (SSx*SSy)
The two variables are correlated by the equation y=A+Bx and we also wish to find the values of a and b
A=0.09375, B=0.904
Y=0.09375+0.904X>. The two variables are correlated by the given equation and their value of correlation: r=0.9406
There is high level of correlation between a mother’s IQ and that of her child IQ .