Problem #1
Plot the histogram of the number of cases of accident for each group of street speed, km/h.
These histograms describe the distribution of the number of cases of accident that have happened at the speed of 40, 60, 80 and 100 km/h. The distribution of the accidents at the speed of 40 km/h and 60 km/h is right skewed. The distribution of accidents at the 80 km/h speed is almost symmetric and the distribution of accidents at 100 km/h is uniform.
Problem #2
The average number of accidents at the speed of 40 km/h is 3.1 with a standard deviation of 1.91887 accidents. The minimum number of accidents is 1 and the maximum is 8. The interquartile range is 2. The 50th percentile (median) is 2.5.
The average number of accidents at the speed of 60 km/h is 6.2933 with a standard deviation of 4.41978 accidents. The minimum number of accidents is 1 and the maximum is 23. The interquartile range is 6. The 50th percentile (median) is 5.
The average number of accidents at the speed of 80 km/h is 9.2917 with a standard deviation of 6.08261 accidents. The minimum number of accidents is 1 and the maximum is 33. The interquartile range is 4.7. The 50th percentile (median) is 8.
The average number of accidents at the speed of 100 km/h is 11.25 with a standard deviation of 3.30404 accidents. The minimum number of accidents is 8 and the maximum is 15. The interquartile range is 6.25. The 50th percentile (median) is 11.
Summing up, we can conclude that at the first glance, the higher speed on the road is associated with more often cases of accidents.
Problem #3
In this problem, we generate probability distribution plots for each table of data.
The probability distribution plots indicated that all the four groups are not normally distributed.
Problem #4
Create the 95% confidence intervals for the population mean value of accidents in each of the speed groups. These confidence intervals were reported in the tables in Problem #2. We gather them in the table below:
We can interpret these confidence intervals as follows: we are 95% confident that the true population mean of the number of accidents of each group is between lower and upper limits of the confidence interval, calculated above. Based on the given results, we conclude that the confidence interval for 40 km/h does not intersect with all three other intervals. This means that there is a significant difference in the average number of accidents between 40 km/h group and all other groups. The other confidence interval intersect each other. This means that there is no significant difference in the mean number of accidents in the 60, 80 and 100 km/h groups.
Problem #5
In order to test the independence of the groups, we have to perform Levene’s test for Variances with t-test for independent groups.
The Levene’s test indicated that the variances are approximately equal for 60, 80 and 100 groups (p>0.05). This means that the groups are not independent. The variance of 40 km/h group is significantly different from the other groups. It seems that this group is independent of others.
Problem #6
We plot the average number of the accidents in each groups vs. the speed indicator:
This plot indicates a strong positive association between the speed and the number of accidents. The higher speed causes more accidents and vice versa.
Problem #7
Perform a hypothesis test that there is a linear relationship between the average number of accidents and speed value. We do this using linear regression analysis.
H0: β1=0Ha: β1≠0
The level of significance is 0.05
Develop linear regression:
The ANOVA table indicates that the coefficients of the regression are jointly significant (F=180.664, p=0.005). The SpeedGroup coefficient is also significant (p=0.005, t=13.441). The R-square indicates that the speed value explains 99350% of variance in the number of accidents. We conclude that there is a significant linear relationship between the variables and reject the null hypothesis. The regression equation is the following:
AverageAccidents = -2.123 + 0.137*SpeedGroup.