Question two
Question Three
Variable Mean SE Mean StDev Variance CoefVar Minimum Q1 Median
Smoking 102.88 3.44 17.20 295.78 16.72 66.00 91.00 104.00
Mortality 109.00 5.22 26.11 681.92 23.96 51.00 87.00 113.00
N for
Variable Q3 Maximum Range Mode Mode Skewness Kurtosis
Smoking 114.00 137.00 71.00 91, 102 2 -0.10 -0.04
Mortality 128.00 155.00 104.00 104, 113, 128 2 -0.38 -0.16
The median and mean within this descriptive statistics represents the data set center. Smoking has a mean of 102.88 and a median of 104.0. On the other hand, mortality has a mean of 109.00 and a median of 113.00. The two histograms does not show clustering of data.
Smoking has a standard deviation of 17.2 and a range of 71.00. Mortality has a standard deviation of 26.11 and a range of 104.00. For both, the quartiles widths are same. Smoking has a kurtosis of -0.04 and mortality have -0.16. Smoking has a skew of -0.01 and that one of mortality is -0.38.
Question four
Symmetric data is the data that when it is represented in a graph like a bar chart, histogram or a box plot appears symmetrical with respect to a vertical axis that passes through the mean. In this case, the data are symmetric. This is due to the fact that both the histogram and the boxplot are symmetric with respect to the vertical axis passing through the mean. However, these data are not sufficiently symmetric since the mean; mode and median are not equal. Both data have lesser skews meaning that the data is symmetric but smoking data is more symmetrical.
Question five
Question six
The two variables correlates since one is depended on the other. Mortality is dependent on smoking since the fitted line plot depicts a straight line. This shows there exists a positive correlation between smoking and mortality.
Question seven
The simple regression equation is Mortality = - 2.89 + 1.088 Smoking. The P-Value of the regression equation is less than 0.05, meaning that a positive linear relationship exists between smoking and mortality. The equation can be described as statistically significant hence it should be used to predict. This is as shown below
Regression Analysis: Mortality versus Smoking
The regression equation is
Mortality = - 2.89 + 1.088 Smoking
S = 18.6154 R-Sq = 51.3% R-Sq (adj) = 49.2%
Analysis of Variance
Source DF SS MS F P
Regression 1 8395.7 8395.75 24.23 0.000
Error 23 7970.3 346.53
Total 24 16366.0
Fitted Line: Mortality versus Smoking
General Regression Analysis: Mortality versus Smoking
Regression Equation
Mortality = -2.88532 + 1.08753 Smoking
Coefficients
Term Coef SE Coef T P
Constant -2.88532 23.0337 -0.12526 0.901
Smoking 1.08753 0.2209 4.92218 0.000
Summary of Model
S = 18.6154 R-Sq = 51.30% R-Sq (adj) = 49.18%
PRESS = 9640.73 R-Sq (pred) = 41.09%
Analysis of Variance
Source DF Seq SS Adj SS Adj MS F P
Regression 1 8395.7 8395.75 8395.75 24.2279 0.000057
Smoking 1 8395.7 8395.75 8395.75 24.2279 0.000057
Error 23 7970.3 7970.25 346.53
Lack-of-Fit 21 7705.3 7705.25 366.92 2.7692 0.298859
Pure Error 2 265.0 265.00 132.50
Total 24 16366.0
Fits and Diagnostics for Unusual Observations
No unusual observations.
Reference
http://lib.stat.cmu.edu/DASL/Datafiles/SmokingandCancer.html