Assignment 8
1. U = runif(100)
Largest value, around 1.0, in the uniform distribution is not as far out to the right compared to the normal distribution. This is shown as a curve to the right in the normal qqplot for the uniform distribution which takes the range from 0 to 1. Likewise, the smallest value, around 0, in the uniform distribution is not as far out to the left compared to the normal distribution. This is shown as a curve towards the left in the same qqplot. If the uniform distribution is similar to the normal distribution, the qqplot would have display a straight line, but clearly is not, hence the uniform distribution has no similarities to a normal distribution.
2. M = Uniform(-15,-5), so that, the uniform distribution is centered on -10.
N = Uniform(5,15), so that, the uniform distribution is centered on 10.
P <- c(M,N)
qqnorm(P):
Largest value (around 15 for P) in the uniform distribution is not as far out to the right compared to the normal distribution. This is shown as a curve to the right in the normal qqplot for P. Likewise, the smallest value (around -15 for P) in the uniform distribution is not as far out to the left compared to the normal distribution. This is shown as a curve towards the left in the normal qqplot for P. Also, there is a jump from -5 to 5 in the qqplot, corresponding to a gap between M and N when concatenating to get P. This is not found in a normal distribution.
3. Q <- rep(0,100) replicates 0 for hundred times.
P <- c(M,N,Q)
qqnorm(P):
This qqplot is similar to that in Question (2) except that now, we see a horizontal line at 0. This corresponds to the 100 entries with value 0 from distribution Q. Since there are 100 entries, there will be several quantiles for distribution P that will have the same value, i.e. 100. For a normal distribution, these quantiles will have different but increasing values. Hence, when we plot the quantiles for distribution P vs. the normal distribution, we get a horizontal line.
4.
T2 <- rt(500,2)
t10 <- rt(500,10)
t25 <- rt(500,25)
hist(t2,main="2 degrees of freedom",xlim = c(-10,10), ylim=c(0,250), breaks=20)
hist(t10,main="10 degrees of freedom",xlim = c(-10,10), ylim=c(0,250))
hist(t25,main="25 degrees of freedom",xlim = c(-10,10), ylim=c(0,250))
We have used a shorter title, as it would not fit into its subframe. We have used ylim=c(0,250) instead of c(0,275) so that the higher degrees of freedom distributions can be seen clearer. The Figures are shown below:
We notice that as the number of degrees of freedom N gets higher, the peak of the distribution gets smaller. Also, as N gets larger, the t-distribution looks more like a normal distribution centered at zero.
5. This is used: plot(density(rchisq(n,ndf)), main="ndf = ")
The Figures for the distributions are shown below:
As the number of degrees of freedom ndf gets higher, the bandwidth gets larger. Also, the distribution seems to tend to a normal distribution, as the ndf gets higher. As a result, the skewness gets lesser, and the right tail of the distribution gets lesser as ndf increases.
6. First, df2 is varied and df1 = 3. The script is:
plot(density(rf(100,3,3)),col="red",main="F distribution",xlab="n = 100, df1 = 3",xlim=c(0,20),ylim=c(0,0.7))
par(new=T)
plot(density(rf(100,3,6)),col="green",main="F distribution",xlab="n = 100, df1 = 3",xlim=c(0,20),ylim=c(0,0.7))
par(new=T)
plot(density(rf(100,3,12)),col="blue",main="F distribution",xlab="n = 100, df1 = 3",xlim=c(0,20),ylim=c(0,0.7))
par(new=T)
plot(density(rf(100,3,25)),main="F distribution",xlab="n = 100, df1 = 3",xlim=c(0,20),ylim=c(0,0.7))
labels <- c("df2 = 3","df2 = 6", "df2 = 12", "df2 = 25")
colors <- c("red","green","blue","black")
legend("topright",title="Legend",labels, lwd = 2, col=colors)
and the Figure below is obtained:
As df2 increases, the F distribution gets a higher peak, and the tail of the distribution gets lesser density. The peak of the F distributions (df2 = 3, 6, 12, and 25) seems to stay at the same position.
Second, df2 = 3, but df1 is varied. The script is:
plot(density(rf(100,3,3)),col="red",main="F distribution",xlab="n = 100, df2 = 3",xlim=c(0,20),ylim=c(0,0.5))
par(new=T)
plot(density(rf(100,6,3)),col="green",main="F distribution",xlab="n = 100, df2 = 3",xlim=c(0,20),ylim=c(0,0.5))
par(new=T)
plot(density(rf(100,12,3)),col="blue",main="F distribution",xlab="n = 100, df2 = 3",xlim=c(0,20),ylim=c(0,0.5))
par(new=T)
plot(density(rf(100,25,3)),main="F distribution",xlab="n = 100, df2 = 3",xlim=c(0,20),ylim=c(0,0.5))
labels <- c("df1 = 3","df1 = 6", "df1 = 12", "df1 = 25")
colors <- c("red","green","blue","black")
legend("topright",title="Legend",labels, lwd = 2, col=colors)
and the Figure below is obtained:
The peak of the F distributions (df1 = 3, 6, 12, and 25) seems to stay around the same position, varying only slightly among one another. Also, the peak of df1 = 3, and df1 = 12 distributions have about the same height. The peak of df1 = 6, and df1 = 25 distributions have about the same height, and their peaks are lower compared to that of df1 = 3, and df1 = 12. The tail of the distributions with df1 = 6, and df1 = 25 is longer compared to that of df1 = 3, and df1 = 12.
Third, fix df2 = 6, and vary df1. The Figure below is obtained:
The peaks in df2=6 distributions with varying df1 are higher compared to the ones in df2=3 distributions. All the peaks in df2=6 distributions are about the same height. This is different from those in df2=3 distributions where the heights are varied.
Fourth, vary df2, and fix df1 = 6. The Figure below is obtained:
The characteristics of the df1=6 distributions with varying df2 is the same as the df1=3 distributions. However, the peaks are higher in the df1=6 distributions compared to the ones with df1 = 3.
7. We used hist(trees$Girth) to obtain the histogram on girth:
The distribution of girths shows a skewed distribution. Using summary(trees$Girth), we find that the mean girth is 13.25, and the median girth is 12.90. Since, mean > median and median > mode (mode is in bin 10 - 12), this shows that the distribution is positively skewed. This can also be seen simply from the histogram.
We used hist(trees$Height) to obtain the histogram on height:
The distribution of heights shows a short-tailed distribution. Using summary(trees$Height), we find that the mean height is 76, and the median height is 76. Also, we find that the mode lies at bin 75-80, which is the same bin where the mean and median lie. This shows that the distribution is not skewed, or long-tailed.
We used hist(trees$Volume) to obtain the histogram on volume:
The distribution of volumes shows a long-tailed distribution. Using summary(trees$Volume), we find that the mean girth is 30.17, and the median girth is 24.20. Also, the mode is in the first bin, i.e. bin 10 – 20. Since, mean > median > mode, the distribution is positively skewed. But more accurately, since the mode lies to the far extreme left, this distribution should be characterized as a long-tailed distribution.
8. prop.test(42,100,.65, conf.level=.95) gives a result on a 1-sample proportions test with continuity correction. The null probability of 0.65 is rejected in favor of the alternative hypothesis, as R finds the proportion of the sample to be 0.42. This is a 95% confidence level result. The p-value is 2.39e-06.
When leaving out the third number, i.e. remove 0.65 and replace with NULL, hence using the command: prop.test(42,100,NULL, conf.level=.95), the p-value changed to 0.1336. When the third number 0.65 is changed with different values, from values < 0.65 to values > 0.65, the p-value increases and then decreases. When we put prop.test(42,100,0.5, conf.level=.95), the p-value is 0.1336, which is the same as when the value is NULL.
9.
(1) Null hypothesis, N: Proportion of male = ½
(2) sample(75:99,1)
R outputs 90. So C = 90.
(3) sample(40:160,1)
R outputs 89. So B = 89.
(4) S = 200, B = 89, population proportion = 89/200 = 0.445
(5) p = 0.445, q = 1 – 0.445 = 0.555, S = 200, standard error = 0.4450.555200≈0.0351
Margin of error = qnorm(0.95)*0.0351
Using R, margin of error ≈0.0577
Left value of confidence interval = 0.445 – 0.0577 ≈ 0.387
Right value of confidence interval = 0.445 + 0.0577 ≈ 0.503
Confidence interval is (0.387, 0.503).
(6) Confidence level was C = 90%. So we should use (90 + 10/2)/100 = 0.95 for
probability from –infinity to the critical value.
So the function to use in R is qnorm(0.95)