Part 1: Setting up and R code.
Yellow words are not a part of R code, you should not submit them. This is just a commentary for your convenience.
Input the following R code:
> d <- file.choose();
// then choose LDS.csv file
> LDC<- as.data.frame(read.csv(d),header=T);
> N<-301049
> Deletions <- (c(N, N+35, N+70, N+105, N+140)%%72)
> LDC[Deletions,];
> myLDC<-myLDC <- LDC[-Deletions,]
// now type the following to ensure that the data set is cleared with the determined deletions:
> myLDC
// It will display the following:
X Customers OMandA Dist.Cost Total.Bill Area Size
1 Algoma 11581 839 56.15 132.47 N S
2 Atikokan 1661 564 55.61 135.34 N S
3 Brant 9741 490 36.34 114.13 S S
4 CW 6496 299 33.35 110.82 S S
5 Chapleau 1293 416 34.16 113.20 N S
6 Coop Embrun 1954 274 33.90 112.26 S S
7 ELK 11276 214 36.00 115.98 S S
8 Espanola 3299 326 40.88 119.99 N S
9 Fort Frances 3775 345 22.98 98.32 N S
10 Grimsby 10307 202 36.82 114.82 S S
11 Hearst 2817 308 25.52 103.07 N S
12 Hydro 2000 1208 264 35.16 114.85 S S
14 Kenora 5572 359 36.94 114.45 N S
16 Lakeland 9598 293 41.16 119.56 N S
18 NOTL 8000 238 35.13 112.70 S S
19 NOW 6059 353 34.23 111.70 N S
20 Orangeville 11248 263 34.43 112.03 S S
21 Ottawa River 10555 253 12.73 87.30 N S
22 Parry Sound 3441 383 41.23 120.77 N S
23 Renfrew 4183 269 27.07 106.29 S S
24 Rideau SL 4185 275 37.60 117.46 S S
25 Sioux Lookout 2755 425 45.00 123.79 N S
26 Tillsonburg 6745 330 32.07 109.34 S S
27 Wasaga 12324 180 30.96 110.59 S S
28 Wellington North 3626 432 38.07 117.37 S S
29 Blue Water 35772 309 40.98 117.81 S M
30 Brantford 37964 176 28.79 106.07 S M
31 Burlington 64329 225 34.33 111.50 S M
32 Cambridge 51584 209 33.94 110.30 S M
33 Chatham kent 32132 209 26.73 106.26 S M
34 Collus 15723 259 33.48 110.80 S M
35 CNP 15708 279 39.68 111.15 S M
36 Enwin 85083 268 41.14 118.12 S M
37 ErieThames 18090 315 40.56 118.16 S M
38 Essex 28094 197 36.91 115.43 S M
39 Festival 19885 200 38.25 114.75 S M
40 Greater Sudbury 46748 280 33.90 111.91 N M
41 Guelph 50859 251 34.71 110.54 S M
42 Haldimand 21070 346 46.73 125.78 S M
43 Halton 21232 227 32.40 110.92 S M
44 Innisfil 14826 281 43.59 123.09 S M
45 Kingston 26844 224 34.40 111.16 S M
46 KitchenerWilmot 87964 155 29.60 106.19 S M
47 Milton 30485 210 35.76 112.63 S M
48 Newmarket 33338 198 37.98 115.17 S M
49 NPEI 51162 275 36.75 114.98 S M
51 North Bay 23850 224 36.29 113.97 N M
53 Orillia 13035 345 31.57 108.13 S M
54 Oshawa 53083 191 26.63 103.97 S M
55 Peterborough 35270 199 30.75 108.24 S M
56 Sault Ste Marie 32998 260 26.65 100.16 N M
57 St Thomas 16436 225 28.85 105.64 S M
58 Thunder Bay 49765 238 26.29 103.75 N M
59 Waterloo North 53611 182 35.30 112.47 S M
60 Welland 21768 242 37.77 115.81 S M
61 Westario 22257 207 28.91 108.70 S M
62 Whitby 40337 214 38.19 115.69 S M
63 Woodstock 15181 251 41.05 118.40 S M
64 Enersource 195381 238 30.88 107.75 S L
65 Horizon 235327 175 36.72 113.90 S L
66 Hydro One Brampton 137856 148 30.67 107.46 S L
67 Ottawa 305266 191 34.67 111.44 S L
68 London 148331 209 34.83 112.03 S L
69 Powerstream 332993 184 32.21 108.65 S L
70 Veridian 113709 181 31.38 108.81 S L
71 Hydro One Networks 1210695 454 51.42 131.34 N XL
72 Toronto 709323 328 39.59 116.73 S XL
> attach(myLDC)
// N is equal to 17 (odd), hence, we regress Total.Bill on Customers, the square root of Customers and OMandA.
> lm( Total.Bill ~Customers + I(sqrt(Customers)) + OMandA)
Call:
lm(formula = Total.Bill ~ Customers + I(sqrt(Customers)) + OMandA)
Coefficients:
(Intercept) Customers I(sqrt(Customers))
1.020e+02 9.669e-06 -1.635e-03
OMandA
3.786e-02
> summary(lm( Total.Bill ~Customers + I(sqrt(Customers)) + OMandA))
Call:
lm(formula = Total.Bill ~ Customers + I(sqrt(Customers)) + OMandA)
Residuals:
Min 1Q Median 3Q Max
-24.255 -2.659 0.118 4.263 11.995
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.020e+02 3.664e+00 27.852 < 2e-16 ***
Customers 9.669e-06 1.435e-05 0.674 0.503
I(sqrt(Customers)) -1.635e-03 1.394e-02 -0.117 0.907
OMandA 3.786e-02 8.391e-03 4.512 2.86e-05 ***
---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 6.174 on 63 degrees of freedom
Multiple R-squared: 0.3558, Adjusted R-squared: 0.3251
F-statistic: 11.6 on 3 and 63 DF, p-value: 3.789e-06
Part 2: Writing a short report.
The economic purpose of this assignment is to determine significant factors that impact total bill value. The data represents the information about the total bill value, number of customers and other characteristics in a sample of restaurants. In this paper we have developed multiple regression to predict the value of total bill based on the given number of customers, square root of this number and OMandA. We expect the following form of the estimated regression equation:
Total.Bill=β0+β1Customers+β2Customers+β3OMandA
The null hypothesis is:
H0: β0=β1=β2=β3=0
The alternative hypothesis is that not all β are equal to 0.
It is appeared that the regression equation has the following form:
Total.Bill=102+0.0000096Customers-0.001635Customers+0.03786OMandA
The coefficients are jointly significant (F=11.6, p<0.001). However, not all coefficients are separately significant. For example, customers and square root of customers are not significant factors in prediction total bill value (p=0.503 and p=0.907 respectively). OMandA variable is significant (p<0.001). Regression model explains approximately 35.58% of variation in Total.Bill. Quite large part of variation is left unexplained.
In conclusion, I suggest excluding customers and square root of customers from this model and finding other factors that may have an impact on total.bill variable.