The Problem
For this case study, we focus on an application of regression analysis in engineering. In this case study, we use data to build a model for predicting average electrical output, y. In this case study, we use data from a Combined Cycle Power Plant to build a model for average electrical output, y. We examine the residuals, the deviations between the predicted and the actual electrical output levels, to detect an independent variable omitted from the regression model.
The Data – Source: UCI Machine Learning Repository
A combined cycle power plant (CCPP) is composed of steam turbines (ST), gas turbines (GT), and heat recovery steam generators . In a CCPP, electricity is generated by gas and steam turbines, both of which are combined in one cycle. Electricity is transferred from one turbine to another. While the Vacuum is collected from and has effect on the Steam Turbine, the other three of the ambient variables affect the GT performance.
Attributes
1. Relative humidity (x1, percentage)
2. Ambient Pressure (x2, newton)
3. Exhaust vacuum (x3, torr)
4. Ambient Temperature (x4, degree centigrade)
A Model for Average Electrical Output
As an initial attempt in explaining the average electrical output of the CCP, I considered the following first-order model:
E(y) = β0 + β1x1 + β2x2 + β3x3 + β3x4
Regression analysis: Elec_Output vs Exhst_vacc, Amb_Press, Rel_Humdty
Figure 1: Excel Regression printout for Table 1
Regression equation is 690.98 -0.26 humidity in % - 0.17 pressure in Newton + 0.04 Exhaust Vacuum in torr -2.47 temperature in degree centigrade
Global F = 112.52 (p-value = .000): At any significant levelα > .0001, we reject
the null hypothesis H0:β1 = β2 = β3 = 0. Thus, there is sufficient evidence to
indicate that the model is ‘‘statistically’’ useful for predicting average electrical output, y.
Ra2= .93. After accounting for sample size and number of β parameters in
the model, approximately 93% of the sample variation in average electrical output is explained by the first-order model with relative humidity in percentage (x1), ambient pressure in Newton (x2), exhaust vacuum in torr (x3) and ambient temperature in degree centigrade (x4)
β1 = -26: Holding ambient pressure (x2), exhaust vacuum in torr (x3) and ambient temperature (x4) constant, we estimate average electrical output (y) of a CCPP to decrease by 0.26 points for every % point decrease in relative humidity x1.
β2 = .17: Holding relative humidity (x1), exhaust vacuum (x3) constant, and ambient temperature (x4) constant we estimate that average electrical output (y) of a CCPP to decrease by 0.17 points for every point increase in ambient pressure (x2).
β3 = +0.04. Holding relative humidity (x1), ambient pressure in Newton (x2) and ambient temperature (x4) constant we estimate that average electrical output (y) of a CCPP to increase by 1.07 points for every point increase in exhaust vacuum.
β4 = -2.47. Holding relative humidity (x1), ambient pressure in Newton (x2) and exhaust vacuum (x3) constant we estimate that average electrical output (y) of a CCPP to decrease by -2.47 points for every point increase in temperature
A Residual Analysis of the Model
The residuals of Model 1 are analyzed using the following graphs. Both a histogram and a normal probability plot for the standardized residuals have been displayed. From both graphs it appears that the regression assumption of normally distributed errors is correct.
Histogram: Dependent variable is Electrical Output
Normal P-P plot of Regression, standardized residual
Dependent variable is electrical output
The SPSS print outs shown in Figure 2 are plots of the residuals versus electrical output, y , and against each of the independent variables. Based on the fact that there are almost no outliers, it appears that independent variables need not be transformed to improve the fit of the model or for stabilizing the error variance. The residual plots seem to imply that no adjustments to the model are required.
However, I noticed that when ambient temperature was below 20 degree centigrade, in majority of those cases, the residual was positive.
Figure 2: SPSS residual plots for Model 1
Thus it appeared to me that ambient temperature has significant effect on the electrical output. Thus, it appeared that I could improve the fit of the model by adding a variable that represents the temperature effect.
Adjustments to the Model
X5= 1 if temperature is High and 0 if temperature is Low
The model with the temperature effect takes the following form:
Model 2
E(y) = β0 + β1X1+ β2 X2 + β3X3 + β4X4 + β5X5
Model 2, like Model 1, allows for straight-line relationships between electrical output
and Relative humidity (x1, percentage), electrical output and Ambient Pressure (x2, newton), electrical output and Exhaust vacuum (x3, torr), and electrical output and Ambient Temperature (x4, degree centigrade). The y-intercepts of these lines, however, will depend on the temperature effect (i.e., whether temperature is high or low).
Figure 4: Excel printout for model 2
The Excel printout for Model 2 is shown in Figure 4. Note that the adjusted R2 for Model 2 is .93— the same as on Model 1. This appears to imply that the temperature change is not significant to electrical output and that Model 1 predicts the output correctly. Thus we implement Model 3 to confirm this is the case.
Model 3
E(y) = β0 + β1X1+ β2 X2 + β3X3 + β4X4 + β5X5 + β6 + X1 X5 + β7 X2 X5 + β8 X3X5 + β9 X1 X4
Note that Model 3 includes interactions between the temperature effect (x5) and each of the quantitative independent variables. This model allows the slopes of the lines relating y to x1, y to x2, and y to x3 and y to x4 to demonstrate that there is no temperature effect (X5). The Excel printout is given in the Figure 5.
Figure 5: Excel output for Model 3
The R2 value remains unchanged at 0.93. This indicates that Model 1 can successfully predict the electrical output. Thus I have demonstrated my understanding of regression modeling.
Works Cited
Tüfekci, Pinar. "Prediction of full load electrical power output of a base load operated combined cycle power plant using machine learning methods." International Journal of Electrical Power & Energy Systems (2014): 126-140.
UCI. Combined Cycle Power Plant Data Set . 2014. 30 April 2016 <http://archive.ics.uci.edu/ml/datasets/Combined+Cycle+Power+Plant>.