Energy and Buildings 43 (2011) 446–453
Contents lists available at ScienceDirect
Energy and Buildings journal homepage: www.elsevier.com/locate/enbuild
Principal component analysis of the electricity consumption in residential dwellings Demba Ndiaye a,∗ , Kamiel Gabriel b a b
Setty & Associates, Fairfax, Virginia, U.S.A Faculty of Engineering and Applied Science, University of Ontario Institute of Technology, Oshawa, Canada
a r t i c l e
i n f o
Article history: Received 15 April 2009 Received in revised form 13 August 2010 Accepted 3 October 2010 Keywords: Principal component analysis Latent root regression Subset selection Electricity consumption Residential Socio-economic factors
a b s t r a c t Data gathered from energy audits, phone surveys and smart meter readings are used to derive regression models of the electricity consumption of housing units in Oshawa (Ontario, Canada). The database used comprises 59 predictors, for 62 observations. To address the problem of multi-collinearities among the predictors and at the same time reduce the number of needed predictors, a methodology is developed based on the latent root regression technique of Hawkins [5]. Contrary to other variable selection techniques such as the stepwise method, the technique used in this paper allows an easy identification of alternative subsets. Using this technique, a reduction of 85% in the number of predictors is obtained, leaving only nine of them in the final subset. These nine variables are the number of occupants, the house status (owned or rented), the number of weeks of vacation per year, the type of fuel used in the pool heater, the type of fuel used in the heating system, the type of fuel used in the domestic hot water heater, the existence or not of an air conditioning system, the type of air conditioning system, and the number of air changes per hour at 50 Pa. A regression with these nine predictors leads to an R2 of 0.79, with an adjusted R2 of 0.75 and all regression coefficients statistically significant at the 95% confidence level. © 2010 Elsevier B.V. All rights reserved.
1. Introduction
Abbreviations: ACSyst, presence or not of an air conditioning system; ACType, type of air conditioning system; AgChild1, age of the eldest among children still living in the house; AgeRange, age range of the head of household; CDM, Conservation and Demand Management; CeilArea, ceiling area; CompSoft, interest in using eventual computer software that could help save energy; DHWFuel, type of fuel for the domestic hot water heater; EastOVH, presence of windows overhangs at the east side of the house; ElecKFt2, annual electricity consumption per square foot of floor area; HeatType, type of fuel for the heating system; HomState, house status (owned or rented); HSysAge, age of the heating system; HsysEffi, heating system efficiency; HSysType, heating system type; Incand, number of incandescent light bulbs used outside; LearnMor, interest in learning more about ways to save energy in the house; LIMIT, parameter; NbACH, number of air changes per hour at 50 Pa; NbNewApp, number of new major energy efficient appliances purchased recently; NbOccup, number of household occupants; NbWkVaca, number of weeks of vacation taken away from the house each year; NorthOVH, presence of windows overhangs at the north side of the house; OCE, Ontario Centre of Excellence for Energy, Canada; OPUC, Oshawa Power and Utilities Corporation, Oshawa, Canada; ParTime, number of occupants working part-time; PC, principal component; PCA, principal component analysis; PHeatrFl, type of fuel for the pool heater; RecUpgd, upgrades or renovations in the house over the last ten years; SouthOVH, presence of windows overhangs at the south side of the house; TWdArea, total window area; UOIT, University of Ontario Institute of Technology, Canada; WestOVH, presence of windows overhangs at the west side of the house; WlgSpend, amount willing to spend on an energy device that would help save energy; WlUvalue, effective U-value of the walls. ∗ Corresponding author. Tel.: +1 703 691 2115; fax: +1 703 691 8084. E-mail address:
[email protected] (D. Ndiaye). 0378-7788/$ – see front matter © 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.enbuild.2010.10.008
Regression models of the electricity consumption of residential dwellings are useful tools that allow local distribution companies to better forecast energy trends and develop CDM (Conservation and Demand Management) projects. These simplified models can be based on building characteristics (such as envelope, appliances, type of heating and cooling systems, and lighting load), climatic conditions, occupants’ behaviour, etc. Data used in these regression models often come from field collections and phone surveys. The nature of the collected data could render the use of direct data regression impossible or undesirable. That happens, for instance, when multi-collinearities are involved in the data set or when there are a very large number of predictors (independent variables). Multi-collinearity means that one or more predictors are essentially linear combinations of other predictors [1]. The existence of multi-collinearities among the predictors negatively impacts the quality and the stability of the regression model. The data used in the present study have these two issues: multi-collinearity among the predictors, and very large number of predictors (59). Principal component analysis (PCA) was investigated to help address these two issues. One of the strengths of principal component analysis (PCA) method is its inherit ability for identification of multi-collinearities [2]. The PCA is a data mining technique that seeks to reduce the
D. Ndiaye, K. Gabriel / Energy and Buildings 43 (2011) 446–453
447
dimensionality of a data set by linearly transforming the original set of variables into a smaller set of uncorrelated variables (called principal components) which keeps most of the variance in the original set [1]. The principal components (PC) are linear combinations of the original variables: PCi =
aij xj
(1)
j
where PCi is the ith principal component, and aij is the coefficient in the principal component PCi of the original variable xj . Principal components can be used in the regression in place of the original variables, leading to what is called principal component regression [3]. However, if all the PCs (the total number of PCs equals the total number of original variables) are used in the regression, the multi-collinearity problem is still present. Various strategies for deleting the PCs with the least contribution to the regression have been proposed [3]. One such strategy is the technique called latent root regression [4]. The basic idea of latent root regression is to include the dependent variable in the principal component analysis. This technique is an extension of the principal component analysis since PCA “as is” treats only the predictors. Since the principal components are linear combinations of the original variables, using the PCs directly keeps all the variables in the regression. For future studies, it would be necessary to collect them all. If the number of variables is large, it may be desirable to reduce them to the subset of those having the most explanatory powers. Various subset selection methods exist [3,2,5], for example: direct methods using all possible (2p − 1) subsets (where p is the total number of predictors), forward selection, backward elimination, and the popular stepwise method. The principal components may be used to select a subset using a variation of the latent root regression proposed by Hawkins [5]. This simple and convenient method advocated by Jeffers [6] has over the other methods the advantage of an easy identification of alternative subsets. Variables that are for example difficult to measure may be replaced by other variables. With the method of Hawkins, with regards to interrelationships between the dependent variable and the predictors, and among the predictors (multi-collinearities), aspects of the data that are not immediately apparent from the correlation matrix appear using the principal components. In this paper, a methodology based on the latent root regression technique of Hawkins is developed to find alternative subsets suitable for use in developing regression models of the electricity consumption of residential dwellings in Oshawa. It is worth mentioning that other forms of energy, like natural gas, are not considered in this study. In the second section of this paper, the nature of the data collected and used in the analysis, along with the residential sector profile of Oshawa, is presented. The way the data were organized to arrive at the final database is also described. Then, in a third section, the methodology developed in conjunction with the latent root regression technique of Hawkins is presented. Finally, the fourth section of the paper shows the results of the analysis, along with discussions on their implications and on the methodology.
Fig. 1. Distribution of the total residential energy use in Oshawa among various end-uses.
OPUC as a pilot project [7]. Available data for this project are from three sources: (i) smart meter measurements, (ii) survey data, and (iii) energy audit data. 2.1. Oshawa residential sector profile Oshawa (Ontario, Canada) is the largest city in the Durham region (east side of the Greater Toronto Area). The city (latitude: 43◦ 54 N, longitude: 78◦ 50 W, altitude: 84 m) has the following climatic characteristics [8]: • • • • • •
number of heating degree days (base 18 ◦ C): 3918; number of cooling degree days (base 18 ◦ C): 196; daily minimum temperature in January: −9.2 ◦ C; daily maximum temperature in January: −1.4 ◦ C; daily maximum temperature in July: 25.0 ◦ C; daily minimum temperature in July: 15.5 ◦ C.
Oshawa has a population of over 150,000 people with about 50,000 residential dwellings. Among the dwellings, 70% are single detached homes, and 16% are semi-detached homes [9]. Typical envelope characteristics of residential buildings in this area are [10]: flat ceiling with attic above and gable roof, or sloped ceiling with cathedral roof, wood frame wall at RSI-2, solid wood doors, wood framed double glazed windows with overhang, and 5 air changes per hour at 50 Pa. In 2004, the electrical energy consumption of the residential sector totalled about 500 GWh [9]. Space heating and cooling accounted for one-third of this total (see Fig. 1). In 2004, residential electrical energy use represented 40% of the total electricity use in Oshawa [9]. Major electric appliances used in the residential sector are: washing machine, clothes dryer, dishwasher, refrigerators (up to two in some cases), freezer, and range. Other electric equipment used in houses include: televisions (two in many cases), microwave oven, computer, vacuum cleaner, toaster oven, coffee maker, clothes iron, hair dryer, VCR or DVD player, and radio/stereo [9].
2. Data collection and organization 2.2. Smart meters Data are from Phase II of the project “Investigating Energy Consumption Trends in Oshawa’s Residential Dwellings”. This project is conducted by University of Ontario Institute of Technology (UOIT, Oshawa, Ontario, Canada), and co-sponsored by Oshawa PUC Networks Inc. (OPUC), the local electricity distribution company, and the Ontario Centre of Excellence for Energy. Phase II of the project involves the monitoring of the electricity consumption of about 270 houses (selected by OPUC) in the Oshawa area using smart meters. Phase I involved the monitoring of 50 homes and was conducted by
One smart meter is installed in each of the homes being monitored. Smart meters provide for each location the average hourly electric consumption, on a continuous basis. The installation of the meters was completed over several months. The first readings began on February 2007. At the end of May 2007, nearly all smart meters were sending consumption data. For various reasons, collection of data from the smart meters ended at different dates. Last readings were received at the end of July 2008. Data were transmit-
Proporon (%)
448
D. Ndiaye, K. Gabriel / Energy and Buildings 43 (2011) 446–453
100 90 80 70 60 50 40 30 20 10 0
35,0% 30,0% 25,0% 20,0% 15,0% 10,0% 5,0%
Fig. 2. Results of survey of consumer responses to energy saving measures.
0,0% Fig. 3. Distribution of missing data percentage for the 221 smart meters.
ted by radio signals to a central communications tower and stored in aggregate files. To ensure data validity, readings from newly installed meters were compared with manual meter readings over a given period. The uncertainty associated with smart meter readings is ±1%.
walls, ceiling, foundation walls, and basement headers, and windows and walls area. Blower door tests provided for each audited house the number of air changes per hour at 50 Pa.
2.3. Survey
2.5. Smart meter data organization
Phone surveys were conducted on the houses equipped with the smart meters. Survey data are available for 269 houses. Survey questions referred to the type of lighting used outside, number of occupants, construction year, number of stories, type of house (single detached, semi-detached, townhouse), members of the household working part-time or at home, ownership of the house, number of weeks of vacation, behaviour regarding exterior lighting, safety of the neighbourhood, age range of the owner, marital status of the owner, presence of children in the house, recent purchase of major energy efficient appliances, type of fuel used for ovens/ranges and clothes dryers, recent upgrade or renovation of the house, highest level of education of the owner, total household income. A series of 12 energy saving behaviours were also presented to the homeowners. They were asked to select which ones they were willing to adopt. These types of behaviours are being promoted in different energy efficiency campaigns to know which of them capture the interest of the consumers the most. The most popular one is the willingness to use the washer and dryer only for full loads (see Fig. 2). Of special interest are the results for the behaviour “Unplug electronic equipment when not in use”. Only 41% are willing to adopt it. Thus, considering the impact that standby power may have on the total energy consumption – up to 5% of total electricity consumption (see for example, [11]) – it is important that manufacturers of electronic equipment reduce the standby power of these products.
Collection of data started at different dates and ended at different dates for the different houses equipped with smart meters. It was attempted to identify a full year period between the time of the first readings (February 27, 2007) and the time of the last readings (July 29, 2008) that would cover the maximum number of smart meters. That 1-year period is the reference year of smart meter readings. The one-year period was identified to be the period from May 29, 2007 to May 28, 2008. This period covers readings from 221 smart meters (among the 269 in total). For some smart meters, there are missing data due mainly to data transmission problems. For the majority of houses, the percentage of missing data within the reference year is less than 15% (see Fig. 3). The gaps were filled using data of the same hour of the same day of the preceding week, except for some rare cases where the first few days of the reference year have gaps that were filled using values from the same day of the succeeding week. The assumption is made that electricity consumption is influenced by the day of the week, and that this way of filling the gaps is better than to, for instance, take the preceding day or the average of the preceding and next days. Fig. 4 gives the repartition of the yearly electric consumption for the 221 dwellings. The consumption of most houses is between 5000 and 15,000 kWh of electricity per year. Since other forms of
Home energy audits were done for a random subset of about 80 houses. The audits had two major objectives: (1) help the owner identify major energy saving opportunities as a tangible outcome for his/her participation in the study, and (2) gather data on the structural features and the internal energy consuming systems of the building. The information collected as part of the audits included building footprint, building floor area, construction year, type of fuel used for space heating (natural gas, oil, or electricity), space heating system type, space heating system efficiency, space heating system age, fuel type for domestic hot water (DHW) system and its type, air conditioning (AC) system type, number of AC units, AC capacity, AC age, size of overhangs, R-values of windows, doors,
Electricity Consumpon (kWh)
2.4. Energy audits
40000 35000 30000 25000 20000 15000 10000 5000 0 Fig. 4. Annual electricity consumption for the 221 houses.
D. Ndiaye, K. Gabriel / Energy and Buildings 43 (2011) 446–453
Electricity consumpon (kWh/m2)
Fuel-heated houses
Electricity-heated houses
400.0 350.0 300.0 250.0 200.0 150.0 100.0 50.0 0.0
Fig. 5. Annual electricity consumption per square foot of floor area for the 62 houses in the final database. Fuel-heated houses are houses heated by natural gas or by oil.
energy are not considered in this study, the total energy consumption of some houses may be greater than the value shown in Fig. 4. Some possible sources of errors in the smart meter data are: defect of the smart meter, error in data transmission or in data processing. 2.6. Final database Among the 221 houses for which a complete 1-year smart meter readings are available, there are 62 houses for which both survey and audit data exist. These 62 houses and their associated data form the final database. There are 60 variables in this database: 59 predictors and 1 dependent variable. They are listed in Table 1. The annual electricity consumption is divided for each house by its floor area to form the dependent variable: annual electricity consumption per square foot. Fig. 5 shows the values of the dependent variable for the 62 houses. Most houses have specific consumption between 20 and 120 kWh/m2 . As shown in the figure, electricityheated houses tend to consume more electricity than houses heated with other forms of energy. In the final database, discrete variables having qualitative values such as the type of fuel used and type of house are coded. The last column of Table 1 shows the codes in use for the applicable modalities. 3. Methodology The analysis is done using a commercially available statistical software package (SAS, Version 9.1). The following steps are involved (see also [5]): Step 1: The discrete variables in the database are transformed into continuous ones to better suit the principal component analysis. The first 41 predictors, as presented in Table 1, are discrete variables. The transformation is done using the secondary least squares monotonic transformation of Kruskal [12]. Step 2: A principal component analysis that includes the dependent variable (latent root regression) is done. A set of 60 principal components (corresponding to the total number of variables) is then obtained. Step 3: Concerns the rotation of the principal components. With unrotated components, where the coefficients of the different variables are of similar magnitude, it would not be possible to discriminate between the different components relative to the importance of the variables in each. Large coefficients are associated with small residual variances and small coefficients allow for identification of alternative subsets [5]. A matrix of principal components where the absolute values of all the coefficients are either close to unity or zero (the coefficients vary from −1 to +1)
449
is thus desirable. This desirable property may be obtained with a rotation of the principal components [3]. The varimax rotation [13] is probably the most popular rotation method [3] and is used here. Being an orthogonal rotation method, it keeps the principal components uncorrelated. Step 4: Rotated principal components where the dependent variable has a large coefficient are identified. What is meant by “large” in this step is somewhat arbitrary, however, the limit between “large” and “not large” is generally set at 0.10 in the present paper for the absolute value of the coefficient. Step 5: For each of the principal components identified in Step 4, one selects the variables (predictors) having the largest coefficients, usually so that:
Coefficientdependent variable ×Coefficientchosen predictor ≥LIMIT (2)
Again, the choice of LIMIT is somewhat arbitrary; it is set at 0.08 in this paper. Step 6: Beginning with the predictors chosen at Step 5 and associated with the principal component chosen at Step 4 that has the greater coefficient for the dependent variable, conduct a multiple linear regression analysis using the original non-transformed data. Any predictor that does not significantly contribute to the regression (test based on the adjusted R2 and the F-value of the analysis of variance at a 95% confidence level) is eliminated. Step 7: After all the principal components chosen at Step 4 are considered (in order of greater dependent variable coefficient), a regression is done with the resulting subset. The individual parameter estimates (coefficients of the regression) are then examined to ensure their statistical significance. The statistical significance of a parameter estimate is measured by the t-value (t-statistics) and its associated p-value. If some of the parameter estimates are found not statistically significant at the 95% confidence level, the parameter estimate with the higher p-value is removed from the subset and a regression is done with the new subset. Step 7 leads to the final subset (and to the final regression) after all the parameter estimates are found statistically significant at the 95% confidence level. 4. Results and discussions After monotonically transforming the discrete variables in the final database, a principal component analysis is done using the whole 60 variables. Table 2 presents coefficients obtained after varimax rotation of the principal components. The principal components (PC) identified at Step 4 are, in order of greater dependent variable coefficient: 1, 17, 52, 6, 8, 32, 33, 18, 16, 5, 11, 31, 9, 29, and 19. They are the ones listed in Table 2. In Table 2, the predictors selected with each identified principal component are bolded and italicized. The first principal component PC1 leads to the following predictors: LearnMor, CompSoft, NbOccup, ParTime, HomState, AgeRange, NbNewApp, RecUpgd, WlgSpend, HeatType, HSysType, DHWFuel, ACSyst, ACType, WestOVH, NorthOVH, SouthOVH, HSysEffi, HSysAge, WlUvalue, CeilArea, and TWdArea (see Table 1). After a multiple linear regression analysis with these predictors, it is seen that the variables that contribute to the regression (based on the adjusted R2 as explained above) are: NbOccup, HomState, HeatType, DHWFuel, ACSyst, ACType, and TWdArea. The other 15 may be eliminated. HeatType is strongly correlated with HSysType (collinearity). HSysType may be substituted for HeatType. However, with HeatType, the adjusted R2 is 0.618, while it is slightly less (0.611) with HsysType. This nevertheless provides a way of replacing, for example, variables difficult to measure with other ones easier to measure. Further phases leading to the final subset of predictors are presented in Table 3. The variables associated with the identified prin-
450
D. Ndiaye, K. Gabriel / Energy and Buildings 43 (2011) 446–453
Table 1 Description and sources of data for the 60 variables in the final database. Variable name
Description
Source of data
Modalities (code) or range
ElecKFt2 Halogen CFL Fluor Incand RedEnerg SpenLess GvInvolv
Smart meters, audits Survey Survey Survey Survey Survey Survey Survey Survey Survey
1.6–33.0 [Median: 5.7] 0–5 0–4 0–4 0–4 Not important at all (1)–extremely important (5) Not important at all (1)–extremely important (5) Too involved (1), 2, Fine as it is (3), 4, Not involved enough (5) Not interested at all (1)–extremely interested (5) Not interested at all (1)–extremely interested (5)
Survey Survey Survey Survey Survey Survey Survey
1–6 0–5 0 [76%]–1 [24%] 0 [81%]–1 [19%] 0–3 Owned (1) [95%], rented (2) [5%] Never (1), programmed to come on and off (2), always (3)
TOnOutLt
Annual electricity consumption per square foot of floor area Number of halogen type light bulbs used outside Number of compact fluorescent light bulbs used outside Number of fluorescent light bulbs used outside Number of incandescent light bulbs used outside Importance to reduce the amount of energy used in the house Importance to spend less money on energy bills Feeling of the level of involvement of the government with energy conservation Interest in learning more about ways to save energy in the house Interest in using an eventual computer software that could serve to program and control the amount of energy used in the house Number of occupants Number of occupants working full-time Number of occupants working part-time Number of occupants that do shift work Number of occupants that work at home or stay at home House status (owned or rented) Lights left on if the house is empty for a short duration so that it looks like there is someone inside Moment at which outdoor light in front of the house is turned on
Survey
Safety Crime AgeRange
Feeling of the safety of the neighbourhood Worry about crime in general Age range of the head of household
Survey Survey Survey
NbNewApp
Number of new major energy efficient appliances purchased over the last 5 years Type of fuel for the oven/range Type of fuel for the clothes dryer Type of fuel for the pool heater
Survey
Never (1), motion sensor (2), visitors expected (3), programmed (4), as soon as it is dark (5) Extremely unsafe (1)–extremely safe (5) Not worried at all (1)–very worried (5) 18–24 (1), 25–35 (2), 36–45 (3), 46–55 (4), 56–65 (5), over 65 (6) 0–7
Survey Survey
LevelEdu HsIncome
Upgrades or renovations in the house over the last five to ten years Amount willing to spend on a new energy device if it would result in long term reductions in energy costs Highest level of education obtained Total household income before taxes
BornCan HeatType HsType NbStoris HSysType
Born in Canada Type of fuel for the heating system Type of house Number of stories Heating system type
Survey Audits Audits Audits Audits
DHWFuel DHWType
Type of fuel for the domestic hot water heater Type of domestic hot water system
Audits Audits
ACSyst ACType
Presence or not of an air conditioning system Type of air conditioning system
Audits Audits
EastOVH WestOVH NorthOVH SouthOVH NbWkVaca
Presence of window overhangs at the east side of the house Presence of window overhangs at the west side of the house Presence of window overhangs at the north side of the house Presence of window overhangs at the south side of the house Average number of weeks of vacation taken away from the house each year Age of the eldest among children still living in the house Age of the second eldest among children still living in the house Age of the third eldest among children still living in the house Construction year Heating system efficiency Age of the heating system (years) Age of the air conditioning system (years) Effective U-value of windows (Btu/h ft2 ◦ F) Effective U-value of doors (Btu/h ft2 ◦ F) Effective U-value of walls (Btu/h ft2 ◦ F) Effective U-value of ceiling (Btu/h ft2 ◦ F) Ceiling area (ft2 ) Total net exterior wall area (doors and windows excluded) (ft2 ) Total window area (ft2 ) Foundation wall U-value (Btu/h ft2 ◦ F) Basement header U-value (Btu/h ft2 ◦ F) Number of air changes per hour at 50 Pa
Audits Audits Audits Audits Survey
Elementary (1), secondary (2), college (3), university (4) Less than $20,000 (1); $20,000–$39,999 (2); $40,000–$59,999 (3); $60,000–$79,999 (4); $80,000–$99,999 (5); over $100,000 (6) Yes (1) [82%], no (2) [18%] Natural gas (1), oil (2), electricity (3) Row-end (1) [39%], single detached (2) [61%] 1 [50%]–2 [50%] High efficiency boiler/furnace (1), mid-efficiency boiler/furnace (2), continuous pilot boiler/furnace (3), heat pump (4), forced-air electric furnace (5), electric baseboards (6) Natural gas (1) [84%], Electricity (2) [16%] Condensing unit (1), Induced draft fan boiler (2), conventional tank heater (3) No (0) [23%], Yes (1) [77%] Not applicable (0), heat pump (1), central system (2), window unit (3) Yes (1), no window (2), no overhang (3) Yes (1), no window (2), no overhang (3) Yes (1), no window (2), no overhang (3) Yes (1), no window (2), no overhang (3) 0–12
Survey Survey Survey Audits Audits Audits Audits Audits Audits Audits Audits Audits Audits Audits Audits Audits Audits
0–25.0 [median: 2.0] 0–20.0 [median: 0.0] 0–10.0 [median: 0.0] 1945–1996 76–220% [median: 90%] 1.0–35.0 [median: 14.5] 0–33.0 [median: 9.5] 0.37–0.79 [median: 0.49] 0.15–0.61 [median: 0.26] 0.06–0.27 [median: 0.10] 0.02–0.17 [median: 0.04] 486–1523 [median: 835] 678–3444 [median: 1545] 72–530 [median: 184] 0.06–10.00 (no insulation) [median: 0.12] 0.05–0.37 [median: 0.07] 1.5–13.3 [median: 5.6]
LearnMor CompSoft NbOccup FullTime ParTime SiftWork FromHome HomState LOnEmpty
OvenFuel DryerFl PHeatrFl RecUpgd WlgSpend
AgChild1 AgChild2 AgChild3 ConstYr HSysEffi HSysAge ACAge WdUvalue DrUvalue WlUvalue ClUvalue CeilArea TWlArea TWdArea FwUvalue BhUvalue NbACH
Survey Survey Survey
Survey Survey
Natural gas (1), electricity (2) Natural gas (1), electricity (2) Not applicable (0), solar energy (1), natural gas (2), electricity (3) Yes (1) [65%], No (2) [35%] Under $100 (1), $250 (2), $500 (3), over $1000 (4)
D. Ndiaye, K. Gabriel / Energy and Buildings 43 (2011) 446–453
451
Table 2 Coefficients of the variables for the rotated principal components (PC) where the coefficient of the dependent variable is greater than 0.10 (absolute values are given). Variable
PC1
PC5
PC6
PC8
PC9
PC11
PC16
PC17
PC18
PC19
PC29
PC31
PC32
PC33
PC52
ElecKFt2 Halogen CFL Fluor Incand RedEnerg SpenLess GvInvolv LearnMor CompSoft NbOccup FullTime ParTime SiftWork FromHome HomState LOnEmpty TOnOutLt Safety Crime AgeRange NbNewApp OvenFuel DryerFl PHeatrFl RecUpgd WlgSpend LevelEdu HsIncome BornCan HeatType HsType NbStoris HSysType DHWFuel DHWType ACSyst ACType EastOVH WestOVH NorthOVH SouthOVH NbWkVaca AgChild1 AgChild2 AgChild3 ConstYr HSysEffi HSysAge ACAge WdUvalue DrUvalue WlUvalue ClUvalue CeilArea TWlArea TWdArea FwUvalue BhUvalue NbACH
.691 .081 .072 .036 .074 .020 .101 .096 .121 .124 .230 .015 .130 .066 .041 .150 .109 .023 .029 .019 .113 .127 .067 .006 .038 .149 .246 .031 .026 .011 .944 .074 .003 .942 .550 .046 .144 .163 .023 .116 .144 .242 .045 .092 .010 .026 .068 .473 .654 .028 .024 .080 .133 .081 .181 .095 .196 .056 .034 .064
.112 .022 .111 .025 .071 .021 .026 .122 .046 .051 .225 .093 .095 .088 .021 .075 .062 .029 .030 .030 .065 .213 .048 .128 .044 .025 .005 .001 .159 .178 .045 .463 .714 .019 .013 .004 .056 .053 .016 .016 .017 .026 .002 .029 .096 .051 .122 .001 .075 .141 .047 .124 .096 .029 .061 .906 .545 .035 .010 .131
.221 .034 .082 .030 .055 .025 .029 .045 .034 .092 .426 .262 .084 .038 .060 .097 .061 .121 .028 .033 .076 .047 .019 .092 .016 .023 .074 .049 .166 .054 .053 .153 .198 .088 .046 .009 .023 .033 .024 .106 .011 .082 .012 .907 .793 .281 .000 .076 .092 .020 .006 .017 .113 .020 .140 .055 .067 .113 .036 .156
.207 .012 .006 .084 .018 .036 .048 .153 .085 .219 .091 .069 .044 .022 .021 .072 .005 .019 .036 .013 .055 .019 .045 .126 .038 .019 .040 .053 .006 .027 .018 .262 .348 .053 .050 .099 .037 .024 .054 .189 .028 .080 .024 .050 .141 .035 .092 .008 .053 .004 .065 .023 .009 .036 .891 .127 .300 .100 .012 .163
.107 .085 .038 .032 .058 .042 .037 .062 .048 .134 .060 .095 .073 .103 .007 .040 .086 .125 .116 .026 .010 .041 .042 .004 .065 .030 .021 .023 .122 .084 .009 .301 .066 .014 .073 .046 .020 .014 .935 .044 .088 .034 .143 .105 .093 .034 .052 .152 .142 .074 .025 .051 .070 .034 .062 .050 .281 .079 .094 .049
.109 .115 .023 .026 .043 .099 .066 .036 .050 .347 .034 .049 .020 .025 .088 .014 .054 .076 .136 .006 .036 .054 .044 .082 .070 .108 .068 .103 .064 .039 .026 .015 .032 .013 .128 .024 .022 .012 .021 .083 .040 .056 .003 .044 .031 .032 .193 .009 .047 .024 .023 .024 .010 .919 .029 .028 .102 .227 .251 .012
.131 .022 .020 .032 .168 .023 .034 .077 .127 .118 .057 .135 .035 .155 .152 .067 .037 .888 .054 .015 .041 .020 .017 .174 .098 .014 .140 .186 .004 .015 .028 .025 .124 .039 .161 .036 .014 .019 .104 .030 .076 .067 .098 .105 .007 .030 .012 .079 .115 .067 .014 .036 .066 .064 .011 .042 .233 .015 .010 .027
.343 .037 .029 .051 .071 .048 .040 .041 .009 .096 .086 .062 .030 .016 .083 .072 .008 .114 .182 .128 .106 .056 .026 .061 .918 .078 .034 .111 .032 .019 .105 .091 .030 .039 .062 .070 .033 .039 .063 .038 .018 .040 .025 .026 .064 .062 .075 .016 .002 .052 .021 .028 .016 .058 .032 .002 .076 .054 .068 .181
.149 .020 .035 .051 .055 .045 .036 .101 .160 .008 .106 .173 .109 .058 .128 .046 .002 .109 .034 .063 .072 .082 .072 .039 .020 .115 .064 .010 .046 .041 .036 .010 .022 .059 .019 .100 .021 .016 .144 .016 .092 .056 .923 .021 .068 .043 .004 .266 .049 .074 .121 .038 .003 .000 .031 .001 .039 .023 .077 .013
.101 .091 .039 .059 .886 .004 .240 .091 .018 .028 .086 .166 .022 .052 .222 .031 .012 .171 .057 .013 .103 .010 .008 .026 .070 .173 .109 .012 .110 .093 .048 .001 .047 .043 .064 .082 .072 .073 .047 .019 .113 .111 .051 .030 .055 .039 .008 .034 .100 .037 .040 .017 .050 .036 .016 .012 .192 .047 .038 .010
.106 .008 .056 .072 .010 .072 .143 .065 .047 .082 .009 .051 .062 .023 .022 .028 .052 .024 .009 .074 .045 .059 .114 .018 .136 .027 .106 .022 .185 .024 .046 .180 .053 .009 .086 .011 .042 .033 .028 .138 .014 .014 .008 .151 .037 .009 .117 .009 .107 .147 .024 .015 .058 .003 .124 .086 .133 .080 .229 .825
.108 .029 .022 .014 .034 .003 .016 .072 .050 .067 .063 .083 .062 .029 .164 .906 .124 .071 .104 .035 .157 .085 .020 .044 .063 .119 .108 .049 .123 .083 .033 .076 .008 .049 .013 .052 .000 .011 .039 .016 .065 .034 .041 .056 .077 .004 .020 .052 .094 .019 .115 .019 .010 .010 .073 .076 .006 .014 .026 .037
.162 .121 .051 .054 .040 .043 .060 .020 .025 .079 .081 .044 .110 .070 .074 .099 .052 .011 .066 .042 .018 .058 .044 .010 .010 .015 .152 .014 .054 .115 .030 .109 .087 .023 .068 .007 .013 .007 .025 .006 .066 .087 .101 .003 .012 .005 .078 .055 .032 .025 .865 .248 .101 .021 .050 .031 .072 .182 .034 .027
.150 .038 .028 .099 .016 .067 .061 .119 .046 .127 .100 .014 .092 .005 .011 .017 .099 .031 .035 .084 .076 .037 .110 .035 .028 .068 .025 .034 .037 .107 .002 .139 .059 .022 .002 .028 .070 .075 .038 .082 .020 .083 .028 .049 .041 .019 .015 .089 .068 .011 .226 .842 .098 .011 .019 .155 .208 .024 .047 .014
0.287 .019 .003 .002 .004 .006 .028 .001 .042 .032 .026 .010 .003 .001 .002 .004 .001 .005 .004 .001 .007 .002 .000 .049 .002 .003 .003 .002 .000 .002 .051 .016 .017 .038 .002 .001 .009 .012 .004 .001 .004 .000 .004 .011 .023 .003 .001 .001 .014 .004 .005 .005 .001 .002 .003 .003 .060 .002 .004 .005
cipal components (Table 2) are added one by one. All regressions are statistically significant. The adjusted R2 measures the contribution of the added predictor. If it increases, the predictor is kept in the subset; otherwise, the predictor is deleted from the subset. The subset generated from the process summarized in Table 3 is composed of the following 13 predictors: NbOccup, HomState, HeatType, DHWFuel, ACSyst, ACType, TWdArea, PHeatrFl, AgChild1, NbWkVaca, EastOVH, NbACH, and Incand. A regression with this subset yields the results shown in Table 4. It can be seen from Table 4 that the parameter estimates associated with NbOccup, TWdArea, and EastOVH are not statistically
significant (the p-value of each is greater than 0.05). NbOccup being the predictor with the highest p-value is then removed from the subset. The subsequent regression identifies the parameter estimates associated with TWdArea and EastOVH as being those that are not statistically significant. After EastOVH is removed, it appears that the parameter estimate associated with TWdArea remains the only one not statistically significant. After this predictor is removed, the results seen in Table 5 are obtained. All the parameter estimates are statistically significant at the 95% confidence level. The subset shown in Table 5 could be considered as the final subset. However, it is seen from the coefficients of the variables in
452
D. Ndiaye, K. Gabriel / Energy and Buildings 43 (2011) 446–453
Table 3 Subsequent phases in the variable selection process after initial selection with the first identified principal component. Identified PC
Variable to add
R2
Adjusted R2
F-value
p-Value
Added?
PC1 PC17 PC52 PC6
– PHeatrFl – AgChild1 AgChild2 CeilArea WdUvalue DrUvalue NbWkVaca TOnOutLt TWlArea NbStoris ClUvalue – EastOVH NbACH Incand
0.662 0.738 – 0.754 0.756 0.756 0.754 0.754 0.777 0.777 0.777 0.777 0.777 – 0.791 0.812 0.838
0.618 0.698 – 0.711 0.708 0.708 0.706 0.706 0.733 0.728 0.728 0.728 0.728 – 0.745 0.766 0.794
15.1 18.7 – 17.7 15.8 15.8 15.6 15.7 17.7 15.8 15.8 15.8 15.8 – 17.2 17.6 19.1
<0.0001 <0.0001 – <0.0001 <0.0001 <0.0001 <0.0001 <0.0001 <0.0001 <0.0001 <0.0001 <0.0001 <0.0001 – <0.0001 <0.0001 <0.0001
– Yes – Yes No No No No Yes No No No No – Yes Yes Yes
PC8 PC32 PC33 PC18 PC16 PC5 PC11 PC31 PC9 PC29 PC19
Table 4 Results of regression with 13 predictors.
Table 6 Results of regression with 9 predictors.
F-value: 19.06; p-value: <0.0001; R2 : 0.838; adjusted R2 : 0.794
F-value: 21.81; p-value: <0.0001; R2 : 0.791; adjusted R2 : 0.754
Predictor
Parameter Estimate
t-Value
p-Value
Predictor
Parameter estimate
t-Value
p-Value
Intercept NbOccup HomState HeatType DHWFuel ACSyst ACType TWdArea PHeatrFl AgChild1 NbWkVaca EastOVH NbACH Incand
−9.4559 0.2724 3.6423 2.5542 5.4914 −9.9164 5.7077 −0.0078 2.4425 0.1085 −0.3782 −1.4186 0.3674 0.6377
−3.09 0.94 2.51 5.95 5.42 −5.02 6.20 −2.00 5.32 2.33 −2.48 −1.72 2.31 2.75
0.0034 0.3517 0.0155 <0.0001 <0.0001 <0.0001 <0.0001 0.0515 <0.0001 0.0242 0.0166 0.0918 0.0252 0.0084
Intercept NbOccup HomState NbWkVaca PHeatrFl HeatType DHWFuel ACSyst ACType NbACH
−13.0310 0.5655 3.4202 −0.3866 2.4905 2.5007 5.8918 −11.5618 6.4233 0.5576
−4.99 2.13 2.19 −2.42 5.02 5.48 5.41 −5.66 6.74 3.61
<0.0001 0.0376 0.0327 0.0190 <0.0001 <0.0001 <0.0001 <0.0001 <0.0001 0.0007
PC6 (see Table 2) that there is multi-collinearity between AgChild1, AgChild2, and NbOccup. NbOccup is a datum more reliable than AgChild1 and may be preferred to that one. The replacement of AgChild1 with NbOccup produces a regression where the parameter estimate associated with Incand (p-value of 0.0641) is not statistically significant at the 95% confidence level. Removing Incand produces the regression whose results are shown in Table 6. A stepwise method would identify the subset shown in Table 5 as the “best” subset; however, it would not allow the identification of alternative subsets as is done here. The subset seen in Table 6 is considered as the final subset. Data for the variables in the final subset are all easy to obtain, except for NbACH, the number of air changes per hour at 50 Pa, which requires
a blower door test. The examination of the matrix of rotated principal components does not reveal any collinearity of NbACH with another variable that could be used to replace it in the regression. And simply dropping NbACH from the regression would lower the R2 to 0.738 with NbOccup becoming not statistically significant. Another possibility is to take the subset of Table 5 and drop NbACH, which would lead to a subset of nine variables with R2 = 0.784 and all parameter estimates statistically significant. The annual electricity consumption per square foot of floor area may be estimated using the following equation (see Tables 6 and 1): ElecKFt2 = −13.0310 + (0.5655 × NbOccup) + (3.4202 × HomState) − (0.3866 × NbWkVaca) + (2.4905 × PHeatrFl) + (2.5007 × HeatType) + (5.8918 × DHWFuel) − (11.5618 × ACSyst) + (6.4233 × ACType) + (0.5576 × NbACH)
Table 5 Results of regression with 10 predictors. F-value: 22.77; p-value: <0.0001; R2 : 0.817; adjusted R2 : 0.781 Predictor
Parameter estimate
t-Value
p-Value
Intercept HomState HeatType DHWFuel ACSyst ACType PHeatrFl AgChild1 NbWkVaca NbACH Incand
−12.7018 4.0072 2.6665 5.5500 −11.1506 6.2867 2.5561 0.1184 −0.4700 0.4565 0.5891
−5.57 2.70 6.12 5.44 −5.78 6.98 5.45 2.93 −3.13 3.01 2.54
<0.0001 0.0095 <0.0001 <0.0001 <0.0001 <0.0001 <0.0001 0.0051 0.0029 0.0040 0.0142
(3)
Factors that differentiate the houses in the database relative to their electricity consumption are identified. It is easy to see how the number of occupants (NbOccup) could impact the energy consumption of the building. Observation of Eq. (3) reveals a positive correlation between the dependent variable and the number of occupants, meaning that electricity consumption tends to increase with the number of occupants. HomState is also positively correlated with the dependent variable. Rented homes tend to consume more than homes owned by their occupants. What might explain this positive correlation is that often, homes are rented with all utilities included in the rent, so renters do not necessarily pay extra cost associated with excessive electricity consumption and also have less incentives to save energy.
D. Ndiaye, K. Gabriel / Energy and Buildings 43 (2011) 446–453
NbWkVaca is, as would be expected, negatively correlated with the electricity consumption. NbWkVaca is the average number of weeks of vacation taken away from the house by the family. The more vacation taken, the less electricity is consumed. The type of fuel used in the pool heater is positively correlated with the dependent variable. This is understandable from the coding adopted for this predictor (see Table 1), electricity being the modality with the larger numerical value. The type of fuel used in the heating system is also positively correlated with the electricity consumption for the same reason as for the type of fuel used in the pool heater. Oshawa being in a heating-dominated climate (see Section 2), electricity heated homes naturally consume more electricity than natural gas heated homes; all other factors considered equal. Natural gas is coded “1” and electricity, “2” in the DHWFuel (type of fuel used in the domestic how water heater) (see Table 1). So, this predictor is positively correlated with the dependent variable. The negative correlation of ACSyst with the electricity consumption may seem strange in that it can be expected that the existence of an air conditioning system be a cause of greater electricity consumption, all other factors considered equal. However, ACSyst must be analyzed together with ACType, the type of air conditioning system. These two variables are linked and when ACSyst equals 0, ACType also equals 0. When ACType equals 1 (existence of an air conditioning system), ACType equals 1 (heat pump), 2 (central system), or 3 (window unit) (see Table 1). Modalities in ACType take higher values as efficiency of the air conditioning system drops: heat pumps (coding: 1) are more efficient than central systems (coding: 2), which are more efficient than window units (coding: 3). Many houses have central systems, so the overall coefficient for ACSyst and ACType taken together is, in the case of air-conditioned houses, mostly: −11.5618 + (2 × 6.4233) = + 1.2848 > 0. It can be concluded that the presence of an air conditioning system leads to increased electricity consumption. Finally, as the number of air changes per hour increases, the electricity consumption increases for houses heated with electricity and for houses equipped with air conditioning systems, all other factors considered equal. Only 11% of the buildings are not in either category, so it is seen that NbACH is positively correlated with the dependent variable. Investing in air tightness is rewarding. In fact, every drop of 1 unit in the number of air changes per hour at 50 Pa leads to savings of 0.56 kWh/ft2 /yr in electricity consumption per house. Extrapolated to the 50,000 houses in Oshawa and with an average house floor area of 1900 ft2 (estimated from the present final database of 62 houses), a 1 unit drop in NbACH in each house could lead to savings of 53 GWh of electricity per year. This is 10% of the total electricity consumed by Oshawa’s residential sector in 2004 (see Section 2). The main difficulty with this technique is in the choice of how large a coefficient is. Choosing a value of 0.20 for LIMIT (see Eq. (2)) would lead to a subset formed by only five variables: HeatType, DHWFuel, HSysAge, AgChild1, and PHeatrFl, with R2 = 0.564. This value of LIMIT would involve only 4 PCs (out of the 60) in the selection process (Step 4 of the methodology) and is obviously too large. The selection of the threshold is based on observation of the PCs, keeping in mind that the number of PCs selected at Step 4 is impacted by the value of LIMIT (because the value taken by a coefficient cannot exceed 1, any PC in which the coefficient of the dependent variable is less than LIMIT cannot be selected at Step 4 – see Eq. (2)). Judgment and experience fortunately ease the process of selecting the threshold. Finally, different coding schemes for the discrete variables having qualitative values were tested in order to ascertain the impact
453
of the coding in the final results. The schemes tested did not change the results. 5. Conclusion Data collected in the city of Oshawa (Ontario, Canada) are used in a principal component analysis to generate regression models of the electricity consumption in the city’s residential dwellings. It is seen that using a methodology developed in conjunction with the latent root regression technique of Hawkins [5], the number of predictors may be reduced from 59 to only nine, i.e. a reduction of 85% in the number of predictors. These nine variables are: (1) number of occupants of the house, (2) house status (owned by the occupant or rented), (3) average annual number of weeks of vacation taken away from the house by the family, (4) type of fuel used in the pool heater, (5) type of fuel used in the space heating system, (6) type of fuel used in the domestic hot water system, (7) presence or not of an air conditioning system, (8) type of air conditioning system, and (9) number of air changes per hour at 50 Pa. The regression model produced with these nine predictors is statistically significant at a p-value less than 10−4 and all parameter estimates, including the intercept, are statistically significant at the 95% confidence level. The R2 for this regression is 0.79 with an adjusted R2 of 0.75. The main consequence is that for future studies, only these nine variables need to be collected in order to obtain good estimates of the electricity consumption. At the exception of the ninth predictor which requires a blower door test, all the needed data are easy to collect. Contrary to other popular subset selection methods such as forward selection, backward elimination or stepwise method, the methodology developed and used in this paper allows easy identification of alternative subsets. Acknowledgements This research was sponsored by the Oshawa PUC Networks Inc. (OPUC), Ontario Centre of Excellence for Energy (OCE), and the University of Ontario Institute of Technology (UOIT). The authors are grateful to their contributions. References [1] G.H. Duntenam, Principal Components Analysis, Sage University Paper Series on Quantitative Applications in the Social Sciences, No. 07-069, Sage, Newbury Park, CA, USA, 1989. [2] R.R. Hocking, The analysis and selection of variables in linear regression, Biometrics 32 (1976) 1–49. [3] I.T. Jolliffe, Principal Component Analysis, 2nd edition, Springer-Verlag, New York, USA, 2002. [4] J.T. Webster, R.F. Gunst, R.L. Mason, Latent root regression analysis, Technometrics 16 (4) (1974) 513–522. [5] D.G. Hawkins, On the investigation of alternative regressions by principal component analysis, Applied Statistics 22 (3) (1973) 275–286. [6] J.N.R. Jeffers, Investigation of alternative regressions: some practical examples, The Statistician 30 (2) (1981) 79–88. [7] K.S. Gabriel, D. Willis, A study of energy consumption trends in residential dwellings in Oshawa – report focusing on data analysis for 50 home data set, Report, University of Ontario Institute of Technology, Oshawa, Canada, 2006. [8] NCDIA, Canadian Climate Normals 1971–2000, National Climate Data and Information Archive, Environment Canada. http://www.climate.weatheroffice. ec.gc.ca/climate normals/results e.html?StnID=4996&autofwd=1 (consulted 18.02.09). [9] Kinectrics, CDM programs for Oshawa PUC Networks Inc. – benchmarking of residential electrical energy consumption, Report, Oshawa, Canada, 2005. [10] CanMET, HOT2XP, Software, Version 2.74, Canada Centre for Mineral and Energy Technology, Natural Resources Canada, 2005. [11] K. Roth, K. McKenney, J. Brodrick, Small devices – big loads, Ashrae Journal 50 (6) (2008) 64–65. [12] J.B. Kruskal, Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis, Psychometrika 29 (1) (1964) 1–27. [13] H.F. Kaiser, The varimax criterion for analytic rotation in factor analysis, Psychometrika 23 (3) (1958) 187–200.