Accepted Manuscript
Random Forest based Hourly Building Energy Prediction Zeyu Wang , Yueren Wang , Ruochen Zeng , Ravi S. Srinivasan , Sherry Ahrentzen PII: DOI: Reference:
S0378-7788(18)31129-0 10.1016/j.enbuild.2018.04.008 ENB 8481
To appear in:
Energy & Buildings
Received date: Revised date: Accepted date:
26 January 2017 23 October 2017 10 April 2018
Please cite this article as: Zeyu Wang , Yueren Wang , Ruochen Zeng , Ravi S. Srinivasan , Sherry Ahrentzen , Random Forest based Hourly Building Energy Prediction, Energy & Buildings (2018), doi: 10.1016/j.enbuild.2018.04.008
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT
Random Forest based Hourly Building Energy Prediction Zeyu Wang a, Yueren Wang b, *, Ruochen Zeng c, Ravi S. Srinivasan c, Sherry Ahrentzen c School of Management, Guangzhou University, Guangzhou, Guangdong, China b
c
CR IP T
a
Microsoft Corporation, One Microsoft Way, Redmond, Washington, USA
M.E. Rinker, Sr. School of Construction Management, University of Florida, Gainesville, Florida, USA
Corresponding author: Tel: +1 352 214 2809
AN US
*
Email address:
[email protected]
Abstract
M
Accurate building energy prediction plays an important role in improving the energy efficiency of buildings. This paper proposes a homogeneous ensemble approach, i.e., use
ED
of Random Forest (RF), for hourly building energy prediction. The approach was adopted to predict the hourly electricity usage of two educational buildings in North Central Florida. The RF models trained with different parameter settings were compared to
PT
investigate the impact of parameter setting on the prediction performance of the model. The results indicated that RF was not very sensitive to the number of variables (mtry) and
CE
using empirical mtry is preferable because it saves time and is more accurate. RF was compared with regression tree (RT) and Support Vector Regression (SVR) to validate the
AC
superiority of RF in building energy prediction. The prediction performances of RF measured by performance index (PI) were 14-25% and 5-5.5% better than RT and SVR, respectively, indicating that RF was the best prediction model in the comparison. Moreover, an analysis based on the variable importance of RF was performed to identify the most influential features during different semesters. The results showed that the most influential features vary depending on the semester, indicating the existence of different operational conditions for the tested buildings. A further comparison between RF trained
ACCEPTED MANUSCRIPT
with yearly and monthly data indicated that the energy usage prediction for educational buildings could be improved by taking into consideration their energy behavior changes during different semesters. Keywords: Random forest, regression tree, ensemble model, machine learning, building
CR IP T
energy use prediction
1. Introduction
Global energy consumption has increased by 100% in the past 40 years [1]. The continuous increases in energy demand and the emergence of an energy crisis have forced people to change their lifestyle and use energy in a more efficient way. Most
AN US
people spend over 90% of their daily lives indoors. Consequently, buildings have become the largest energy consumer worldwide [2]. For example in the United States, the building sector contributes over 41% of the primary energy usage compared with industry (30%) and transportation-related (29%) energy usage [3]. Needless to say, improving building energy efficiency has become a major concern around the world for the purpose
M
of achieving energy conservation and carbon emission reduction. Building energy prediction is becoming tremendously significant in improving efficiency due to the
ED
important role it plays in implementing building energy efficiency measures, e.g., demand response control [4], system fault detection and diagnosis [5], building energy
PT
benchmarking [6], and building system measurement and verification [7]. A popular approach to building energy prediction is through engineering-based building
CE
energy modeling, which uses physical principles to calculate the thermal dynamics and energy behaviors of buildings [8]. This approach helps users design energy efficient
AC
buildings [9] and enables the use of hundreds of building energy modeling tools around the world owing to its simplicity and efficiency [10]. However, engineering-based building energy modeling is often used in the building design phase because it reviews all details of the building. Engineering-based building energy modeling requires detailed building information, e.g., building geometry, material and component specifications, Heating, Ventilation, and Air Conditioning (HVAC) system settings, and lighting system specifications, which is hard to acquire from existing buildings [8]. Moreover, recent
ACCEPTED MANUSCRIPT
research has shown that this approach often results in a large difference between the predicted and the observed building energy usage [9] [10]. A more applicable approach to energy prediction is called empirical modeling, which has been widely used in predicting building energy usage during the past twenty years owing to its superiority in model implementation and prediction accuracy [11, 12]. Empirical
CR IP T
modeling uses machine learning algorithms, such as the Decision Tree (DT) [13], Artificial Neural Network (ANN) [14], the Gaussian process [15], and Support Vector Machine (SVM) [16], to generalize mapping relationships between input and output data. This approach is more practical than the engineering-based approach in energy prediction of existing buildings because its required data, such as building energy data,
AN US
environmental data, and occupancy data, is relatively easy to obtain from existing buildings.
Previous research has proven that empirical modeling can provide prominent prediction results and outperforming engineering-based building energy modeling when the learning
M
algorithm was wisely chosen and well utilized [17] [14]. However, some algorithms used in empirical modeling, e.g. DT and ANN, are suspected to be unreliable due to their instability issues [18]. The instability of these algorithms may introduce significant
ED
variations in the output due to small changes made in the input data [18], therefore making the prediction results dramatically different from the observed. This can cause the
PT
failure of the prediction model. This limitation could impede these algorithms from real world application because many energy efficient measurements, such as building system
CE
fault detection and building energy benchmarking, rely on the reliability of the energy prediction results. Moreover, with the development of building energy management
AC
technology, buildings nowadays have abundant energy-related data. The short-term building energy prediction has become increasingly important in recent years because of the invention and implementation of high sampling frequency sensors. The prediction accuracy requirements for building energy prediction is becoming more rigid as the building industry pays more attention to the details of building operation and the data sampling interval becomes smaller.
ACCEPTED MANUSCRIPT
To overcome the instability of the learning algorithm as well as to improve prediction accuracy, a more advanced data mining technique called ensemble learning was introduced in the early 1990s [19]. Ensemble learning creates a set of prediction models, a.k.a. base models, and provides a prediction of high accuracy by taking advantage of the irrelevance between each base model. The technique has an open structure in which
CR IP T
different base models and integration strategies can be employed and therefore derive different versions of ensemble models. Based on the generation of these base models, ensemble models can be classified into two types, namely, heterogeneous and homogenous ensemble models. Heterogeneous ensemble models generate their base models by training different learning algorithms, or the same algorithm with different
AN US
parameter settings using the same training dataset [20]. On the contrary, homogeneous ensemble models generate their base models by using the same learning algorithms on different training datasets [21].
In this paper, a homogeneous ensemble model called Random Forest (RF) is introduced for building energy prediction. An hourly basis building energy prediction experiment
M
was performed to validate the feasibility of RF in short term building energy prediction. A comparative analysis was performed to compare the prediction performance of RF with
ED
CART and Support Vector Regression (SVR). In addition, this paper provides an insight into the analysis of the variable importance of each variable used in generating RF, which
PT
could assist researchers in locating key impact variables and gaining a thorough understanding of the energy behavior of the predicted building. Finally, the RF model
CE
was trained using yearly and monthly data to investigate the impact of time-wise data on model training quality.
AC
The remainder of this paper is organized as follows: Section 2 reviews the related work; Section 3 discusses the research methodology, followed by an introduction of experiment design in Section 4; In Section 5, the model development for different algorithms is presented; Section 6 presents results of the experiment; Section 7 discusses the major findings, and Section 8 draws the conclusion.
ACCEPTED MANUSCRIPT
2. Related works RF could be considered as an ensemble of Classification and Regression Tree (CART) because multiple CART models are generated and used as base models. After first proposed by Breiman in 2001 [22], RF has received great attention and has become a sought-after procedure in many fields. For example, Díaz-Uriarte and Andrés [23]
CR IP T
proposed a new method of gene selection in classification problems by using RF; Culter et al. [24] introduced RF in the ecology field to classify different plant species; Rodriguez-Galiano et al. [25] used RF in the land remote sensing field to classify different land covers; and Sun et al. [26] used RF to predict solar radiation of different areas. The results of these research programs demonstrated the appealing performance of
AN US
RF in solving both classification and regression problems. Moreover, previous research programs have compared RF with other popular techniques such as linear discriminant analysis [24], decision tree [25], and SVM [27]. The comparison results showed RF outperformed those competitors in solving the research problems, indicating its potential
M
to be a promising tool for solving the building energy prediction problem. However, this novel technique is not yet prevalent in building energy prediction, with
ED
only a few related researches performed in the past five years. Tsanas and Xifara [28] first used RF to predict the heating and cooling loads of residential buildings. The authors proved that RF can provide more accurate heating and cooling loads predictions when
PT
comparing against a classical linear regression approach. The research also found that RF can be used to find an accurate functional relationship between input and output variables.
CE
Lahouar and Slama [29] proposed a short-term load prediction method by combining RF with expert input selection. The proposed method was used to predict the next 24-hour
AC
electrical demand. The results showed that RF coupled with expert selection can capture complex load behaviors. Some researchers also compared RF with other machine learning models for building energy prediction. Jurado et al. [30] used RF to predict hourly electricity consumption of three educational buildings. The authors compared the prediction performance of RF with neural networks, Fuzzy Inductive Reasoning (FIR), and Auto Regressive Integrated Moving Average (ARIMA). The results showed that FIR and RF perform better than the
ACCEPTED MANUSCRIPT
rest two methods for building energy prediction and voting strategies can be used to combine different methods and provide more accurate predictions. Candanedo et al. [31] used RF to predict the short-term electricity consumption of a house in Belgium. The authors compared the prediction performance of RF with other three prediction models, including Multiple Linear Regression (MLR), SVR, and Gradient Boost Machines
CR IP T
(GBM). The results showed that RF and GBM performed better than MLR and SVR. More recently, Ahmad et al. [32] used RF to predict hourly HVAC electricity consumption of a hotel in Madrid, Spain. The authors compared the prediction performance of RF with the widely-used ANN. The results showed that both models are effective for predicting hourly HVAC electricity consumption while RF model can be
AN US
used as a variable selection tool.
Moreover, some researchers also investigated how to use RF to identify important variables for building energy prediction. Ma and Cheng [33] used RF to identify influential features on the regional energy use intensity (EUI) of residential buildings. The authors used 171 influential features describing the buildings, economy, education,
M
environment, households, surrounding, and transportation to model the average site EUI of residential buildings in block groups. The top 20 influentials were identified based on
ED
the out-of-bag estimation in RF. Similarly, Yu et al. [34] used RF to predict the coefficient of performance (COP) of the chiller system and measure the variable
PT
importance. Thirteen inputs from measured and derived operating variables were used to develop the RF models in the four operating modes. The authors identified the top five
CE
important variables on the prediction of COP based on the variable importance analysis of RF.
AC
In addition, RF was also used in the domain of occupancy detection, automated measurement and verification, and anomaly detection of building energy consumption. Candanedo and Feldheim [35] used RF to detect the occupancy in an office room using light, temperature, humidity, and CO2 data. Their study showed that RF is able to accurately detect the presence of occupants by observing the indoor environmental data. Granderson et al. [36] used RF to predict the energy savings of retrofit buildings. Their research showed that RF can provide accurate predictions on building energy savings.
ACCEPTED MANUSCRIPT
Araya et al. [37] used RF to identify abnormal building energy behavior. The authors used RF as an anomaly detection classifier to build the proposed ensemble anomaly detection (EAD) framework, which combines multiple classifiers by using majority voting. Their results showed that the proposed EAD outperforms the individual anomaly detection classifier.
CR IP T
Although previous research works have successfully implemented RF to building energy related predictions, the application details of RF in building energy prediction, such as the selection of key parameters, the analysis of variable importance, and the advantages and limitations of the technique, were not thoroughly studied or well addressed. This paper fills the gap by exploring the implementation of RF in building energy prediction
AN US
and the utilization of its unique characteristic, i.e., variable importance, in locating key factors which impact building energy usage.
3. Methodology Random Forest
M
3.1.
RF is an ensemble prediction model consisting of a collection of different regression trees
ED
(CART) which are trained through bagging and random variable selection. The tree development rationale of trees in RF is the same as that of CART, which is through recursive partitioning. In recursive partitioning, the exact position of the cut-point and the
PT
selection of the splitting variable strongly depend on the distribution of observations in the learning sample [38]. Thus, CART is considered an unstable learner because a small
CE
change in learning data could change the selection of the first cut-point or the first splitting variable, and subsequently, change the entire tree structure. RF overcomes the
AC
instability issue of CART by predicting with the use of a set of trees rather than a single tree. The logic behind using a set of trees for prediction is to mitigate the instability issue of each tree by combining the prediction of multiple trees. Combining trees of high diversity would complement the instability of each tree significantly because CART is an unbiased predictor which is unstable, but provides the right prediction on average [38] [39]. As opposed to that, combining similar trees would not complement the instability of each tree because similar trees could theoretically be unstable at the same time.
ACCEPTED MANUSCRIPT
Accordingly, to create a diverse set of trees is essential for achieving model complementary and improving overall prediction performance of the ensemble model. RF enhances the diversity of trees by using training data set randomization and input variable set randomization. Figure 1 illustrates the model development procedure of RF. RF first generates multiple new training data sets by randomly sampling from the initial
CR IP T
training data set with replacement. The size of the new training data sets are the same as that of the initial one, but some observations may be repeated as a result of sampling with replacement.
The new training data sets are expected to have, on average, 63.2% of the initial training data, with the rest as duplicates [40]. After new training data sets are generated, RF
AN US
injects variable set randomization before the tree splitting process to enhance the diversity of its trees. For each new training dataset, variable set randomization creates a random variable set by randomly sampling from the set of all variables. At each splitting point, rather than considering all variables in the training data set, each tree in RF grows by searching the best split under variables within the random variable set. Since both the
M
training data sets and variable sets are randomly generated, the growth of trees in RF is expected to be independent and different from each other. Once all trees are developed,
ED
RF combines them by averaging their individual predictions, which could equalize the influence of training data and makes RF stable [41]. Such a joint prediction approach
AC
CE
trees.
PT
reduces the risk of large errors and makes RF more accurate than any of its constituent
Figure 1. Model development procedure of RF
Variable importance
AC
3.2.
CE
PT
ED
M
AN US
CR IP T
ACCEPTED MANUSCRIPT
The variable importance of RF uses data permutation to measure the impact of each variable on the overall prediction performance of the model. The variable importance is calculated by computing the decrease of prediction accuracy resulting from randomly permuting the values of a variable. The higher the drop in prediction accuracy, the more important a variable is, and vice versa. The rationale is that the original association between a variable and the output could be broken by randomly permuting its values, and
ACCEPTED MANUSCRIPT
accordingly, the prediction accuracy would decrease if the original variable is replaced by the permuted one [38]. If the variable is associated with the output, permuting its value would reduce the prediction accuracy of RF significantly. On the contrary, if the variable is not associated with the output, permuting its values would not change the prediction accuracy because the splitting decisions of all trees are
CR IP T
not impacted. Thus, comparing the importance of different variables reveals their association with the output. Notably, RF measures the variable importance of each variable jointly, that is, considering the impact of each variable as well as in multivariate interactions with other variables [38] [28] [43]. The redundant variables, which are highly correlated with other variables, are penalized and not assigned large importance even
AN US
though they may be highly correlated with the output [22].
Variable importance could be used to select the most important variables in an RF, which would help users in targeting the most influential factors and understanding the relationships between input and output variables. This aspect is particularly useful for
M
high-dimensional data, where the identification of the most relevant variables is of great importance [44]. In this paper, the variable importance is used to investigate the most influential factors of RF models trained via different datasets to understand the changes
ED
in energy behavior of institutional buildings during different semesters. A detailed explanation of the application of variable importance is given in Section 6.2.
Evaluation Indices
PT
3.3.
CE
In this paper, the prediction performance is measured by considering three frequentlyused prediction accuracy evaluation indices, which are: the coefficient of determination (R2), the Root Mean Square Error (RMSE), and the Mean Average Percentage Error
AC
(MAPE). To jointly evaluate the prediction performance, a composite evaluation index called the performance index (PI), which combines R2, RMSE, and MAPE into one single measure, was created and used for comparing the prediction performances of difference models. R2 measures the goodness of fit of a model. A high R2 value indicates the predicted values perfectly fit the observed values. The R2 is defined as follows:
ACCEPTED MANUSCRIPT
∑ ( ) ̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅ ∑ ( ̅) where n is the sample size; mean of
is the observed value; and ̅ is the
is the predicted value;
.
CR IP T
RMSE stands for the sample standard deviation of the residuals between predicted and observed values. This measure is used to identify large errors and evaluate the fluctuation of model response regarding variance. RMSE punishes large errors severely because it geometrically amplifies the error. The mathematical formula for RMSE is:
)
∑(
AN US
√
MAPE is a statistical indicator that describes the accuracy of the prediction by comparing the residual with the observed values. It usually expresses accuracy in percentage and is effective for evaluating the performance of the prediction model by introducing the
ED
M
concept of absolute values. The MAPE is defined by the formula: ∑|
|
PT
PI measures the prediction performance of a model comprehensively by considering multiple evaluation indices. Rather than evaluating the difference between the actual and
CE
predicted values, PI is used to compare the prediction performances of different
AC
prediction models. The PI used in this paper is defined by the formula:
where
model a;
are the PI, R2, RMSE, and MAPE of the prediction is the minimum R2 in the comparison, and
and
are
the maximum RMSE and MAPE in the comparison, respectively. It can be seen from the equation that PI eliminates the magnitude differences between the three evaluation indices by standardization, which compares the index values of each prediction with the
ACCEPTED MANUSCRIPT
worst index performance among the comparison. Unlike each index, PI considers all indices available and measure the performance of the model more comprehensively. In this research, R2, RMSE, and MAPE were considered equally important in measuring the prediction performance; hence, they were assigned with the same weight in the PI equation. A smaller PI indicates a better prediction performance, and vice versa.
4.1.
CR IP T
4. Experiment Design Tested Building
Two institutional buildings, Rinker Hall (RH) and Fine Arts Building C (FAC), located on the University of Florida main campus were used as tested buildings to validate and
AN US
compare the prediction performance of the proposed models. RH is a three-story building which has a floor area of 47,270 ft2, and FAC is a four-story building with floor area of 72,520 ft2. Figure 2 and 3 depict the floor plans of RH and FAC, respectively. Notably, as shown in Figure 3, there is a connection bridge located on the third floor of FAC, which
M
connects FAC with another institutional building, i.e., Fine Arts Building A (FAA). Both buildings are comprised of classrooms, offices, laboratories, and student facilities. RH is mainly used for course instruction and office services, while FAC is used for
ED
experimental activities besides the above two functions. RH was built in 2003 and is the first building in the State of Florida to receive the Leadership in Energy and
PT
Environmental Design (LEED) Gold certification. FAC is a relatively old building which
AC
CE
was built in 1964.
CR IP T
ACCEPTED MANUSCRIPT
AC
CE
PT
ED
M
AN US
Figure 2. Ground floor plan of Rinker Hall
Figure 3. Third floor plan of Fine Arts Building C and Fine Arts Building A
4.2.
Data
The data consists of 11 input variables as its input data, and one output variable as its output data. The input variables include meteorological data (i.e., outdoor temperature,
ACCEPTED MANUSCRIPT
dew point, relative humidity, barometric pressure, precipitation, wind speed, and solar radiation), occupancy data (i.e., number of occupants), and time related data (i.e., time of day, workday type – for instance, a weekday, weekend or holiday, and day type – a specific day, Sunday, Monday, Tuesday, etc.), as listed in table 1. The meteorological data was colleted from an on-campus weather station operated by the
CR IP T
University’s Physics Department. The straight-line distance between the weather station and the tested buildings is about 500 meters. The weather station consists of a complete set of weather sensors, including temperature, humidity, wind speed, wind direction, rainfall, solar intensity, and lightning detection. The authors downloaded the meteorological data on an hourly average basis from the department web server.
AN US
The hourly number of occupants was estimated based on the daily operation and class schedule of the tested buildings, which were obtained from the UF Space Inventory & Allocation System (SPIN). The SPIN provides information on how the tested buildings are being utilized, including the start time and duration of each class, the number of
M
students registered for each class, the number of faculty and staff members working in the tested buildings, and the time table for daily operation. According to the information provided by SPIN, a data transformation process was performed to calculate the hourly
ED
number of occupants in each tested building and convert such information to the format that fits the experiment requirements.
PT
Moreover, three features which describe the time of observations, i.e., time of day, workday type, and day type, were introduced because of their strong correlations with
CE
occupants’ hourly, daily, and weekly working patterns. The workday type which indicates the working status of the university was derived from the UF calendar of the
AC
2014-2015 academic year. The output variable is the hourly building level electricity usage, which covers the electricity usage of the lighting system, computer lab, mechanical system, and miscellaneous appliances. The initial electricity data were extracted from the building energy management system (BEMs) of the tested buildings with a sampling rate of 15 minutes and were latterly scaled into hourly basis data for the purpose of data resolution uniformity. The hourly electricity consumption is the sum of four 15 minutes consecutive
ACCEPTED MANUSCRIPT
sampling points. Figure 4 and 5 depict the hourly electricity consumption of RH and FAC for UF 2014-2014 calendar year, respectively. All variables used in this paper were monitored for one calendar year, i.e., UF 2014-2015 calendar year. A total of 8,760 data points were collected for each building. The raw data for each building is an 8,760 × 12 matrix which contains 105,120 measurements. To
CR IP T
ensure the integrity of the data, a data screening process was performed to remove data points with missing values. After the screening, the number of usable data points for the RH and FAC was 8,647 and 8,321, respectively. The screened data was therefore considered to be representative because it covered 99% and 95% of data in the monitoring period of RH and FAC, respectively.
CE AC
Type Continuous Continuous Continuous Continuous Continuous Continuous Continuous Continuous Categorical Categorical Categorical
ED
M
Abbreviation Temp Dew Hum Press Prec Wind Solar Occ Time Wday Dtype
PT
Variable Outdoor Temperature Dew Point Relative Humidity Barometric Pressure Precipitation Wind Speed Solar Radiation Number of Occupants Time of Day Workday Type Day Type
AN US
Table 1 Summary of input variables
Measurement Deg. F Deg. F % Inch In Hg Mph W/m2 person 1, 2, 3…, 23, 24 weekday and weekend Sunday, Monday, …, Saturday
ACCEPTED MANUSCRIPT
75
50
CR IP T
Hourly Electricity Consumption(kWh)
100
25
0
AN US
May-14 Jun-14 Jul-14 Aug-14 Sep-14 Oct-14 Nov-14 Dec-14 Jan-15 Feb-15 Mar-15 Apr-15
Figure 4. Hourly electricity consumption of Rinker Hall. Note: This data is from 01/05/2014 00:00 until 30/04/2015 23:00.
M ED
200
100
PT
150
CE
Hourly Electricity Consumption(kWh)
250
50
AC
0
May-14 Jun-14 Jul-14 Aug-14 Sep-14 Oct-14 Nov-14 Dec-14 Jan-15 Feb-15 Mar-15 Apr-15
Figure 5. Hourly electricity consumption of Fine Arts Building C. Note: This data is from 01/05/2014 00:00 until 30/04/2015 23:00.
ACCEPTED MANUSCRIPT
4.3.
Research outline
Figure 6 shows the schematic outline of this research. The research contains two modules, Module 1 and Module 2, which were developed independently by using yearly and monthly data, respectively. The detailed information from these two modules is listed in Table 2. Module 1 aims to compare the prediction performance of different learning
CR IP T
algorithms, i.e., RT, SVR, and RF. Three prediction models representing RT, SVR, and RF respectively were created for each of the tested buildings and were latterly used to compare the prediction performance of the employed learning algorithms. Module 2 aims to simulate the energy pattern of RH and FAC for each semester and to investigate the impact of time-wise data partition on the prediction performance of RF. The yearly data
AN US
of the tested building covered three semesters which were independent of each other and different in energy patterns [8]. Accordingly, prediction models in Module 2 were trained and tested with data selected from a typical month of each semester. After the monthlybased RF models were developed, a variable importance analysis was conducted to compare the most influential factors between different models. Finally, the monthly-
M
based RF models were compared with the yearly-based RF models to explore the impact
AC
CE
PT
ED
of time-wise data partition on the prediction performance of RF.
ED
M
AN US
CR IP T
ACCEPTED MANUSCRIPT
Figure 6. Schematic outline of the research.
Experimental procedure
PT
4.4.
In Module 1, the following procedures were used to train and test the three prediction
CE
models for each tested building. First, the training and testing datasets were generated by randomly splitting the yearly data into two parts: 80% for training and 20% for testing.
AC
Then the generated training dataset was used to train the learning algorithms for each tested building. Finally, the prediction performances of the models for each tested building were evaluated by using the testing dataset. In Module 2, three monthly datasets, representing summer, fall and spring semester, were first distributed to their corresponding prediction model. The monthly dataset within each prediction model was then randomly split into two parts with a ratio of 8:2 for training and testing purposes. Each prediction model was trained with the generated training dataset and was finally
ACCEPTED MANUSCRIPT
tested with the generated testing dataset. The procedures used in Module 1 and 2 were repeated 100 times and the prediction performance of each model was calculated by averaging the prediction performances of the 100 repetitions. Table 2 Summary of Module 1 and 2.
Model 1
RH
Model 2
RH
Model 3
RH
Model 4
FAC
Model 5
FAC
Model 6
FAC
Model 7
RH
Model 8
RH
Model 9
RH
Module 1
Model 10 FAC
PT
Model 11 FAC
SVR RF RT SVR RF
RF RF RF RF RF RF
Total data points
05.01.2014 04.30.2015 05.01.2014 04.30.2015 05.01.2014 04.30.2015 05.01.2014 04.30.2015 05.01.2014 04.30.2015 05.01.2014 04.30.2015 07.01.2014 07.31.2014 10.01.2014 10.31.2014 02.01.2015 02.28.2015 07.01.2014 07.31.2014 10.01.2014 10.31.2014 02.01.2015 02.28.2015
8647 8647 8647 8321 8321 8321
744 744 672 744 744 672
CE
Model 12 FAC
RT
Monthly data Monthly data Monthly data Monthly data Monthly data Monthly data
ED
Module 2
Yearly data Yearly data Yearly data Yearly data Yearly data Yearly data
Algorithm Period
CR IP T
Building Data
AN US
Model
M
Module
5. Model Development RF model development
AC
5.1.
The RF model development requires three user-defined parameters to be determined. These are: the minimum number of terminal nodes for each tree (nodesize), the number of trees in the forest (ntree), and the number of randomly selected variables to grow the tree (mtry) [26].
ACCEPTED MANUSCRIPT
The nodesize parameter controls the size of each tree within the forest. Essentially, the selection of this parameter determines when to stop the tree splitting process. A large nodesize will construct shallow trees because of limits in the tree splitting process. The computation time would be reduced, but some patterns from data would not be learned because the number of nodes was limited. Consequently, the prediction accuracy of each
CR IP T
tree could not be guaranteed. On the contrary, a small nodesize would bring deep tree structure which creates comprehensive learning from the data. However, a deep tree would cost more computation time and may encounter overfitting. In this research, the authors used 5 as the optimal value of nodesize because this is a commonly suggested value for solving regression problems and has been proven effective in previous studies
AN US
[22] [26].
The ntree parameter determines the number of trees generated in an RF model. A large ntree will improve the prediction performance of RF because more trees would be considered and the complementarity between trees would be enhanced. However, this would result in a significant increase in computation time. Alternatively, a small ntree
M
will save computation time while the prediction performance of RF would be sacrificed if the number of trees is insufficient. In this research, after numerous computation tests
ED
were made, it was found that the prediction accuracy of RF did not increase significantly when ntree was at and more than 300. The optimal ntree was therefore determined as 300
PT
for this research because this was neither too large to cost significant computation time, nor too small to release the potential of RF.
CE
The mtry parameter impacts the prediction accuracy of RF by introducing randomness in the tree construction process. Specifically, randomness is introduced by randomly
AC
selecting n variables from the input and choosing the best split from the selected n variables. Because mtry determines the size of randomly selected variables, it impacts both the prediction performance of the individual trees in the forest and the correlation between them, which jointly determine the prediction accuracy of RF [45]. An effective RF contains trees which have prominent prediction performance while their correlations between each other are weak. In general, the prediction performance of individual trees will be better if more variables are selected; however, the correlation between each tree
ACCEPTED MANUSCRIPT
would increase at the same time. There is a trade-off between reducing correlation and maintaining prediction performance when selecting the mtry parameter. Previous studies used empirical equations, such as one-third of the total number of variables [26] and the first integer less than Log2M+1, where M is the total number of variables [22], to determine mtry for RF model development. These equations make the
CR IP T
RF model development easier by omitting the search for optimal mtry value. However, without searching the optimal mtry, the developed RF may not be the most accurate available. Because the total number of variables is relatively small for this study (11 input variables in total), it is feasible to test all possible numbers and select the optimal mtry by comparing their corresponding generalization performance.
AN US
Specifically, the authors adopted a commonly used model validation method, called kfold cross-validation, to select the optimal mtry for each model. All possible mtry settings, from 1 to 11, were validated by applying the k-fold cross-validation on the training data. By randomly partitioning the training data into k subsets, the k-fold cross-validation can
M
validate the RF k times, with each of the k subsets used exactly once as the validation data. The results of the k-fold cross-validation can be used to evaluate the generalization performance of the model. Notably, 10 was the used as the k value setting for the k-fold
ED
cross-validation. For each possible mtry, its corresponding generalization performance was calculated by repetitively validating the RF 100 times and averaging their prediction
PT
performances. Figure 7 shows the comparison results of RF models trained with different mtry settings for RH yearly-based model (model 3). The curve shows the trend of the PI
CE
of RF trained with different mtry settings. As seen in the curve, point 6 has the lowest PI, which indicates the best prediction performance of RF. Therefore, 6 was selected as the
AC
optimal mtry for RF in Model 3. The optimal mtry for other models were selected in the same manner. Table 3 shows the selection results for all models.
ACCEPTED MANUSCRIPT
1.05
0.95
0.9
0.85
0.8
0.75 2
3
4
5
6
7
8
9
10
11
AN US
1
CR IP T
Performance Index
1
mtry
Figure 7. PI results for RF trained with different mtry selections in Model 3.
4 0.86 0.89 0.75 0.78 0.77 0.84 0.86 0.76
ED
3 0.86 0.91 0.76 0.83 0.78 0.85 0.88 0.78
5 0.85 0.91 0.74 0.77 0.75 0.83 0.86 0.74
PI 6 0.83 0.91 0.74 0.77 0.76 0.83 0.86 0.69
7 0.85 0.91 0.72 0.76 0.74 0.83 0.86 0.72
8 0.85 0.92 0.71 0.77 0.76 0.82 0.85 0.71
9 0.86 0.92 0.72 0.77 0.75 0.82 0.84 0.71
10 0.85 0.92 0.73 0.76 0.75 0.81 0.85 0.72
11 0.84 0.93 0.73 0.77 0.75 0.82 0.86 0.70
mtry 6 4 8 10 7 10 9 6
RT and SVR model development
AC
5.2.
2 0.88 0.93 0.80 0.84 0.79 0.86 0.91 0.80
CE
Model 3 Model 6 Model 7 Model 8 Model 9 Model 10 Model 11 Model 12
1 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
PT
Model
M
Table 3 Results of optimal mtry selection. Best results are shown in bold font.
RT is a binary recursive partitioning, where the parent nodes are always split into two child nodes, and the splitting process is then repeated at each child node until the tree is complete. In general, at each stage of partitioning, RT searches all possible binary splits and locates the best one by comparing their reductions in the mean square error (MSE) between the predicted and actual value. The split which leads to the least MSE is selected as the optimal split. The splitting process runs recursively until the pre-defined rule for
ACCEPTED MANUSCRIPT
stopping is met. In this research, the nodesize parameter which determines the size of each terminal node was used as the stopping rule for RT. For the purpose of comparability, the nodesize of RT was set to 5, which is the same as that used in RF. Two parameters, i.e., the regularized constant (C) and Gaussian radial basis function parameter (g), were optimized for SVR. Parameter C determines the level of penalty
CR IP T
when the SVR makes an inaccurate prediction. A large C indicates a severe penalty on mistakes and vice versa. A severe penalty makes the SVR hard to converge and results in instability, while a mild penalty requires a great amount of training time for the model to converge. Parameter g is the reciprocal of variance for Gaussian function. A large g leads to stable predictions with high bias. On the contrary, a small g mitigates the bias but
AN US
makes the predictions unstable. Similar to the mtry selection of RF, the authors used 10fold cross-validation to select the optimal C and g value. The training data were used for the parameter selection. The search for optimal C and g ranged from [2^-8, 2^8] with 2^1 as the exponential step size growth. Consequently, a total of 256 different combinations of {C, g} were generated and tested. For each combination of {C, g}, ten SVR models
M
were trained and tested individually by 10-fold cross-validation and their average prediction performance was considered as the prediction performance of the particular
ED
combination of {C, g}. The combination of {C, g} which results in the minimum MAPE among the 256 combinations was defined as the optimal C and g. In this research, the
6.1.
CE
6. Results
PT
optimal C and g were selected as 16 and 0.0625, respectively.
Comparison between RF, RT, and SVR
AC
RF, RT, and SVR were trained and tested individually with yearly data from RH and FAC to compare their prediction performances. The experiment for each model was repeated 100 times, and the average performances of the 100 trials were used for comparison. Tables 4 and 5 compare the prediction performances of RF, RT, and SVR for RH and FAC, respectively. The results illustrate which model performed most efficiently on hourly electricity use prediction for the tested building.
ACCEPTED MANUSCRIPT
As shown in Tables 4 and 5, RF has the best performance in all evaluation indices, indicating this was the best prediction algorithm among the comparisons. The MAPE of RF in the testing process is 7.75%, for RH, 11.93%, for FAC, which is acceptable in hourly prediction. RT has the worst performance with MAPE of 8.90% and 14.50% for RH and FAC, which is not surprising considering the long span of data used in the
CR IP T
training process which could cover abnormal observations that impact the stability and accuracy of RT. Moreover, the RMSE results indicate that RF is more stable than RT in both training and testing.
According to the comparison of PI in the testing process, RF is 14.0% and 25.0% better than RT for energy prediction of RH and FAC, respectively, which indicates that the
AN US
ensemble strategy applied in RF is capable of improving the prediction performance of RT. Moreover, RF is 5.5% and 5.0 % better than the outstanding prediction algorithm, i.e., the SVR, for energy prediction of RH and FAC, respectively, indicating that RF is a promising energy prediction tool. It should be noted that RT may encounter an instability issue in the experiment, particularly in the energy usage prediction of FAC, where the
M
testing performance drops greatly when compared with the training performance. For both RH and FAC, RT performs better than SVR in the training process but performs
ED
worse than SVR in the testing process. However, the prediction performances of RF in the training and testing process are relatively stable, indicating the algorithm, as an
PT
ensemble version of RT, did not suffer from the instability issue that RT may encounter in the experiment. In summary, the comparison results indicate that RF has the best
CE
fitting capability to solve the prediction problem. Furthermore, comparing RH prediction results with FAC prediction results indicates
AC
that all models performed better in the energy prediction of RH than those of FAC. The difference is attributed to the uncertainty about human behaviors in FAC. As more than 40% of the area for FAC is used for laboratory activities, occupancy information about FAC is difficult to accurately interpret based on the means of occupancy measurement adopted in this research, which is based on daily class and work schedule. On the contrary, less than 20% of the space area in RH is used for laboratory activities. Hence, its estimated hourly number of occupants is closer to the actual condition than that of the
ACCEPTED MANUSCRIPT
FAC. The authors believe that the prediction performance of models developed in this research would be better if more accurate and sufficient data was introduced. Table 4 Summary of prediction performances for all algorithms applied to RH. Best results are shown in bold font. Model 2 (SVR)
0.63 6 8.90% 1.00
0.69 5.47 8.04% 0.91
0.87 3.55 5.21% 0.85
0.79 4.54 6.10% 1.00
Model 3 (RF)
CR IP T
Model 1 (RT)
AN US
Model Testing R2 RMSE MAPE PI Training R2 RMSE MAPE PI
0.73 5.12 7.75% 0.86 0.89 3.22 4.79% 0.79
Table 5 Summary of prediction performances for all algorithms applied to FAC. Best results are shown in bold font.
R2 RMSE MAPE PI Training R2 RMSE MAPE PI
0.29 18.59 14.50% 1.00
PT
CE
0.76 10.76 8.11% 0.84
Model 6 (RF)
0.45 16.3 12.21% 0.79
0.5 15.58 11.93% 0.75
0.63 13.49 9.00% 1.00
0.78 10.38 8.01% 0.82
Variable importance analysis
AC
6.2.
Model 5 (SVR)
M
Model 4 (RT)
ED
Model Testing
The permutation-based variable importance of RF enables analysis of the role of input variables within the model. In this research, variable importance analysis was performed for Model 7 to Model 12, aiming at investigating the most influential variables for RF models developed with data of different semesters. Figure 8 to 13 depict the variable importance results for RF trained with monthly data of RH and FAC (model 7 to 12). The
ACCEPTED MANUSCRIPT
results provide an indication of the relative dependence of RF models on each input variable. For RH, it can be seen from Figures 9 and 10 that Model 8 and Model 9 have a similar variable importance pattern, for which the top 5 most influential variables are the same (i.e., occupancy, dew point, barometric pressure, time of day, and temperature),
CR IP T
indicating that the electricity use of RH in the fall and spring semesters were highly correlated with the same factors. However, the results in Figure 8 show that Model 7 has a distinct variable importance pattern compared with Model 8 and Model 9, with day type as the most influential variable. The difference in the variable importance pattern between Model 7 and the others is as expected because the operation conditions of RH in
AN US
the summer, fall, and spring semesters were different. As fewer students were registered for the summer semester, RH is considered a less occupied building during this period. RH is mainly used for office services during the summer semester, and during this time, the work schedule is constant (i.e., Monday through Friday, 9:00 a.m. to 5 p.m.) and the variation of occupancy condition is unapparent. Therefore, the energy use of RH in the
M
summer semester depends on whether it is a weekday and the time of day rather than on the variation of the hourly number of occupants. As opposed to that, RH is considered a
ED
fully-occupied building in both fall and spring semester, when this building provides more services for course instruction. The occupancy condition varies strongly regarding
PT
the daily course arrangement. Hence, the electricity use during these two semesters is highly correlated with the variation of the hourly number of occupants. This argument is
CE
supported by the results shown in Figure 9 and Figure 10, where the two RF models share the similar variable importance pattern and are impacted the most by the occupancy
AC
variable.
For FAC, the condition is more complicated. It can be seen from Figures 11 to 13 that each model for FAC has its variable importance pattern, which indicates that the electricity use of FAC is correlated with different factors in these three models. As shown in Figure 11, the prediction performance is highly impacted by day type and time of day. In Figure 12, occupancy and dew point are the most influential variables. In Figure 13, the occupancy, weekday type, dew point, and pressure have the same level of impact on
ACCEPTED MANUSCRIPT
the prediction performance of RF. Such differences are not surprising because FAC contains more uncertainties (i.e., unexpected occupancy activities) than RH, which are not fully controlled for in this research. As mentioned in the previous subsection, a large proportion of space in the FAC building is used for laboratory activities, which are unstable and difficult to accurately interpret based on the method used in this research. In
CR IP T
other words, the variation of the occupancy condition would not be captured if overtime happened in the laboratory. Moreover, since FAC is connected with another building which is mainly used as a library, as shown in Figure 3, its occupancy could be impacted by activities which happened in the connected building. Therefore, the occupancy data used in this research may not contain all the occupancy information for FAC and the
AN US
importance of the occupancy variable could be incorrectly measured and cannot subsequently represent the actual occupancy condition of the building. The variable importance results of Model 11 and Model 12, as shown Figure 12 and Figure 13, support this argument as the superiority of occupancy over other variables in Model 12 was not as significant as that in Model 11.
M
Moreover, it should be noted that the variable importance pattern shown in Figure 11 is similar to that shown in Figure 8, even though their corresponding tested buildings are
ED
different. It can be speculated that the similarity is caused by the same operation condition of RH and FAC during the summer semester. In other words, during the
PT
summer semester, the electricity use of RH and FAC are correlated with the same factors – the day type and time of day.
CE
In summary, the results demonstrated the capability of the proposed permutation-based variable importance in providing an understanding of the relative importance of each
AC
variable to the overall prediction performance of RF model. Comparisons between different RF models in variable importance indicated that the RF models were impacted by different variables during summer, fall, and spring semester for the tested building.
ACCEPTED MANUSCRIPT
Dtype Wday Time
Solar Wind Prec Press Hum Dew Temp 0.5
1
1.5 Variable Importance
2
2.5
3
3
3.5
AN US
0
CR IP T
Variable
Occ
Figure 8. Variable importance for Model 7. Dtype
M
Wday Time
ED
Solar Wind Prec
PT
Variable
Occ
Press
CE
Hum Dew
AC
Temp
0
0.5
1
1.5 2 Variable Importance
2.5
Figure 9. Variable importance for Model 8.
ACCEPTED MANUSCRIPT
Dtype Wday Time
Solar Wind Prec Press Hum Dew Temp 0.5
1 1.5 Variable Importance
2
2.5
2
2.5
AN US
0
CR IP T
Variable
Occ
Figure 10. Variable importance for Model 9. Dtype
M
Wday Time
ED
Solar Wind Prec
PT
Variable
Occ
Press
CE
Hum Dew
AC
Temp
0
0.5
1 1.5 Variable Importance
Figure 11. Variable importance of Model 10.
ACCEPTED MANUSCRIPT
Dtype Wday Time
Solar Wind Prec Press Hum Dew Temp 0.5
1
1.5 2 2.5 Variable Importame
3
3.5
4
AN US
0
CR IP T
Variable
Occ
Figure 12. Variable importance of Model 11. Dtype
M
Wday Time
ED
Solar Wind Prec
PT
Variable
Occ
Press Hum
CE
Dew
Temp
AC
0
6.3.
0.2
0.4
0.6
0.8 1 1.2 Variable Importance
1.4
1.6
1.8
2
Figure 13. Variable importance of Model 12.
Comparison between RF trained with yearly and monthly data
Since the variable importance analysis performed has demonstrated that the RF models were impacted by different variables in the summer, fall, and spring semester. This
ACCEPTED MANUSCRIPT
subsection further explores the impact of time-wise data partition on the prediction performance of RF through comparisons of RF models trained with yearly and monthly data. Table 6 and Table 7 compare the prediction performances of all RF models developed for RH and FAC, respectively. As mentioned before, Models 3 and 6 used yearly data for model training and testing, while Models 7 to 12 used monthly data which
CR IP T
was extracted from the data of different semesters. It can be observed from Table 6 and Table 7 that RF model trained with yearly data performed worse than those trained with monthly data in both RH and FAC. Overall, the PI for different models indicates that the RF models trained with monthly data were 4350% and 25-46% better than those trained with yearly data for RH and FAC, respectively.
AN US
The comparisons showed that the prediction performances of RF were improved by partitioning the yearly data into different semesters. Accordingly, the authors suggest that a data partition process would be necessary for the improvement of building energy prediction if distinct operation conditions exist during different time periods.
ED
Prediction Type Monthly Monthly Monthly Yearly
R2 0.93 0.89 0.92 0.73
RMSE 1.99 1.7 2.32 5.12
MAPE 2.84% 2.81% 3.50% 7.75%
PI 0.51 0.50 0.57 1.00
PT
Model Model 7 Model 8 Model 9 Model 3
M
Table 6 Summary of prediction performances for all RF models developed for RH. Worst results are shown in bold font.
CE
Table 7 Summary of prediction performances for all RF models developed for FAC. Worst results are shown in bold font.
AC
Model Model 10 Model 11 Model 12 Model 6
Prediction Type Monthly Monthly Monthly Yearly
R2 0.7 0.75 0.71 0.5
RMSE 6.78 10.08 12.62 15.58
MAPE 5.70% 6.89% 8.92% 11.93%
PI 0.54 0.63 0.75 1.00
ACCEPTED MANUSCRIPT
7. Discussion Impact of number of variables on prediction performance of RF
7.1.
As previously mentioned, the selection of the number of variables (mtry) is essential for RF because it impacts the prediction performance of each tree as well as the correlation
CR IP T
between all trees within the model. In this research, the authors selected the optimal mtry by testing and comparing all possible mtry values, which is time-consuming and may not be practical for high dimensional data. As some empirical rules have been offered by previous research studies [22] [26], a comparison between RF models developed with empirical mtry and the selected mtry was performed to investigate the differences in
AN US
prediction performance. Table 8 shows the comparison results for all RF models developed in this research. It can be observed from the table that most mtry selected in this research were greater than the empirical mtry, with only one equaled in Model 6. Correspondingly, most RF models developed with selected mtry were better than those developed with empirical mtry. However, the levels of improvement were very limited
M
(i.e., 0.00% to 7.58%) on all models, which indicates that the prediction performance of RF is not very sensitive to the selection of mtry. Therefore, searching for optimal mtry is
ED
less valuable in solving the research problem. According to the comparison, the authors argue that using empirical mtry is preferable for RF based building energy prediction
PT
because it is accurate and time-saving. Table 8 Comparison of RF developed with empirical mtry and optimal mtry mtry selection Empirical mtry Selected mtry 4 6 4 4 4 8 4 10 4 7 4 10 4 9 4 6
CE
Model
AC
Model 3 Model 6 Model 7 Model 8 Model 9 Model 10 Model 11 Model 12
PI Empirical mtry 0.85 0.90 0.75 0.78 0.77 0.84 0.86 0.75
Selected mtry 0.84 0.90 0.72 0.76 0.74 0.81 0.85 0.69
Prediction Improvement 0.88% 0.00% 3.30% 2.60% 3.90% 2.97% 1.63% 7.58%
ACCEPTED MANUSCRIPT
7.2.
Prediction performance improvement of RF over RT
In this research, RF outperformed RT in energy prediction of the tested building, demonstrating its ability to improve the prediction performance of RT. The authors have completed 100 repetitions, and the results indicated that RF was consistently better than RT in all repetitions. This finding demonstrates that RF is not a trivial method that
CR IP T
averages the results done by each single model but a method that could bring the training quality to a better level that no single model could achieve. The authors believe that combining multiple models could complement the errors made by each single model and thus make the prediction more stable and accurate.
AN US
However, the level of improvement depends highly on the instability of RT, which is associated with the quality of the data. As the merit of RF is to compensate the instability issue of RT, the improvement happens when RT shows some difficulties in making reliable predictions. In other words, if RT is not sensitive to any changes made in the input data, and provides stable and accurate predictions, there would be little or no space
M
for improvement to be had by using the strategies employed in RF. In this research, it can be observed from Tables 4 and 5 that the percentage of improvement of RF over RT in
ED
testing are 14% and 25% for RH and FAC, respectively. The authors believe that the difference is caused by the level of instability of RT developed in the tested buildings. As previously mentioned, RT for FAC was encountered with an instability issue which
PT
made the model less predictable in testing compared with its prediction performance in the training process. However, RT for RH performed in a relatively stable manner
CE
because the prediction performance variation between training and testing was not as severe as that of RT for FAC. Therefore, it can be speculated that RF for FAC has more
AC
space than RF for RH in improving the prediction performance of RT. The results indicate that RF is more competent than RT in dealing with complex and sparse data.
7.3.
Computation time
Computation time is one of the major limitations of RF as its model structure requires the generation and combination of multiple base models. Table 9 summarizes the average computation time for RT, SVR, and RF as developed in Module 1. It can be observed that
ACCEPTED MANUSCRIPT
the computation time of RF was considerably greater than those of RT and SVR in the tested building. However, the authors argue that the computation time issue of RF is not a concern because the generation of base models are performed in parallel and its learning algorithm (i.e., RT) is a fast learner which requires nominal time for training. Moreover, the authors believe that prediction accuracy is the first priority of the research problem.
CR IP T
Therefore, it is worthwhile to trade computation time for better prediction accuracy as long as the computation time is limited to an acceptable range.
Table 9 Summary of computation time (in second) for all algorithms used in module 1. RT 0.01 0.01
SVR 0.05 0.05
RF 1.48 1.36
AN US
Tested Building RH FAC
The relationship between computation time and data size was also investigated in this paper. The authors compared the computation time of RF models trained using yearly and monthly data to find out whether adding more data points to the dataset will result in significant increase in computation time. Table 10 lists the computation time for all RF
M
models developed in the paper. Notably, Model 3 and model 6 used yearly dataset and model 7 to 12 used monthly dataset. The results show that the computation time of RF
ED
models trained under yearly dataset requires significantly higher computation time than those used monthly dataset, which indicates that the increment of data size will lead to
PT
the increase in computation time. The results also indicate that the computation time of RF increase linearly with the increment of data size. The analysis and comparison show that the computation time of RF is sensitive to data size and, consequently, using
CE
excessively large dataset in training can add significant computation overhead on RF
AC
model development process. Table 10 Summary of computation time (in second) for all RF models.
Model model 3 model 6 model 7 model 8 model 9 model 10
Data Size 8,647 8,321 744 744 672 744
Computation Time 1.48 1.36 0.13 0.13 0.12 0.13
ACCEPTED MANUSCRIPT
model 11 model 12
7.4.
744 672
0.13 0.12
Inclusion of Occupant Behavior Variables
Occupant behavior is a key source of uncertainty in building energy modeling [46, 47] . In this study, occupancy variables were few in number and relatively static; whereas
CR IP T
occupant behavior is actually dynamic, stochastic, and influenced by contextual factors. However, adding occupant variables to the modeling equations here would have further increased computation time, whereby curtailing the heuristic value and major aim (i.e. prediction performance) of this study.
Nonetheless, future research should incorporate germane occupant behavior variables
AN US
affecting energy consumption. Since occupant behaviors in classroom buildings are relatively limited in scope (primarily reflecting office work and instructional activities) and the equipment fairly standardized, this building type provides fertile testing grounds for monitoring energy-consuming and discomfort-driven occupant behaviors in order to identify those that may be best predictors to incorporate in the energy modeling used here.
M
Technologies increasingly being used for behavior measurement -- such as floor sensor pads at entry points, video cameras with computer vision, wearable sensors, and security
[47].
PT
8. Conclusion
ED
systems -- are proving to be applicable in documenting and measuring occupancy actions
CE
This paper contributes to the existing literature on building energy prediction in several ways. First, it contributes to the research on empirical modeling based building energy
AC
prediction. RF belongs to the scope of ensemble learning, which is an advanced empirical modeling approach for improving the prediction performance of conventional learning algorithms. By adopting RF in building energy prediction and comparing its prediction performance with that of the conventional approaches, i.e., RT and SVR, this paper demonstrates the superiority of RF and the feasibility of homogeneous ensemble learning in building energy prediction. These findings provide a new option for researchers to
ACCEPTED MANUSCRIPT
predict building energy usage and enrich the algorithms library of empirical modeling based building energy prediction. Second, this paper contributes to the research on building energy management for educational buildings. By using variable importance to investigate the most influential factors of RF for different semesters, this research provides a new approach for locating
CR IP T
key building energy impact factors and a better understanding of building energy behaviors. The conducted variable importance analysis shows that the most influential factors of the tested buildings vary between different semesters, indicating a change of energy behaviors for educational buildings in different semesters. In other words, the energy usage of educational buildings follows a semester basis pattern rather than an
AN US
annual basis pattern. Additionally, a comparison between RF trained with yearly and monthly data proved that the energy prediction of educational buildings could be improved by taking into consideration the energy behavior changes among different semesters. The above findings assist researchers and building owners in understanding
M
and managing the energy usage of educational buildings.
Finally, this paper contributes to the research on the implementation of RF for building energy prediction. This paper sets an example of how to use variable importance, a
ED
distinct characteristic of RF, to locate key impact variables and to understand building energy behaviors. This characteristic will provide property owners supporting details to
PT
establish their energy conservation policies and rules. Moreover, by comparing the prediction performances of RF trained with optimal and empirical parameter setting, i.e.,
CE
mtry setting, this paper proved that using empirical mtry during the model development of RF is practical and effective for building energy prediction, and the search for optimal
AC
mtry is unnecessary.
Future work will focus on incorporating more accurate occupant variables to enhance the prediction performance and expanding the proposed RF model to the energy prediction of other types of buildings (such as commercial buildings and hospitals).
ACCEPTED MANUSCRIPT
References
[1] International Energy Agency, "2015 Key World Energy Statistics," International Energy Agency, 2015.
CR IP T
[2] X. Cao, X. Dai and J. Liu, "Building energy-consumption status worldwide and the state-of-the-art technologies for zero-energy buildings during the past decade," Energy and Buildings, vol. 128, pp. 198-213, 2016.
[3] U.S. Energy Information Administration, "How much energy is consumed in the world
by
each
sector?,"
7
January
[Online].
Available:
AN US
http://www.eia.gov/tools/faqs/faq.cfm?id=447&t=1.
2015.
[4] T. H. Pedersen, R. E. Hedegaard and S. Petersen, "Space heating demand response potential of retrofitted residential apartment blocks," Energy and Buildings, vol. 141, pp. 158-166, 2017.
M
[5] D. Li, G. Hu and C. J. Spanos, "A data-driven strategy for detection and diagnosis of building chiller faults using linear discriminant analysis," Energy and Buildings, vol.
ED
128, pp. 519-529, 2016.
[6] H.-x. Zhao and F. Magoulès, "A review on the prediction of building energy
CE
2012.
PT
consumption," Renewable and Sustainable Energy Reviews, vol. 16, p. 3586– 3592,
[7] Y. Heo and V. M. Zavala, "Gaussian process modeling for measurement and verification of building energy savings," Energy and Buildings, vol. 53, pp. 7-18,
AC
2012.
[8] Z. Wang, Y. Wang and R. Srinivasan, "A Bagging Tree based Ensemble Model for Building Energy Prediction," 2016.
[9] T. Reeves, S. Olbina and R. Issa, "Validation of building energy modeling tools: Ecotect™, Green Building Studio™ and IES
™," in Proceedings of the 2012
ACCEPTED MANUSCRIPT
Winter Simulation Conference (WSC) , Berlin, 2012. [10] E. M. Ryan and T. F. Sanquist, "Validation of building energy modeling tools under idealized and realistic conditions," Energy and Buildings, vol. 47, pp. 375-382, 2012.
CR IP T
[11] M. Aydinalp, V. I. Ugursal and A. S. Fung, "Modeling of the appliance, lighting, and space-cooling energy consumptions in the residential sector using neural networks," Applied Energy, vol. 71, no. 2, p. 87–110, 2002.
[12] M. Yalcintas and U. A. Ozturk, "An energy benchmarking model based on artificial neural network method utilizing US Commercial Buildings Energy Consumption
AN US
Survey (CBECS) database," International Journal of Energy Research, vol. 31, no. 4, p. 412–421, 2007.
[13] Z. Yu, F. Haghighat, B. C. Fung and H. Yoshino, "A decision tree method for building energy demand modeling," Energy and Buildings, vol. 42, no. 10, pp. 1637-
M
1646, 2010.
[14] B. B. Ekici and T. U. Aksoy, "Prediction of building energy consumption by using
ED
artificial neural networks," Advances in Engineering Software, vol. 40, pp. 356-362,
PT
2009.
[15] M. C. Burkhart, Y. Heo and V. M. Zavala, "Measurement and verification of building systems under uncertain data: A Gaussian process modeling approach,"
CE
Energy and Buildings, vol. 75, pp. 189-198, 2014.
AC
[16] B. Dong, C. Cao and S. E. Lee, "Applying support vector machines to predict building energy consumption in tropical region," Energy and Buildings, pp. 545-553, 2005 .
[17] A. H. Neto and F. A. S. Fiorelli, "Comparison between detailed model simulation and artificial neural network for forecasting building energy consumption," Energy and Buildings, vol. 40, no. 12, p. 2169–2176, 2008.
ACCEPTED MANUSCRIPT
[18] L. Breiman, "Heuristics of instability in model selection," The Annals of Statistics, vol. 24, pp. 2350-2383, 1996. [19] L.
K.
Hansen
and
P.
Salamon,
"Neural
Network
Ensembles,"
IEEE
TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, vol.
CR IP T
12, pp. 993-1001, 1990. [20] S. Reid, "A Review of Heterogeneous Ensemble Methods," 2007.
[21] T. Dietterich, "Machine learning research: Four current direction," AI Magazine, vol. 18, no. 4, pp. 97-136, 1997.
AN US
[22] L. Breiman, "Random Forests," Machine Learning, vol. 45, no. 1, p. 5–32, 2001.
[23] R. Díaz-Uriarte and S. A. d. Andrés, "Gene selection and classification of microarray data using random forest," BMC Bioinformatics, vol. 7, no. 3, pp. 1-13, 2006.
M
[24] D. R. Culter, T. C. Edwards, Jr., K. H. Bread, A. Culter, K. T. Hess, J. Gibson and J. J. Lawler, "Random forests for classification in ecology," Ecology, vol. 88, no. 11, p.
ED
2783–2792, 2007.
[25] V. Rodriguez-Galiano, B. Ghimire, J. Rogan, M. Chica-Olmo and J. Rigol-Sanchez,
PT
"An assessment of the effectiveness of a random forest classifier for land-cover classification," ISPRS Journal of Photogrammetry and Remote Sensing, vol. 67, p.
CE
93–104, 2012.
[26] H. Sun, D. Gui, B. Yan, Y. Liu, W. Liao, Y. Zhu, C. Lu and N. Zhao, "Assessing the
AC
potential of random forest method for estimating solar radiation using air pollution index," Energy Conservation and Management, vol. 119, p. 121–129, 2016.
[27] M. Khalilia, S. Chakraborty and M. Popescu, "Predicting disease risks from highly imbalanced data using random forest," BMC Medical Informatics and Decision Making, vol. 11, no. 51, pp. 1-13, 2011.
ACCEPTED MANUSCRIPT
[28] A. Tsanas and A. Xifara, "Accurate quantitative estimation of energy performance of residential buildings using statistical machine learning tools," Energy and Buildings, vol. 49, p. 560–567, 2012. [29] A. Lahouar and J. B. H. Slama, "Day-ahead load forecast using random forest and expert input selection," Energy Conversion and Management, vol. 103, pp. 1040-
CR IP T
1051, 2015.
[30] S. Jurado, À. Nebot, F. Mugica and N. Avellana, "Hybrid methodologies for electricity load forecasting: Entropy-based feature selection with machine learning and soft computing techniques," Energy, vol. 86, p. 276–291, 2015.
AN US
[31] L. M. Candanedo, V. Feldheim and D. Deramaix, "Data driven prediction models of energy use of appliances in a low-energy house," Energy and Buildings, vol. 140, pp. 81-97, 2017.
[32] M. W. Ahmad, M. Mourshed and Y. Rezgui, "Trees vs Neurons: Comparison
M
between random forest and ANN for high-resolution prediction of building energy consumption," Energy and Buildings, vol. 147, pp. 77-89, 2017 .
ED
[33] J. Ma and J. C. Cheng, "Identifying the influential features on the regional energy use intensity of residential buildings based on Random Forests," Applied Energy,
PT
vol. 183, pp. 193-201, 2016.
[34] F. Yu, W. Ho, K. Chan and R. Sit, "Critique of operating variables importance on
CE
chiller energy performance using random forest," Energy and Buildings, vol. 139,
AC
pp. 653-664, 2017. [35] L. M. Candanedo and V. Feldheim, "Accurate occupancy detection of an office room from light, temperature, humidity and CO2 measurements using statistical learning models," Energy and Buildings, vol. 112, pp. 28-39, 2016.
[36] J. Granderson, S. Touzani, C. Custodio, M. D. Sohn and D. Jump, "Accuracy of automated measurement and verification (M&V) techniques," Applied Energy, vol.
ACCEPTED MANUSCRIPT
173, pp. 296-308, 2016. [37] D. B. Araya, K. Grolinger, H. F. ElYamany, M. A. Capretz and G. Bitsuamlak, "An ensemble learning framework for anomaly detection in building energy consumption," Energy and Buildings, vol. 144, pp. 191-206, 2017.
CR IP T
[38] C. Strobl, J. Malley and G. Tutz, "An Introduction to Recursive Partitioning: Rationale, Application and Characteristics of Classification and Regression Trees, Bagging and Random Forests," Psychological methods, vol. 14, no. 4, pp. 323-348, 2009.
[39] T. G. Dietterich, "An experimental comparison of three methods for constructing
AN US
ensembles of decision trees: Bagging, boosting, and randomization," Machine Learning, vol. 40, pp. 139-157, 2000.
[40] L. Breiman, "Bagging Predictors," Machine Learning, vol. 24, no. 2, pp. 123-140, 1996.
M
[41] Y. Grandvalet, "Bagging Equalizes Influence," Machine Learning, vol. 55, no. 3, p.
ED
251–270, 2004.
[42] P. Buhlmann and B. Yu, "Analyzing bagging," Annals of Statistics, vol. 30, pp. 927-
PT
961, 2002.
[43] B. Gregorutti, B. Michel and P. Saint-Pierre, "Grouped variable importance with
CE
random forests and application to multiple functional data analysis," Computational Statistics & Data Analysis, vol. 90, pp. 15-35, 2015.
AC
[44] I. Guyon and . A. Elisseeff, "An introduction to variable and feature selection," The Journal of Machine Learning Research, vol. 3, pp. 1157-1182 , 2003.
[45] Y. Amit and D. Geman, "Shape quantization and recognition with randomized trees," Neural Computation, vol. 9, p. 1545–1588, 1997. [46] P. Hoes, J. Hensen, M. Loomans, B. d. Vries and D. Bourgeois, "User behavior in
ACCEPTED MANUSCRIPT
whole building simulation," Energy and Buildings, vol. 41, pp. 295-302, 2009. [47] D. Yan, W. O’Brien, T. Hong, X. Feng, H. B. Gunay, F. Tahmasebi and A. Mahdavi, "Occupant behavior modeling for building performance simulation: Current state and future challenges," Energy and Buildings, vol. 107, pp. 264-278,
AC
CE
PT
ED
M
AN US
CR IP T
2015.