Random Forest based hourly building energy prediction

Random Forest based hourly building energy prediction

Accepted Manuscript Random Forest based Hourly Building Energy Prediction Zeyu Wang , Yueren Wang , Ruochen Zeng , Ravi S. Srinivasan , Sherry Ahrent...

1MB Sizes 0 Downloads 78 Views

Accepted Manuscript

Random Forest based Hourly Building Energy Prediction Zeyu Wang , Yueren Wang , Ruochen Zeng , Ravi S. Srinivasan , Sherry Ahrentzen PII: DOI: Reference:

S0378-7788(18)31129-0 10.1016/j.enbuild.2018.04.008 ENB 8481

To appear in:

Energy & Buildings

Received date: Revised date: Accepted date:

26 January 2017 23 October 2017 10 April 2018

Please cite this article as: Zeyu Wang , Yueren Wang , Ruochen Zeng , Ravi S. Srinivasan , Sherry Ahrentzen , Random Forest based Hourly Building Energy Prediction, Energy & Buildings (2018), doi: 10.1016/j.enbuild.2018.04.008

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT

Random Forest based Hourly Building Energy Prediction Zeyu Wang a, Yueren Wang b, *, Ruochen Zeng c, Ravi S. Srinivasan c, Sherry Ahrentzen c School of Management, Guangzhou University, Guangzhou, Guangdong, China b

c

CR IP T

a

Microsoft Corporation, One Microsoft Way, Redmond, Washington, USA

M.E. Rinker, Sr. School of Construction Management, University of Florida, Gainesville, Florida, USA

Corresponding author: Tel: +1 352 214 2809

AN US

*

Email address: [email protected]

Abstract

M

Accurate building energy prediction plays an important role in improving the energy efficiency of buildings. This paper proposes a homogeneous ensemble approach, i.e., use

ED

of Random Forest (RF), for hourly building energy prediction. The approach was adopted to predict the hourly electricity usage of two educational buildings in North Central Florida. The RF models trained with different parameter settings were compared to

PT

investigate the impact of parameter setting on the prediction performance of the model. The results indicated that RF was not very sensitive to the number of variables (mtry) and

CE

using empirical mtry is preferable because it saves time and is more accurate. RF was compared with regression tree (RT) and Support Vector Regression (SVR) to validate the

AC

superiority of RF in building energy prediction. The prediction performances of RF measured by performance index (PI) were 14-25% and 5-5.5% better than RT and SVR, respectively, indicating that RF was the best prediction model in the comparison. Moreover, an analysis based on the variable importance of RF was performed to identify the most influential features during different semesters. The results showed that the most influential features vary depending on the semester, indicating the existence of different operational conditions for the tested buildings. A further comparison between RF trained

ACCEPTED MANUSCRIPT

with yearly and monthly data indicated that the energy usage prediction for educational buildings could be improved by taking into consideration their energy behavior changes during different semesters. Keywords: Random forest, regression tree, ensemble model, machine learning, building

CR IP T

energy use prediction

1. Introduction

Global energy consumption has increased by 100% in the past 40 years [1]. The continuous increases in energy demand and the emergence of an energy crisis have forced people to change their lifestyle and use energy in a more efficient way. Most

AN US

people spend over 90% of their daily lives indoors. Consequently, buildings have become the largest energy consumer worldwide [2]. For example in the United States, the building sector contributes over 41% of the primary energy usage compared with industry (30%) and transportation-related (29%) energy usage [3]. Needless to say, improving building energy efficiency has become a major concern around the world for the purpose

M

of achieving energy conservation and carbon emission reduction. Building energy prediction is becoming tremendously significant in improving efficiency due to the

ED

important role it plays in implementing building energy efficiency measures, e.g., demand response control [4], system fault detection and diagnosis [5], building energy

PT

benchmarking [6], and building system measurement and verification [7]. A popular approach to building energy prediction is through engineering-based building

CE

energy modeling, which uses physical principles to calculate the thermal dynamics and energy behaviors of buildings [8]. This approach helps users design energy efficient

AC

buildings [9] and enables the use of hundreds of building energy modeling tools around the world owing to its simplicity and efficiency [10]. However, engineering-based building energy modeling is often used in the building design phase because it reviews all details of the building. Engineering-based building energy modeling requires detailed building information, e.g., building geometry, material and component specifications, Heating, Ventilation, and Air Conditioning (HVAC) system settings, and lighting system specifications, which is hard to acquire from existing buildings [8]. Moreover, recent

ACCEPTED MANUSCRIPT

research has shown that this approach often results in a large difference between the predicted and the observed building energy usage [9] [10]. A more applicable approach to energy prediction is called empirical modeling, which has been widely used in predicting building energy usage during the past twenty years owing to its superiority in model implementation and prediction accuracy [11, 12]. Empirical

CR IP T

modeling uses machine learning algorithms, such as the Decision Tree (DT) [13], Artificial Neural Network (ANN) [14], the Gaussian process [15], and Support Vector Machine (SVM) [16], to generalize mapping relationships between input and output data. This approach is more practical than the engineering-based approach in energy prediction of existing buildings because its required data, such as building energy data,

AN US

environmental data, and occupancy data, is relatively easy to obtain from existing buildings.

Previous research has proven that empirical modeling can provide prominent prediction results and outperforming engineering-based building energy modeling when the learning

M

algorithm was wisely chosen and well utilized [17] [14]. However, some algorithms used in empirical modeling, e.g. DT and ANN, are suspected to be unreliable due to their instability issues [18]. The instability of these algorithms may introduce significant

ED

variations in the output due to small changes made in the input data [18], therefore making the prediction results dramatically different from the observed. This can cause the

PT

failure of the prediction model. This limitation could impede these algorithms from real world application because many energy efficient measurements, such as building system

CE

fault detection and building energy benchmarking, rely on the reliability of the energy prediction results. Moreover, with the development of building energy management

AC

technology, buildings nowadays have abundant energy-related data. The short-term building energy prediction has become increasingly important in recent years because of the invention and implementation of high sampling frequency sensors. The prediction accuracy requirements for building energy prediction is becoming more rigid as the building industry pays more attention to the details of building operation and the data sampling interval becomes smaller.

ACCEPTED MANUSCRIPT

To overcome the instability of the learning algorithm as well as to improve prediction accuracy, a more advanced data mining technique called ensemble learning was introduced in the early 1990s [19]. Ensemble learning creates a set of prediction models, a.k.a. base models, and provides a prediction of high accuracy by taking advantage of the irrelevance between each base model. The technique has an open structure in which

CR IP T

different base models and integration strategies can be employed and therefore derive different versions of ensemble models. Based on the generation of these base models, ensemble models can be classified into two types, namely, heterogeneous and homogenous ensemble models. Heterogeneous ensemble models generate their base models by training different learning algorithms, or the same algorithm with different

AN US

parameter settings using the same training dataset [20]. On the contrary, homogeneous ensemble models generate their base models by using the same learning algorithms on different training datasets [21].

In this paper, a homogeneous ensemble model called Random Forest (RF) is introduced for building energy prediction. An hourly basis building energy prediction experiment

M

was performed to validate the feasibility of RF in short term building energy prediction. A comparative analysis was performed to compare the prediction performance of RF with

ED

CART and Support Vector Regression (SVR). In addition, this paper provides an insight into the analysis of the variable importance of each variable used in generating RF, which

PT

could assist researchers in locating key impact variables and gaining a thorough understanding of the energy behavior of the predicted building. Finally, the RF model

CE

was trained using yearly and monthly data to investigate the impact of time-wise data on model training quality.

AC

The remainder of this paper is organized as follows: Section 2 reviews the related work; Section 3 discusses the research methodology, followed by an introduction of experiment design in Section 4; In Section 5, the model development for different algorithms is presented; Section 6 presents results of the experiment; Section 7 discusses the major findings, and Section 8 draws the conclusion.

ACCEPTED MANUSCRIPT

2. Related works RF could be considered as an ensemble of Classification and Regression Tree (CART) because multiple CART models are generated and used as base models. After first proposed by Breiman in 2001 [22], RF has received great attention and has become a sought-after procedure in many fields. For example, Díaz-Uriarte and Andrés [23]

CR IP T

proposed a new method of gene selection in classification problems by using RF; Culter et al. [24] introduced RF in the ecology field to classify different plant species; Rodriguez-Galiano et al. [25] used RF in the land remote sensing field to classify different land covers; and Sun et al. [26] used RF to predict solar radiation of different areas. The results of these research programs demonstrated the appealing performance of

AN US

RF in solving both classification and regression problems. Moreover, previous research programs have compared RF with other popular techniques such as linear discriminant analysis [24], decision tree [25], and SVM [27]. The comparison results showed RF outperformed those competitors in solving the research problems, indicating its potential

M

to be a promising tool for solving the building energy prediction problem. However, this novel technique is not yet prevalent in building energy prediction, with

ED

only a few related researches performed in the past five years. Tsanas and Xifara [28] first used RF to predict the heating and cooling loads of residential buildings. The authors proved that RF can provide more accurate heating and cooling loads predictions when

PT

comparing against a classical linear regression approach. The research also found that RF can be used to find an accurate functional relationship between input and output variables.

CE

Lahouar and Slama [29] proposed a short-term load prediction method by combining RF with expert input selection. The proposed method was used to predict the next 24-hour

AC

electrical demand. The results showed that RF coupled with expert selection can capture complex load behaviors. Some researchers also compared RF with other machine learning models for building energy prediction. Jurado et al. [30] used RF to predict hourly electricity consumption of three educational buildings. The authors compared the prediction performance of RF with neural networks, Fuzzy Inductive Reasoning (FIR), and Auto Regressive Integrated Moving Average (ARIMA). The results showed that FIR and RF perform better than the

ACCEPTED MANUSCRIPT

rest two methods for building energy prediction and voting strategies can be used to combine different methods and provide more accurate predictions. Candanedo et al. [31] used RF to predict the short-term electricity consumption of a house in Belgium. The authors compared the prediction performance of RF with other three prediction models, including Multiple Linear Regression (MLR), SVR, and Gradient Boost Machines

CR IP T

(GBM). The results showed that RF and GBM performed better than MLR and SVR. More recently, Ahmad et al. [32] used RF to predict hourly HVAC electricity consumption of a hotel in Madrid, Spain. The authors compared the prediction performance of RF with the widely-used ANN. The results showed that both models are effective for predicting hourly HVAC electricity consumption while RF model can be

AN US

used as a variable selection tool.

Moreover, some researchers also investigated how to use RF to identify important variables for building energy prediction. Ma and Cheng [33] used RF to identify influential features on the regional energy use intensity (EUI) of residential buildings. The authors used 171 influential features describing the buildings, economy, education,

M

environment, households, surrounding, and transportation to model the average site EUI of residential buildings in block groups. The top 20 influentials were identified based on

ED

the out-of-bag estimation in RF. Similarly, Yu et al. [34] used RF to predict the coefficient of performance (COP) of the chiller system and measure the variable

PT

importance. Thirteen inputs from measured and derived operating variables were used to develop the RF models in the four operating modes. The authors identified the top five

CE

important variables on the prediction of COP based on the variable importance analysis of RF.

AC

In addition, RF was also used in the domain of occupancy detection, automated measurement and verification, and anomaly detection of building energy consumption. Candanedo and Feldheim [35] used RF to detect the occupancy in an office room using light, temperature, humidity, and CO2 data. Their study showed that RF is able to accurately detect the presence of occupants by observing the indoor environmental data. Granderson et al. [36] used RF to predict the energy savings of retrofit buildings. Their research showed that RF can provide accurate predictions on building energy savings.

ACCEPTED MANUSCRIPT

Araya et al. [37] used RF to identify abnormal building energy behavior. The authors used RF as an anomaly detection classifier to build the proposed ensemble anomaly detection (EAD) framework, which combines multiple classifiers by using majority voting. Their results showed that the proposed EAD outperforms the individual anomaly detection classifier.

CR IP T

Although previous research works have successfully implemented RF to building energy related predictions, the application details of RF in building energy prediction, such as the selection of key parameters, the analysis of variable importance, and the advantages and limitations of the technique, were not thoroughly studied or well addressed. This paper fills the gap by exploring the implementation of RF in building energy prediction

AN US

and the utilization of its unique characteristic, i.e., variable importance, in locating key factors which impact building energy usage.

3. Methodology Random Forest

M

3.1.

RF is an ensemble prediction model consisting of a collection of different regression trees

ED

(CART) which are trained through bagging and random variable selection. The tree development rationale of trees in RF is the same as that of CART, which is through recursive partitioning. In recursive partitioning, the exact position of the cut-point and the

PT

selection of the splitting variable strongly depend on the distribution of observations in the learning sample [38]. Thus, CART is considered an unstable learner because a small

CE

change in learning data could change the selection of the first cut-point or the first splitting variable, and subsequently, change the entire tree structure. RF overcomes the

AC

instability issue of CART by predicting with the use of a set of trees rather than a single tree. The logic behind using a set of trees for prediction is to mitigate the instability issue of each tree by combining the prediction of multiple trees. Combining trees of high diversity would complement the instability of each tree significantly because CART is an unbiased predictor which is unstable, but provides the right prediction on average [38] [39]. As opposed to that, combining similar trees would not complement the instability of each tree because similar trees could theoretically be unstable at the same time.

ACCEPTED MANUSCRIPT

Accordingly, to create a diverse set of trees is essential for achieving model complementary and improving overall prediction performance of the ensemble model. RF enhances the diversity of trees by using training data set randomization and input variable set randomization. Figure 1 illustrates the model development procedure of RF. RF first generates multiple new training data sets by randomly sampling from the initial

CR IP T

training data set with replacement. The size of the new training data sets are the same as that of the initial one, but some observations may be repeated as a result of sampling with replacement.

The new training data sets are expected to have, on average, 63.2% of the initial training data, with the rest as duplicates [40]. After new training data sets are generated, RF

AN US

injects variable set randomization before the tree splitting process to enhance the diversity of its trees. For each new training dataset, variable set randomization creates a random variable set by randomly sampling from the set of all variables. At each splitting point, rather than considering all variables in the training data set, each tree in RF grows by searching the best split under variables within the random variable set. Since both the

M

training data sets and variable sets are randomly generated, the growth of trees in RF is expected to be independent and different from each other. Once all trees are developed,

ED

RF combines them by averaging their individual predictions, which could equalize the influence of training data and makes RF stable [41]. Such a joint prediction approach

AC

CE

trees.

PT

reduces the risk of large errors and makes RF more accurate than any of its constituent

Figure 1. Model development procedure of RF

Variable importance

AC

3.2.

CE

PT

ED

M

AN US

CR IP T

ACCEPTED MANUSCRIPT

The variable importance of RF uses data permutation to measure the impact of each variable on the overall prediction performance of the model. The variable importance is calculated by computing the decrease of prediction accuracy resulting from randomly permuting the values of a variable. The higher the drop in prediction accuracy, the more important a variable is, and vice versa. The rationale is that the original association between a variable and the output could be broken by randomly permuting its values, and

ACCEPTED MANUSCRIPT

accordingly, the prediction accuracy would decrease if the original variable is replaced by the permuted one [38]. If the variable is associated with the output, permuting its value would reduce the prediction accuracy of RF significantly. On the contrary, if the variable is not associated with the output, permuting its values would not change the prediction accuracy because the splitting decisions of all trees are

CR IP T

not impacted. Thus, comparing the importance of different variables reveals their association with the output. Notably, RF measures the variable importance of each variable jointly, that is, considering the impact of each variable as well as in multivariate interactions with other variables [38] [28] [43]. The redundant variables, which are highly correlated with other variables, are penalized and not assigned large importance even

AN US

though they may be highly correlated with the output [22].

Variable importance could be used to select the most important variables in an RF, which would help users in targeting the most influential factors and understanding the relationships between input and output variables. This aspect is particularly useful for

M

high-dimensional data, where the identification of the most relevant variables is of great importance [44]. In this paper, the variable importance is used to investigate the most influential factors of RF models trained via different datasets to understand the changes

ED

in energy behavior of institutional buildings during different semesters. A detailed explanation of the application of variable importance is given in Section 6.2.

Evaluation Indices

PT

3.3.

CE

In this paper, the prediction performance is measured by considering three frequentlyused prediction accuracy evaluation indices, which are: the coefficient of determination (R2), the Root Mean Square Error (RMSE), and the Mean Average Percentage Error

AC

(MAPE). To jointly evaluate the prediction performance, a composite evaluation index called the performance index (PI), which combines R2, RMSE, and MAPE into one single measure, was created and used for comparing the prediction performances of difference models. R2 measures the goodness of fit of a model. A high R2 value indicates the predicted values perfectly fit the observed values. The R2 is defined as follows:

ACCEPTED MANUSCRIPT

∑ ( ) ̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅ ∑ ( ̅) where n is the sample size; mean of

is the observed value; and ̅ is the

is the predicted value;

.

CR IP T

RMSE stands for the sample standard deviation of the residuals between predicted and observed values. This measure is used to identify large errors and evaluate the fluctuation of model response regarding variance. RMSE punishes large errors severely because it geometrically amplifies the error. The mathematical formula for RMSE is:

)

∑(

AN US



MAPE is a statistical indicator that describes the accuracy of the prediction by comparing the residual with the observed values. It usually expresses accuracy in percentage and is effective for evaluating the performance of the prediction model by introducing the

ED

M

concept of absolute values. The MAPE is defined by the formula: ∑|

|

PT

PI measures the prediction performance of a model comprehensively by considering multiple evaluation indices. Rather than evaluating the difference between the actual and

CE

predicted values, PI is used to compare the prediction performances of different

AC

prediction models. The PI used in this paper is defined by the formula:

where

model a;

are the PI, R2, RMSE, and MAPE of the prediction is the minimum R2 in the comparison, and

and

are

the maximum RMSE and MAPE in the comparison, respectively. It can be seen from the equation that PI eliminates the magnitude differences between the three evaluation indices by standardization, which compares the index values of each prediction with the

ACCEPTED MANUSCRIPT

worst index performance among the comparison. Unlike each index, PI considers all indices available and measure the performance of the model more comprehensively. In this research, R2, RMSE, and MAPE were considered equally important in measuring the prediction performance; hence, they were assigned with the same weight in the PI equation. A smaller PI indicates a better prediction performance, and vice versa.

4.1.

CR IP T

4. Experiment Design Tested Building

Two institutional buildings, Rinker Hall (RH) and Fine Arts Building C (FAC), located on the University of Florida main campus were used as tested buildings to validate and

AN US

compare the prediction performance of the proposed models. RH is a three-story building which has a floor area of 47,270 ft2, and FAC is a four-story building with floor area of 72,520 ft2. Figure 2 and 3 depict the floor plans of RH and FAC, respectively. Notably, as shown in Figure 3, there is a connection bridge located on the third floor of FAC, which

M

connects FAC with another institutional building, i.e., Fine Arts Building A (FAA). Both buildings are comprised of classrooms, offices, laboratories, and student facilities. RH is mainly used for course instruction and office services, while FAC is used for

ED

experimental activities besides the above two functions. RH was built in 2003 and is the first building in the State of Florida to receive the Leadership in Energy and

PT

Environmental Design (LEED) Gold certification. FAC is a relatively old building which

AC

CE

was built in 1964.

CR IP T

ACCEPTED MANUSCRIPT

AC

CE

PT

ED

M

AN US

Figure 2. Ground floor plan of Rinker Hall

Figure 3. Third floor plan of Fine Arts Building C and Fine Arts Building A

4.2.

Data

The data consists of 11 input variables as its input data, and one output variable as its output data. The input variables include meteorological data (i.e., outdoor temperature,

ACCEPTED MANUSCRIPT

dew point, relative humidity, barometric pressure, precipitation, wind speed, and solar radiation), occupancy data (i.e., number of occupants), and time related data (i.e., time of day, workday type – for instance, a weekday, weekend or holiday, and day type – a specific day, Sunday, Monday, Tuesday, etc.), as listed in table 1. The meteorological data was colleted from an on-campus weather station operated by the

CR IP T

University’s Physics Department. The straight-line distance between the weather station and the tested buildings is about 500 meters. The weather station consists of a complete set of weather sensors, including temperature, humidity, wind speed, wind direction, rainfall, solar intensity, and lightning detection. The authors downloaded the meteorological data on an hourly average basis from the department web server.

AN US

The hourly number of occupants was estimated based on the daily operation and class schedule of the tested buildings, which were obtained from the UF Space Inventory & Allocation System (SPIN). The SPIN provides information on how the tested buildings are being utilized, including the start time and duration of each class, the number of

M

students registered for each class, the number of faculty and staff members working in the tested buildings, and the time table for daily operation. According to the information provided by SPIN, a data transformation process was performed to calculate the hourly

ED

number of occupants in each tested building and convert such information to the format that fits the experiment requirements.

PT

Moreover, three features which describe the time of observations, i.e., time of day, workday type, and day type, were introduced because of their strong correlations with

CE

occupants’ hourly, daily, and weekly working patterns. The workday type which indicates the working status of the university was derived from the UF calendar of the

AC

2014-2015 academic year. The output variable is the hourly building level electricity usage, which covers the electricity usage of the lighting system, computer lab, mechanical system, and miscellaneous appliances. The initial electricity data were extracted from the building energy management system (BEMs) of the tested buildings with a sampling rate of 15 minutes and were latterly scaled into hourly basis data for the purpose of data resolution uniformity. The hourly electricity consumption is the sum of four 15 minutes consecutive

ACCEPTED MANUSCRIPT

sampling points. Figure 4 and 5 depict the hourly electricity consumption of RH and FAC for UF 2014-2014 calendar year, respectively. All variables used in this paper were monitored for one calendar year, i.e., UF 2014-2015 calendar year. A total of 8,760 data points were collected for each building. The raw data for each building is an 8,760 × 12 matrix which contains 105,120 measurements. To

CR IP T

ensure the integrity of the data, a data screening process was performed to remove data points with missing values. After the screening, the number of usable data points for the RH and FAC was 8,647 and 8,321, respectively. The screened data was therefore considered to be representative because it covered 99% and 95% of data in the monitoring period of RH and FAC, respectively.

CE AC

Type Continuous Continuous Continuous Continuous Continuous Continuous Continuous Continuous Categorical Categorical Categorical

ED

M

Abbreviation Temp Dew Hum Press Prec Wind Solar Occ Time Wday Dtype

PT

Variable Outdoor Temperature Dew Point Relative Humidity Barometric Pressure Precipitation Wind Speed Solar Radiation Number of Occupants Time of Day Workday Type Day Type

AN US

Table 1 Summary of input variables

Measurement Deg. F Deg. F % Inch In Hg Mph W/m2 person 1, 2, 3…, 23, 24 weekday and weekend Sunday, Monday, …, Saturday

ACCEPTED MANUSCRIPT

75

50

CR IP T

Hourly Electricity Consumption(kWh)

100

25

0

AN US

May-14 Jun-14 Jul-14 Aug-14 Sep-14 Oct-14 Nov-14 Dec-14 Jan-15 Feb-15 Mar-15 Apr-15

Figure 4. Hourly electricity consumption of Rinker Hall. Note: This data is from 01/05/2014 00:00 until 30/04/2015 23:00.

M ED

200

100

PT

150

CE

Hourly Electricity Consumption(kWh)

250

50

AC

0

May-14 Jun-14 Jul-14 Aug-14 Sep-14 Oct-14 Nov-14 Dec-14 Jan-15 Feb-15 Mar-15 Apr-15

Figure 5. Hourly electricity consumption of Fine Arts Building C. Note: This data is from 01/05/2014 00:00 until 30/04/2015 23:00.

ACCEPTED MANUSCRIPT

4.3.

Research outline

Figure 6 shows the schematic outline of this research. The research contains two modules, Module 1 and Module 2, which were developed independently by using yearly and monthly data, respectively. The detailed information from these two modules is listed in Table 2. Module 1 aims to compare the prediction performance of different learning

CR IP T

algorithms, i.e., RT, SVR, and RF. Three prediction models representing RT, SVR, and RF respectively were created for each of the tested buildings and were latterly used to compare the prediction performance of the employed learning algorithms. Module 2 aims to simulate the energy pattern of RH and FAC for each semester and to investigate the impact of time-wise data partition on the prediction performance of RF. The yearly data

AN US

of the tested building covered three semesters which were independent of each other and different in energy patterns [8]. Accordingly, prediction models in Module 2 were trained and tested with data selected from a typical month of each semester. After the monthlybased RF models were developed, a variable importance analysis was conducted to compare the most influential factors between different models. Finally, the monthly-

M

based RF models were compared with the yearly-based RF models to explore the impact

AC

CE

PT

ED

of time-wise data partition on the prediction performance of RF.

ED

M

AN US

CR IP T

ACCEPTED MANUSCRIPT

Figure 6. Schematic outline of the research.

Experimental procedure

PT

4.4.

In Module 1, the following procedures were used to train and test the three prediction

CE

models for each tested building. First, the training and testing datasets were generated by randomly splitting the yearly data into two parts: 80% for training and 20% for testing.

AC

Then the generated training dataset was used to train the learning algorithms for each tested building. Finally, the prediction performances of the models for each tested building were evaluated by using the testing dataset. In Module 2, three monthly datasets, representing summer, fall and spring semester, were first distributed to their corresponding prediction model. The monthly dataset within each prediction model was then randomly split into two parts with a ratio of 8:2 for training and testing purposes. Each prediction model was trained with the generated training dataset and was finally

ACCEPTED MANUSCRIPT

tested with the generated testing dataset. The procedures used in Module 1 and 2 were repeated 100 times and the prediction performance of each model was calculated by averaging the prediction performances of the 100 repetitions. Table 2 Summary of Module 1 and 2.

Model 1

RH

Model 2

RH

Model 3

RH

Model 4

FAC

Model 5

FAC

Model 6

FAC

Model 7

RH

Model 8

RH

Model 9

RH

Module 1

Model 10 FAC

PT

Model 11 FAC

SVR RF RT SVR RF

RF RF RF RF RF RF

Total data points

05.01.2014 04.30.2015 05.01.2014 04.30.2015 05.01.2014 04.30.2015 05.01.2014 04.30.2015 05.01.2014 04.30.2015 05.01.2014 04.30.2015 07.01.2014 07.31.2014 10.01.2014 10.31.2014 02.01.2015 02.28.2015 07.01.2014 07.31.2014 10.01.2014 10.31.2014 02.01.2015 02.28.2015

8647 8647 8647 8321 8321 8321

744 744 672 744 744 672

CE

Model 12 FAC

RT

Monthly data Monthly data Monthly data Monthly data Monthly data Monthly data

ED

Module 2

Yearly data Yearly data Yearly data Yearly data Yearly data Yearly data

Algorithm Period

CR IP T

Building Data

AN US

Model

M

Module

5. Model Development RF model development

AC

5.1.

The RF model development requires three user-defined parameters to be determined. These are: the minimum number of terminal nodes for each tree (nodesize), the number of trees in the forest (ntree), and the number of randomly selected variables to grow the tree (mtry) [26].

ACCEPTED MANUSCRIPT

The nodesize parameter controls the size of each tree within the forest. Essentially, the selection of this parameter determines when to stop the tree splitting process. A large nodesize will construct shallow trees because of limits in the tree splitting process. The computation time would be reduced, but some patterns from data would not be learned because the number of nodes was limited. Consequently, the prediction accuracy of each

CR IP T

tree could not be guaranteed. On the contrary, a small nodesize would bring deep tree structure which creates comprehensive learning from the data. However, a deep tree would cost more computation time and may encounter overfitting. In this research, the authors used 5 as the optimal value of nodesize because this is a commonly suggested value for solving regression problems and has been proven effective in previous studies

AN US

[22] [26].

The ntree parameter determines the number of trees generated in an RF model. A large ntree will improve the prediction performance of RF because more trees would be considered and the complementarity between trees would be enhanced. However, this would result in a significant increase in computation time. Alternatively, a small ntree

M

will save computation time while the prediction performance of RF would be sacrificed if the number of trees is insufficient. In this research, after numerous computation tests

ED

were made, it was found that the prediction accuracy of RF did not increase significantly when ntree was at and more than 300. The optimal ntree was therefore determined as 300

PT

for this research because this was neither too large to cost significant computation time, nor too small to release the potential of RF.

CE

The mtry parameter impacts the prediction accuracy of RF by introducing randomness in the tree construction process. Specifically, randomness is introduced by randomly

AC

selecting n variables from the input and choosing the best split from the selected n variables. Because mtry determines the size of randomly selected variables, it impacts both the prediction performance of the individual trees in the forest and the correlation between them, which jointly determine the prediction accuracy of RF [45]. An effective RF contains trees which have prominent prediction performance while their correlations between each other are weak. In general, the prediction performance of individual trees will be better if more variables are selected; however, the correlation between each tree

ACCEPTED MANUSCRIPT

would increase at the same time. There is a trade-off between reducing correlation and maintaining prediction performance when selecting the mtry parameter. Previous studies used empirical equations, such as one-third of the total number of variables [26] and the first integer less than Log2M+1, where M is the total number of variables [22], to determine mtry for RF model development. These equations make the

CR IP T

RF model development easier by omitting the search for optimal mtry value. However, without searching the optimal mtry, the developed RF may not be the most accurate available. Because the total number of variables is relatively small for this study (11 input variables in total), it is feasible to test all possible numbers and select the optimal mtry by comparing their corresponding generalization performance.

AN US

Specifically, the authors adopted a commonly used model validation method, called kfold cross-validation, to select the optimal mtry for each model. All possible mtry settings, from 1 to 11, were validated by applying the k-fold cross-validation on the training data. By randomly partitioning the training data into k subsets, the k-fold cross-validation can

M

validate the RF k times, with each of the k subsets used exactly once as the validation data. The results of the k-fold cross-validation can be used to evaluate the generalization performance of the model. Notably, 10 was the used as the k value setting for the k-fold

ED

cross-validation. For each possible mtry, its corresponding generalization performance was calculated by repetitively validating the RF 100 times and averaging their prediction

PT

performances. Figure 7 shows the comparison results of RF models trained with different mtry settings for RH yearly-based model (model 3). The curve shows the trend of the PI

CE

of RF trained with different mtry settings. As seen in the curve, point 6 has the lowest PI, which indicates the best prediction performance of RF. Therefore, 6 was selected as the

AC

optimal mtry for RF in Model 3. The optimal mtry for other models were selected in the same manner. Table 3 shows the selection results for all models.

ACCEPTED MANUSCRIPT

1.05

0.95

0.9

0.85

0.8

0.75 2

3

4

5

6

7

8

9

10

11

AN US

1

CR IP T

Performance Index

1

mtry

Figure 7. PI results for RF trained with different mtry selections in Model 3.

4 0.86 0.89 0.75 0.78 0.77 0.84 0.86 0.76

ED

3 0.86 0.91 0.76 0.83 0.78 0.85 0.88 0.78

5 0.85 0.91 0.74 0.77 0.75 0.83 0.86 0.74

PI 6 0.83 0.91 0.74 0.77 0.76 0.83 0.86 0.69

7 0.85 0.91 0.72 0.76 0.74 0.83 0.86 0.72

8 0.85 0.92 0.71 0.77 0.76 0.82 0.85 0.71

9 0.86 0.92 0.72 0.77 0.75 0.82 0.84 0.71

10 0.85 0.92 0.73 0.76 0.75 0.81 0.85 0.72

11 0.84 0.93 0.73 0.77 0.75 0.82 0.86 0.70

mtry 6 4 8 10 7 10 9 6

RT and SVR model development

AC

5.2.

2 0.88 0.93 0.80 0.84 0.79 0.86 0.91 0.80

CE

Model 3 Model 6 Model 7 Model 8 Model 9 Model 10 Model 11 Model 12

1 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00

PT

Model

M

Table 3 Results of optimal mtry selection. Best results are shown in bold font.

RT is a binary recursive partitioning, where the parent nodes are always split into two child nodes, and the splitting process is then repeated at each child node until the tree is complete. In general, at each stage of partitioning, RT searches all possible binary splits and locates the best one by comparing their reductions in the mean square error (MSE) between the predicted and actual value. The split which leads to the least MSE is selected as the optimal split. The splitting process runs recursively until the pre-defined rule for

ACCEPTED MANUSCRIPT

stopping is met. In this research, the nodesize parameter which determines the size of each terminal node was used as the stopping rule for RT. For the purpose of comparability, the nodesize of RT was set to 5, which is the same as that used in RF. Two parameters, i.e., the regularized constant (C) and Gaussian radial basis function parameter (g), were optimized for SVR. Parameter C determines the level of penalty

CR IP T

when the SVR makes an inaccurate prediction. A large C indicates a severe penalty on mistakes and vice versa. A severe penalty makes the SVR hard to converge and results in instability, while a mild penalty requires a great amount of training time for the model to converge. Parameter g is the reciprocal of variance for Gaussian function. A large g leads to stable predictions with high bias. On the contrary, a small g mitigates the bias but

AN US

makes the predictions unstable. Similar to the mtry selection of RF, the authors used 10fold cross-validation to select the optimal C and g value. The training data were used for the parameter selection. The search for optimal C and g ranged from [2^-8, 2^8] with 2^1 as the exponential step size growth. Consequently, a total of 256 different combinations of {C, g} were generated and tested. For each combination of {C, g}, ten SVR models

M

were trained and tested individually by 10-fold cross-validation and their average prediction performance was considered as the prediction performance of the particular

ED

combination of {C, g}. The combination of {C, g} which results in the minimum MAPE among the 256 combinations was defined as the optimal C and g. In this research, the

6.1.

CE

6. Results

PT

optimal C and g were selected as 16 and 0.0625, respectively.

Comparison between RF, RT, and SVR

AC

RF, RT, and SVR were trained and tested individually with yearly data from RH and FAC to compare their prediction performances. The experiment for each model was repeated 100 times, and the average performances of the 100 trials were used for comparison. Tables 4 and 5 compare the prediction performances of RF, RT, and SVR for RH and FAC, respectively. The results illustrate which model performed most efficiently on hourly electricity use prediction for the tested building.

ACCEPTED MANUSCRIPT

As shown in Tables 4 and 5, RF has the best performance in all evaluation indices, indicating this was the best prediction algorithm among the comparisons. The MAPE of RF in the testing process is 7.75%, for RH, 11.93%, for FAC, which is acceptable in hourly prediction. RT has the worst performance with MAPE of 8.90% and 14.50% for RH and FAC, which is not surprising considering the long span of data used in the

CR IP T

training process which could cover abnormal observations that impact the stability and accuracy of RT. Moreover, the RMSE results indicate that RF is more stable than RT in both training and testing.

According to the comparison of PI in the testing process, RF is 14.0% and 25.0% better than RT for energy prediction of RH and FAC, respectively, which indicates that the

AN US

ensemble strategy applied in RF is capable of improving the prediction performance of RT. Moreover, RF is 5.5% and 5.0 % better than the outstanding prediction algorithm, i.e., the SVR, for energy prediction of RH and FAC, respectively, indicating that RF is a promising energy prediction tool. It should be noted that RT may encounter an instability issue in the experiment, particularly in the energy usage prediction of FAC, where the

M

testing performance drops greatly when compared with the training performance. For both RH and FAC, RT performs better than SVR in the training process but performs

ED

worse than SVR in the testing process. However, the prediction performances of RF in the training and testing process are relatively stable, indicating the algorithm, as an

PT

ensemble version of RT, did not suffer from the instability issue that RT may encounter in the experiment. In summary, the comparison results indicate that RF has the best

CE

fitting capability to solve the prediction problem. Furthermore, comparing RH prediction results with FAC prediction results indicates

AC

that all models performed better in the energy prediction of RH than those of FAC. The difference is attributed to the uncertainty about human behaviors in FAC. As more than 40% of the area for FAC is used for laboratory activities, occupancy information about FAC is difficult to accurately interpret based on the means of occupancy measurement adopted in this research, which is based on daily class and work schedule. On the contrary, less than 20% of the space area in RH is used for laboratory activities. Hence, its estimated hourly number of occupants is closer to the actual condition than that of the

ACCEPTED MANUSCRIPT

FAC. The authors believe that the prediction performance of models developed in this research would be better if more accurate and sufficient data was introduced. Table 4 Summary of prediction performances for all algorithms applied to RH. Best results are shown in bold font. Model 2 (SVR)

0.63 6 8.90% 1.00

0.69 5.47 8.04% 0.91

0.87 3.55 5.21% 0.85

0.79 4.54 6.10% 1.00

Model 3 (RF)

CR IP T

Model 1 (RT)

AN US

Model Testing R2 RMSE MAPE PI Training R2 RMSE MAPE PI

0.73 5.12 7.75% 0.86 0.89 3.22 4.79% 0.79

Table 5 Summary of prediction performances for all algorithms applied to FAC. Best results are shown in bold font.

R2 RMSE MAPE PI Training R2 RMSE MAPE PI

0.29 18.59 14.50% 1.00

PT

CE

0.76 10.76 8.11% 0.84

Model 6 (RF)

0.45 16.3 12.21% 0.79

0.5 15.58 11.93% 0.75

0.63 13.49 9.00% 1.00

0.78 10.38 8.01% 0.82

Variable importance analysis

AC

6.2.

Model 5 (SVR)

M

Model 4 (RT)

ED

Model Testing

The permutation-based variable importance of RF enables analysis of the role of input variables within the model. In this research, variable importance analysis was performed for Model 7 to Model 12, aiming at investigating the most influential variables for RF models developed with data of different semesters. Figure 8 to 13 depict the variable importance results for RF trained with monthly data of RH and FAC (model 7 to 12). The

ACCEPTED MANUSCRIPT

results provide an indication of the relative dependence of RF models on each input variable. For RH, it can be seen from Figures 9 and 10 that Model 8 and Model 9 have a similar variable importance pattern, for which the top 5 most influential variables are the same (i.e., occupancy, dew point, barometric pressure, time of day, and temperature),

CR IP T

indicating that the electricity use of RH in the fall and spring semesters were highly correlated with the same factors. However, the results in Figure 8 show that Model 7 has a distinct variable importance pattern compared with Model 8 and Model 9, with day type as the most influential variable. The difference in the variable importance pattern between Model 7 and the others is as expected because the operation conditions of RH in

AN US

the summer, fall, and spring semesters were different. As fewer students were registered for the summer semester, RH is considered a less occupied building during this period. RH is mainly used for office services during the summer semester, and during this time, the work schedule is constant (i.e., Monday through Friday, 9:00 a.m. to 5 p.m.) and the variation of occupancy condition is unapparent. Therefore, the energy use of RH in the

M

summer semester depends on whether it is a weekday and the time of day rather than on the variation of the hourly number of occupants. As opposed to that, RH is considered a

ED

fully-occupied building in both fall and spring semester, when this building provides more services for course instruction. The occupancy condition varies strongly regarding

PT

the daily course arrangement. Hence, the electricity use during these two semesters is highly correlated with the variation of the hourly number of occupants. This argument is

CE

supported by the results shown in Figure 9 and Figure 10, where the two RF models share the similar variable importance pattern and are impacted the most by the occupancy

AC

variable.

For FAC, the condition is more complicated. It can be seen from Figures 11 to 13 that each model for FAC has its variable importance pattern, which indicates that the electricity use of FAC is correlated with different factors in these three models. As shown in Figure 11, the prediction performance is highly impacted by day type and time of day. In Figure 12, occupancy and dew point are the most influential variables. In Figure 13, the occupancy, weekday type, dew point, and pressure have the same level of impact on

ACCEPTED MANUSCRIPT

the prediction performance of RF. Such differences are not surprising because FAC contains more uncertainties (i.e., unexpected occupancy activities) than RH, which are not fully controlled for in this research. As mentioned in the previous subsection, a large proportion of space in the FAC building is used for laboratory activities, which are unstable and difficult to accurately interpret based on the method used in this research. In

CR IP T

other words, the variation of the occupancy condition would not be captured if overtime happened in the laboratory. Moreover, since FAC is connected with another building which is mainly used as a library, as shown in Figure 3, its occupancy could be impacted by activities which happened in the connected building. Therefore, the occupancy data used in this research may not contain all the occupancy information for FAC and the

AN US

importance of the occupancy variable could be incorrectly measured and cannot subsequently represent the actual occupancy condition of the building. The variable importance results of Model 11 and Model 12, as shown Figure 12 and Figure 13, support this argument as the superiority of occupancy over other variables in Model 12 was not as significant as that in Model 11.

M

Moreover, it should be noted that the variable importance pattern shown in Figure 11 is similar to that shown in Figure 8, even though their corresponding tested buildings are

ED

different. It can be speculated that the similarity is caused by the same operation condition of RH and FAC during the summer semester. In other words, during the

PT

summer semester, the electricity use of RH and FAC are correlated with the same factors – the day type and time of day.

CE

In summary, the results demonstrated the capability of the proposed permutation-based variable importance in providing an understanding of the relative importance of each

AC

variable to the overall prediction performance of RF model. Comparisons between different RF models in variable importance indicated that the RF models were impacted by different variables during summer, fall, and spring semester for the tested building.

ACCEPTED MANUSCRIPT

Dtype Wday Time

Solar Wind Prec Press Hum Dew Temp 0.5

1

1.5 Variable Importance

2

2.5

3

3

3.5

AN US

0

CR IP T

Variable

Occ

Figure 8. Variable importance for Model 7. Dtype

M

Wday Time

ED

Solar Wind Prec

PT

Variable

Occ

Press

CE

Hum Dew

AC

Temp

0

0.5

1

1.5 2 Variable Importance

2.5

Figure 9. Variable importance for Model 8.

ACCEPTED MANUSCRIPT

Dtype Wday Time

Solar Wind Prec Press Hum Dew Temp 0.5

1 1.5 Variable Importance

2

2.5

2

2.5

AN US

0

CR IP T

Variable

Occ

Figure 10. Variable importance for Model 9. Dtype

M

Wday Time

ED

Solar Wind Prec

PT

Variable

Occ

Press

CE

Hum Dew

AC

Temp

0

0.5

1 1.5 Variable Importance

Figure 11. Variable importance of Model 10.

ACCEPTED MANUSCRIPT

Dtype Wday Time

Solar Wind Prec Press Hum Dew Temp 0.5

1

1.5 2 2.5 Variable Importame

3

3.5

4

AN US

0

CR IP T

Variable

Occ

Figure 12. Variable importance of Model 11. Dtype

M

Wday Time

ED

Solar Wind Prec

PT

Variable

Occ

Press Hum

CE

Dew

Temp

AC

0

6.3.

0.2

0.4

0.6

0.8 1 1.2 Variable Importance

1.4

1.6

1.8

2

Figure 13. Variable importance of Model 12.

Comparison between RF trained with yearly and monthly data

Since the variable importance analysis performed has demonstrated that the RF models were impacted by different variables in the summer, fall, and spring semester. This

ACCEPTED MANUSCRIPT

subsection further explores the impact of time-wise data partition on the prediction performance of RF through comparisons of RF models trained with yearly and monthly data. Table 6 and Table 7 compare the prediction performances of all RF models developed for RH and FAC, respectively. As mentioned before, Models 3 and 6 used yearly data for model training and testing, while Models 7 to 12 used monthly data which

CR IP T

was extracted from the data of different semesters. It can be observed from Table 6 and Table 7 that RF model trained with yearly data performed worse than those trained with monthly data in both RH and FAC. Overall, the PI for different models indicates that the RF models trained with monthly data were 4350% and 25-46% better than those trained with yearly data for RH and FAC, respectively.

AN US

The comparisons showed that the prediction performances of RF were improved by partitioning the yearly data into different semesters. Accordingly, the authors suggest that a data partition process would be necessary for the improvement of building energy prediction if distinct operation conditions exist during different time periods.

ED

Prediction Type Monthly Monthly Monthly Yearly

R2 0.93 0.89 0.92 0.73

RMSE 1.99 1.7 2.32 5.12

MAPE 2.84% 2.81% 3.50% 7.75%

PI 0.51 0.50 0.57 1.00

PT

Model Model 7 Model 8 Model 9 Model 3

M

Table 6 Summary of prediction performances for all RF models developed for RH. Worst results are shown in bold font.

CE

Table 7 Summary of prediction performances for all RF models developed for FAC. Worst results are shown in bold font.

AC

Model Model 10 Model 11 Model 12 Model 6

Prediction Type Monthly Monthly Monthly Yearly

R2 0.7 0.75 0.71 0.5

RMSE 6.78 10.08 12.62 15.58

MAPE 5.70% 6.89% 8.92% 11.93%

PI 0.54 0.63 0.75 1.00

ACCEPTED MANUSCRIPT

7. Discussion Impact of number of variables on prediction performance of RF

7.1.

As previously mentioned, the selection of the number of variables (mtry) is essential for RF because it impacts the prediction performance of each tree as well as the correlation

CR IP T

between all trees within the model. In this research, the authors selected the optimal mtry by testing and comparing all possible mtry values, which is time-consuming and may not be practical for high dimensional data. As some empirical rules have been offered by previous research studies [22] [26], a comparison between RF models developed with empirical mtry and the selected mtry was performed to investigate the differences in

AN US

prediction performance. Table 8 shows the comparison results for all RF models developed in this research. It can be observed from the table that most mtry selected in this research were greater than the empirical mtry, with only one equaled in Model 6. Correspondingly, most RF models developed with selected mtry were better than those developed with empirical mtry. However, the levels of improvement were very limited

M

(i.e., 0.00% to 7.58%) on all models, which indicates that the prediction performance of RF is not very sensitive to the selection of mtry. Therefore, searching for optimal mtry is

ED

less valuable in solving the research problem. According to the comparison, the authors argue that using empirical mtry is preferable for RF based building energy prediction

PT

because it is accurate and time-saving. Table 8 Comparison of RF developed with empirical mtry and optimal mtry mtry selection Empirical mtry Selected mtry 4 6 4 4 4 8 4 10 4 7 4 10 4 9 4 6

CE

Model

AC

Model 3 Model 6 Model 7 Model 8 Model 9 Model 10 Model 11 Model 12

PI Empirical mtry 0.85 0.90 0.75 0.78 0.77 0.84 0.86 0.75

Selected mtry 0.84 0.90 0.72 0.76 0.74 0.81 0.85 0.69

Prediction Improvement 0.88% 0.00% 3.30% 2.60% 3.90% 2.97% 1.63% 7.58%

ACCEPTED MANUSCRIPT

7.2.

Prediction performance improvement of RF over RT

In this research, RF outperformed RT in energy prediction of the tested building, demonstrating its ability to improve the prediction performance of RT. The authors have completed 100 repetitions, and the results indicated that RF was consistently better than RT in all repetitions. This finding demonstrates that RF is not a trivial method that

CR IP T

averages the results done by each single model but a method that could bring the training quality to a better level that no single model could achieve. The authors believe that combining multiple models could complement the errors made by each single model and thus make the prediction more stable and accurate.

AN US

However, the level of improvement depends highly on the instability of RT, which is associated with the quality of the data. As the merit of RF is to compensate the instability issue of RT, the improvement happens when RT shows some difficulties in making reliable predictions. In other words, if RT is not sensitive to any changes made in the input data, and provides stable and accurate predictions, there would be little or no space

M

for improvement to be had by using the strategies employed in RF. In this research, it can be observed from Tables 4 and 5 that the percentage of improvement of RF over RT in

ED

testing are 14% and 25% for RH and FAC, respectively. The authors believe that the difference is caused by the level of instability of RT developed in the tested buildings. As previously mentioned, RT for FAC was encountered with an instability issue which

PT

made the model less predictable in testing compared with its prediction performance in the training process. However, RT for RH performed in a relatively stable manner

CE

because the prediction performance variation between training and testing was not as severe as that of RT for FAC. Therefore, it can be speculated that RF for FAC has more

AC

space than RF for RH in improving the prediction performance of RT. The results indicate that RF is more competent than RT in dealing with complex and sparse data.

7.3.

Computation time

Computation time is one of the major limitations of RF as its model structure requires the generation and combination of multiple base models. Table 9 summarizes the average computation time for RT, SVR, and RF as developed in Module 1. It can be observed that

ACCEPTED MANUSCRIPT

the computation time of RF was considerably greater than those of RT and SVR in the tested building. However, the authors argue that the computation time issue of RF is not a concern because the generation of base models are performed in parallel and its learning algorithm (i.e., RT) is a fast learner which requires nominal time for training. Moreover, the authors believe that prediction accuracy is the first priority of the research problem.

CR IP T

Therefore, it is worthwhile to trade computation time for better prediction accuracy as long as the computation time is limited to an acceptable range.

Table 9 Summary of computation time (in second) for all algorithms used in module 1. RT 0.01 0.01

SVR 0.05 0.05

RF 1.48 1.36

AN US

Tested Building RH FAC

The relationship between computation time and data size was also investigated in this paper. The authors compared the computation time of RF models trained using yearly and monthly data to find out whether adding more data points to the dataset will result in significant increase in computation time. Table 10 lists the computation time for all RF

M

models developed in the paper. Notably, Model 3 and model 6 used yearly dataset and model 7 to 12 used monthly dataset. The results show that the computation time of RF

ED

models trained under yearly dataset requires significantly higher computation time than those used monthly dataset, which indicates that the increment of data size will lead to

PT

the increase in computation time. The results also indicate that the computation time of RF increase linearly with the increment of data size. The analysis and comparison show that the computation time of RF is sensitive to data size and, consequently, using

CE

excessively large dataset in training can add significant computation overhead on RF

AC

model development process. Table 10 Summary of computation time (in second) for all RF models.

Model model 3 model 6 model 7 model 8 model 9 model 10

Data Size 8,647 8,321 744 744 672 744

Computation Time 1.48 1.36 0.13 0.13 0.12 0.13

ACCEPTED MANUSCRIPT

model 11 model 12

7.4.

744 672

0.13 0.12

Inclusion of Occupant Behavior Variables

Occupant behavior is a key source of uncertainty in building energy modeling [46, 47] . In this study, occupancy variables were few in number and relatively static; whereas

CR IP T

occupant behavior is actually dynamic, stochastic, and influenced by contextual factors. However, adding occupant variables to the modeling equations here would have further increased computation time, whereby curtailing the heuristic value and major aim (i.e. prediction performance) of this study.

Nonetheless, future research should incorporate germane occupant behavior variables

AN US

affecting energy consumption. Since occupant behaviors in classroom buildings are relatively limited in scope (primarily reflecting office work and instructional activities) and the equipment fairly standardized, this building type provides fertile testing grounds for monitoring energy-consuming and discomfort-driven occupant behaviors in order to identify those that may be best predictors to incorporate in the energy modeling used here.

M

Technologies increasingly being used for behavior measurement -- such as floor sensor pads at entry points, video cameras with computer vision, wearable sensors, and security

[47].

PT

8. Conclusion

ED

systems -- are proving to be applicable in documenting and measuring occupancy actions

CE

This paper contributes to the existing literature on building energy prediction in several ways. First, it contributes to the research on empirical modeling based building energy

AC

prediction. RF belongs to the scope of ensemble learning, which is an advanced empirical modeling approach for improving the prediction performance of conventional learning algorithms. By adopting RF in building energy prediction and comparing its prediction performance with that of the conventional approaches, i.e., RT and SVR, this paper demonstrates the superiority of RF and the feasibility of homogeneous ensemble learning in building energy prediction. These findings provide a new option for researchers to

ACCEPTED MANUSCRIPT

predict building energy usage and enrich the algorithms library of empirical modeling based building energy prediction. Second, this paper contributes to the research on building energy management for educational buildings. By using variable importance to investigate the most influential factors of RF for different semesters, this research provides a new approach for locating

CR IP T

key building energy impact factors and a better understanding of building energy behaviors. The conducted variable importance analysis shows that the most influential factors of the tested buildings vary between different semesters, indicating a change of energy behaviors for educational buildings in different semesters. In other words, the energy usage of educational buildings follows a semester basis pattern rather than an

AN US

annual basis pattern. Additionally, a comparison between RF trained with yearly and monthly data proved that the energy prediction of educational buildings could be improved by taking into consideration the energy behavior changes among different semesters. The above findings assist researchers and building owners in understanding

M

and managing the energy usage of educational buildings.

Finally, this paper contributes to the research on the implementation of RF for building energy prediction. This paper sets an example of how to use variable importance, a

ED

distinct characteristic of RF, to locate key impact variables and to understand building energy behaviors. This characteristic will provide property owners supporting details to

PT

establish their energy conservation policies and rules. Moreover, by comparing the prediction performances of RF trained with optimal and empirical parameter setting, i.e.,

CE

mtry setting, this paper proved that using empirical mtry during the model development of RF is practical and effective for building energy prediction, and the search for optimal

AC

mtry is unnecessary.

Future work will focus on incorporating more accurate occupant variables to enhance the prediction performance and expanding the proposed RF model to the energy prediction of other types of buildings (such as commercial buildings and hospitals).

ACCEPTED MANUSCRIPT

References

[1] International Energy Agency, "2015 Key World Energy Statistics," International Energy Agency, 2015.

CR IP T

[2] X. Cao, X. Dai and J. Liu, "Building energy-consumption status worldwide and the state-of-the-art technologies for zero-energy buildings during the past decade," Energy and Buildings, vol. 128, pp. 198-213, 2016.

[3] U.S. Energy Information Administration, "How much energy is consumed in the world

by

each

sector?,"

7

January

[Online].

Available:

AN US

http://www.eia.gov/tools/faqs/faq.cfm?id=447&t=1.

2015.

[4] T. H. Pedersen, R. E. Hedegaard and S. Petersen, "Space heating demand response potential of retrofitted residential apartment blocks," Energy and Buildings, vol. 141, pp. 158-166, 2017.

M

[5] D. Li, G. Hu and C. J. Spanos, "A data-driven strategy for detection and diagnosis of building chiller faults using linear discriminant analysis," Energy and Buildings, vol.

ED

128, pp. 519-529, 2016.

[6] H.-x. Zhao and F. Magoulès, "A review on the prediction of building energy

CE

2012.

PT

consumption," Renewable and Sustainable Energy Reviews, vol. 16, p. 3586– 3592,

[7] Y. Heo and V. M. Zavala, "Gaussian process modeling for measurement and verification of building energy savings," Energy and Buildings, vol. 53, pp. 7-18,

AC

2012.

[8] Z. Wang, Y. Wang and R. Srinivasan, "A Bagging Tree based Ensemble Model for Building Energy Prediction," 2016.

[9] T. Reeves, S. Olbina and R. Issa, "Validation of building energy modeling tools: Ecotect™, Green Building Studio™ and IES™," in Proceedings of the 2012

ACCEPTED MANUSCRIPT

Winter Simulation Conference (WSC) , Berlin, 2012. [10] E. M. Ryan and T. F. Sanquist, "Validation of building energy modeling tools under idealized and realistic conditions," Energy and Buildings, vol. 47, pp. 375-382, 2012.

CR IP T

[11] M. Aydinalp, V. I. Ugursal and A. S. Fung, "Modeling of the appliance, lighting, and space-cooling energy consumptions in the residential sector using neural networks," Applied Energy, vol. 71, no. 2, p. 87–110, 2002.

[12] M. Yalcintas and U. A. Ozturk, "An energy benchmarking model based on artificial neural network method utilizing US Commercial Buildings Energy Consumption

AN US

Survey (CBECS) database," International Journal of Energy Research, vol. 31, no. 4, p. 412–421, 2007.

[13] Z. Yu, F. Haghighat, B. C. Fung and H. Yoshino, "A decision tree method for building energy demand modeling," Energy and Buildings, vol. 42, no. 10, pp. 1637-

M

1646, 2010.

[14] B. B. Ekici and T. U. Aksoy, "Prediction of building energy consumption by using

ED

artificial neural networks," Advances in Engineering Software, vol. 40, pp. 356-362,

PT

2009.

[15] M. C. Burkhart, Y. Heo and V. M. Zavala, "Measurement and verification of building systems under uncertain data: A Gaussian process modeling approach,"

CE

Energy and Buildings, vol. 75, pp. 189-198, 2014.

AC

[16] B. Dong, C. Cao and S. E. Lee, "Applying support vector machines to predict building energy consumption in tropical region," Energy and Buildings, pp. 545-553, 2005 .

[17] A. H. Neto and F. A. S. Fiorelli, "Comparison between detailed model simulation and artificial neural network for forecasting building energy consumption," Energy and Buildings, vol. 40, no. 12, p. 2169–2176, 2008.

ACCEPTED MANUSCRIPT

[18] L. Breiman, "Heuristics of instability in model selection," The Annals of Statistics, vol. 24, pp. 2350-2383, 1996. [19] L.

K.

Hansen

and

P.

Salamon,

"Neural

Network

Ensembles,"

IEEE

TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, vol.

CR IP T

12, pp. 993-1001, 1990. [20] S. Reid, "A Review of Heterogeneous Ensemble Methods," 2007.

[21] T. Dietterich, "Machine learning research: Four current direction," AI Magazine, vol. 18, no. 4, pp. 97-136, 1997.

AN US

[22] L. Breiman, "Random Forests," Machine Learning, vol. 45, no. 1, p. 5–32, 2001.

[23] R. Díaz-Uriarte and S. A. d. Andrés, "Gene selection and classification of microarray data using random forest," BMC Bioinformatics, vol. 7, no. 3, pp. 1-13, 2006.

M

[24] D. R. Culter, T. C. Edwards, Jr., K. H. Bread, A. Culter, K. T. Hess, J. Gibson and J. J. Lawler, "Random forests for classification in ecology," Ecology, vol. 88, no. 11, p.

ED

2783–2792, 2007.

[25] V. Rodriguez-Galiano, B. Ghimire, J. Rogan, M. Chica-Olmo and J. Rigol-Sanchez,

PT

"An assessment of the effectiveness of a random forest classifier for land-cover classification," ISPRS Journal of Photogrammetry and Remote Sensing, vol. 67, p.

CE

93–104, 2012.

[26] H. Sun, D. Gui, B. Yan, Y. Liu, W. Liao, Y. Zhu, C. Lu and N. Zhao, "Assessing the

AC

potential of random forest method for estimating solar radiation using air pollution index," Energy Conservation and Management, vol. 119, p. 121–129, 2016.

[27] M. Khalilia, S. Chakraborty and M. Popescu, "Predicting disease risks from highly imbalanced data using random forest," BMC Medical Informatics and Decision Making, vol. 11, no. 51, pp. 1-13, 2011.

ACCEPTED MANUSCRIPT

[28] A. Tsanas and A. Xifara, "Accurate quantitative estimation of energy performance of residential buildings using statistical machine learning tools," Energy and Buildings, vol. 49, p. 560–567, 2012. [29] A. Lahouar and J. B. H. Slama, "Day-ahead load forecast using random forest and expert input selection," Energy Conversion and Management, vol. 103, pp. 1040-

CR IP T

1051, 2015.

[30] S. Jurado, À. Nebot, F. Mugica and N. Avellana, "Hybrid methodologies for electricity load forecasting: Entropy-based feature selection with machine learning and soft computing techniques," Energy, vol. 86, p. 276–291, 2015.

AN US

[31] L. M. Candanedo, V. Feldheim and D. Deramaix, "Data driven prediction models of energy use of appliances in a low-energy house," Energy and Buildings, vol. 140, pp. 81-97, 2017.

[32] M. W. Ahmad, M. Mourshed and Y. Rezgui, "Trees vs Neurons: Comparison

M

between random forest and ANN for high-resolution prediction of building energy consumption," Energy and Buildings, vol. 147, pp. 77-89, 2017 .

ED

[33] J. Ma and J. C. Cheng, "Identifying the influential features on the regional energy use intensity of residential buildings based on Random Forests," Applied Energy,

PT

vol. 183, pp. 193-201, 2016.

[34] F. Yu, W. Ho, K. Chan and R. Sit, "Critique of operating variables importance on

CE

chiller energy performance using random forest," Energy and Buildings, vol. 139,

AC

pp. 653-664, 2017. [35] L. M. Candanedo and V. Feldheim, "Accurate occupancy detection of an office room from light, temperature, humidity and CO2 measurements using statistical learning models," Energy and Buildings, vol. 112, pp. 28-39, 2016.

[36] J. Granderson, S. Touzani, C. Custodio, M. D. Sohn and D. Jump, "Accuracy of automated measurement and verification (M&V) techniques," Applied Energy, vol.

ACCEPTED MANUSCRIPT

173, pp. 296-308, 2016. [37] D. B. Araya, K. Grolinger, H. F. ElYamany, M. A. Capretz and G. Bitsuamlak, "An ensemble learning framework for anomaly detection in building energy consumption," Energy and Buildings, vol. 144, pp. 191-206, 2017.

CR IP T

[38] C. Strobl, J. Malley and G. Tutz, "An Introduction to Recursive Partitioning: Rationale, Application and Characteristics of Classification and Regression Trees, Bagging and Random Forests," Psychological methods, vol. 14, no. 4, pp. 323-348, 2009.

[39] T. G. Dietterich, "An experimental comparison of three methods for constructing

AN US

ensembles of decision trees: Bagging, boosting, and randomization," Machine Learning, vol. 40, pp. 139-157, 2000.

[40] L. Breiman, "Bagging Predictors," Machine Learning, vol. 24, no. 2, pp. 123-140, 1996.

M

[41] Y. Grandvalet, "Bagging Equalizes Influence," Machine Learning, vol. 55, no. 3, p.

ED

251–270, 2004.

[42] P. Buhlmann and B. Yu, "Analyzing bagging," Annals of Statistics, vol. 30, pp. 927-

PT

961, 2002.

[43] B. Gregorutti, B. Michel and P. Saint-Pierre, "Grouped variable importance with

CE

random forests and application to multiple functional data analysis," Computational Statistics & Data Analysis, vol. 90, pp. 15-35, 2015.

AC

[44] I. Guyon and . A. Elisseeff, "An introduction to variable and feature selection," The Journal of Machine Learning Research, vol. 3, pp. 1157-1182 , 2003.

[45] Y. Amit and D. Geman, "Shape quantization and recognition with randomized trees," Neural Computation, vol. 9, p. 1545–1588, 1997. [46] P. Hoes, J. Hensen, M. Loomans, B. d. Vries and D. Bourgeois, "User behavior in

ACCEPTED MANUSCRIPT

whole building simulation," Energy and Buildings, vol. 41, pp. 295-302, 2009. [47] D. Yan, W. O’Brien, T. Hong, X. Feng, H. B. Gunay, F. Tahmasebi and A. Mahdavi, "Occupant behavior modeling for building performance simulation: Current state and future challenges," Energy and Buildings, vol. 107, pp. 264-278,

AC

CE

PT

ED

M

AN US

CR IP T

2015.