A hybrid model for building energy consumption forecasting using long short term memory networks

A hybrid model for building energy consumption forecasting using long short term memory networks

Applied Energy 261 (2020) 114131 Contents lists available at ScienceDirect Applied Energy journal homepage: www.elsevier.com/locate/apenergy A hybr...

5MB Sizes 1 Downloads 97 Views

Applied Energy 261 (2020) 114131

Contents lists available at ScienceDirect

Applied Energy journal homepage: www.elsevier.com/locate/apenergy

A hybrid model for building energy consumption forecasting using long short term memory networks

T

Nivethitha Somua, Gauthama Raman M Rb, Krithi Ramamrithama,



a b

Smart Energy Informatics Lab (SEIL), Department of Computer Science and Engineering, Indian Institute of Technology-Bombay, Maharashtra 400076, India iTrust-Centre for Research in Cyber Security, Singapore University of Technology and Design, Singapore

HIGHLIGHTS

a data driven energy consumption forecasting model is developed. • eDemand, proposed model is designed for accurate short, mid, and long term forecasting. • The version of SCOA is used to identify the optimal hyperparameters of LSTM. • Improved case study using KReSIT power consumption data is presented. • AImpact of hidden layers, hidden units and dropout on model accuracy was analysed. • ARTICLE INFO

ABSTRACT

Keywords: Buildings Energy consumption Forecasting models Artificial intelligence Neural networks Optimization

Data driven building energy consumption forecasting models play a significant role in enhancing the energy efficiency of the buildings through building energy management, energy operations, and control strategies. The multi-source and heterogeneous energy consumption data necessitates the integration of evolutionary algorithms and data-driven models for better forecast accuracy and robustness. We present eDemand, an energy consumption forecasting model which employs long short term memory networks and improved sine cosine optimization algorithm for accurate and robust building energy consumption forecasting. A novel Haar wavelet based mutation operator was introduced to enhance the divergence nature of sine cosine optimization algorithm towards the global optimal solution. Further, the hyperparameters (learning rate, weight decay, momentum, and number of hidden layers) of the LSTM were optimized using the improved sine cosine optimization algorithm. A case study on the real-time energy consumption data obtained from Kanwal Rekhi building, an academic building at Indian Institute of Technology, Bombay for short, mid, and long-term forecasting. Experiments reveal that the proposed model outperforms the state-of-the-art energy consumption forecast models in terms of mean absolute error, mean absolute percentage error, mean square error, root mean square error, and Theil statistics. It is shown that stable and accurate forecast results are produced by ISCOA-LSTM and hence it can be used as an efficient tool for solving energy consumption forecast problems.

1. Introduction

1.1. Background and motivation

This section discusses the need for an energy consumption forecasting model and its impact on building energy management and the environment. Further, the significance of a hybrid data-driven based energy consumption forecasting model over the engineering-based methods and statistical methods is presented along with the novelty and contributions of this research.

The unprecedented growth in the global population accompanied by industrialization, economic development, comfort demands, and social progress have a significant impact on global energy consumption and environmental concerns [1]. Peoples spend 90% of their day-to-day lives in buildings, thereby increasing the energy-intensive building operations to satisfy their occupational activities and thermal comfort, which accounts for 80–90% of the total energy consumption of the entire building life cycle [2]. As a result, the building sector has become



Corresponding author. E-mail addresses: [email protected] (N. Somu), [email protected] (K. Ramamritham).

https://doi.org/10.1016/j.apenergy.2019.114131 Received 16 July 2019; Received in revised form 20 October 2019; Accepted 11 November 2019 0306-2619/ © 2019 Published by Elsevier Ltd.

Applied Energy 261 (2020) 114131

N. Somu, et al.

the largest energy consumers owing to their respective contribution of 39% and 38% towards global energy consumption and greenhouse gas emissions [3]. According to International Energy Outlook 2017 “Electricity, the main source of energy for lighting, cooling, and appliances, is the fastest-growing source of energy used in buildings between 2015 and 2040; China and India account for one-fourth of the world’s buildings electricity consumption in 2040” [4]. Recent research reveals that energy demand management has become an important research area due to the shortage of energy resources, ever-increasing global energy demand, pollutant gas emissions, and research gaps in renewables and green energy systems. Energy consumption forecasting forms an essential part of energy management systems, which aims to provide day-to-day management of electric utility, power grid planning, and make optimal decisions in power grid energy management for the safe and secure operation of the power system [5,6]. In this way, improving the energy efficiency of the buildings through the design of accurate and robust energy consumption forecasting model proves to be an efficient solution for energy management, demand response programs, fault detection, and energy benchmarking [2,7]. Further, an accurate building energy consumption forecasting model helps the decision-makers to develop and implement energy efficiency policies to reduce building energy consumption, alleviate environmental pollution and achieve sustainable development [1]. However, non-linear, non-stationary and multi-seasonality nature of the energy consumption data and its dependence on several influential factors like weather conditions (indoor and outdoor), building context and dynamics, time, occupancy, etc. makes accurate energy consumption forecasting, a challenging task.

approach to achieve better forecasting accuracy and consistency for building energy consumption forecasting. 1.3. Why long short term neural networks? In general, regression and time series forecasting approaches are the most commonly used data-driven approaches for building energy consumption forecasting. The former constructs a model based on the correlation between multiple attributes and the energy consumption data and forecasts the building energy consumption. The time series forecasting approaches identify the interdependence and correlation between the variable with respect to time and forecasts the changes in the building energy consumption over a period [6]. Among the traditional time series forecasting models such as autoregressive moving average model (ARMA), autoregressive integrated moving average model (ARIMA) and gray systems, Recurrent Neural Networks (RNN) have been widely used in non-linear time series forecasting problems and have demonstrated their outstanding performance in building energy consumption forecasting [32]. RNNs are the most powerful and robust variant of ANN for modeling time series problems, i.e., a series of related observations listed in the order of time, since they have a fully connected structure of neurons with internal memory and cycles that loop back the information of the previous time step into the network, i.e., information sharing between the time steps. Despite its benefits, RNN suffers from “vanishing and exploding gradient problem,” and therefore it has difficulty in learning long-term dependencies [33]. To overcome the above mentioned shortcomings of RNN, Hochreiter and Schmidhuber designed a new architecture for RNN so-called Long Short Term Memory (LSTM) based on the concept of “memory blocks (gate units)” [34]. The self-loop memory block and the three gate units (input, forget and output) help the flow of the gradient along the long sequences and control the flow of information, respectively. The unique characteristics of the gates in the hidden units help in holding the relevant data and forgetting the irrelevant data thereby providing a constant error [34]. LSTMs have been widely employed in machine translation, language modeling, handwriting recognition, image captioning, speech recognition, process forecasting, and medical diagnosis due to its inherent capability in modeling temporal aspects of data [35]. However, the hyperparameters of LSTM have a significant impact on its forecasting accuracy, and therefore the selection of an intelligent algorithm to improve the forecasting accuracy of LSTM has been an open research challenge [36].

1.2. Need for data-driven approaches for building energy consumption forecasting Recent advances in the design of high precision and high robustness building energy consumption forecasting models can be categorized into three approaches, (i) Engineering or white box approaches (EnergyPlus, eQuest, Ecotect, etc.), (ii) Statistical or grey box approaches (temperature frequency method, degree-day method, residential load factor method, etc.) and (iii) Data driven or black box approaches (neural networks, support vector machines, decision trees, regression models, k-nearest neighbour, etc.) [8,9]. Among these, data driven approaches have gained immense popularity in building energy consumption forecasting due to its ease of use, practicability, adaptability and high forecasting accuracy. Further, data driven approaches are more practical than the engineering approaches since they provide an accurate forecast based on the available data (energy consumption, climatic, temporal, and occupancy) that are easy to obtain from the buildings through new sensing and communication technologies [2]. Table 1 provides detailed insight into the recent research developments in data driven based building energy consumption forecasting models. A review of the literature makes it evident that Artificial Neural Networks (ANN) and its variants (Feed Forward Neural Network (FFNN), Recurrent Neural Network (RNN) [22], Probabilistic Neural Network (PNN) [30,31], etc.) are the most commonly employed data driven approaches for building energy consumption forecasting (short term, midterm, and long term) and fault detection and diagnosis. Irrespective of the nature of the neural network model used for energy consumption forecasting, the selection of the model parameters, i.e., hyperparameters has a significant impact on the forecasting accuracy of the model on the given data [28]. According to T Y Kim and S B Cho, “The potential limitation of the learning model lies in relatively large efforts by trial and error to determine the optimal hyperparameters. To work out this problem, we need to automate searching for the best hyperparameters that can automatically search for the hyperparameter space of the learning model” [29]. Therefore, the identification of suitable hyperparameters (weights, learning rate, etc.) using optimization algorithms or statistical techniques forms a well-known and standard

1.4. Why sine cosine optimization algorithm? The application of a single data driven model for energy consumption forecasting problem, which deals with multi-source or heterogeneous data sources leads to convergence problem and poor model accuracy. Nevertheless, research in the hybridization of advance Evolutionary Algorithms (EA) and data driven models provides an intelligent way to improve the forecasting accuracy and robustness of the energy consumption forecasting model [16]. Overall, we present eDemand-an energy consumption forecasting model and an Improved Sine Cosine Optimization Algorithm based LSTM (ISCOA-LSTM), employing ISCOA for the identification of the optimal hyperparameters of LSTM to improve its forecasting accuracy. In 2016, Mirjalili proposed Sine Cosine Optimization Algorithm (SCOA), a novel population-based metaheuristic algorithm which employs simple sine and cosine mathematical functions to find the global optimal solution [37,38]. Specific features like minimal tuning parameters, high optimization accuracy, fast convergence speed, and strong global search ability add up to the beneficial aspects of SCOA for its application in various real-world problems [39,40]. SCOA is proved to be more efficient than many population based meta heuristic algorithm for obtaining global optimal solution due to the following reasons: (i) inherent benefits from high exploration and avoid trap at local optima 2

Applied Energy 261 (2020) 114131

N. Somu, et al.

Table 1 Related works. Author

Technique

Dataset

Metric

Nelson Fumo and M.A. Rafe Biswas, 2015 [10]

Simple and multiple regression analysis

Case study: TxAIRE Research and Demonstration House#1 (Energy consumption, time, outdoor dry bulb temperature and global horizon radiation) University of Granada (Energy consumption and temperature)

Coefficient of determination, adjusted coefficient of determination and RMSE

Case study: French residential low energy building standards (Energy consumption, climatic data, occupancy profile, and operating conditions) Energy consumption data from Rinker hall (Climatic, occupancy and temporal data) Real time dataset (Energy consumption, climatic and temporal data)

Coefficient of determination and RMSE

Pecan street (energy consumption data)

MSE

Two real time datasets (Energy consumption, climatic, occupancy and temporal data) Rinker hall and Fine arts building C (energy consumption, weather, occupancy and temporal data) University of Granada (energy consumption and temperature)

MAPE and coefficient of variation

Ruiz L.G.B, Cuéllar M.P, Flores M.D.C, and Jiménez M.D.C.P, 2016 [11] Paudel S, Elmitri M, Couturier S, Nguyen P.H, Kamphuis R, Lacarrière B, and Corre O.L, 2017 [12] Wang Z, Wang Y, and Srinivasan R.S, 2018 [13] Ahmad T, Chen H, Huang R, Yabin G, Wang J, Shair J, Akram, H.M.A, Mohsan S.A.H, and Kazim M, 2018 [14] Muralitharan K, Sakthivel R, and Vishnuvarthan R, 2018 [15] Li K, Xie X, Xue W, Dai X, Chen X, and Yang X, 2018 [16] Wang Z, Wang Y, Zeng R, Srnivasan R.S, and Ahrentzen S, 2018 [2] Ruiz L.B.G, Rueda R, Cuéllar M.P, and Pegalajar M.C, 2018 [17]

Xiao J, Li Y, Xie L, Liu D, and Huang J, 2018 [18]

Mohan N, Soman K.P, and Kumar S.S, 2018 [19] Barman M, Choudhury N.B.D, and Sutradhar S, 2018 [20] Zhang J, Wei Y.M, Li D, Tan Z, and Zhou J, 2018 [21] Fan C, Wang J, Gang W, and Li S, 2019 [22] He F, Zhou J, Feng Z.K, Liu G, and Yang Y, 2019 [5] Wu Z, Zhao X, Ma Y, and Zhao X, 2019 [23] Lang Y, Niu D, and Hong W.C, 2019 [24]

Yang A, Li W, and Yang X, 2019 [25]

Yang Y, Che J, Deng C, and Li L, 2019 [26] Fan C, Sun Y, Zhao Y, Song M, and Wang J, 2019 [27] Zhong H, Wang J, Jia H, Mu Y, and Lv S, 2019 [6]

auto regressive model • Non-linear auto regressive model with • Non-linear exogenous inputs Vector Machine • Support • Relevant data Vs. All data Ensemble bagging trees Decision Tree • Binary Regression Gaussian Process • Compact Gaussian Processes Regression • Stepwise Linear Regression Model • Generalized neural network • Conventional network based genetic algorithm • Neural network based particle swarm • Neural optimization algorithm

Improved teaching learning based optimization algorithm based artificial neural network (feedback phase, accuracy factors, elite strategy) Random forest Learning model (Non-linear auto regressive model, Non-linear auto regressive model with exogenous inputs, Elman neural network and Elman neural network with exogenous input) weights optimized with CHC genetic algorithm Hybrid forecasting model - Group Method of Data Handling (GMDH) selective ensemble GMDH-based autoregressive model AdaBoost ensemble - Backpropagation NN, SVR, Genetic programming and RBFNN Dynamic mode decomposition

• • •

Grasshopper optimization algorithm based SVM empirical mode decomposition • Improved integrated moving average • Autoregressive neural network • Wavelet • Fruit fly optimization Advanced recurrent neural network based strategies

mode decomposition • Variational short term memory • Long optimization algorithm • Bayesian general regression neural network • Modified sorting-based multi-objective • Non-dominated cuckoo search algorithm mode decomposition • Empirical redundancy maximal relevance • Minimal regression neural network with fruit • General fly optimization algorithm correlation function • Auto wolf optimization algorithm • Grey validation • Cross Least squares SVM • Sequential grid approach based support vector

regression Deep learning techniques for feature engineering Vector-field based support vector regression

MSE

Mean and standard deviation of coefficient of determination, RMSE, and MAPE MAPE and coefficient of variation

Coefficient of determination, RMSE, MAPE, and performance index MSE

Chinese total energy consumption and total oil consumption from 1978 to 2014 - China Statistical Yearbook

RMSE and MAPE

Australian energy market operator and North American electric utility (power consumption data) Case study: Assam Energy consumption dispatch centre (Energy consumption data) Case study: Australia and New York City (Energy consumption and temperature data)

RMSE, MAE, MAPE, and run time MAPE MAE, MAPE, MPE, and RMSE

Educational building in Hong Kong (Energy consumption, climatic, temporal and operation data) Hubei province, China (Energy consumption, temporal and climatic data) Energy consumption data from five states of Australia

RMSE, MAE, and Coefficient of Variation of the RMSE

Case study: Energy consumption data in Langfang, China

Relative Error (RE), MAE, RMSE, MAPE, and Theil'sinequality coefficient (TIC)

Energy consumption data from New South Wales(NSW), Victoria (VIC) and Queensland (QLD) in Australia

MAE, MAPE, andR2

Energy consumption data from Jiangxi province and Californiaelectric utility Case study: Building operation data from an educational building in Hong Kong Case study: Office building in a coastal town of Tianjin, China

MAE

RMSE, MAE, MAPE, andR2 Average Error (AE), MAE, MSE, and MAPE

RMSE, MAE, and CV-RMSE Correlation coefficient (R), MAE, RMSE, Relative Absolute Error (RAE), and Root Relative Squared Error (RRSE)

(continued on next page) 3

Applied Energy 261 (2020) 114131

N. Somu, et al.

Table 1 (continued) Author

Technique

Dataset

Metric

Seyedzadeh S, Rahimian F.P, Rastogi P, and Glesk I, 2019 [28]

Tuning machine learning models (artificial neural networks, support vector machine, gaussian process, random forest and gradient boosted regression trees) Parallel learning scheme and generative adversarial nets

Ecotect and EnergyPlus dataset

RMSE, MAE, andR2

Building energy consumption data from retail building at Fremont, CA and Easton centre, Beijing Individual household electric power consumption dataset from UCI machine learning repository

MAE, MAPE, and Pearson correlation coefficient

Tian C, Li C, Zhang G, and Lv Y, 2019 [7] Tae Young Kim and Sung Bae Cho, 2019 [29]

Convolutional neural network and long short term memory

based on a set of random candidate solutions and intensive search space with simple sine and cosine functions; (ii) adaptive range of SCOA makes it to switch from exploration [ < 1, > 1] to exploitation [−1,1] using simple sine and cosine functions; (iii) tendency towards the best region of the search space since the solution update their positions around the best solution obtained so far. Further, the performance of SCOA on different benchmark functions (unimodal, multi-modal, composite, etc.) shows that it outperforms the traditional optimization techniques such as genetic algorithm, particle swarm optimization algorithm, bat algorithm, and gravitational search algorithm [39,41,42]. Despite these advantages, the accuracy and the convergence of the SCOA are affected by tuning and randomness of a few internal parameters. To address the above-said challenges in SCOA, several research works have been carried out through the hybridization of SCOA with other meta-heuristic algorithms (particle swarm optimization, whale optimization, grey wolf optimization, etc.) and the incorporation of various strategies like elitism, opposition based learning, chaos theory, etc [38].

1. Multi-variate or Univariate: ISCOA-LSTM works for multivariate and univariate time series energy consumption data. 2. Pure or hybrid LSTM to achieve forecast accuracy: ISCOA-LSTM improves the performance of the LSTM through the identification of the optimal hyperparameters (learning rate, momentum, weight decay, and the number of hidden units) using an improved version of sine cosine optimization. 3. ISCOA + LSTM? The application of Haar wavelet operator to improve the divergence and convergence nature of a simple and efficient sine cosine optimization algorithm proves to be an efficient solution for the identification of optimal hyperparameter values of LSTM in real time. 4. Live or Static Data? ISCOA-LSTM has been implemented on the building energy management system designed for KReSIT, an academic building in IIT Bombay. With the insights on the unique feature of ISCOA-LSTM for energy consumption prediction, the major contributions of this research are highlighted as follows:

1.5. Research gaps

1. We present eDemand, an energy consumption forecasting model and ISCOA-LSTM, an improved version of LSTM for accurate energy consumption forecast with respect to Short Term Forecast (STF), Midterm Forecast (MTF), and Long Term Forecast (LTF). 2. ISCOA, an improved version of SCOA, was used for the identification of optimal hyperparameter values for learning rate, weight decay, momentum, and the number of hidden units to enhance the forecasting accuracy of LSTM for building energy consumption forecasting. 3. The performance of the traditional SCOA was improved with the introduction of a novel mutation operator based on Haar wavelet to minimize the trade-off between the exploration and exploitation of populations, thereby avoiding premature convergence. 4. The effectiveness of ISCOA-LSTM on building energy consumption forecasting problem was demonstrated using a case study on realtime energy consumption data obtained from Kanwal Rekhi building, an academic building at Indian Institute of Technology, Bombay (IIT-B) in terms of various quality metrics. 5. The experimental validations were carried out for different forecasting scenarios, namely LTF, MTF, and STF & compared with the state-of-the-art building energy consumption forecasting models. 6. Further, we have also studied the impact of the number of hidden layers, hidden units, and drop out factor on the forecasting accuracy of ISCOA-LSTM for LTF, MTF, and STF. 7. ISCOA-LSTM can be applied to the design of demand-side management programs, pricing strategies, transmission expansion plans, energy anomaly detection, electricity theft, and future load & power generation prediction.

With the extensive literature study on the building energy forecasting (academic, commercial, and residential buildings, the identified research gaps were highlighted as follows:

• Recent literature employs pure LSTM or a hybrid model (Time series





MSE, RMSE, MAE, and MAPE

decomposition + Optimization + LSTM) that focuses on improving the forecast accuracy of the model for energy consumption prediction. In the hybrid model, each technique employed to carry out a specific task and not to tune hyperparameters of LSTM for improving its performance to achieve minimal forecast error. Hence, the research works on improving the performance of the LSTM for its application to specific real-world problems remains nil. According to T Y Kim and S B Cho, “The potential limitation of the learning model lies in relatively large efforts by trial and error to determine the optimal hyperparameters. To work out this problem, we need to automate searching for the best hyperparameters that can automatically search for the hyperparameter space of the learning model” [29]. Further, most of the research works have demonstrated the performance of LSTM for energy consumption prediction using static data (benchmark datasets) rather than a working model on a real time building operational data.

1.6. Novelty and contributions With the existence of similar methods and algorithms in the recent literature, it is a general query that comes up, “How is ISCOA-LSTM different towards its application to energy consumption forecasting?”. This question can be answered up by highlighting the unique features of ISCOA-LSTM as follows:

The rest of the paper is organized as follows. Section 2 provides an insight into basic terminologies of energy consumption forecasting problem, RNN, LSTM, and sine cosine optimization algorithm. Section 3 4

Applied Energy 261 (2020) 114131

N. Somu, et al.

introduces eDemand, an energy consumption forecasting model and ISCOA-LSTM, an improved sine cosine optimization algorithm based LSTM for building energy consumption forecasting. Section 4 discusses the performance evaluation of ISCOA-LSTM over the existing building energy consumption forecasting models in terms of various quality metrics. Section 5 concludes the paper.

of the ith sensor at the t th timestamp. The input window size and forecast window size can be adjusted based on the nature of the forecast, i.e., short term, midterm, and long term. 2.2. Long short term memory In general, neural network models can be categorized into Feed Forward Neural Network (FFNN) and Recurrent Neural Network (RNN). FFNNs are extensively used to process the data in the spatial domain, leaving out the data occurrence with respect to time, i.e., temporal information. On the other hand, RNN architectures can be viewed as loopback architecture with interconnected neurons for modeling both sequential and time dependencies among the data on a larger scale [43]. The standard representation of RNN architecture is given in Fig. 2. Each node in the network receives the input from the current state (x t ) and the hidden state values of the hidden layers from the previous state (h(t 1) ) . In simpler terms, a single neuron and its feedback loop act as an information processing unit and memory respectively, such that the input at time t will have an impact on the future output of the network through recurrent connections. Eqs. (6) and (7) provide the fundamental computations of RNN.

2. Materials and methods This section provides detailed insight into the mathematical background of energy consumption forecasting, long short term memory neural networks, and sine cosine optimization algorithm. 2.1. Mathematical formulation of energy consumption forecasting problem Let us consider a multivariate building energy forecasting problem with ‘n’ distinct variables, i.e., sensors deployed at the various parts of the building. For each timestamp, the electrical energy consumed by various components, i.e., electrical appliances in the building, is recorded by the sensors and represented as in Eq. (1). (1)

Xt = {xt1, xt2 , ..,xti , ..,x tn}

where represents the energy consumption recorded by the sensor at t th timestamp. As most of the data driven based forecasting models utilize a window-based approach for forecasting, let {Il, Ol} {N } be the input window and forecast window size, respectively. Therefore, the total number of input and forecast windows are k = (Sn Il a) Ol , where Sn is the total number of samples, and a is the forecast interval. The input window (SI ) of size Il is represented as in Eq. (2) (see Fig. 1).

x ti

ith

X t+k }

ROl xn

(3)

(4)

From Eq. (4), it is obvious that for a given time window (IL) , i.e., the current state of the sensors, the model (f ) learns to forecast their values for the time window (OL) with minimal forecast error (FE ) (Eq. (5)).

FE =

1 n

where

n

|x ti i=1

x ti

and

x ti

x ti |

ht + b y )

1

(6)

+ bh)

(7)

where is the activation function; and are the weight matrix of the input-hidden layer, hidden-hidden layer and hiddenoutput layer respectively; bh and b y are the hidden and output bias, respectively. In general, back propagation through time is used to learn the weights of the RNN network connections; however, such concept is vulnerable when handling long term dependencies. Since these values get back-propagated into the activation functions, RNN suffers from the “vanishing or exploding gradient problem” due to the propagation of local errors while handling sequences of long intervals. LSTM, a significant advancement over RNN, uses ‘self-connected’ memory cells and gate units in the hidden layer to address the ‘vanishing gradient problem’ in RNN [33]. The self-connected memory cells enable the model to learn the long term dependencies while handling sequential data. Further, the four gate units, namely input gate (it ) , update gate (g t ) , forget gate (f t ) and output gate (ot ) enables the model to write/update, forget and read the information from the memory cells respectively. Altogether, LSTM with self-connected memory cells, four gate units, input node, and internal state node provides an intelligent approach to sustain constant error through retaining relevant

Therefore, using the LSTM neural network, a non-linear approximation function (f ) that relates Eqs. (2) and (3) can be defined as in Eq. (4).

So = f (Si ), f : R Il xn

(W yh

ht

W xh,

Similarly, the forecast window (So) of size Ol is denoted as in Eq. (3).

So = {Xt , Xt + 1,

x t + Whh

yt =

(2)

SI = {Xt , Xt + 1 , ..,Xt + k }

ht = (W xh

(5)

are the actual and forecasted energy consumption value

Fig. 1. Forecasting energy consumption value with input window size and forecast window size. 5

Whh

W yh

Applied Energy 261 (2020) 114131

N. Somu, et al.

Fig. 2. Simple recurrent neural network – Unrolling in time.

memory unit based on the activation function which binds the output values and determines which value will be provided as an output by the output gate (ot ) .

ot =

(Wox

x t + Woh

ht

1

(12)

+ bo )

where Wix , Wcx , Wfx and Wox are the input weight matrices, respectively; W fh, Wch, Wih and Woh are the recurrent weight matrices, respectively. Finally, the hidden state of the output unit (ht ) and the overall output of the LSTM unit (y t ) are computed using Eqs. (13) and (14) respectively.

ht = ot yt =

information and ‘forgetting’ irrelevant information. Fig. 3 provides detailed insight into the internal architecture of LSTM. Unlike traditional RNN, LSTM consists of memory blocks with one or more memory cells acting as neurons with multiplicative gates (input (it ) , update (g t ) , forget (f t ) and output(ot ) ). The input gate (it ) and update gate (g t ) performs the operation of write function (input gate which values to write; update gate - create a vector of new cell values) in the memory cells of LSTM, while forget gate (f t ) scales the internal state of the cell to incorporate gradual forgetting in the memory cell. The output gate performs the read function which is then combined with the memory cell to compute the output of the memory cell (ht ) . Each gate is guided by its activation function (sigmoid or hyperbolic tangent function) that controls the flow of information in and out of the memory units. At time step t , the gates receive two inputs, i.e., input data at t (x t ) and the output of the same memory unit obtained from the previous time step (ht 1) . Eqs. (8)–(12) provides a set of equations that governs the working of each gate in LSTM.

(Wix

x t + Wih

g t = Tanh (Wcx

ht

1

x t + Wch

+ bi )

(8)

ht

(9)

1

+ bc )

(W fx

x t + W hf

ht

1

+ bf )

A group of optimization approaches known as metaheuristic algorithms has attracted the research community since they mimic natural behavior to find optimal solutions for real world problems. In this context, metaheuristic algorithms (individual or population-based) fall into four categories, namely evolutionary theory (genetic algorithm, differential evolution, etc.), physical or mathematical concept (sine cosine optimization, gravitational search algorithm, etc.), swarm (particle swarm optimization, whale optimization, etc.) and human concepts (teacher learning, mine blast, etc.) – based algorithms with respect to the different metaphors. Individual or population-based metaheuristic algorithms have shown their outstanding performance in solving several combinatorial optimization problems including parameter optimization and parameter setting [38,44–47]. Sine Cosine Optimization Algorithm (SCOA) is a population based meta heuristic algorithm proposed by Seyedali Mirjalili that uses simple sine and cosine mathematical operators for solving optimization problems [37]. In SCOA, the search for the global optimal solution initiates with the set of random candidate solutions (positions) and uses sine and cosine functions to update their position either towards or outwards the best solution (Eqs. (15)–(17)). Different regions of the search space were explored when the sine and cosine functions return the values which are greater than one or lesser than one. Similarly, the promising regions of the search space were exploited when the values returned by the sine and cosine functions lies in the range [−1,1].

(10)

Each memory unit recursively updates its values through the interaction of the previous state values (t 1) with write and forget gate values.

ct = f t

ct

1

+ it

(11)

gt

The output gate

(o t )

(14)

ht + b y )

2.3. Sine cosine optimization algorithm

The forget gate (f t ) regulates the amount of the information to be deleted from the memory unit (range: (0,1); 0-forget all and 1-remember all).

ft =

(13)

where Why is the hidden output weight matrix; bn is the bias (n {i , c , f , h}) . As in standard practice, the initial values of the weights and bias of the LSTMs are randomly generated during the training process. In general, the backpropagation algorithm that employs Standard Gradient Descent (SGD) method was used to update weights and bias. However, the performance of the SGD relies on the hyperparameters like learning rate, weight decay, momentum, number of hidden units, etc. Hence, in this work, we attempt to identify the optimal values of these hyperparameters to improve the forecasting accuracy of LSTM for time series problems.

Fig. 3. Long short term memory – Memory block (memory cell and gate units).

it =

(Why

Tanh (c t )

Xit + 1 = Xit + r1

controls the flow of information out of the 6

Sin(r2 )

r3 Pit

Xit

(15)

Applied Energy 261 (2020) 114131

N. Somu, et al.

Xit + 1 = Xit + r1

Cos(r2 )

r3 Pit

energy management system, characterized by several digital controllers (sensors, actuators, etc.) that provide an asynchronous communication architecture to interact with distributed automation devices. The automated software collects, aggregates and stores the building energy consumption data and other relevant factors such as occupancy, climatic data (temperature, humidity, etc.) and device operation status in the database for further processing. We have employed correlation coefficient analysis to study the correlation between energy consumption and the relevant factors. More details on the nature of the data are given in Section 4.1.

(16)

Xit

Combining Eqs. (14) and (15),

Xit + 1 =

Xit + r1 Xit

+ r1

Sin(r2 )

r3 Pit

Xit , r4 < 0.5

Cos(r2 )

r3 Pit

Xit , r4

0.5

(17)

where Xit is the current candidate’s position in the ith dimension at the t th iteration; Pit is the best candidate’s position in the ith dimension at the t th iteration; r1, r2, r3 and r4 are the random variables. The parameter r1 is responsible for determining the next search region to be explored (Eq. (18)), r2 defines the direction of movement towards or away from the best solution and lies in the range [0, 2 ], r3 is a random weight that stochastically emphasizes or deemphasizes the effect of destination on the current movement and r4 is a random number in the range of [0,1], that balance between the exploration and exploitation of the search space, by switching between the sine and cosine functions.

r1 =

1

t T

(ii) Data pre-processing layer The building automation software collects and stores the raw data which usually contains noisy, unreliable, incomplete, and missing data due to faulty or broken devices, transmission errors, etc. In general, irregular patterns and missing data are handled by moving average filter, sliding window and linear interpolation techniques [11,17]. In addition to that, min-max normalization is employed to normalize the data in the range of [0,1] and to ease the stable convergence of weights and bias of the learning model. Further, the pre-processed energy consumption dataset is divided in the ratio of 60:20:20 for training, evaluation, and testing respectively in a random fashion.

(18)

where is the constant; t is the current iteration; T is the total number of iterations. Algorithm 1 provides the pseudocode of SCOA. The time complexity of SCOA is O (N T C ) , where N is the total number of candidates and C is the time cost for updating the position of each candidate.

(iii) Data analytics layer

Algorithm 1: Sine Cosine Algorithm (SCOA)

The data analytics layer employs LSTM networks to forecast energy consumption at the user-specified time. Further, an improved sine cosine optimization algorithm is used iteratively to optimize the hyperparameters of LSTM (learning rate, number of hidden layers, momentum, and decay factor) to enhance its forecasting accuracy. The validation of ISCOA-LSTM was assessed using the test dataset (20% of the energy consumption dataset). The learning process of ISCOA-LSTM completes when the fitness value (mean square error) is minimum, i.e., the difference between the forecasted value and the actual value is minimum.

Input N Total number of candidates T Maximum number of iterations Output X Best candidate SCOA () 1. Randomly generate the position for N candidates 2. While(!Termination) do 3. Evaluate each candidate using the fitness function 4. Identify the best candidate (X ) 5. Update r 1, r 2, r3, r4 6. Update the position of each candidate using Eq. (16) 7. end 8. returnX

(iv) Application layer

Due to its inherent benefits such as fast convergence speed, high optimization accuracy, global search ability and less number of tuning parameters, SCOA has been successfully adopted for various research problems in handwritten text recognition, wind speed forecasting, object tracking, etc. Besides, the performance of the SCOA is also verified for multi-objective optimization problems with several unimodal, multi-modal and complex benchmark functions.

After the completion of the validation process, the energy consumption for the user-specified time interval is forecasted using ISCOALSTM. 3.2. ISCOA-LSTM: Proposed data driven approach for energy consumption forecasting The main objective of ISCOA-LSTM is to minimize the trade-off between the computational complexity and the forecasting error of LSTM through the identification of the optimal combination of the hyperparameters, i.e., learning rate, weight decay, momentum and number of hidden units. The overall working of ISCOA-LSTM can be elaborated through four distinct phases (Algorithm 2), namely (i) Encoding strategy, i.e., generation of population, (ii) Hyperparameter optimization, (iii) Population updation, i.e., update the position of each population using Haar wavelet based mutation operator and (iv) Performance evaluation of ISCOA-LSTM (Fig. 5). A step by step procedure on the working of each phase of ISCOA-LSTM is detailed below. Step 1: Data Pre-processing:- Normalize the historical power consumption dataset (DPC = x (t ); t = {1, 2, , n}) in the range [0,1]. Generate the training (DTrain) , evaluation (DEval ) and testing samples (DTest ) in the ratio of 60:20:20, respectively using random sampling without replacement technique.

3. Proposed energy consumption forecasting model This section provides detailed explanations on the eDemand-the proposed energy consumption forecasting model and Improved Sine Cosine Optimization Algorithm based LSTM (ISCOA-LSTM), a novel data-driven approach for accurate energy consumption forecasting. 3.1. eDemand – Architecture Fig. 4 presents the generic architecture of eDemand, the proposed energy consumption forecasting model. eDemand consists of four layers, namely (i) Data acquisition and storage layer, (ii) Data preprocessing layer, (iii) Data analytics layer, and (iv) Application layer. Each layer in eDemand is composed of several modules to perform their intended functions, as detailed below. (i) Data acquisition and storage layer

Algorithm 2: ISCOA-LSTM based energy consumption forecasting model Input:

Smart buildings employ a centralized software, namely a building 7

Applied Energy 261 (2020) 114131

N. Somu, et al.

Fig. 4. eDemand: Architecture of the proposed building energy consumption forecasting model. Algorithm 2: ISCOA-LSTM based energy consumption forecasting model

Algorithm 2: ISCOA-LSTM based energy consumption forecasting model

DEC x (t ); {t = 1, 2, ...,n} // Energy consumption dataset Output: Optimal learning rate Optimal weight decay Optimal momentum rate Optimal number of Hidden units ISCOA LSTM () *** Parameter Initialization *** 1. Initialize the values of number of population (NPoP ) , maximum number of 0, GBestFit 0 0 , Fit 0 , BestFit generation (TMax ) , t *** Generation of Training, Evaluation and Testing dataset *** x (t ); {t = 1, 2, , n 0.6} 2. DTrain x (t ); {t = ((n 0.6) + 1), (n 0.8)} 3. DEval x (t );{t =((n 0.8) + 1, , n) 4. DTest *** Identification of Optimal Hyper parameters *** 1toNPop begin 5. for each i 6. Randomly generate the position of the particle PoPi and compute the hyper parameters using Eqs. (19) and (20) respectively. 7. Train the LSTM using computed parameters and DTrain 8. Calculate the mean square error for PoPi with DEval MSE 9. Fit [i] 10. end Max (Fit ) 11. BestFit

BestFit 12. GBestFit PoP (Bestindex ) 13. GBestPoP 14. while (t Tmax ) begin 15. Update the position of each particle with GBestPoP using Eq. (23) 1toNPop begin 16. for each i 17. Compute the hyperparameters from PoPi using Eq. (20) 18. Train the LSTM using computed parameters and DTrain 19. Calculate the mean square error for PoPi with DEval Fit [i] MSE 20. 21. end for Max (Fit ) 22. BestFit 23. if (BestFit > GBestFit ) begin GBestFit BestFit 24. GBestPoP PoP (Bestindex ) 25. 26. end if 27. Update the position of each particle with GBestPoP using Eq. (23) 28. if (rand Mt ) begin 29. Update the position of each particle using Eq. (24) 30. end if t+1 31. t 32. end while 33. Compute MAE, MAPE, MSE, RMSE, Theil U1 and Theil U2 with optimal parameters and DTest

Fig. 5. ISCOA-LSTM: Workflow of the proposed building energy consumption forecasting model. 8

Applied Energy 261 (2020) 114131

N. Somu, et al.

Table 2 Iteration 0: Initial population representation.

Mt = 1

0.34 0.21 0.86 0.24 0.11

PoP1 PoP2 PoP3 PoP4 PoP5

0.25 0.42 0.54 0.42 0.72

0.72 0.41 0.72 0.52 0.52

0.62 0.11 0.57 0.72 0.51

PoPit + 1 = where

=

Step 2: Encoding strategy:- In traditional SCOA, the populations were generated randomly within a specified range ([Lowerlimit , Upperlimit ]), and the best population for the position update was obtained based on the fitness value. ISCOA-LSTM employs a vector encoding strategy for the generation of initial population as it must optimize the multiple parameters (learning rate, weight decay, momentum, and number of hidden units), each with a unique range. In this encoding strategy, the position of each population is represented in the form of a vector whose length corresponds to the number of parameters that needs to be optimized. For example, in ISCOA-LSTM, we optimize four hyperparameters, namely learning rate ( i ) , weight decay ( i ) , momentum factor ( i ) and number of hidden units ( i ) therefore, each population vector is represented as in Eq. (19). i , i,

i ];

i = (1, 2, ...,NPoP )

fv = PoPMin + [PoPMax

PoPMin ]

PoPv

yForecast ]2

(PoPMax

PoPit

(PoPit

+

PoPit ), if

<0

PoPMin), otherwise

(23)

is the Haar wavelet function (Eq. (23)).

1 e a

( ax )

2

cos

5x a

(24)

3.2.1. Example To improve the understanding of the working of ISCOA-LSTM, the proposed data driven approach for energy consumption forecasting, let us consider a sample dataset with 1000 observations obtained from the considered energy consumption dataset from KReSIT, IIT Bombay. As mentioned earlier, the major objective of ISCOA-LSTM is to achieve minimum forecast error through the identification of the optimal values for learning rate, weight decay, momentum factor and number of hidden units. The working of ISCOA-LSTM begins with the initialization of NPoP = 4, TMax = 10, t = 0, Fit = 0, BestFit = 0, GBestFit = 0. Subsequently, the input dataset was divided into training (DTrain) , evaluation (DEval ) and testing (DTest ) dataset with 600, 200, and 200 samples, respectively (Line). According to Step 2 (Encoding strategy), the initial set of population for ISCOA represented in the form of a vector (length = 4) is generated randomly in the range of [0,1], as reported in Table 2. Further, the LSTM is trained with the samples from DTrain and the values of the hyperparameters are computed from each population vector using Eq. (15) (Table 3). The fitness value of each population is obtained through evaluating the LSTM with DEval in terms of MSE (Eq. (16)), and the population with the minimum fitness value (MSE) is considered as the best solution for the current iteration. Table 4 presents the fitness value of each population during iteration 0. If the terminations condition fails, the position update procedure for the subsequent iteration is carried out based on the position of the best population and few random variables (r1, r2, r3 and r4) of ISCOA (Table 5). Further, the mutation probability (Mt ) is computed. On the satisfaction of the condition (rand Mt ) , the Haar wavelet based mutation operator (position update) is applied to the updated population at random points and the mutated population is fed as inputs to the subsequent iteration (Table 6); Else, the updated population is passed on to the next iteration; Finally, the fitness of each population with the updated position is computed and the best population is identified (Table 7). This process repeats until the termination condition is satisfied (maximum number of generation). Table 8 provides the best fitness value (MSE) for ten iterations. Further, the performance of LSTM with respect to the optimal hyperparameters is verified in term of MAE, MAPE, MSE, RMSE, Theil U1 and Theil U2 using DTest .

(19)

(20)

where PoPMin and PoPMax are the minimum and maximum values of the hyperparameter; Pv is the randomly generated population. Step 3: Train LSTM:- During the training process, the hyperparameters obtained from each population (Step 2) along with the training dataset (DTrain) is used to train the LSTM. Step 4: Evaluate LSTM:- During the evaluation process, the performance of LSTM trained with the hyperparameters from each population is validated using the evaluation dataset (DEval ) in terms of Mean Square Error (MSE) as a fitness function (Eq. (21)).

1 [y n, Eval

PoPit +

where a is a random number in the range [−2.5,2.5].

where NPoP is the total number of population. A major drawback of vector encoding strategy is that the population vector can be generated only in the specified range. Therefore, we generate each population vector in a random fashion in the range of [0,1] and convert them into their corresponding parameter-specific range using Eq. (20).

MSE =

(22)

where t is the current iteration; T is the maximum number of iterations.

Population

PoPi = [ i,

t T

(21)

where n is the number of samples in the evaluation dataset; yEval and yForecast are the actual and forecasted value respectively. Step 5: Termination condition: - The evaluation process of LSTM returns the fitness value of each population and the population with least MSE (fitness value) is identified as the potential solution. Further, the performance of the LSTM is evaluated in terms of MAE, MAPE, MSE, RMSE, Theil U1 and Theil U2 using the test dataset (DTest ) on the attainment of termination condition (maximum number of iterations). Else, the position of each population is updated using Step 6. Step 6: Position update:- The position of each population is updated using Eq. (17). According to Elaziz, M. A, Oliva D, and Xiong S, “One of the significant drawbacks of traditional SCOA is the convergence at the local optima due to the randomness of its internal parameters” [38]. To overcome the above mentioned challenge in conventional SCOA, we have introduced a Haar wavelet based mutation operator for position update in SCOA. During this procedure, a mutation probability (Mt ) is computed using Eq. (22); if the condition (rand Mt ) is satisfied, then the random points (hyperparameters) are selected from the respective population vector and their corresponding mutated values are obtained using Eq. (23). '

Table 3 Iteration 0: Parameter values. Population

9

PoP1

9 × 10

3

8 × 10

8

PoP2

2 × 10

3

2 × 10

6

PoP3

3 × 10

3

5 × 10

6

PoP4

6 × 10

3

1 × 10

4

PoP5

4 × 10

3

2 × 10

6

0.84

20

0.73

38

0.52 0.62 0.41

14 42 32

Applied Energy 261 (2020) 114131

N. Somu, et al.

4.1. Dataset description

Table 4 Iteration 0: Fitness value. Population

Fitness value

PoP1 PoP2 PoP3 PoP4 PoP5

0.605 0.730 0.705 0.886 0.645

“Of course, making use of multiple inputs attains good contact with the actual of forecasting, but objective this association is to demonstrate that including fewer input parameters, satiating forecasting performance and accuracy can be attained.” - Ahmad T, Chen H, Huang R, Yabin G, Wang J, Shair J, Akram, H.M.A, Mohsan S.A.H, and Kazim M, 2018 [14]. “But on the other hand, incorporating too many variables, makes the model much more complicated and inserts uncertainty into the system because it would depend on known variables.”

Table 5 Population updation: Before mutation.

– Ruiz L.B.G, Rueda R, Cuéllar M.P, and Pegalajar M.C, 2018 [17].

Population 0.72

42

0.41

42

PoP1

2.3 × 10

3

4.2 × 10

8

PoP2

0.4 × 10

4

5.4 × 10

6

PoP3

5.2 × 10

3

8.7 × 10

6

PoP4

4.6 × 10

3

8.4 × 10

4

PoP5

6.2 × 10

3

2.1 × 10

6

0.71

0.72

42

0.41

15

0.73 0.72

The performance of the ISCOA-LSTM, the proposed building energy consumption forecasting model was assessed using the building energy consumption data obtained from Kanwal Rekhi School of Information Technology (KReSIT), an academic building in Indian Institute of Technology (IIT), Mumbai, India. The annual average temperature of Mumbai is 17–34 °C with three seasons namely summer (March-May), monsoon (June-September), and winter (October-February). KReSIT is a four-storeyed building which comprises of classrooms, offices, lecture halls, laboratories, research centers, and server room. Each floor in KReSIT comprises of three wings namely Wing A, Wing B, and Wing C. KReSIT owns a Building Energy Management System (BEMS) (Objective: ‘Zero consumption during zero occupancy” and “Minimal power consumption in an occupied room”) which is used to monitor and control the power consumption and temperature profiles in its distributed facilities. Fig. 6 and Fig. 7 provide the architecture and overview of BEMS at KReSIT building respectively. The smart meters, i.e., Advanced Metering Infrastructure (AMI) deployed at the facilities record the power consumption data of the air conditioners, light, fans, and power sockets at per second granularity. The energy consumption data from the smart meters are passed on to the Raspberry Pi through a MODBUS communication protocol, and the data from the sensor nodes (temperature, humidity, etc.) are passed on to the gateway, which is then sent to the server for further process. In the server, the Message Queuing Telemetry Transport (MQTT) broker receives the data from the Raspberry Pi and gateway module over Ethernet or wireless links. The data logger maps the publish and subscribe for efficient live streaming and storage of the power consumption data at per-second granularity. Further, Apache Spark was employed to aggregate the live power consumption data based on the required granularity level. Table 9 provides the attributes of the KReSIT power consumption dataset. The overall power consumption data (W) recorded at the smart meter installed at MAINS was used for experimentation purposes. The complete energy consumption dataset of KReSIT is found in [48]. For experimental analysis, we have considered the energy consumption data for the period of two years from January 2017 to October 2018 at 30 min granularity. Fig. 8(a) and (b) provides the descriptive statistics of the energy consumption data for the year 2017 and 2018, respectively. The climatic data were obtained from the nearest weather station (Ankur Puranik Wadala East Antophill Warehousing Complex, Mumbai) [49]. Figs. 9–11 presents the temperature, humidity and dew point of Mumbai for 2017.

16 28 18

Table 6 Population updation: After mutation. Population

PoP1

2.3 × 10

3

3.1 × 10

8

PoP2

6.1 × 10

4

5.4 × 10

6

PoP3

5.2 × 10

3

1.9 × 10

3

PoP4

5.3 × 10

3

8.4 × 10

4

PoP5

6.2 × 10

3

4.8 × 10

6

0.40 0.18 0.31

16 27 18

Table 7 Iteration 1: Fitness value. Population

Fitness value

PoP1 PoP2 PoP3 PoP4 PoP5

0.576 0.427 0.286 0.237 0.414

Table 8 Best fitness value summary. Population

Fitness value

Iteration 1 Iteration 2 Iteration 3 Iteration 4 Iteration 5 Iteration 6 Iteration 7 Iteration 8 Iteration 9 Iteration 10

0.237 0.228 0.211 0.174 0.166 0.153 0.026 0.017 0.009 0.003

4. Case study: ISCOA-LSTM on KReSIT building energy consumption data

4.2. Experimental setup

This section demonstrates the performance of ISCOA-LSTM over the state-of-the-art energy consumption forecasting model through a case study on the Kanwal Rekhi building energy consumption dataset in terms of various quality metrics.

ISCOA-LSTM, the proposed energy consumption forecasting model was implemented using Python 3.6 on an INTEL® Core™ i3 processor @ 2.40 GHz system with 12 GB RAM running Windows 10 operating system. The entire set of experiments was split up into three phases, namely (i) Data pre-processing, (ii) Generation of training, evaluation and testing datasets, and (iii) Performance validation using various 10

Applied Energy 261 (2020) 114131

N. Somu, et al.

Fig. 6. Building energy management system at KReSIT-Architecture.

quality metrics.

In this work, the min-max normalization technique was used to normalize the input and the output variables in the range of [0,1] to avoid inaccurate forecasting due to the presence of high magnitude energy consumption data (Eq. (25)).

4.2.1. Data pre-processing Data pre-processing, an initial and inevitable step in data analytic applications were used to transform the considered building energy consumption dataset into a compatible format for the learning model.

Fig. 7. Building energy management system at KReSIT, IIT-B – Overview of smart meters deployed. 11

Applied Energy 261 (2020) 114131

N. Somu, et al.

approaches, namely Auto Regressive Integrated Moving Average (ARIMA), Deep Belief Network (DBN) regression, Support Vector (SV) regression, Genetic Algorithm-LSTM (GA-LSTM), Particle Swarm Optimization-LSTM (PSO-LSTM) and Sine Cosine Optimization Algorithm-LSTM (SCOA-LSTM). The hyperparameters of DBN regression (learning rate, decay factor, number of hidden units, and penalty) and SV regression (C, epsilon, and, kernel) were tuned using grid search (Python package: scikit-learn; GridSearchCV) [50]. The statistical measures are defined as follows:

Table 9 KReSIT power consumption dataset – attributes. S. no.

Attributes

Description

1. 2. 3. 4. 5. 6. 7. 8. 9. 10.

Timestamp W Temp Prec Dew Hum Press WSpeed WDay DType

Timestamp of the data (DD-MM-YYYY-H:M:S) Power consumption Temperature (F) Precipitation (%) Dew Point (F) Humidity (%) Pressure (in) Wind Speed (km/h) Workday type (Weekday and Weekend) Day type (Monday, Tuesday, …, Sunday)

fi =

fi (fi )Max

(fi )Min (fi )Min

(ii) (iii)

;

i = (1, 2,

, S)

N (|yi yi |) i=1 |y y | N 1 MAPE = N i = 1 i y i i 1 N MSE = N i = 1 (yi yi ) 2 N 1 RMSE = N i = 1 (yi yi )2

(i) MAE =

(iv)

(25)

(v) THEIL

where S is the total number of samples in the dataset; fMin and fMax are the minimum and the maximum value of the sample in the dataset. An important point to note is that the prediction errors reported in terms various quality metrics defined in Section 4.3 are in their original scale.

(vi) THEIL

1 N

(

U1 =

U2 =

)

100%

1 yi )2] 2 1 n [ i = 1 (yi2 )] 2

[ n i = 1 (yi

1 n 2 2 i = 1 (yi yi ) 1 1 n 2 2 + 1 n (y 2 ) 2 i = 1 (yi ) n i=1 i 1 n

1 n

4.2.2. Generation of training, evaluation and testing dataset In this phase, we generate the training, evaluation, and testing samples from the considered building energy consumption dataset to minimize the complexity of the learning model to achieve high forecasting accuracy. For experimentations, per second granularity energy consumption data was down-sampled for 30 min interval, i.e., 48 data points per day, 1488 data points per month, and 17,520 data points per year. Further, the energy consumption data were categorized into (i) Long term forecasting – Energy demand over a number of years, (ii) midterm forecasting – Energy demand for weeks to months, and (iii) Short term forecasting – Energy demand for days or weeks. Further, to study the impact of seasonality, the experiments were carried out using the building energy consumption data for three major seasons in Mumbai, namely (i) Summer–March to May, (ii) Monsoon-June to September, and (iii) Winter-October to February. The training and testing samples for the long term, midterm, and short term forecasting are given in Table 10. The evaluation samples correspond to 20% of the training dataset.

where yi is the forecasted energy consumption at time stamp i ; yi is the actual energy consumption at time stamp i ; N is the total number of data points in the dataset. Further, PMAE , PMAPE , PMSE , PRMSE , PTHEIL U1 and PTHEIL U 2 were used to demonstrate the improvement of ISCOA-LSTM over the considered data-driven approaches [5]. The positive values of PMAE , PMAPE , PMSE , PRMSE , PTHEIL U1 and PTHEIL U 2 represents that ISCOALSTM performs better than the data driven approaches considered for evaluation. The definition of PMAE , PMAPE , PMSE , PRMSE , PTHEIL U1 and PTHEIL U 2 is given below:

4.3. Performance metrics

(vi)

In general, the performance of data driven models was assessed using a set of statistical measures designed based on the actual and predicted outcomes. In this work, Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), Mean Square Error (MSE), Root Mean Square Error (RMSE), Theil U1, and Theil U2 statistics were selected as evaluation metrics to assess the forecasting accuracy of ISCOALSTM over the six state-of-the-art building energy consumption

where subscript 1 represents the evaluation metric of the ISCOA-LSTM, and subscript 2 represents the evaluation metric of the contrast model.

( =( =( =(

(i) PMAE = (ii) PMAPE (iii) PMSE (iv) (v)

MAE2 MAE1 MAE2

)

100%

MAPE2 MAPE1 MAPE2

MSE2 MSE1 MSE2

)

)

100%

100%

)

RMSE2 RMSE1 PRMSE 100% RMSE2 (THEIL U 12 ) (THEIL U 11) PTHEIL U 1 = THEIL U 12 (THEIL U 22) (THEILU 21) PTHEIL U 2 = THEILU 22

( (

)

)

100% 100%

4.4. Results and discussions This section discusses the various experimental and comparative

Fig. 8. Descriptive statistics of KReSIT energy consumption data– mean, maximum, minimum and standard deviation (a) 2017 (b) 2018. 12

Applied Energy 261 (2020) 114131

N. Somu, et al.

Fig. 9. Annual temperature of Mumbai-2017.

Fig. 10. Annual dew point of Mumbai-2017.

Fig. 11. Annual humidity of Mumbai-2017. Table 10 Training and testing samples. Forecasting Scenario

Training set

Testing set

From

To

No. of Data Points

From

To

No. of Data Points

Long Term Forecasting

1st January 2017

31st December 2017

17,520

1st January 2018

31st January 2018

1488

Midterm Forecasting Summer Monsoon Winter

1st March 2017 1st June 2017 1st October 2017

31st May 2017 31st August 2017 31st December 2017

1488

1st March 2018 1st June 2018 1st October 2018

31st March 2018 30th June 2018 31st October 2018

1488 1440 1488

Short Term Forecasting Summer Monsoon Winter

1st April 2017 1st August 2017 1st January 2017

30th April 2017 31st August 2017 31st January 2017

1440 1488

1st May 2017 1st September 2017 1st February 2017

7th May 2017 7th September 2017 7th February 2017

336

analysis carried out to prove the efficiency of ISCOA-LSTM over the considered contrast data driven models for accurate building energy consumption forecasting. The primary objective of this research work is to design an efficient and robust building energy consumption forecasting model with high forecast accuracy and minimal complexity. ISCOA-LSTM, the proposed version of LSTM, integrates the improved version of sine cosine optimization algorithm with the LSTM to forecast short-term, mid-term, and long-term energy consumption of KReSIT, an

academic building at Indian Institute of Technology, Bombay. The parameter setting of ISCOA-LSTM is given in Table 11. The effectiveness of ISCOA-LSTM was demonstrated over the six contrast energy consumption forecast models, namely ARIMA, deep belief network regression, support vector regression, GA-LSTM, PSOLSTM and SCOA-LSTM in terms of various quality metrics. The LSTM network was implemented by Keras framework with Tensorflow backend, and the rest of the approaches were implemented by scikit13

Applied Energy 261 (2020) 114131

N. Somu, et al.

optimal LSTM architecture, i.e., hidden layers, hidden units, and drop out for the considered energy consumption forecasting scenarios. Varying the number of hidden layers resulted in three distinct LSTM models, namely (i) Stacked LSTM 1 – One hidden layer, (ii) Stacked LSTM 2 – Two hidden layers, and (iii) Stacked LSTM 3 – Three hidden layers. Further, the hidden units with the range of [5,30] and drop out factor with the range of [0.1,0.5] were varied with the step size of 5 and 0.1 respectively. An important point to note is that, as we vary the number of hidden layers, we have utilized per layer dropout factor and interestingly the range of the dropout factor [0.1, 0.5] is kept common for all the hidden layers. Figs. 12–18 presents the impact of hidden units and drop out factor for the considered forecasting scenarios in terms of MSE. Table 12 provides the optimal LSTM network architecture (hidden units and drop out factor) for the considered forecasting scenarios. From the above figures and table, it was clear that stacked LSTM 1 (hidden units = 30 and drop out = 0.1) or stacked LSTM 1 (hidden units = 20 and drop out = 0.1) for long term forecast; stacked LSTM 1 (hidden units = 30 and drop out = 0.1) for midterm forecast (summer, monsoon, and winter); stacked LSTM 3 (hidden units = 25 and drop out = 0.1) for short term forecast (summer); stacked LSTM 1 (hidden units = 30 and drop out = 0.1) for short term forecast (monsoon and winter).

Table 11 Parameter setting: ISCOA-LSTM. S. no.

Parameter

Range

1. 2. 3.

Learning rate Momentum Weight decay

[0.001,0.1] [0.1,0.9]

4. 5. 6. 7. 8.

Number of hidden units Batch size Epoch Number of populations Number of iterations

[1 × 10 4, 1 × 10 6] [10,300] 1 100 5 50

learn machine learning library in Python. The average value (MAE, MAPE, MSE, RMSE, Theil U1, and Theil U2) obtained after 15 runs were considered for the performance evaluation of ISCOA-LSTM and the considered contrast models. (a) Optimal LSTM architecture: Hidden layers, hidden units and dropout As an initial step, the experiments were carried out to find the

Fig. 12. Impact of hidden units and dropout for (a) Stacked LSTM 1, (b) Stacked LSTM 2, and (c) Stacked LSTM 3-Long term forecasting.

Fig. 13. Impact of hidden units and dropout for (a) Stacked LSTM 1, (b) Stacked LSTM 2, and (c) Stacked LSTM 3-Midterm forecasting (Summer).

Fig. 14. Impact of hidden units and dropout for (a) Stacked LSTM 1, (b) Stacked LSTM 2, and (c) Stacked LSTM 3-idterm forecasting (Monsoon). 14

Applied Energy 261 (2020) 114131

N. Somu, et al.

Fig. 15. Impact of hidden units and dropout for (a) Stacked LSTM 1, (b) Stacked LSTM 2, and (c) Stacked LSTM 3-Midterm forecasting (Winter).

Fig. 16. Impact of hidden units and dropout for (a) Stacked LSTM 1, (b) Stacked LSTM 2, and (c) Stacked LSTM 3-Short term forecasting (Summer).

Fig. 17. Impact of hidden units and dropout for (a) Stacked LSTM 1, (b) Stacked LSTM 2, and (c) Stacked LSTM 3-Short term forecasting (Monsoon).

Fig. 18. Impact of hidden units and dropout for (a) Stacked LSTM 1, (b) Stacked LSTM 2, and (c) Stacked LSTM 3-Short term forecasting (Winter).

(b) ISCOA-LSTM Vs. state-of-the-art energy consumption forecasting models

MSE, RMSE, Theil U1 and Theil U2 (Tables 13–19). The identification of the optimal LSTM architecture for different forecasting scenarios enables ISCOA-LSTM to forecast the energy consumption value with a better fit to the real value, i.e., minimal forecast error. An in-depth analysis of the error metrics shows that ISCOA-LSTM outperforms the considered contrast energy consumption forecasting

Further, comparative analysis of ISCOA-LSTM with the considered contrast models was carried out based on the identified optimal architecture for different forecasting scenarios in terms of MAE, MAPE, 15

Applied Energy 261 (2020) 114131

N. Somu, et al.

Table 12 Optimal LSTM architecture – Hidden units and drop out factor for different forecasting scenarios. Forecasting scenario

Long Term Midterm-Summer Midterm-Monsoon Midterm-Winter Short Term-Summer Short Term-Monsoon Short Term-Winter

Stacked LSTM 1

Stacked LSTM 2

Stacked LSTM 3

Error Index – MSE

Hidden Units

Dropout

Error Index – MSE

Hidden Units

Dropout

Error Index – MSE

Hidden Units

Dropout

0.0031 0.0104 0.0078 0.0081 0.0182 0.0119 0.0122

30 20 20 25 15 30 20

0.1 0.2 0.2 0.1 0.1 0.1 0.1

0.0031 0.0124 0.0124 0.0096 0.0166 0.0135 0.0125

20 20 20 30 25 30 25

0.1 0.1 0.1 0.1 0.1 0.1 0.1

0.0033 0.0129 0.0129 0.0095 0.0160 0.0161 0.0123

30 25 25 30 25 15 30

0.1 0.1 0.1 0.1 0.1 0.1 0.2

Table 13 Performance evaluation of different forecast models for long term forecasting. Forecast models

MAE

MAPE

MSE

RMSE

Theil U1

Theil U2

ARIMA DBN Regression SV Regression GA-LSTM PSO-LSTM SCA-LSTM ISCOA-LSTM

0.3763 0.3210 0.1410 0.3222 0.4203 0.4451 0.0369

27.8963 10.3215 13.1070 5.3272 4.1150 4.3221 3.3159

0.1724 0.2312 0.0249 0.0231 0.0100 0.2230 0.0031

0.4152 0.4806 0.1579 0.1516 0.1100 0.4690 0.0559

0.4112 0.4759 0.1563 0.1501 0.0990 0.4645 0.0553

0.2153 0.2535 0.0786 0.0755 0.0496 0.2466 0.0276

Table 17 Performance evaluation of different forecast models for short term forecasting (summer).

Table 14 Performance evaluation of different forecast models for mid forecasting (summer). Forecast models

MAE

MAPE

MSE

RMSE

Theil U1

Theil U2

ARIMA DBN Regression SV Regression GA-LSTM PSO-LSTM SCA-LSTM ISCOA-LSTM

0.4695 0.3181 0.1470 0.4502 0.4274 0.2824 0.0659

29.8152 19.2154 11.1107 5.2108 5.1124 5.7858 4.9194

0.2634 0.3101 0.0327 0.3242 0.3123 0.1377 0.0100

0.5133 0.5567 0.1809 0.5656 0.5567 0.3605 0.1000

0.4450 0.4827 0.1568 0.4904 0.4827 0.3126 0.0867

0.2352 0.2580 0.0789 0.2627 0.2580 0.1604 0.0434

MAE

MAPE

MSE

RMSE

Theil U1

Theil U2

ARIMA DBN Regression SV Regression GA-LSTM PSO-LSTM SCA-LSTM ISCOA-LSTM

0.3440 0.3961 0.1174 0.3275 0.4121 0.3705 0.0577

28.6598 12.5414 9.8208 15.3223 17.4572 19.3264 4.3102

0.1813 0.2554 0.0208 0.1321 0.1725 0.2414 0.0073

0.4258 0.5476 0.1444 0.3633 0.4147 0.4909 0.0855

0.3662 0.4301 0.1242 0.3125 0.3567 0.4222 0.0735

0.1899 0.2263 0.0623 0.1603 0.1845 0.2217 0.0368

MAE

MAPE

MSE

RMSE

Theil U1

Theil U2

ARIMA DBN Regression SV Regression GA-LSTM PSO-LSTM SCA-LSTM ISCOA-LSTM

0.3483 0.4852 0.0820 0.0815 0.0753 0.0842 0.0543

25.1728 9.8709 6.2600 5.8321 5.9735 6.2148 4.2842

0.1499 0.2541 0.0131 0.0310 0.0212 0.0279 0.0077

0.3872 0.5039 0.1145 0.1760 0.1449 0.1643 0.0879

0.3559 0.4632 0.1053 0.1618 0.1331 0.1510 0.0808

0.1841 0.2459 0.0528 0.0814 0.0669 0.0759 0.0405

MAPE

MSE

RMSE

Theil U1

Theil U2

ARIMA DBN Regression SV Regression GA-LSTM PSO-LSTM SCA-LSTM ISCOA-LSTM

0.3479 0.2764 0.0949 0.1804 0.0935 0.1287 0.0819

21.3333 17.3124 5.8113 5.9745 5.7213 5.1185 4.9688

0.1661 0.2125 0.0189 0.0432 0.0536 0.0319 0.0135

0.4076 0.4582 0.1375 0.2073 0.2302 0.1760 0.1164

0.3236 0.3638 0.1091 0.1646 0.1827 0.1397 0.0924

0.1663 0.1884 0.0547 0.0828 0.0921 0.0702 0.0463

Forecast models

MAE

MAPE

MSE

RMSE

Theil U1

Theil U2

ARIMA DBN Regression SV Regression GA-LSTM PSO-LSTM SCA-LSTM ISCOA-LSTM

0.2459 0.7256 0.0767 0.0831 0.0811 0.0851 0.0610

22.4803 25.2104 6.7231 7.3169 7.6724 7.1123 5.2961

0.0898 3.4156 0.0109 0.0522 0.0224 0.0416 0.0074

0.2997 1.8466 0.1045 0.2280 0.1483 0.2024 0.0861

0.2863 1.7639 0.0998 0.2178 0.1416 0.1934 0.0822

0.1462 0.8954 0.0500 0.1102 0.0712 0.0976 0.0412

Table 19 Performance evaluation of different forecast models for short term forecasting (monsoon).

Table 16 Performance evaluation of different forecast models for midterm forecasting (monsoon). Forecast models

MAE

Table 18 Performance evaluation of different forecast models for short term forecasting (winter).

Table 15 Performance evaluation of different forecast models for midterm forecasting (winter). Forecast models

Forecast models

Forecast models

MAE

MAPE

MSE

RMSE

Theil U1

Theil U2

ARIMA DBN Regression SV Regression GA-LSTM PSO-LSTM SCA-LSTM ISCOA-LSTM

0.3052 0.8218 0.0827 0.0892 0.0828 0.0952 0.0733

22.9251 27.4421 5.7942 7.5262 7.7247 7.1012 5.1882

0.1386 2.5125 0.0143 0.0321 0.0522 0.0216 0.0115

0.3723 1.5842 0.1196 0.1788 0.2280 0.1449 0.1076

0.3215 1.3681 0.1033 0.1544 0.1969 0.1251 0.0929

0.1652 0.6985 0.0518 0.0777 0.0994 0.0628 0.0465

models for different forecasting scenarios in terms of various quality metrics. Further, it can be noted that DBN regression performs the worst due to over-fitting issues. To further study the performance of ISCOA-LSTM, six evaluation metrics namely PMAE , PMAPE , PMSE , PRMSE , PTHEIL U1, and PTHEIL U 2 were used to demonstrate the improvements of ISCOA-LSTM over the contrast forecasting models for the considered forecasting scenarios (Fig. 19(a)–(f)). The experimental analysis and the obtained results state that ISCOALSTM based energy consumption forecasting model has significant improvements over the contrast forecasting models in terms of various quality metrics. For long term forecasting, the MAE, MAPE, MSE, RMSE, THEIL-U1, and THEIL-U2 of ISCOA-LSTM was decreased by 16

Applied Energy 261 (2020) 114131

N. Somu, et al.

Fig. 19. Improvement of the proposed model over contrast model for different forecasting scenarios (a) PMAE , (b) PMAPE , (c) PMSE , (d) PRMSE , (e) PTHEIL PTHEIL U 2 .

88.4%, 67.8%, 98.6%, 88.3%, 88.3% and 89% when compared with the worst-performing DBN regression and decreased by 73.7%, 74.7%, 87.4%, 64.5%, 64.6% and 64.8% when compared with the best performing support vector regression respectively. Similar improvements can be observed for midterm forecasting and short term forecasting scenarios.

U1 ,

(f)

live energy consumption data obtained from an academic building in Indian Institute of Technology, Bombay to forecast short term, midterm, and long term energy consumption was presented. During the experimentations, the impact on the number of hidden layers (Stacked LSTMs) and drop out factor was analyzed for long term, midterm, and short term energy demand forecasting. Further, the performance of ISCOA-LSTM was validated over the state-of-the-art energy consumption forecasting models in terms of MAE, MAPE, MSE, RMSE, THEIL statistics, PMAE , PMAPE , PMSE , PRMSE , PTHEIL U1, and PTHEIL U 2 . The experimental validations reveal the significance of ISCOA-LSTM in providing accurate and reliable energy demand predictions for efficient energy planning, management, and conservation. The possible limitations of ISCOA-LSTM, which presents a future directive to this research are highlighted as follows: (i) The analysis on the impact of the attributes (power and climatic-related) on the power consumption value has not been carried out, (ii) At real time, the preprocessing techniques related to the noise, glitches, and aging factor of sensor components in the real time data have not been studied, and (iii)

5. Conclusions This research presents eDemand, an energy consumption forecasting model that uses improved sine cosine optimization algorithm and long short term memory networks for accurate building energy consumption forecasting. ISCOA-LSTM employs Haar wavelet based mutation operator for position update to minimize the trade-off between exploration and exploitation of the search space thereby preventing premature convergence. The improved version of SCOA was used to find the optimal values for the hyperparameters (learning rate, weight decay, momentum and number of hidden layers) of LSTM. A case study on the 17

Applied Energy 261 (2020) 114131

N. Somu, et al.

The time slot after which hyperparameter retuning needs to be carried out with respect to the real time data characteristics have not been worked out.

Acknowledgements The authors would like to thank The Ministry of Power, Department of Science and Technology, Government of India – Impacting Research Innovation and Technology (IMPRINT - 16MOPIMP002), New Delhi, India, Prof. Kannan Krithivasan, Dean, School of Education, SASTRA Deemed University, Tamil Nadu, India (TATA Realty—SASTRA Srinivasa Ramanujan Research Cell, India) and SEIL members.

Declaration of Competing Interest The authors declared that there is no conflict of interest. Appendix Abbreviations/Acronyms/Notations

Description

Introduction MSE RMSE MAE MAPE ANN FFNN RNN ARMA LSTM EA ISCOA-LSTM LTF MTF STF

Mean square error Root mean square Mean absolute error Mean absolute percentage error Artificial neural network Feed forward neural networks Recurrent neural networks Autoregressive moving average model Long short term memory Evolutionary algorithms Improved sine cosine optimization algorithm based LSTM Long term forecasting Midterm forecasting Short term forecasting

Long Short Term Memory xt ht yt

Input value of the current time stamp Hidden state value of the current time stamp Input value of the current time stamp

W xh, Whh, and W yh

Weight matrix of the input-hidden layer, hidden-hidden layer, hidden-output layer respectively

b h and b y it gt

Input gate Update gate

ft ot Wix , Wcx , Wfx and Wox W fh, Wch, Wih and Woh Why bn

Sine Cosine Optimization Algorithm

Xit

Pit r 1, r 2, r3 and r4

t T X

Hidden and output bias

Forget gate

Output gate Input weight matrices Recurrent weight matrices Hidden output weight matrix Bias (n

{i, c, f , h})

Current candidate’s position in the ith dimension at the t th iteration Best candidate’s position in the ith dimension at the t th iteration Random variables Constant Current iteration Total number of iterations Best candidate

ISCOA-LSTM: The Proposed Data Driven Approach for Energy Consumption Forecasting Historical power consumption dataset DPC = x (t ); t = {1, 2, , n} DTrain , DEval and DTest Training, evaluation and testing dataset Optimal learning rate, weight decay, momentum rate and number of hidden units , , and NPoP Number of population/Population size Maximum number of generation TMax Fitness, best fitness and global best fitness respectively Fit , BestFit and GBestFit Mt Mutation probability Position of the global best population GBestPoP PoPi Population vector PoPMin and PoPMax Minimum and maximum value of the hyperparameter Randomly generated population Pv Haar wavelet function Dataset Description BEMS AMI MQTT

Building energy management system Advanced metering infrastructure Message queuing telemetry transport

Data pre-processing S fMin and fMax

Total number of samples in the dataset Minimum and maximum value of the sample in the dataset

Performance Metrics

18

Applied Energy 261 (2020) 114131

N. Somu, et al. ARIMA DBN regression SV regression GA-LSTM PSO-LSTM SCOA-LSTM yi yi N

Auto regressive integrated moving average Deep belief network Support vector regression Genetic algorithm-LSTM Particle swarm optimization-LSTM Sine cosine optimization algorithm-LSTM Forecasted energy consumption at time stamp i Actual energy consumption at time stamp i Total number of data points in the dataset

[23] Wu Z, Zhao X, Ma Y, Zhao X. A hybrid model based on modified multi-objective cuckoo search algorithm for short-term load forecasting. Appl Energy 2019;237:896–909. https://doi.org/10.1016/j.apenergy.2019.01.046. [24] Liang Y, Niu D, Hong WC. Short term load forecasting based on feature extraction and improved general regression neural network model. Energy 2019;166:653–63. https://doi.org/10.1016/j.energy.2018.10.119. [25] Yang A, Li W, Yang X. Short-term electricity load forecasting based on feature selection and Least Squares Support Vector Machines. Knowledge-Based Syst 2019;163:159–73. https://doi.org/10.1016/j.knosys.2018.08.027. [26] Yang Y, Che J, Deng C, Li L. Sequential grid approach based support vector regression for short-term electric load forecasting. Appl Energy 2019;238:1010–21. https://doi.org/10.1016/j.apenergy.2019.01.127. [27] Fan C, Sun Y, Zhao Y, Song M, Wang J. Deep learning-based feature engineering methods for improved building energy prediction. Appl Energy 2019;240:35–45. https://doi.org/10.1016/j.apenergy.2019.02.052. [28] Seyedzadeh S, Rahimian FP, Rastogi P, Glesk I. Tuning machine learning models for prediction of building energy loads. Sustain Cities Soc 2019;47:101484https://doi. org/10.1016/j.scs.2019.101484. [29] Kim TY, Cho SB. Predicting residential energy consumption using CNN-LSTM neural networks. Energy 2019;182:72–81. https://doi.org/10.1016/j.energy.2019.05.230. [30] Gauthama Raman MR, Nivethitha S, Kirthivasan K, Shankar Sriram VS. A hypergraph and arithmetic residue-based probabilistic neural network for classification in intrusion detection systems. Neural Networks 2017;92:89–97. [31] Somu N, M.R. Gauthama Raman, Kalpana V, Kirthivasan K, V.S. SS. An improved robust heteroscedastic probabilistic neural network based trust prediction approach for cloud service selection. Neural Networks 2018;108:339–54. https://doi.org/10. 1016/J.NEUNET.2018.08.005. [32] Chitsaz H, Shaker H, Zareipour H, Wood D, Amjady N. Short-term electricity load forecasting of buildings in microgrids. Energy Build 2015;99:50–60. https://doi. org/10.1016/J.ENBUILD.2015.04.011. [33] Srivastava S, Lessmann S. A comparative study of LSTM neural networks in forecasting day-ahead global horizontal irradiance with satellite data. Sol Energy 2018;162:232–47. https://doi.org/10.1016/j.solener.2018.01.005. [34] Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput 1997;9:1735–80. https://doi.org/10.1162/neco.1997.9.8.1735. [35] Peng L, Liu S, Liu R, Wang L. Effective long short-term memory with differential evolution algorithm for electricity price prediction. Energy 2018. https://doi.org/ 10.1016/j.energy.2018.05.052. [36] Zhao H, Magoulès F. A review on the prediction of building energy consumption. Renew. Sustain. Energy Rev. 2012;16:3586–92. https://doi.org/10.1016/j.rser. 2012.02.049. [37] Mirjalili S. SCA: A Sine Cosine Algorithm for solving optimization problems. Knowledge-Based Syst 2016;96:120–33. https://doi.org/10.1016/j.knosys.2015. 12.022. [38] Abd Elaziz M, Oliva D, Xiong S. An improved Opposition-Based Sine Cosine Algorithm for global optimization. Expert Syst Appl 2017;90:484–500. https://doi. org/10.1016/j.eswa.2017.07.043. [39] Li S, Fang H, Liu X. Parameter optimization of support vector regression based on sine cosine algorithm. Expert Syst Appl 2018;91:63–77. https://doi.org/10.1016/j. eswa.2017.08.038. [40] Long W, Wu T, Liang X, Xu S. Solving high-dimensional global optimization problems using an improved sine cosine algorithm. Expert Syst Appl 2019;123:108–26. [41] Gauthama Raman MR, Somu N, Jagarapu S, Manghnani T, Selvam T, Krithivasan K, et al. An efficient intrusion detection technique based on support vector machine and improved binary gravitational search algorithm. Artif Intell Rev 2019:1–32. [42] Somu N, Gauthama Raman MR, Kaveri A, Rahul K A, Krithivasan K, Shankar Sriram VS. IBGSS: An Improved Binary Gravitational Search Algorithm based search strategy for QoS and ranking prediction in cloud environments, Appl Soft Comput, In press. [43] Wielgosz M, Skoczeń A, Mertik M. Using LSTM recurrent neural networks for monitoring the LHC superconducting magnets. Nucl Instruments Methods Phys Res Sect A Accel Spectrometers, Detect Assoc Equip 2017;867:40–50. https://doi.org/ 10.1016/j.nima.2017.06.020. [44] Issa M, Hassanien AE, Oliva D, Helmi A, Ziedan I, Alzohairy A. ASCA-PSO: Adaptive sine cosine optimization algorithm integrated with particle swarm for pairwise local sequence alignment. Expert Syst Appl 2018;99:56–70. https://doi.org/10.1016/j. eswa.2018.01.019. [45] Gauthama Raman MR, Somu N, Liscano R, Krithivasan K, Shankar Sriram VS. An efficient intrusion detection system based on hypergraph-Genetic algorithm for

References [1] Shi G, Liu D, Wei Q. Energy consumption prediction of office buildings based on echo state networks. Neurocomputing 2016;216:478–88. https://doi.org/10.1016/ j.neucom.2016.08.004. [2] Wang Z, Wang Y, Zeng R, Srinivasan RS, Ahrentzen S. Random Forest based hourly building energy prediction. Energy Build 2018;171:11–25. https://doi.org/10. 1016/j.enbuild.2018.04.008. [3] Spandagos C, Ng TL. Equivalent full-load hours for assessing climate change impact on building cooling and heating energy consumption in large Asian cities. Appl Energy 2017;189:352–68. https://doi.org/10.1016/J.APENERGY.2016.12.039. [4] U.S. Energy Information Administration. International Energy Outlook 2017; 2017. [5] Yang Y, Liu G, Zhou J, Feng Z, He F. A hybrid short-term load forecasting model based on variational mode decomposition and long short-term memory networks considering relevant factors with Bayesian optimization algorithm. Appl Energy 2019;237:103–16. https://doi.org/10.1016/j.apenergy.2019.01.055. [6] Zhong H, Wang J, Jia H, Mu Y, Lv S. Vector field-based support vector regression for building energy consumption prediction. Appl Energy 2019;242:403–14. https:// doi.org/10.1016/j.apenergy.2019.03.078. [7] Tian C, Li C, Zhang G, Lv Y. Data driven parallel prediction of building energy consumption using generative adversarial nets. Energy Build 2019;186:230–43. https://doi.org/10.1016/j.enbuild.2019.01.034. [8] Wei Y, Zhang X, Shi Y, Xia L, Pan S, Wu J, et al. A review of data-driven approaches for prediction and classification of building energy consumption. Renew Sustain Energy Rev 2018;82:1027–47. https://doi.org/10.1016/j.rser.2017.09.108. [9] Amasyali K, El-gohary NM. A review of data-driven building energy consumption prediction studies. Renew Sustain Energy Rev 2018;81:1192–205. https://doi.org/ 10.1016/j.rser.2017.04.095. [10] Fumo N, Rafe Biswas MA. Regression analysis for prediction of residential energy consumption. Renew Sustain Energy Rev 2015;47:332–43. https://doi.org/10. 1016/j.rser.2015.03.035. [11] Ruiz L, Cuéllar M, Calvo-Flores M, Jiménez M. An application of non-linear autoregressive neural networks to predict energy consumption in public buildings. Energies 2016;9:684. https://doi.org/10.3390/en9090684. [12] Paudel S, Elmitri M, Couturier S, Nguyen PH, Kamphuis R, Lacarrière B, et al. A relevant data selection method for energy consumption prediction of low energy building based on support vector machine. Energy Build 2017;138:240–56. https:// doi.org/10.1016/J.ENBUILD.2016.11.009. [13] Wang Z, Wang Y, Srinivasan RS. A novel ensemble learning approach to support building energy use prediction. Energy Build 2018;159:109–22. https://doi.org/10. 1016/J.ENBUILD.2017.10.085. [14] Ahmad T, Chen H, Huang R, Yabin G, Wang J, Shair J, et al. Supervised based machine learning models for short, medium and long-term energy prediction in distinct building environment. Energy 2018;158:17–32. https://doi.org/10.1016/J. ENERGY.2018.05.169. [15] Muralitharan K, Sakthivel R, Vishnuvarthan R. Neural network based optimization approach for energy demand prediction in smart grid. Neurocomputing 2018;273:199–208. https://doi.org/10.1016/J.NEUCOM.2017.08.017. [16] Li K, Xie X, Xue W, Dai X, Chen X, Yang X. A hybrid teaching-learning artificial neural network for building electrical energy consumption prediction. Energy Build 2018;174:323–34. https://doi.org/10.1016/j.enbuild.2018.06.017. [17] Ruiz LGB, Rueda R, Cuéllar MP, Pegalajar MC. Energy consumption forecasting based on Elman neural networks with evolutive optimization. Expert Syst Appl 2018;92:380–9. https://doi.org/10.1016/j.eswa.2017.09.059. [18] Xiao J, Li Y, Xie L, Liu D, Huang J. A hybrid model based on selective ensemble for energy consumption forecasting in China. Energy 2018;159:534–46. https://doi. org/10.1016/j.energy.2018.06.161. [19] Mohan N, Soman KP, Sachin Kumar S. A data-driven strategy for short-term electric load forecasting using dynamic mode decomposition model. Appl Energy 2018;232:229–44. https://doi.org/10.1016/j.apenergy.2018.09.190. [20] Barman M, Dev Choudhury NB, Sutradhar S. A regional hybrid GOA-SVM model based on similar day approach for short-term load forecasting in Assam, India. Energy 2018;145:710–20. https://doi.org/10.1016/j.energy.2017.12.156. [21] Jinliang Zhang JZ, Wei Yi-Ming, Li Dezhi, Tan Zhongfu. Short term electricity load forecasting using a hybrid model. Energy 2018;158:774–81. [22] Fan C, Wang J, Gang W, Li S. Assessment of deep recurrent neural network-based strategies for short-term building energy predictions. Appl Energy 2019;236:700–10. https://doi.org/10.1016/j.apenergy.2018.12.004.

19

Applied Energy 261 (2020) 114131

N. Somu, et al. parameter optimization and feature selection in support vector machine. Knowl Based Syst 2017;134:1–12. [46] Somu N, Gauthama Raman MR, Krithivasan K, Shankar Sriram VS. A trust centric optimal service ranking approach for cloud service selection. Future Gener Comput Syst 2018;86:234–52. [47] Somu N, Gauthama Raman MR, Obulaporam G, Krithivasan K, Shankar Sriram VS. An improved rough set approach for optimal trust measure parameter selection in cloud environments. Soft Comput 2019;23(22):11979–99.

[48] Smart Energy Informatics Laboratory – Indian Institute of Technology Bombay, Academic Building dataset – 2018 release, (n.d.). < https://seil.cse.iitb.ac.in/ datasets/ > . [49] Climatic Data - Ankur Puranik Wadala East Antophill Warehousing Complex, Mumbai, (n.d.). < https://www.wunderground.com/hourly/in/mumbai?cm_ven= localwx_hour > [accessed February 13, 2019]. [50] Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikitlearn: machine learning in Python. J Mach Learn Res 2011;12:2825–30.

20