Lightweight sustainable intelligent load forecasting platform for smart grid applications

Lightweight sustainable intelligent load forecasting platform for smart grid applications

Journal Pre-proof Lightweight Sustainable Intelligent Load Forecasting Platform for Smart Grid Applications Amartya Mukherjee, Prateeti Mukherjee, Nil...

6MB Sizes 2 Downloads 80 Views

Journal Pre-proof Lightweight Sustainable Intelligent Load Forecasting Platform for Smart Grid Applications Amartya Mukherjee, Prateeti Mukherjee, Nilanjan Dey, Debashis De, B.K Panigrahi

PII:

S2210-5379(18)30342-1

DOI:

https://doi.org/10.1016/j.suscom.2019.100356

Reference:

SUSCOM 100356

To appear in:

Sustainable Computing: Informatics and Systems

Received Date:

30 August 2018

Revised Date:

4 March 2019

Accepted Date:

8 November 2019

Please cite this article as: Mukherjee A, Mukherjee P, Dey N, De D, Panigrahi BK, Lightweight Sustainable Intelligent Load Forecasting Platform for Smart Grid Applications, Sustainable Computing: Informatics and Systems (2019), doi: https://doi.org/10.1016/j.suscom.2019.100356

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. © 2019 Published by Elsevier.

Lightweight Sustainable Intelligent Load Forecasting Platform for Smart Grid Applications

Amartya Mukherjee Dept. of Computer Science &Engineering and BSH, Institute of Engineering & Management, Salt Lake, Kolkata, India, [email protected]

Prateeti Mukherjee Dept. of Computer Science & Engineering,Institute of Engineering & Management, Salt Lake, Kolkata, India, [email protected]

Nilanjan Dey

ro of

Department of Information Technology, Techno India College of Technology, Kolkata, India, [email protected]

Debashis De

Department of Computer Science & Engineering, Maulana Abul Kalam Azad University of Technology, West Bengal, India, [email protected]

B. K Panigrahi

-p

Dept. of Electrical Engineering,Indian Institute of Technology, Delhi, India, [email protected]

re

Highlights

Jo

ur

na

lP

 This work gives a direction towards the implementation of the sustainable, miniature, low cost and lightweight load forecasting embedded platform.  The work finds out the most suitable machine learning model for lightweight Raspberry Pi device.  The result produces by models with minimum RMSE score of 0.09012, minimum execution time of 27.85 Sec and minimum system resource utilization.  Prototyping platform in the work performs comparative study of the performance of the machine learning algorithm.  The work gives a future direction towards the intelligent embedded system for load forecasting.

Abstract

With the global electricity demand witnessing a 3.1 percent jump in 2017, there is an increasing need for incorporating intermittent renewable energy sources and other alternative supply/demand management strategies into the supply grid networks. Short-term load forecasting models enable prediction of future power consumption, thereby encouraging shifting of loads and optimizing the use of stochastic power sources and stored energy. To make the electric grid system smart and sustainable, two-way communication between the utility and consumers must be set up and the

ro of

working equipment must respond digitally to the quickly changing electric demand. The proposed work exploits the power of embedded systems to design a low-cost solution for interconnecting electrical and electronic devices, controlled by the intelligent Internet of Things (IoT) paradigms. This work primarily focuses on implementing standard regression and machine learning-based architectures for smart grid load analysis and forecasting. A state of the art ecosystem for a portable load forecasting device is proposed by means of low-cost, open-source hardware that is experimentally found to be functioning with a high degree of accuracy. Further, the performance of the classical and advanced machine learning models, emulated on the device, are analyzed on the basis of various parameters, including error percentage, execution time, CPU core temperatures, and resource utilization. Overall impressive performance is demonstrated by some specific machine learning models which are considered to be suitable for the proposed framework.

Keyword: Smart Grid, SVR, kNN, RBF, Logistic Regression, Random Forest.

-p

1. Introduction

Jo

ur

na

lP

re

In the past few years, the number of smart devices has increased drastically, and by 2019 the number is expected to exceed 26.66 billion. The primary reason behind such numbers is the significant development of the Internet of Things (IoT) infrastructures. The IoT technology forms the backbone of smart network systems, comprising of different interconnected devices – ranging from smartphones to home appliances, and even vehicles. The transformation of the traditional power grid to an intelligent, bidirectional, self-healing system is another significant change in the present era. Incorporation of such advanced technology strengthens the chances of integrating environment-friendly measures such as green energy solutions and diminished CO2 emissions in the current infrastructure while maximizing profits-both financial and environmental. In the early 2000s, most energy meters were analog in nature. The chances of inaccuracy, theft and operational loss were, therefore, huge. The system was not intelligent, so it could not identify the fault. The introduction of smart meters moved the technology a step ahead. These meters are connected wirelessly between the centralized power station and the consumer, thereby enabling two-way communication between the meter and the user, as well as the grid. In this context, the smart grid is the most promising technology that is established over the IoT paradigm. It is a modernized electrical power system facilitating the bidirectional flow of electricity and information. The system is a spontaneously operating machine that performs energy delivery intelligently and continuously. The users can directly access grid information online. It is interesting to note that any traditional grid can be transformed into a smart grid by introducing certain proactive features. The smart grid is often visualized as a network of grids, with enough intelligence and communication capabilities that ensure balanced, reliable and secure power delivery to the consumer. The numerous functionalities of the smart grid networks have been addressed in this context. In the field of smart grid security, a Diffie-Hallman based authentication scheme through the implementation of AES and RSA key generation technique was proposed in [6]. Another proposal [7] was for a static, centralized building energy control scheme, where an IoT-based testbed was developed to realize the adherence of the inventive study for better energy efficiency, obtained by ratifying the intelligence research. In

addition to smart grid systems, IoT technology has played a significant role in developing efficient two-way communication of the huge chunks of grid data deployed for intelligent decision making.

lP

re

-p

ro of

The high demand for electricity throughout the world makes the development of sustainable optimization techniques an absolute necessity. This, however, is a big challenge. As a result, several research opportunities on the smart grid and microgrid systems have emerged in their relevant domains. The research focused on renewable energy sources has also come out on account of its popularity, not only in rural areas but in the urban areas as well. The hybrid smart grid technology accumulates the power of different conventional and non-conventional energy sources, thereby producing a continuous supply of non-interrupted power. In order to manage operations smoothly, a sophisticated grid management system is required. Various load management mechanisms have been developed to achieve this goal. A complete system, comprising of efficient sensing devices, smart meters,and an accurate prediction mechanism gives direction to smart and sustainable load management, eventually providing the ability to analyze the possible influence of parameters on a large scale. The demand side of load management is another key challenge, considering the trade of energy between the consumer and the provider. Application of intelligent mechanisms facilitates cutting down of peak load values and redesigning the load profile for uniform supply of electricity. The effective management of the demand side requires accurate, indepth knowledge of energy load patterns. Knowledge of future load patterns can help make smart scientific decisions and leverage smart policies.It is judicious to implement a smart grid management system that deploy adequate knowledge of the problem at hand, and drives the system by simultaneously controlling, monitoring and supplying the required energy to the society. Energy demand patterns are highly influenced by several key factors, such as climate change, working and non-working days, the community, the geographical location and economic condition of the residents. Therefore, effective analysis and forecasting of the load is a necessity.

Jo

ur

na

Practically, there are numerous sensing devices that are deployed to grab the energy profile along with relevant environmental data. That huge chunk of data must be stored in a large database. The information obtained is then analyzed by using a heavyweight computing platform. In most cases, the smart systems pay enormous amounts of money for data analysis required by the high-end computing systems and related services. This can be an additional burden for the consumer who, in turn, is highly charged for those services. Further, in rural areas, where the demand is often unpredictable, the necessity of load forecasting mechanisms increases. However, due to the huge infrastructure cost of the device, it is difficult to deploy such a smart mechanism in a scalable and sustainable manner. Therefore, there is a pressing need for a sophisticated platform that can perform the load forecasting and predictions at a micro level. This device is an alternative solution to the smart meter and consists of a sufficiently optimized prediction algorithm embedded in an inexpensive, yet powerful hardware platform. The next part of the presented work is organized as follows: Section 2 reveals the related research, section 3 presents the main motivation of the work, section 4 will illustrate the underlying mathematical models for the algorithms we are working with. Section 5 describes the emulation platform, section 6 illustrates the result analysis and finally, section 7 draws the conclusion.

2. Related Research

Jo

ur

na

lP

re

-p

ro of

Several research studies in this field have been reported. In [1], a classical regression model has been applied on a historical energy data set, with the prediction recorded thereafter. An Artificial Neural Network based demand forecasting mechanism has also been reported [2]. The work constitutes a simulation approach that mimics multi-layered feedforward networks, and highly efficient system performance has been claimed. Reka et al.[3] have reported an extensive survey on IoT enabled smart grid techniques for efficient energy delivery. Here, several IoT oriented energy management services have been addressed depending upon certain parameters, including security, energy efficiency, sustainability, and functionality. Mortajiet al.[4] proposed the ARIMA (AutoRegressive Integrated Moving Average) model for time series load data. A prediction model-based smart IoT system is proposed in this work that can be controlled directly, with the aim of reducing the Peak to Average Ratio (PAR). Alberg et al. [5] proposed a sliding window based ARIMA algorithm to predict seasonal and non-seasonal load. Ouyang et al. [8] addressed a deep learning framework for short term load forecasting. A data-driven framework with copula model-based deep belief network has been addressed in this case. The Support Vector Regression (SVR), Deep Neural Networks (DNN) and Extreme Learning Machine (ELM) concepts have been compared as well. Chen et al.[9] proposed a short term load forecasting model using deep residual networks to improve the forecast results. Here, a two-stage ensemble strategy was addressed to improve the efficiency of the system. Ahmed et al. proposed the AFC-STLF model [10] which is a combination of a feature selector, a forecaster,and an optimizer. The time series data is fed into the feature selector. Once the redundancy has been removed from the dataset, the final data is served as input to a Neural Network that forecasts the electricity demands for the following day. Fallah et al.[10] gives a direction on several challenges and issues in load forecasting systems. Different hybrid CI oriented load forecasting mechanisms have been addressed in this case. An IoT based load forecasting methodology is proposed by Li et al. [12]. A two-step forecasting methodology has been addressed which results in more precise forecasts than that obtained through conventional methods. Maaji et al. proposed a voltage stability model through a classical machine learning approach [13]. In this case, k-Nearest Neighbor and Naive Bayes Classifiers are observed to have achieved the best results. Rafiei et al. [14] proposed a novel hybrid method of probabilistic load forecasting with a Generalized Extreme Learning Machine (GELM). Wavelet transformation concepts have been applied in their proposed system to divide electricity demand into a sequence of well-behaved productive subseries. A Particle Swarm Optimization (PSO) based technique for short-term load forecasting has been proposed by Zeng et al. [15]. The technique is known as DPSO-ELM. The model is claimed to have produced better results by avoiding unnecessary hidden node and over training problems.In the next section,the motivation of the work has been elaborated.

3. Motivation The global economy is a transient statistical measure, depending on a variety of factors including industrial growth, employment statistics, international transport, energy and exchange of goods and services. Electricity and energy determine a major section of a nation’s growth and development index, as proved through extensive research in multiple reports [25,26]. Understanding the

relationship between electricity production, demand, and consumption, and societal development is crucial for the growth of a Nation. While yearly electricity consumption as of 2003 is recorded at 14,280,000 GWh, it is consumption at the household level that determines the growth of a community amidst the economic, social and cultural conditions prevalent in that specific geographical region. A typical household in the US consumes about 11,700 kWh each year, in France, the value reads around 6,400 kWh, the UK consumes 4,600 kWh and China, around 1,300 kWh. As evident from such statistics, the values vary enormously across Nations. Numerous reasons could drive such differences, including wealth, square footage of housing establishment, appliance standards, electricity prices prevailing in the area, and access to alternative sources of energy. Adequate power supply to households enables improved living standards, better public health, and elevated economic growth.

na

lP

re

-p

ro of

Developing Nations, however, face multiple challenges in planning power grid infrastructure needed to support the rapidly growing urban populations. According to [27], the electricity sector is faced with three major challenges. The first challenge relates to the investment strategies of a Nation, which must be high enough to keep up with the growing demand for global energy while keeping the final energy costs under control. The magnitude of this first hurdle should not be understated given the serious and ongoing uncertainty surrounding the global economy, fossil fuel prices, and future environmental regulations. The second challenge deals with the regulation of greenhouse gas (GHG) emissions. The energy sector here is on the front line, given two-thirds of global GHG emissions are carbon emissions linked to that sector. The third major challenge, prevalent particularly in emerging and developing countries, is massive urbanization. The trend is particularly visible in these nations since cities, even when they develop in a relatively disorganized fashion, offer a better chance of escaping poverty than rural areas. Urban growth rates have reached unprecedented levels and it is expected that by 2030 the urban population will likely double from 2 billion to 4 billion worldwide. With these numbers on the rise, energy optimization is crucial to managing social and environmental externalities of modern cities. For optimization techniques to succeed, it must go hand in hand with systematic, long-term planning of “sustainable cities”. To solve this particular problem is the ambition of this work.

Jo

ur

The primary motivation of this work may be credited to the increasing demand for the inexpensive open-source hardware platform functions as an easy-to-use device that could conveniently be connected to a standard computing environment with minimal power requirements. The device is capable of taking chunks of load data as input and performing feature extraction, followed by analysis and forecasting of future load requirements. Electricity load forecasting plays a key role in the operation of power systems. An accurate load forecasting system is crucial for optimal planning and energy management in individual households, communities and the nation at large. Often the concept of STLF is associated with smart buildings and power systems alone, where there is an abundance of powerful computing resources. Our aim, however, is to take the capabilities of loadforecasting models to the consumer at home, targeting individual households in a smart grid setting as opposed to large power stations and research labs with expensive equipment at their disposal. Hence, the cheap, pint-sized device with the intelligent model embedded in the hardware platform, would suffice as advanced, stand-alone load-forecasting equipment that could easily be installed in homes and be used on a monthly, weekly, or even daily basis to receive detailed graphs of future load consumption. The users may then decide to cut-down on energy use or look for alternate

energy sources for a certain duration when the demand is extremely high, and the cost of electricity is soaring. Next section elaborates the mathematical models of the regression strategies. 4. Mathematical Models of used Regression Methodology

-p

ro of

The fundamental target of this work can be visualized in two separate stages. First, we must identify the proper machine learning model that is suitable for the low power embedded platform in terms of prediction accuracy, computation time, CPU utilization, and barrel temperatures. Studying these parameters are important for our research, since a low power, low memory, and limited CPU based embedded device may suffer from very high execution time, lagging interface, and soaring temperatures when processing bulk data [21]. The procedure begins with the collection of sampling dataset to be served as input to the machine learning regressor models.The dataset that has been used in the emulation prototype is a practical time series data of a residential apartment in San Jose, with load values recorded between 1st November 2016 to 30th September 2017. The labels include a start time and end time of consumption, separated by a fifteen-minute interval. The usage in the interval has been measured in KiloWatts(kW)and the per unit cost of consumption is provided as well. The data has further been fragmented into training and test datasets. In the following subsections,we discuss the classical machine learning models analyzed in our work. 4.1 k-Nearest Neighbor Regressor Model

na

lP

re

The k-Nearest Neighbor (kNN) Regressor [18, 30] is a well-known classical non-parametric regression model. The elementary idea behind kNN is that if k nearest samples in the vicinity of the test sample belongs to a particular category in the feature space, then the test sample must belong to the same category. The problem statement is to estimate the response variable based on the value of one or more independent feature variables. We consider the tuple {x,y}, where x is an ordered set of attribute values such as { x1,x2...xd} and y is the predicted metric variable. Here, the attribute xd implies the value corresponding to a d-dimensional space. The input variable to the problem can be determined from the following equations [28]. i) a set of n Tuples called a training data set, (1)

ur

D = {(x1,y1), (x2,y2),...(xn,yn)}

ii) a validation tuple xt

Jo

The output is the estimated value for a given tuplext which can therefore be expressed as, yt= f(xt, D, ρ)

(2)

Where the ρ is the parameter that function f() takes through some learning method. The actual kNN algorithm works as follows. For a given tuple instance xt,yt = arg 𝑚𝑎𝑥𝑐∈{𝑐1,𝑐2,…𝑐𝑚) ∑𝑥𝑖 Є𝑁(𝑥𝑡,𝑘) 𝐸(𝑦𝑖 , 𝑐)(3) Here yt is a class that has been predicted form the input instance xt. Also,

𝐸(𝑎, 𝑏) = {

1 𝑖𝑓𝑎 = 𝑏 0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

(4)

Equation (2) can further be written as

y t=

{∑

𝑥𝑖∈𝑁(𝑥𝑡,𝑘) 𝐸(𝑦𝑖

, 𝑐1 ) , ∑𝑥𝑖∈𝑁(𝑥𝑡,𝑘) 𝐸(𝑦𝑖 , 𝑐2 ) … , ∑𝑥𝑖∈𝑁(𝑥𝑡,𝑘) 𝐸(𝑦𝑖 , 𝑐𝑚

}

(5)

or it can be further be modified to

y t=

{∑

𝑥𝑖∈𝑁(𝑥𝑡,𝑘) 𝐸(𝑦𝑖

, 𝑐1 )/𝑘 , ∑𝑥𝑖∈𝑁(𝑥𝑡,𝑘) 𝐸(𝑦𝑖 , 𝑐2 )/𝑘 … , ∑𝑥𝑖∈𝑁(𝑥𝑡,𝑘) 𝐸(𝑦𝑖 , 𝑐𝑚 )/𝑘

}

(6)

ro of

Therefore, we can say that 𝑝(𝑐𝑗 )(𝑥𝑡,𝑘) = ∑𝑥𝑖∈𝑁(𝑥𝑡,𝑘) 𝐸(𝑦𝑖 , 𝑐𝑗 )/𝑘

(7)

Here 𝑝(𝑐𝑗 )(𝑥𝑡,𝑘) is the probability of occurrence of jth class for the neighbor of the tuple instance xt. Finally, the equation can be generalized as follows.

4.2 Support Vector Regression Model

(8)

re

-p

yt = arg max { 𝑝(𝑐𝑗 )(𝑥𝑡,𝑘) , 𝑝(𝑐𝑗 )(𝑥𝑡,𝑘) , … , 𝑝(𝑐𝑗 )(𝑥𝑡,𝑘) }

na

lP

This section studies the mathematical architecture behind the Support Vector Regression model (SVR). Consider a set (xi,yi), where i = 1,2,…n, with the input xi given in the following form xi … Rn. and the output yi denoted by yi … R. The main objective of this theory is to derive a nonlinear mapping from the input space to the output space. The value must be mapped to a higher dimensional feature space. According to Jensen & shen, linear regression can be achieved by the following estimation function [17]. ω ЄF

(9)

ur

f(x) = [ω . φ(x)] +b φ : RmF

Jo

Here f(x) is the estimation function for regression, ω is weight vector and b is the threshold value. The nonlinear mapping has been expressed by φ(x)from the input space to the higher dimensional feature space. The function proximate problem here is equivalent to the minimization of the following problem. 1

1

2

𝑙

R = ||𝜔2 || + 𝑐 ∑𝑙𝑖=1 |𝑦𝑖 − 𝑓(𝑥𝑖 )|𝜀

(10)

Here ||𝜔2 ||is the weight vector. It is a constraint for the capacity of the model structure and can therefore be used for better generalization. C is a constant that keeps track of the tradeoff between empirical error and regular terms. The expression|𝑦𝑖 − 𝑓(𝑥𝑖 )|𝜀 improvises the Vapnik’s loss function with 𝜀 − sensitive zone. This is nothing but a measure of the empirical error and is denoted as follows

0 , 𝑖𝑓|𝑦𝑖 − 𝑓(𝑥𝑖 )| ≤ 𝜀 |𝑦𝑖 − 𝑓(𝑥𝑖 )|𝜀 = { |𝑦𝑖 − 𝑓(𝑥𝑖 )| − 𝜀 , otherwise

(11)

A positive slack variable 𝜀an𝜀 ∗ can further be considered to minimize the risk R. This can be applied to the objective function . Thus, the objective function becomes, R =

1 2

||𝜔2 || + 𝐶 ∑𝑙𝑖=1(𝜃𝑖∗ − 𝜃𝑖 ) , where

(12)

[ω . φ(x)] + b − |𝑦𝑖 − 𝑓(𝑥𝑖 )| ≤ 𝜀 − 𝜃𝑖∗ {|𝑦𝑖 − 𝑓(𝑥𝑖 )| − [ω . φ(x)] + b ≤ 𝜀 − 𝜃𝑖∗ 𝜃𝑖 , 𝜃𝑖∗ ≥ 0

(13)

ro of

Where 𝜃𝑖 is the upper training error and 𝜃𝑖∗ is the lower training error, subjected to 𝜀 − sensitive constrain[ω . φ(x)] + b ≤ 𝜀

-p

As the analysis of time series data is of paramount importance in this case, we have to keep track of the fundamental time-dependent observation on the data dt. Classification, forecasting and similarity measurement with a time series data can be considered as a purely numerical problem. To get better accuracy for the time series data, kernel tricking is a major tool. The kernel functions employed in this work are-linear, radial basis, and polynomial functions.

(14)

lP

𝑓(𝑋𝑇−1 , 𝑋𝑇−2 , … , 𝑋𝑇−𝑛 ) = ∑𝑛𝑡=1 ω𝑡 . 𝑋𝑡−1 + 𝑘

re

In case of linear kernel, the decision function is given by f(x) = ω .x + k.Thus, for time series the function is as follows:

Hence, the model performs as a statistical auto regression algorithm.

na

In Radial Basis kernel the similarity of two data points are judged by euclidean distance. This can be described by k γ (x,y) = exp(- γ || x – y|| 2 ). Specifically, we assume a linear function g and express the generated data by xT = g( xT-1, xT-2, … , xT-n) + η where η is termed as Gaussian noise.

Jo

ur

Polynomial kernel function addresses the similarities in the training sample in the feature space existing above the polynomial of the actual variable. This allows for a non-linear learning paradigm. A polynomial of degree d for this kernel can be defined as K(m,n) = (mTn + c)d . Here (m, n) is the order of the input space vector. This vector carries the features for training and validation sets. The value c≥ 0 influences higher to lower order terms. A special case of the quadratic kernel can be observed by the multinomial theorem (in this case, binomial theorem) as shown in the following equation, 2

K(m, n) = (∑ni=1 xi yi + c)

(15)

4.3 Random forest Regression Model Random forest regressor [16] fundamentally consists of a collection of the randomized regression

trees expressed by {rn(X,Θn,Ďn), m ≥1} where Θ1,Θ2 are randomized variable Θ’s output. Such random trees are a combination of the following regression estimate [29]. 𝑟 ′ 𝑛 (𝑋, Ď𝑛 ) = 𝐸Θ [𝑟𝑛 (𝑿, Θ, Ď𝑛 )]

(16)

where, EΘ denotes the expectation with respect to X, the random parameter, and the data set is denoted by Ďn..It must be noted that the variable Θ is an independent variable with respect to X and the data set Ďn. A tree in the random forest is built as follows: Each node is associated with a cell, so that, each step of the construction produces a partition of 𝑘

ro of

[0,1]d where the root is [0,1]d .This procedure repeats ⌈𝑙𝑜𝑔2 𝑛 ⌉times.

∑𝑛 𝑖=1 𝑙[𝑋𝑖∈𝐴𝑛(𝑋,𝛩)]

)𝑙𝛩(𝑋,𝛩)

(17)

Here the event Έ(X, Θ) is defined as (18)

lP

Έ(𝑿, 𝛩) = [ ∑𝑛𝑖=1 𝑙[𝑋𝑖∈𝐴𝑛(𝑋,𝛩)] ≠ 0]

re

∑𝑛 𝑖=1 𝑌𝑖 𝑙[𝑋𝑖∈𝐴𝑛(𝑋,𝛩)]

𝑟𝑛 (𝑋, 𝛩) = (

-p

At each node, a coordinate of X must be selected such that X = ( X(1) ,… , X(d)) with j denoting the feature for a probability of PnjЄ (0,1). Now, a divide and conquer approach is applied to ensure that each node is selected based on a particular coordinate. Splitting is done at the midpoint for each randomized tree rn (X, Θ).Overall, the output Yi for corresponding vector Xi can be obtained from the random partition as of X by the following expression:

Therefore, the exception with respect to Θ obtained for the random forest estimation and can be expressed as: ∑𝑛 𝑖=1 𝑌𝑖 𝑙[𝑋𝑖∈𝐴𝑛(𝑋,𝛩)]

na

𝑟′𝑛 (𝑋, 𝛩) = Ë𝛩 [𝑟𝑛 (𝑋, 𝛩)] = (

)𝑙𝛩(𝑋,𝛩)

(19)

ur

∑𝑛 𝑖=1 𝑙[𝑋𝑖∈𝐴𝑛(𝑋,𝛩)]

4.4 Logistic regression models

Jo

The logistic regression model [19,20], analyzes the relationship among multiple dependent variables with respect to a categorical dependent variable. The main goal is to find out the probability of the occurrence of a certain event by fitting data into a logistic curve. As we know, odd of an event can be computed as follows odd {Event} = P/(1-P)

(20)

Therefore, the fundamental idea of logistic regression involves finding out the probability of occurrence of an event over the negative probability of occurrence of that event. The term odds actually explain the impact of the independent variables. In this case, response variable p and an explanatory variable x forms the relationship: p=α+βx

(21)

With logistic regression concept, the natural log of the odds is defined as a linear function of the explanatory variable. Therefore, logit(y) = ln(odds) = ln(p/1-p) = α + βx

(22)

Here p is the probability of the interested outcomes and x is the explanatory variable. Taking antilog on both sides, we can get the prediction equation of the interested outcomes as follows. 𝑌

𝑒𝛼 + 𝛽 𝑥

𝑥

1−𝑒 𝛼 + 𝛽 𝑥

P(interested outcomes/specific value) ( ) =

=

1

(23)

1− 𝑒 −(𝛼 + 𝛽 𝑥)

Further, a complex logistic regression model can be made for multiple predictions by simply extending the properties of the simple logistic regression as follows:

Therefore, 𝑌

𝑒 𝛼 +𝛽 𝑥

𝑥

1−𝑒 𝛼 + 𝛽1 𝑥1 +⋯+𝛽𝑘 𝑥𝑘

P(interested outcomes/specific value) ( ) =

1

1− 𝑒 −(𝛼 + 𝛽1 𝑥1 +⋯+𝛽𝑘 𝑥𝑘 )

(25)

-p

5. Emulation Platform

=

(24)

ro of

logit(y) = ln(odds) = ln(p/1-p) = α + β1 x1 +... + βkxk

na

lP

re

Since our objective is to develop a sustainable miniature load forecasting device for smart grid domestic applications, we have chosen the Raspberry Pi ARM microprocessor platform. While running complex machine learning algorithms processing large datasets, our primary hardware concerns include temperature of the chip, speed of execution, and CPU utilization. The Model B is built around Broadcom’s 64-bit processor that incorporates power integrity optimizations and a heat spreader. This feature allows the single circuit board to reach higher clock-speeds and provides better control over the temperature of the chip. While training models such as Support Vector Regression on the training samples of the time series data, the board heats up to a certain degree, as depicted in Figure 9(d) in section6. To prevent overheating when working on real-time data, heat sink sets or cooling cases specific to the model may be used.

Jo

ur

The chipset, BCM2837, runs at 1.2GHz and can be over clocked to attain higher computation speeds on a lightweight device. The models mentioned in the paper have all been implemented at standard processor speeds, with their respective execution time and error analysis listed. The experimental data obtained from the models show that even while employing standard specifications of the hardware platform, the device gives reasonably efficient results. Further manipulations to the existing structure, such as pairing over clocking with heat sinks, is expected to increase the efficiency as well as sustainability of machine learning implementations from a hardware perspective. The system is running Ubuntu MATE 16.04Linux distribution for the platform. This robust device with a 64-bit Cortex Quad-core processor from Broadcom, along with its built-in Wi-Fi and Bluetooth connectivity, has proved to be a reliable low-cost platform for the application of the Internet of Things (IoT) technology. The system is implemented such that the dataset can be stored in the SD card module, either from a real-time source or from offline resources (the load profile data). The algorithm execution eco-system for the emulation platform with the related APIs is shown in Figure 1.

ro of

-p

Fig 1.Execution ecosystem for the emulation platform

na

5.1 Feature Engineering

lP

re

The three base layers, namely, the Hardware, the Kernel Driver and the Native OS layers are system specific and customizable to fit desired requirements. Over the Operating System, the Python-based sustainable machine learning module has been executed by virtue of the support of related APIs. Raw load data has been considered as input which is further split into training and test data. The predicted labels are compared with the actual test labels to determine the error score of the models. Multiple machine learning models are compared to decide the final model that best fits our requirements.

Jo

ur

The raw dataset consisted of various attributes such as Type, Date, Start_Time, End_Time, Usage, Units, Cost, and Notes. After loading the dataset into the Python pandas data frame, the feature engineering begins. The Start_Time and End_Time values of power consumption measurement are separated by a 15-minute interval. Having the Date and the Time values as separate features is of little use for a load forecasting model. Let the End_Time of consumption be n. Then, according to conventions, the Usage value for the corresponding row gives the electrical energy consumption at the n-th minute, as it is the End_Time that denotes when exactly the consumption is recorded. Thus, it is the End_Time and Date that is concatenated to create a new feature – Date_Time. This modified feature is very important for our load forecasting models and forms the variable against which the prediction graphs are plotted. It is often observed that load datasets follow certain patterns at specific intervals of time. This arises primarily due to the fact that a previous consumption value may affect future consumption, that is, the usage value for a previous time interval may affect the next. To get a reference to this influence, a few additional attributes are introduced to the data. These attributes follow an iterative process to shift usage values and keep a copy of the previous observation. Once the dataset with these new features is viewed, the pattern becomes more obvious and the influence is exploited by the algorithms. The generalized execution

Jo

ur

na

lP

re

-p

ro of

sequence has been depicted in Figure 2.

Fig 2. Generalized execution sequence of the machine learning algorithms in specified emulation platform After performing these manipulations on the input data, the key features pertaining to our learning models are picked from the modified dataset and stored in a new, clean dataset for convenience. Beyond this point, the learning algorithms work on this clean dataset. The training and testing samples of the dataset are specified in the code and are not generated randomly. The training sample consists of the first eight months’ worth of data. The following month is set aside for validation parameters. The final two months of the dataset forms the testing samples. All these data

samples are derived from the clean dataset. 5.2 The Iterative Process and Choice of Regression Models

Jo

ur

na

lP

re

-p

ro of

The comparative execution flowchart for each model under the emulation platform is illustrated in Figure 3. The Random Forest regression model is essentially an ensemble method that builds multiple decision trees to construct a final prediction. It consists of an iterative process that traverses through the entire training dataset, splitting the set at each step to maximize information gain, or minimize entropy. With the target value being a real number, the regression model is fit to the target variable using each independent variable. Subsequently, for each independent variable, the data is split at several split points and the Sum of Squared Errors (SSE) is calculated between the predicted and actual values. The variable with the least SSE value is selected for the node, and the recursive process stops. Thus, the random forest regression model works as a large collection of decorrelated decision trees.

Fig 3. A comparative flowchart for each regression model under the specified emulation platform Although random forest regressors are among the most popular machine learning algorithms, they might not be the best choice for time-series forecasts at first glance because of their lack of ability to extrapolate data points. The random forest algorithm splits the trend predictor and any future trend value is simply treated as the last split bucket, with no concept of extrapolation. Predictions

for the test data samples are obtained by finding the predictor corresponding to the partition where this new input variable resides. Thus, for load prediction, the random forest regression model on its own isn’t a competitive model. However, the model is worth testing on load datasets by incorporating bagging and trending techniques along with lagged predictor variables, as they have often proved to work with better accuracy when compared to benchmarking methods. Hence, the dataset is modified accordingly, and the forecasts made by the model are compared for accuracy with the actual test samples. Further, the model is run on both scaled and unscaled data, and the observations are made.

-p

ro of

Support Vector regression falls under the category of discriminative regression models. The model takes a set of input vectors and associated responses, uses a kernel function to perform transformations on the input data (prior to the learning step), and then fits the model to try and predict the response when given a new input vector. The kernel-based approach to regression is to transform the feature to some vector space, and then perform simple linear regression in that vector space. The function gives the dot product of vectors in this new space, the upshot being that we do not need to know this new space at all. All the mathematical details are hidden in the regression done in the transformed space, and the choice transforms. Further, no knowledge of the mapping function is required, it is the kernel function that is solely used for the transformation.

lP

re

The main challenge of using kernel SVR is determining the right kernel and combination of hyperplane parameters suited to the problem statement. A k-fold cross-validation would be a statistically valid procedure for obtaining the right hyperplane parameters, however, the effectiveness of the cross-validationis reduced since the consumption changes over time. To deal with this, we took time as an input feature in our proposed model with standard parameter tuning, and forecasting was performed on both scaled and unscaled data. The kernels used in our model were the Radial Basis Function (RBF), Linear, and Polynomial of degree 3, and the results of the prediction were analyzed in each case.

ur

na

The logistic model is a variation of the probabilistic, statistical classification models, predicting binary outcomes and giving a binary response from a binary predictor. To make predictions on realtime data on the fly, the training dataset must first be labeled, setting the dependent variable to either 0 or 1. Then, the current estimate of vector β is used to predict the new probability of the dependent variable for the data that just arrived.

Jo

With the above discussion, it is clear that logistic regression produces either probabilistic(floating point value between 0 and 1), or binary outcomes, and hence the forecast shall also be of the same nature. The historical data used for training and testing the model is of numerical nature with no specific range, and so is the target value. Hence, the application of logistic regression to forecast the consumption values is expected to give a very high RMSE score. However, this model is useful when high and low consumption limits are set. On setting such limits to specific values and adding a feature to the dataset in the feature engineering step that has value 1 for consumption beyond a classification threshold and 0 for consumption below the threshold, a binary predictor may be created. This binary predictor simply predicts whether future consumption values are greater or lesser than a particular value, and the Raspberry Pi can accordingly control the state of the device or provide alternative sources of electricity. The goal of such a model would be to generate an

equation that can reliably classify observations into one of two outcomes- high and low power consumption. The k-Nearest Neighbors model falls under the category of non-parametric regression models. This algorithm stores all the available cases in the training sample and predicts the test samples based on similarity measures. In the model proposed in the paper, the inverse distance weighted average of the k (here, 5) nearest neighbors to the target variable was used as the measure of similarity. This approach of pattern classification based on similarity of the individuals of the population makes the algorithm simple and intuitive and often finds its application in time series forecasting models.

-p

ro of

Since the load dataset in this paper is a univariate time series, it is expected that the consistent datagenerating process often produces observations with repeated patterns. If a previous pattern can be identified as similar to the current behavior of the time series, the behaviors of the previous pattern can provide some insight to predict the same in the immediate future. Hence the kNN algorithm is an obvious choice for predictions on our dataset. A kNN algorithm is characterized by issues such as the number of neighbors, adopted distance, among others, and the parameters are tuned in accordance with our dataset. This model is again performed on both scaled and unscaled samples and observations are recorded. 6. Result Analysis

Jo

ur

na

lP

re

Prior to the analysis, it is necessary to graphically represent the original, untouched dataset. Figure 4 illustrates the raw time series data with Usage along y-axis and Date_Time along x-axis.

Fig 4.Raw load dataset

The execution time of the models is obtained using the time.process_time()function included in time module, available in Python v3.3 and above. The function returns a float value, that gives a measure of the sum of system and user CPU time for the current process. The returned value does not include time elapsed during sleep and is hence more accurate for our experiments. Further, the reference point of the returned value is kept undefined to ensure that only the difference between the results of consecutive function calls is valid. First, the start_time is obtained using the above functionality. Then, the function enclosing our model is run and a corresponding end_time is obtained once function execution is completed. Finally, the difference is calculated, and the value is displayed on the screen. It is to be noted that the execution time recorded for all the proposed models include this time along with their respective fitting and prediction time values. They do not just indicate the time taken by the forecasting model to work on the dataset. We have found the execution time for data manipulation = 23.1365 secs.

ro of

6.1 Support Vector Regression Model

-p

In Support Vector Regression models, the choice of the kernel and the hyperplane parameters directly affects the learning and efficiency of the model. The kernel selection process is dataset dependent; hence three spaces were used for experimental analysis, each being characterized by a different complexity – the Linear, Radial Basis Function (RBF) and the Polynomial Kernel SVRs. The hyperparameter values for the models are listed in Table 1.

re

Table 1. Tabulated record of hyperparameter values for different kernels in the SVR Parameter

Value

C(Penalty parameter) CACHE_SIZE COEF0

na

EPSILON(ε) DEGREE

Jo

ur

GAMMA(γ) MAX_ITER SHRINKING TOL VERBOSE

Linear, RBF, Polynomial

lP

KERNEL

1.0 200 0.0 0.1 3 Auto -1(unlimited) True 0.001 False

The prediction graphs for each model, running on scaled and unscaled feature sets are provided below in Figure 5 (a,b,c,d,e,f)

Jo

ur

ro of

na

lP

re

-p

(a) SVR(Linear)-Scaled

(b) SVR(Linear)-Unscaled

Jo

ur

ro of

na

lP

re

-p

(c) SVR(RBF)-Scaled

(d) SVR(RBF)-Unscaled

ro of -p

Jo

ur

na

lP

re

(e) SVR(polynomial)-Scaled

(f) SVR(polynomial)-Unscaled

Fig 5. The prediction graphs for each model, working on scaled and unscaled feature sets

Table 2. Scores and execution time of the implemented models on SVR. KERNEL

FITTING SCORE Scaled Unscaled

RMSE SCORE Scaled Unscaled

EXECUTION TIME (sec)

Linear RBF Polynomial

0.6009 0.5775 0.5476

0.1888 0.2035 0.1530

75.5581 115.2076 78.0684

0.6011 0.5825 0.4113

0.0901 0.0985 0.1295

ro of

On analyzing the prediction graphs, and obtaining the fitting and RMSE scores, it is clear that the linear and RBF kernel SVRs produce very similar results on the unscaled feature set, both outperforming the polynomial kernel. When considering preprocessed, scaled data, the polynomial kernel performs predictions with the least RMSE score, however, the values are not satisfactory when compared to the scores obtained from the model on unscaled data. Table 2 illustrates scores and execution time for SVR.

lP

re

-p

With the linear kernel performing marginally better than the RBF kernel on unscaled set of features, the use of either kernel should serve the purpose of our problem statement. However, these two kernels have a substantial difference in complexity, with the linear being much simpler than the RBF. This may not affect the forecast on this dataset, however, since the RBF kernel SVR is a nonparametric model, the complexity of the hypothetical space increases with increasing dataset size, while the linear SVR, being a parametric model, does not; thus, explaining the dependence of kernel choice on the size of dataset. With increasing complexity, it gets more expensive to train the RBF kernel SVR since the projection to the infinite higher dimensional space becomes more expensive, and so does the prediction, along with a greater risk of overfitting. On the flip side, the linear kernel seems too simple for complex multivariate forecasting situations where the RBF kernel is mathematically more promising.

Jo

ur

na

The polynomial kernel underperforms on unscaled features and performs better than both linear and RBF kernels on scaled feature set. However, the predictions are not satisfactory. A possible explanation would be under-penalizing the model. The hyperparameter C , the penalty parameter, is set to default in all the three learning algorithms. This is done to maintain consistency and common grounds for comparison. However, on tuning this hyperparameter we specify how harshly the model is punished for straying away from the data, and the default value may be under-penalizing the model, resulting in unsatisfactory prediction curves. The polynomial kernel is still not advised for short-term load forecasting, considering the lack of computational and predictive efficiency. While the RBF Kernel might perform well in the given scenario, the execution time and core temperature variations of the model in the chosen platform presentconcerning results. The Support Vector Regression models are observed to lie in the higher end of the spectrum when it comes to execution time, as well as CPU temperatures. When applied on real-time data for short term forecasting in households, these factors could potentially generate errors in prediction and cause general inconvenience due to overheating and lagging interface issues. 6.2 Random Forest Regression Model

Jo

ur

na

lP

(a) Random Forest Regression-Scaled

re

-p

ro of

In the Random Forest Regression model, the number of parallel processes was set to 3 to make the model run faster on the platform, and the random_state parameter was set to 2017, to indicate that the learning algorithm starts from that point. The graphical results for the same have been obtained and illustrated in Figure6 (a,b).

(b)Random Forest Regression-Unscaled

Fig 6. Prediction graphs for Random Forest Regression

Table 3. Fitting Score, RMSE, and Execution Time for the Random Forest Regression Model FITTING SCORE

RMSE SCORE

Scaled 0.8966

Scaled 0.2239

Unscaled 0.8969

Unscaled 0.1048

EXECUTION TIME (sec) 28.2557

ro of

MODEL NAME Random Forest Regression

-p

Table 3 represents the scores and execution time for Random Forest. The fitting score for the random forest regression model does not suffer on changing the feature set from scaled to unscaled data, however, the RMSE score shows a substantial difference. This has to do with the scaling function used for the univariate time series data preprocessing, which scaled the feature set to a range of values, here, -1 to 1.

lP

re

In the random forest regression-based model, the iterative partition process is greedy and cannot generally converge to a globally optimum tree. The bagging technique is used solve this problem - a collection or ensemble of locally optimum trees is generated by sampling uniformly at random from the original subset, and from this resampled version of the dataset, all but a few features are sampled. Finally, a new tree is trained. The collection of such trees forms a forest and voting is employed to get an aggregate prediction.

ur

na

When it comes to a time series dataset, better results may be obtained by using the technique of recently lagged predictor values, wherein the role of the predictor variable is taken by the previous values in the series. Although the results obtained from the regression model are satisfactory, the execution time for the model is impressive and does hold the potential of serving its purpose in real time load forecasting situations.

Jo

6.3 Logistic Regression model

The logistic regression model works differently from the other models mentioned in the paper since it only produces binary outcomes. The purpose of using Logistic Regression is to predict high and low consumption values. The same could be done with the classification counterparts of the regression methods used to forecast, however, the concept of logistic regression and the mathematics behind the standard learning algorithm suited best for this purpose. During the preprocessing of data, all usage values >= 0.1 kWh were set to 1(or high power consumption) and below the threshold were set to 0 (or low power consumption). Thus, this adds a new column to the dataset, consisting of dependent variables 0 and 1. The model then works on Date_time and the dependent variable. The prediction graphs on scaled and unscaled feature sets are illustrated in Figure 7 (a,b).

ro of -p

(a) Logistic Regression-scaled

Jo

ur

na

lP

re

`

(b) Logistic Regression-unscaled Fig7. Prediction Graphs for Logistic Regression

Table 4 .Fitting Score, RSME score, and Execution Time for Logistic Regression Model FITTING SCORE

RMSE SCORE

EXECUTION TIME (sec)

Logistic Regression

Scaled

Unscaled

Scaled

Unscaled 26.4989

1.0

1.0

0.0

0.0

ro of

MODEL NAME

lP

re

-p

The prediction model in this case produces interesting results as shown in Table 4. The fitting score is a perfect 1.0, implying possible over fitting of data, and produces a clearly misleading RMSE score of 0.0, obtained on test samples. Since the dependent variables have two states, with both being an integer value, scaling the feature set does not have any impact on the model. The results obtained from this model seem too perfect and is suitable only for the training dataset. However, it must be noted that the RMSE score is not a good route for the analysis of the logistic regression method since it throws a lot of information away. Thus, the regression model gives a perfect prediction on this particular dataset, losing its generalization and ability to perform well when given a different dataset. A few different approaches may be employed for the analysis of this model such as the Receiver Operation Characteristics (ROC) curve, and score functions such as the Brier score. However, since the ultimate objective of this paper does not solely concern probabilistic determination of the state of power consumption, further modifications on this model were not performed.

na

6.4. k-Nearest Neighbors Regression Model

Jo

ur

The k-Nearest Neighbors Regression algorithm is a non-parametric algorithm that finds the k (here, 5) closest samples in the training data and uses the average response of these k samples as a predicted response. The weights used in this model are set to uniform, and the metric system to Minkowski [22,23] (default). The graphical representation of predictions obtained by working on scaled and unscaled feature sets is displayed in figure 8 (a,b). Table5 represents scores and execution time of kNN regression.

ro of -p

Jo

ur

na

lP

re

(a) kNN-scaled

(b) kNN-unscaled

Fig 8. Prediction graphs for kNN Regression

Table 5. Fitting Score, RSME score, and Execution Time for k-NN Regression Model MODEL NAME

FITTING SCORE

kNNScaled Regression

Unscaled Scaled

Unscaled

0.7177

0.0957

0.0956

EXECUTION TIME (sec) 27.8502

ro of

0.7177

RMSE SCORE

re

-p

The fitting and RMSE scores obtained on scaled and unscaled feature sets differ marginally. The scores obtained from this algorithm indicate good forecasting abilities. When taking the execution time and core temperature curves in Fig 9(d) under consideration, along with the fitting and RMSE scores, this algorithm works best among all the modified learning models developed in this paper. With feature space construction and distance metric selection being the two most important aspects of non-parametric regression methods such as the kNN, further modifications to real-time multivariate datasets may be made to produce similar or better test results on this kNN system design. Hence, the proposed kNN design proves both robust and flexible for short term load forecasting on the given embedded system platform.

Jo

ur

na

lP

The fitting scores, error scores, execution time and CPU core temperatures for the algorithms have been depicted in the Figure9(a,b,c,d). Further, the memory consumption graph for every algorithm is depicted in Fig 10(a,b,c,d,e,f), along with a detailed explanation of the same.

-p

ro of Jo

ur

na

lP

re

(a) Fitting score comparison for each model

(b) RSME Score comparison for each model

ro of -p

Jo

ur

na

lP

re

(c) Execution time (on the emulation platform) comparison for each model

(d) CPU Core Temperature Comparison for each model

Fig 9. Graphical Analysis of Comparison Metrics From our experimental observations, we have obtained four competitive models with similar RMSE scores. Since the number of input variables(features) in the analysis of our dataset is one, we can skip the scaling step for our feature set. The experimental results also verify the superior performance of the models on unscaled feature sets, hence discussions beyond this point shall only concern the unscaled feature set.

na

lP

re

-p

ro of

The memory consumption of the algorithms was monitored using a memory profiler- a pure python module depending on the psutil module. Line-by-line analysis of memory usage was obtained, and graphs were plotted for each python process, as shown in Figure 10(a, b, c) and Figure 11 (a, b, c) respectively.

Jo

ur

(a) Resource utilization- SVR(Linear) Model

ro of

Jo

ur

na

lP

re

-p

(b) Resource utilization - SVR(RBF) Model

(c) Resource utilization- SVR(Polynomial) Model

Fig 10(a,b,c): Resource utilization of SVR model for different kernel Figures 10 (a),(b), and (c) depict the memory usage statistics of the Support Vector Regression Model, with different kernels. The graphs have a certain trait in common-there is a gradual hike followed by a sudden drop in memory consumption at specific points in every graph. The change is most drastic in case of the polynomial kernel. We have already established while studying the

prediction performance of the polynomial kernel SVM in Section6.1 that this model is unsuited for short-term load forecasting problems. The memory consumption graphs further strengthen this argument since here, a sudden hike followed by a steep drop after seconds of stability is observed. This sudden increase in processor utilization often referred to as a CPU spike, can cause temporary or permanent damage to the CPU and motherboard. The usage of a large number of resources and RAM also causes other programs to lag and during experimentation, often led to freezing of the device. These issues render the model unusable in a short-term load forecasting scenario where large chunks of real-time data need to be processed and meaningful conclusions are to be made over a limited period of time. Hence, the polynomial kernel SVM is not discussed any further and is discarded from our list of competent algorithms for our research.

Jo

ur

na

lP

re

-p

ro of

When considering these memory usage graphs to compare between the Linear and RBF kernels, multiple factors come into play. Firstly, the peak consumption value is less in linear kernel SVM compared to that of the RBF kernel. The underlying complexity of the radial basis function could suffice as a primary reason for this trend. Further, the increase in memory consumption in graph fig 10(a) is more gradual in contrast to the comparatively steep rise observed in fig 10 (b). This trend serves in favor of linear kernel since sharp CPU spikes are avoided and a steady, intermittent rise in RAM usage is beneficial for proper functioning of the hardware platform. However, once the peak value is attained, the linear kernel SVM continues to use such a large amount of resources over a longer period of time than the RBF kernel SVM. The observation is important, since, utilizing a large number of CPU resources for a prolonged period of time affects other applications that might be running on the device, while causing the CPU core to heat up, hampering its performance over time. Hence, the trade-off between high peak value and duration of usage of such resources is important and could vary depending on the number of training samples.

re

-p

ro of

(a) Resource utilization – Random Forest Regression Model

Jo

ur

na

lP

(b) Resource utilization –Logistic Regression Model

(c )Resource utilization -kNN Regression Model

Fig 11 (a,b,c). Resource utilization graphs for Random forest, Logistic regression, and kNN models

ro of

The memory utilization graphs for the random forest regression-based model(Fig 11(a)) and the logistic regression-based model (Fig 11(b)) display similar trends, with little difference in their peak values. However, since the logistic regressor is a binary classifier that very clearly overfits the dataset as discussed in section6.3, further discussion on this model is redundant and unnecessary in view of the interests of our research work.

-p

While comparing the graphs depicted in Fig 11(a) and Fig 11(c), we notice that the peak values are very close and cannot be considered as a determining factor in this context. However, the steady utilization of a smaller number of CPU resources is observed in Fig 11(c), pertaining to the memory usage characteristics of the kNN Regression-based model. Hence, this observation is directly related to better CPU performance and reduced strain on the cores. Thus, the kNNregressorexhibits better performance than the random forest regressor in terms of resource utilization.

na

lP

re

The graphs obtained from the linear and RBF- kernel SVRs are now compared against the kNNregressor. Here, the choice is obvious. The peak memory usage value for the kNNregressor is far less than that of the SVR models, with the latter being thrice as much as the former. Further, no drastic hikes and drops in trend are observed in the kNN-based model, thus enabling maintenance of healthy CPU levels. Hence, the kNNregressor is singled out as the best-performing model in terms of CPU resource utilization to enable better system performance, reduced lag, and lessen the strain on the barrels and cores of the board.

Jo

ur

Based on overall performance, the competitive models are namely, the SVR(linear), the SVR(RBF), the random forest regressor, and the kNNregressor. A detailed comparison between the models based on RMSE scores and execution time is provided in Figure 12(a,b).

ro of

Jo

ur

na

lP

re

-p

(a) RMSE Comparison between competitive models

(b) Execution Time Comparison between competitive models

Fig 12. Comparison Plots for Competitive Models

When choosing a suitable model for real-time load forecasting systems, it is important to consider both the error score and the time taken for execution. The models vary marginally when considering their RMSE scores as shown in Figure 12(a). The linear kernel SVR performs best with a score of 0.09012, surpassing the RBF kernel SVR by roughly 0.008. Although the underlying mathematics of the RBF kernel is more robust and arguably more suited for time series predictions, this observation can be explained by the simplicity of the linear kernel, in accordance to Occam's razor theory [24]. The kNNregressor outperforms the RBF kernel SVR marginally, while the random forest regressor yields a score of 0.10481.

ro of

With RMS errors being so close to each other, we resort to the analysis of the execution time. The platform for short-term load forecasting must perform fast and reliable predictions, without lag, on real-time data. To support IoT functionalities, the hardware must also be able to subsequently process the information provided by the predictive algorithm and decide the state of a connected device in the grid. Thus, execution time plays a significant role when choosing the best model to serve our purpose.

lP

re

-p

The RBF kernel SVR has the highest execution time, followed by the linear kernel SVR. Such slow execution of the models renders them unusable for the purpose of short-term forecasting on the proposed hardware platform. Besides the obvious problem of slow performance, the circuit board also showed significant signs of heating while running these models, as confirmed by the curves in Figure 9(d). Overtime, running such computationally expensive algorithms could potentially harm the device and is hence not considered the best choice for our platform.

Jo

ur

na

Therefore, we are left with two choices, the random forest regressor, and the kNNregressor models. Here, once again, the choice is straightforward. The random forest regressor is inferior, albeit marginally, to the kNNregressor in terms of both RMSE scores and execution time. Further, as discussed earlier, the memory consumption curves too support this statement, and the core temperature lines show similar trends and are therefore not a conclusive parameter in this case. Thus, experimentally it is determined that for short term load forecasting on the proposed hardware platform for our load dataset, the best model is the kNNregressor model, performing with a loss of 0.09573 with an execution time of 27.85 s.

7. Conclusion

Accurate future electricity demand predictions would facilitate economic operation of power systems. Such predictive models, when combined with embedded systems and the smart Internet of Things (IoT) technologies, holds the potential of revolutionizing the entire power sector, with a mere credit card-sized circuit board. In this paper, we considered a low-price, pocket-sized hardware set-up that can forecast power consumptions on real-time data served as input from an array of sensors and can control the state and source of energy for the connected devices. The price range and size of the hardware platform does propose some challenges when it comes to intelligent learning algorithms. Existing forecasting methods for the estimation of electric demand can be

broadly classified into two categories: (1) Classical Statistical Methods, and (2) Automated Learning Techniques. Implementation of the latter requires a robust hardware architecture with multiple heat sinks, modern GPUs, etc. The embedded system platform could be enhanced with hardware add-ons; however, the prediction process could exhaust the device and the controller aspect of the device would suffer, rendering it useless in a smart grid network. Hence, modified versions of traditional regression algorithms were designed to bring our vision to light.

Jo

ur

na

lP

re

-p

ro of

When multiple regression models tuned to the needs of our dataset were employed, two learning methods produced reliable outcomes- the Support Vector Regression and the k-Nearest Neighbors Regression. Although both models produced similar error scores, the kNN algorithm surpassed the SVR in the following aspects: (1) Consistency when working with scaled and unscaled feature sets,(2) Execution Time of model, (3) Memory Consumption, and (4) Heating of the CPU core. Through extensive research and multiple comparisons, a reliable prediction algorithm is identified. The efficiency of the prediction algorithm is comparable to several deep learning methods that are commonly used for traditional forecasting scenarios. It is also important to note that, when installed in individual households, the test dataset might include previously unseen labels. Likewise, the labels in the training dataset may also vary with the passage of time with more and more parameters affecting electricity demands and costs. In the future, we would like to tackle such issues and deal with continuously evolving datasets from multiple sources served as input to the device. With the hardware limitations in mind, the model proposed in this work can perform acceptable forecasts without tampering the intricacies of the circuit board and can easily be deployed in household settings and smart grid networks, thereby providing a smart solution to the increasing global electricity demand.

References

[1] Charytoniuk, W.; Chen, M.; Van Olinda, P. Nonparametric regression based short-term load forecasting. IEEE Trans. Power Syst. 1998, 13, 725–730.

ro of

[2] Gezer, G., Tuna, G., Kogias, D., Gulez, K. and Gungor, V.C., 2015, July. PI-controlled ANN-based energy consumption forecasting for Smart Grids. In Informatics in Control, Automation and Robotics (ICINCO), 2015 12th International Conference on (Vol. 1, pp. 110-116). IEEE. [3] Reka, S. Sofana, and TomislavDragicevic. "Future effectual role of energy delivery: A comprehensive review of Internet of Things and smart grid." Renewable and Sustainable Energy Reviews 91 (2018): 90-108.

-p

[4] Mortaji, Hamed, Siew Hock Ow, Mahmoud Moghavvemi, and Haider Abbas F. Almurib. "Load Shedding and Smart-Direct Load Control Using Internet of Things in Smart Grid Demand Response Management." IEEE Transactions on Industry Applications 53, no. 6 (2017): 5155-5163.

lP

re

[5] Alberg, D. and Last, M., 2017, April. Short-term load forecasting in smart meters with sliding window-based ARIMA algorithms. In Asian Conference on Intelligent Information and Database Systems (pp. 299-307). Springer, Cham. [6] Mahmood Khalid, et al. A lightweight message authentication scheme for smart grid communications in power sector. ComputElectrEng 2016:1–11. [0 0 0].

na

[7] Pan Jianli, et al. An internet of things framework for smart energy in buildings: designs, prototype and experiments. IEEE Internet Things J 2015;2(6).

ur

[8] Ouyang, Tinghui, Yusen He, Huajin Li, Zhiyu Sun, and Stephen Baek. "A Deep Learning Framework for Short-term Power Load Forecasting." arXiv preprint arXiv:1711.11519 (2017).

Jo

[9] Chen, Kunjin, Kunlong Chen, Qin Wang, Ziyu He, Jun Hu, and Jinliang He. "Short-term Load Forecasting with Deep Residual Networks." IEEE Transactions on Smart Grid (2018). [10] Ahmad, Ashfaq, NadeemJavaid, Mohsen Guizani, Nabil Alrajeh, and Zahoor Ali Khan. "An accurate and fast converging short-term load forecasting model for industrial applications in a smart grid." IEEE Transactions on Industrial Informatics 13, no. 5 (2017): 2587-2596. [11] Fallah, SeyedehNarjes, Ravinesh Chand Deo, Mohammad Shojafar, Mauro Conti, and ShahaboddinShamshirband. "Computational Intelligence Approaches for Energy Load Forecasting in Smart Energy Management Grids: State of the Art, Future Challenges, and Research Directions." Energies11, no. 3 (2018): 596.

[12] Li, Liangzhi, Kaoru Ota, and Mianxiong Dong. "When weather matters: IoT-based electrical load forecasting for smart grid." IEEE Communications Magazine 55, no. 10 (2017): 46-51. [13] Maaji, Salim S., Georgina Cosma, AboozarTaherkhani, Ali A. Alani, and T. M. McGinnity. "On-line voltage stability monitoring using an Ensemble AdaBoost classifier." In 2018 4th International Conference on Information Management (ICIM), pp. 253-259. IEEE, 2018.

ro of

[14] Rafiei, Mehdi, TaherNiknam, JamshidAghaei, MiadrezaShafie-khah, and João PS Catalão. "Probabilistic Load Forecasting using an Improved Wavelet Neural Network Trained by Generalized Extreme Learning Machine." IEEE Transactions on Smart Grid (2018).

-p

[15] Zeng, Nianyin, Hong Zhang, Weibo Liu, Jinling Liang, and Fuad E. Alsaadi. "A switching delayed PSO optimized extreme learning machine for short-term load forecasting." Neurocomputing 240 (2017): 175-182. [16] Biau, GÊrard. "Analysis of a random forests model." Journal of Machine Learning Research 13, no. Apr (2012): 1063-1095.

re

[17] Gunn, S.R., 1998. Support vector machines for classification and regression. ISIS technical report, 14(1), pp.5-16.

lP

[18] Zhang, Min-Ling, and Zhi-Hua Zhou. "A k-nearest neighbor based algorithm for multilabel classification." In Granular Computing, 2005 IEEE International Conference on, vol. 2, pp. 718-721. IEEE, 2005. [19] Hilbe, Joseph M. Logistic regression models. CRC press, 2017.

na

[20] Austin, Peter C., and Ewout W. Steyerberg. "Events per variable (EPV) and the relative performance of different strategies for estimating the out-of-sample validity of logistic regression models." Statistical methods in medical research 26, no. 2 (2017): 796-808.

ur

[21] Uzair, Ahmed, Mirza O. Beg, HasanMujtaba, and HammadMajeed. "WEEC: Web Energy Efficient Computing A Machine Learning Approach." Sustainable Computing: Informatics and Systems (2018).

Jo

[22] Shaw, R. (1982). " 6.6Minkowski space, 6.7,8 Canonical forms pp 221–242". Linear Algebra and Group Representations. Academic Press. ISBN 978-0-12-639201-2. [23] Minkowski, Hermann (1908–1909), "Raum und Zeit" [Space and Time], PhysikalischeZeitschrift, 10: 75–88 [24] Jefferys, William H.; Berger, James O. (1991). "Ockham's Razor and Bayesian Statistics". American Scientist. 80: 64–72. [25] Dr. MbalisiOnyekaFestus ,Offor Beatrice Ogoegbunam ;“EnergyCrisis and Its Effects on National Development The Need for Environmental Education in Nigeria”. Vol. 3, No.1, pp. 21-37, February 2015 Published by European Centre for Research Training and Development UK.

[26] Chi Seng Leung; “How electricity consumption affects social and economic development by comparing low, medium and high human development countries”.GENI. July 2005. [27] Jean-Paul Bouttes, François Dassa and Renaud Crassous; “The Three Challenges Facing the Electricity Sector ”. YaleGlobalOnline.November 28, 2011. [28] Yu, Bin, Xiaolin Song, Feng Guan, Zhiming Yang, and Baozhen Yao. "kNearestneighbor model for multiple-time-step prediction of short-term traffic condition." Journal of Transportation Engineering 142, no. 6 (2016): 04016018.

ro of

[29] Dong, Longjun, Xibing Li, and Gongnan Xie. "Nonlinear methodologies for identifying seismic event and nuclear explosion using random forest, support vector machine, and naive Bayes classification." In Abstract and Applied Analysis, vol. 2014. Hindawi, 2014.

Jo

ur

na

lP

re

-p

[30] Chaki, J., Dey, N., Shi, F., & Sherratt, R. S. (2019). Pattern Mining Approaches used in Sensor-Based Biometric Recognition: A Review. IEEE Sensors Journal.