Load prediction in short-term implementing the multivariate quantile regression

Load prediction in short-term implementing the multivariate quantile regression

Journal Pre-proof Load prediction in short-term implementing the multivariate quantile regression Yazhou Xing, Su Zhang, Peng Wen, Limin Shao, Babak ...

696KB Sizes 0 Downloads 52 Views

Journal Pre-proof Load prediction in short-term implementing the multivariate quantile regression

Yazhou Xing, Su Zhang, Peng Wen, Limin Shao, Babak Daneshvar Rouyendegh PII:

S0360-5442(20)30142-0

DOI:

https://doi.org/10.1016/j.energy.2020.117035

Reference:

EGY 117035

To appear in:

Energy

Received Date:

07 July 2019

Accepted Date:

24 January 2020

Please cite this article as: Yazhou Xing, Su Zhang, Peng Wen, Limin Shao, Babak Daneshvar Rouyendegh, Load prediction in short-term implementing the multivariate quantile regression, Energy (2020), https://doi.org/10.1016/j.energy.2020.117035

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. © 2019 Published by Elsevier.

Journal Pre-proof

Load prediction in short-term implementing the multivariate quantile regression

Yazhou Xing 1† Su Zhang1† Peng Wen1 Limin Shao 1*, Babak Daneshvar Rouyendegh2 1. College of Mechanical and Electrical Engineering, Hebei Agricultural University, Baoding Hebei, China, 071001 2. Department of Industrial Engineering, Ankara Yıldırım Beyazıt University (AYBU), 06010 Ankara, Turkey * Corresponding

Author: Limin Shao,College of Mechanical and Electrical Engineering, Hebei

Agricultural University, Baoding Hebei, China, 071001,E-mail: [email protected]

Y.Z. Xing and S. Zhang contributed equally to this work.

Abstract: Probability-based interim demand prediction plays and important role in managing the grid and optimizing the transmitted power through lines. Improved prediction techniques able to offer precise forecasting are supposed to be compatible with their own implementational situations in interim operation and must be highly efficient and fast. A lot of prediction techniques based on data are excessively verbose and not very suitable. The mentioned challenge emerges when the numerous demands are supposed to be forecasted at the same time, for example assessing and optimizing the energy delivery network. Here, a novel hybrid prediction framework is suggested, which improves the probability-based prediction of each load in real-time. The improvement approach uses the multi-variable quantile regression that is implemented on each prediction in real-time when a new observational data is inputted to the system. The proposed procedure is assessed using the demand data released by the Independent System Operator-new England for

Journal Pre-proof

eight areas, which is composed of six states of the U.S. The performance of the probability-based prediction are compared to that of three other benchmarks with respect to reliability and accuracy. The suggested approach shows better accuracy compared to the highest-ranked benchmark. Keywords- hybrid prediction method, multi-variable regression, interim probability-based prediction, quantile regression forest 1. Introduction Power utilization is increasing fast globally due to the always-rising population, seeking more comfortable life and tendency toward extensive automation that cause the economical growth [1]. Interim and long-term power utilization prediction is necessary for investing in electricity structures [2]. For instance, to react to the high power utilization, a large amount for fossil fuel has been utilized for generating power in the late twentieth century that has led to diminished fossil fuel sources. At the inception of the twenty-first century, sustainable power resources (like the sun-based power, hydro, and wind power) have been highly used for generating electrical power. In comparison to the customary fossil fuels, the hydro, wind, and the sun-based power productions possess the benefits of high extension possibility, no contamination, and the ability for recycling. They have instituted a more desirable and purer power form. Nonetheless, with respect to the volatility and the ability to mixed with the wind power, hydro, sun-based power generations, electrical power utilization prediction turns into completely essential, however controversial topic. A precise prediction could increase the efficient utilization of reproducible power. Furthermore, accurate power utilization prediction could help governmental procedures in future power utilization and improvement. The most prominent procedures for power utilization prediction contain regression methods [3], time-series methods [4], the fuzzy hypothesis [5, 6], neural grids [7], Bayesian grids [8], the hybrid approaches [9], etc. Analysis of regression and time-series

Journal Pre-proof

methods are the most used model in power utilization prediction [10]. By the fast improvement in the novel artificial smart approaches, artificial neural grid, and population boom techniques are presented for electrical power prediction [11]. Reference [12] has illustrated that statistical machine training methods caused higher precision of electrical power forecasts. With regard to forecast output, current approaches of electrical power utilization prediction could be separated to a definite point forecast and probability-based prediction in accordance with volatility investigation [13]. A precise point forecast cannot indicate the variation of electrical power utilization. Actually, the actual electrical power utilization and the usage increase are affected by several parameters, consisting of economical improvement, industrialized structure, income of the people, weather, geographical area, governmental politics (electrical power cost), and so on. The whole of these parameters have interaction among each other. Moreover, the combination of industrialization and informatization in the electrical power field contributes to the accelerated increase of electrical power data, leading to a huge amount of data with large resources, various attributes, large amount and quick increase. It causes electrical power utilization prediction more complex [14]. Nonetheless, PBDP must be enhanced in trade-off situation between the precision and calculational burden to be practical. Mostly, choosing the forecaster, hyper-variables and variable approximation processes take a very long time. For example, there is merely a little enhancement in using the time-taking holistic approach technique choosing process instead of uncomplicated heuristic technique [15]. Hence, after selection of the model, the configuration and the hypervariables are mostly the same. Next, after receiving new observations, the technique is modified. Although a number of the loads might be more accurately modeled with the time-dependent demonstration, using the model choosing and variable approximation approaches in real-time is

Journal Pre-proof

more practical. When a number of demands must be forecasted at the same time to take risky decisions, this notion attract more attention. On the other hand, using a sophisticated prediction engine in each load can be impractical with respect to the economic, calculational, and complexity aspects. A double-step hybrid framework is suggested in this work for interim probability-based prediction of a number of loads. Initially, each probability-based prediction is conducted by a quantile regression forest (QRF) [16]. Next, the prediction results are enhanced using a multivariable quantile regression (MQR) technique and bias modification (BM). The overall goal of a cooperative structure is boosting the total accuracy of the prediction [17]. The single-variable quantile regression (QR) technique in reference [18] was used to derive the proposed QR. This novel technique uses the probability-based prediction of the connected consumers as input and considers the group of loads in constructing the multi-variable forecast quantiles at the same time. The MAR technique is created in a linear constraint program framework to increase the calculational performance and be able to choose a method for real-time prediction jobs that works fast. The challenge is suitably categorizing each load in the multi-variable approach. Depending on the categorization preferences, the suggested approach has the possibility to yield more accurate prediction compared to the initial prediction. Hence, a contrast-based assessment is carried out between the categorization approaches to verify them for the aforementioned notion. The categorization approaches are either based on locational identicality or random selection. This work has contributed as follows:

Journal Pre-proof

1) Proposal of a novel cooperative prediction structure, called MQR and QRF based BM, with the purpose of extracting the time-dependent arrangement in the consumption curve using a real-time enhancement of the single-variable prediction of each load. 2) Application of new version of QR which uses bias correction in QRF in prediction process. In this model, a novel feature weighting subspace sampling approach is considered to increase the efficiency of suggested model. 3) Contrasting the location-based and random categorization techniques to analyze the effect of the multi-variable approach. In the following, the proposed model and related descriptions are presented through application on real world engineering test cases. 2. Cooperative prediction technique Our goal is using the shared data of the inter-dependent loads to create a multi-variable prediction technique. In addition, we try to make our technique practical in real-times conditions, which will let us obtain the changes in the demand curve in real-time by modifying the model variables when more updated observational data are obtained. However, the calculational time of the prediction procedure has to be reasonable with respect to the lead time. 2.1.

Multi-variable quantiles as orientational quantiles.

In this part, we define the ''orientational'' features of our quantiles. At first, we extract and examine the sub-gradient situations, next, declare the powerful equivariance characteristics of our experimental quantiles, and eventually demonstrate some approximate outcomes. 2.1.1. Sub-gradient situations.

Journal Pre-proof

According to the hypothesis (H), the objective function (𝜑𝜏) has global minimum and can always be differentiated on ℝ𝑘. Hence, the population 𝜏 ― quantile can be identically described as the gathering of hyperplanes related to the solutions (𝑎𝜏,𝑏′𝑇)′ of the equations.  ( a , b  )   ( a , b )  0

(1)

These hyperplanes are defined by the following equations

  a  (a, b) ( a ,b )  P  uZ  b Z  a     0 



(2)

 P  Z  H (a , b )     0

 b  (a, b) ( a ,b )   E   Z   E   ZI ZH 











( a ,b )  

0  

(3)

Obviously, relation (2) prepares the multi-variable   quantiles with a common probability-based explanation, because it retains the possibility of their bottom half spaces equivalent to  (||  ||) . The following relation can be demonstrated for equation (3)

1  1  u  E  ZI ZH    E  ZI ZH     0            1 

(4)

When this equation is mixed with (2), it demonstrates that the direct line over the centers of possibility mass

1

1    quantiles half E  ZI Z H   and E  ZI   of the bottom and top     1     Z H  

spaces are parallelized to u : τ /   In addition, notice that

 1  1  (1   )  E  ZI ZH       E  ZI ZH     E  Z   1              

(5)

Journal Pre-proof

Such that the total possibility mass center is a subset of the same direct line. With respect to the gradient situations, which show that (a , c ,  ) are the system solutions, it can be stated that  ( a ,c, ) L (a, c,  )  0

(6)

subject to: L (a, c,  ) :  c (a, c)   (u c  1)

The equation above is the Lagrange equation of the problem. Similarly, since the only points in R k  2 that (a , c ,  ) → (a , c ,  ) cannot be always differentiated are in form of ( a, c,  ) , the second

situation of the gradient can be demonstrated as:

  a L (a, c,  ) ( a ,c , )  0 





(7)

 P  c Z  a     P  Z  H (a , c )     0

 c L (a, c,  ) ( a ,c , )   E ( Z )   E  ZI  ZH 



   L (a, c,  ) ( a ,c , )  1  uc 









0





( a , c )

 u  0

 



(8)

(9)

In case of this type of limitation optimizing problem, situations of the gradient are usually important but not enough. In this situation, nevertheless, remark that premultiplying both sides of (8) by u gives (3), which obviously indicates that, ignoring the Lagrange factor  and (9) to concentrate on the ratio of the quantile hyperplane   , the important situations (7) and (8) are not less effective than the important and adequate ones in (2) and (3), therefore they are important and adequate, too.

Journal Pre-proof

The situations of the gradient (6) are more informative in (1) which are related to the main equation of the quantiles that is truly one of the original causes that we can also think about that substitute description. In fact, equation (8), is reformed as follows:

  1  1  E  ZI ZH      E  ZI ZH     u   1                  (1   )

(10)

This equation provides more data compared to (3) – (4), and more clearly explains the purpose of the Lagrangian coefficient  . This coefficient, that generally just estimates the effect of the boundary limit (in this situation, the limit (9)), in here is shown as a function that is probably helpful for examining elliptical, spherical, or central symmetry or for estimating orientational outlyingness and track performance of the distribution. Furthermore, initially multiplying (8) by

c gives  (cu ) = E [(  ||[c Z a 0] )c Z ] , which is done by utilizing (7) and (9).

  c (a , c )

(11)

The example objective functions  ( n ) (a, b) and c ( n ) (a, c) cannot always be differentiated. Nevertheless, They own orientational derivatives in all orientations that can be utilized for formulating fixed-u sub-gradient situations for the experimental   quantiles , τ   u . Concentrating on the limited optimizing problem, it can be demonstrated that the coefficients 

(a( n ) , c( n ) ) and the coefficient of related Lagrange  ( n ) of any experimental



  quantiles



 ( n )  z  R k : C( n ) z  a( n ) should fulfill the following equations (with the assumption of ri( n ) : c( n ) Z i  a( n ) , i  1,  , n ):

 1n  i 1 I r ( n ) 0    n

 i



1 n



n i 1

I r ( n ) 0

 i



(12)

Journal Pre-proof

 1n  i 1 Z i I r ( n ) 0   n

 i





1 n





1 n Z I  Z   n  i 1 i  ri( n ) 0   i i 1   n

1  u cr( n )  0

1 n



n i 1

Z i I r ( n ) 0

 i



(13)

(14)

In which z  : (max( z1 , 0),  , max( zk , 0)) and z  : (min( z1 , 0),  , min( zk , 0)) . These essential situations are achieved by assuming that, at points (a( n ) , c( n ) , ( n ) ) , orientational derivatives in every 2  k  1 semi axial orientations of the ( a, c,  ) -space is not negative for ( a , c ) and zero for  . When (n>>k), it is obvious that (12) and (14) is estimated forms of their population counterpart (7) and (8), approximately with the identical results (the situation (14) only confirms our limits). Also, equation (12) states that: N N Z P PZ     1  n n n n

(15)

Here (N) demonstrates the negative value, (P) is the positive value and (Z) shows the zero value, in the remaining series ( ri (n ) ), where (i=1,2,..,n). This indicates that for non-integer amounts of (n

 ), experimental hyperplanes of the   quantiles ought to go through some of the Z i ’s. In fact, if the points of data are in usual location (that actually satisfies possibility one under Hypothesis ( Hn )), there will be an example hyperplane of the   quantiles  ( n ) that fits precisely k measurements; (15) then satisfies Z = k. It is worth mentioning that the inequality of (12), (14) and (15) should be strict if the sample   quantiles must be exclusively described. Eventually, the following, the amount of ( ( n ) ) which is parallelized to the case of population, is the minimum which can be obtained.

Journal Pre-proof

For have the limitless description of the experimental quantiles, essential and adequate subgradient situations could be achieved by implementing Theory 2.1 in [25]. Considering that the data points are in the generic location and determining, for a given k-tuple of indexes h  (i1 ,..., ik ) , where 1  i1  ...  ik  n .

Y (h) : Z (h)u X (h) : (1 Z (h)u ) u

u

(16)

k

Here, Z (h) : ( Z i1 ,  , Z ik ) and 1k  (1, ,1)  R k , are outcomes of Koenker, demonstrated that (a( n ) , b( n ) )  (X u (h)) 1 Yu (h) (in case of these situations, there always is a quantile hyperplane that

precisely fits k observations) in the current framework, if and only if

 1k   (h)  (1   )1k

(17)

In which

 1    u Z i 

 (h) : (X u (h)) 1  ih (  I ( r 0) )  i

(18)

Where ri : u Z i  b( n ) u Z i  a( n ) . Also, this answer is exclusive if and only if there is strict inequality in (17) [25]. In the limited situation, it is observed from the hypothesis of the linear programming that (a( n ) , c( n ) ) are coefficients of a   quantiles hyperplane if and only if (17) satisfies ri : c( n ) Z i  a( n ) in (18) (yet with an exclusive answer when there is strict inequality). We emphasize that no situations (particularly, no time situations) are needed in here, except for, the points of data are considered to being in generic situation. 2.1.2. Features of the Equivariance

Journal Pre-proof

For the purpose of simplifying, outcomes for population quantiles in here are declared the following Hypothesis (H), where extra typical observations can be obtained by considering the probable non-unicity of the yielded   quantiles . Then it is simple to verify the affineEquivariance feature.

  Mu / Mu ( MZ  d )  M   u ( Z )  d

(19)

Equation above is true for every inversible k-by-k matrix M and every vector d which is a subset of R k . Furthermore, as  Mu / Mu   u , (19) is appropriate with the generic Equivariance feature supported by [26] as well. Particularly, for translating,   u  Z  d     u  Z   d for every k-vector d, that approves that theory of multi-variables quantiles are not stated at every point of the k-dimension Euclidean spatial area. This can be in clear opposition with other orientational quantile contours that are described with regard to several centers of the area, like the ones in [27] (in the section of quantile bi-plots) as well as [28]. Notice that for every   (0,1) and every u  S ( k 1) , it can be written as:

 (1 )u ( Z )    u ( Z )

(20)

By exchanging the top and bottom parts of the associated half-spaces we will have

int H (1  )u ( z )  int H  u ( Z ) . Obviously,   u (Z ) and   u ( z ) do not have a usual connection if the dispersion of Z is symmetric with respect to the center with regard to any point that   R k . 2.1.3. Asymptotic outcomes.

Journal Pre-proof

With respect to the aforementioned Hypothesis ( Hn ), high constancy, normal asymptotics, and Bahadur-type presentation outcomes for the sample   quantiles and associated values are derived in this section. With respect to the Hypothesis (H), the population   quantiles (a , b ) and (a , c ) are consistently described in an exclusive way, dissimilar to their sample equivalent (a( n ) , b( n ) ) and (a( n ) , c( n ) ) where, in the series, the second one is applied on random series of answer.

High constancy of the sample   quantiles , i.e. as n approaches infinity, (a( n ) , b( n ) ) converges to (a , b ) , is true with respect to the Hypothesis ( Hn ) [29]. However, normal Asymptotics and Bahadur-type presentation outcomes need a bit more powerful hypothesis. Assume the following development of Hypothesis ( Hn ). Hypothesis ( Hn ): The measurements Z i , where i  1, . . . , n , are i.i.d. With a standard distribution that has absolute continuation with regard to the Lebesgue measurement on R k , with density of f, which has a linked support, confirms limited 2nd -order instances and, for several constants ( C > 0), ( r > k – 2) and (s > 0), it holds the following equation:

f ( z1 )  f ( z2 )  C z1  z2

s

2

3 r  s  z z (1  1 2 ) 2 2

(21)

 z1 , z2  R k

Limit (21) is higly soft. Notably, when s=1, it holds by a given density that has continuous differentiability, where there are several constants C > 0, r > k −2 and several invercible k-by-k matrix, denoted by M, so that

Journal Pre-proof

max f ( z )  C (1  R 2 )



r 4 2

(22)

Mz  R

R  0 Therefore, hypothesis ( H n ) is true, e.g. while the Z i ' s are i.i.d. elliptic or multi-normal t when freedom degrees of   2 . Nonetheless, differentiability is not needed, and (21) is true too, e.g. for densities of elliptic proportional to exp( || MZ ||) (that cannot be differentiated at the original point) Hypothesis ( H n ) indicates that the absolutely convex function (a, b)   (a, b) can be differentiated two times at point ( a , b ) , with the Hessian matrix shown in the following:

H :

x  f xx

1 k1  x R

 1  J u      z u

  a  b x  u   x  dx

z  f zz 

u

   a  c z  u  z  d ( z )  J u : J u Hc J u 

(23)

In which, u  :  z  R k : u z  0 and Ju represent the  k  1 by k -block diagonal matrix with diagonal blocks 1 and u . Completely convex denotes that H  is positive semi-definite. Nonetheless, due to the fact that for every  and w : (0 , v)  0 , so: wH w 



R

(0  vx) 2 f

  a  b x  u   x  dx u

(24)

k 1

H  , with respect to the Hypothesis ( Hn  ), is positive and definite with every  . If i , (a, b) : (  I (uZi bu Zi  a 0) ) Zi and  ic, (a, c) : (  I (uZi  a 0) ) Zi , in which Zi : (1, Z i) , while:

Journal Pre-proof

V : Var  J u1,  a , b  

 (1   ) E ( Z )    (1   )   Ju  J u  (1   ) E ( Z ) Var   I ( Zi H ) Z   





 

(25)



 J u Var 1,c  a , c  J u : J uVc J u

Now we are able to express normal asymptotics and Bahadur-type presentation outcome for the sample   quantiles coefficients that is the most important outcome of this part. Theory 2.2.1. Assuming Hypothesis ( Hn  ), it can be written as:

a(n) n  ( n )  b

a  1 n H1 J u  i 1 i , (a , b )  op(1)  b  n

L   N k (0, H1V H1 )

(26)

(27)

n 

Similarly, the following equations are associated with Pk for the  k  1 by  k  1 diagonal matrix, whose diagonals are 1, −1, . . ., −1

 a( n ) n  (n)  b

a  1 n ( Hc )  Pk  i 1 1,c (a , c )  op(1)  b  n

L   N k 1 (0, Pk ( Hc )  Vc ( Hc )  Pk)

(28)

(29)

Here,  Hc  demonstrates the Moore-Penrose pseudo-reverse of H c . In addition 

n (( n )   ) 

1 n    c Zi  a      oP (1)  i 1 n

(30)

Journal Pre-proof



L   N 0,     c Z1  a  



(31)

Where,  denotes variance operator. Since  () is a non-negative function, the distribution of





n ( n )   will tend to skew for finite n (refer to (30)), that could be partially modified through normalization of transforming like the one in [30]. Furthermore, the proving of the aforementioned theory can be simply generalized to obtain the asymptotic distribution of vectors in form of (a(1n ) , b(1n ) ,..., a(Jn ) , b(Jn ) ) , J  N 0 .

Theory 2.2.1 will definitely makes it easier to infer the   quantiles , i.e. it lets us make a safe area for them. Examining linear constraints on   quantiles coefficients, which is examining the nulltheory in form of H0 : (a , b )  M (a0 , b0 , ) : {(a0 , b0 )  v : v  R l } (indicated by a number of k-vector (a0 , b0 ) as well as several full ranked matrix  with size of k  l , where l  k ) that could be obtained by a method similar to [31]. Establishing and investigating these experiments need an exact assessment of the asymptotic pattern of the limited approximator shown in the following that is out of this work's horizon.

 a

(n)





 , b( n ) : arg min ( n ) (a, b)

(32)

 a ,bM ( a0 ,b0 ,  )

3. Arbitrary tree-set for regression 3.1 Regression arbitrary tree-set forest Assuming the L as the teaching data, a method of the Arbitrary tree-set for regression is created with regard to the following stages:

Journal Pre-proof

-

Stage 1: create a sub-set of candidates Lk from L utilizing the bagging prediction [19, 20], which are candidate selection and substitution.

-

Stage 2: growing a tree of regression Tk using Lk. In every point t, the gap is set using the increase in purity, which is shown by equation below:

𝑔=

∑𝑥 𝜖𝑡(𝑌𝑖 ― 𝑌𝑡) 𝑖

𝑁(𝑡)

in which the numbers of the items are demonstrated by N(t) and the average of the whole amount of Yi is demonstrated by 𝑌t in point t. In every branch of the point , 𝑌t is applied for the point forecast amount. -

Stage 3: letting be 𝑌𝑘 the forecast for the tree Tk having input X. The regression Arbitrary tree-sets forecast with the tree K is demonstrated as the following equation: 1

𝑘

𝑌 = 𝐾∑𝑘 = 1𝑌𝑘 Because every tree is originated from a bag sub-set for candidates, it can be originated just by

2

3 of the items in L. Almost

1

3 of items are not included and these objects are named

unbagged candidates that are utilized for estimating the forecast errors [19, 21, 22]. 3.2. Tree-set for quantile regression Quantile regression forest (QRF) applies identical stages that are applied in regression arbitrary tree-sets for growing trees [23]. Nonetheless, in every branch, it maintains the whole amounts of Y as an alternative for the mere average of the amount of Y. Hence, quantile regression tree-sets maintain a line dispersion of Y amounts in every branch point. Utilizing the documentations by [21], let θk be the vector of the arbitrary variable which specifies the k-th tree growth and 𝛩 = [𝜃𝑘]𝑘1 be the collection of the arbitrary variable vector for the treesets which is originated from L. We can calculate a positive weight 𝑤𝑖(𝑥𝑖,𝜃𝑘) for every item 𝑥𝑖𝜖𝐿.

Journal Pre-proof

Letting 𝑙(𝑥,𝜃𝑘,𝑡) be a branch point t in 𝑇𝑘. The items 𝑋𝑖 ∈ 𝑙(𝑥 ,𝜃𝑘,𝑡) are allocated to the identical weight 𝜃𝑖(𝑥 ,𝜃𝑘) = 𝑁(𝑡) ―1, where 𝑁(𝑡) is the items number in 𝑙(𝑥,𝜃𝑘,𝑡). Hence, every scenario in Lk are allocated positive weights and the scenario that aren’t in Lk are allocated the null weight. In case of a single-tree forecast, with regard to 𝑋 = 𝑥, the value of forecast is demonstrated as the following equation: 𝑘

𝑁

𝑌 = ∑𝑖 = 1𝑤𝑖(𝑥, 𝜃𝑘)𝑌𝑖 = ∑𝑥, 𝑋

𝑖

𝑤𝑖(𝑥, 𝜃𝑘)𝑌𝑖

∈ 𝑙(𝑥 ,𝜃𝑘,𝑡)

(33)

The mean amount of the weights by whole trees is the amount of weight w i (x) allocated by Arbitrary tree-sets, where 𝑤𝑖(𝑥) is illustrated in the following formulation: 𝐾

𝑤𝑖 (𝑥) = ∑𝑘 = 1𝑤𝑖(𝑥, 𝜃𝑘)𝐾 ―1

(34)

The regression arbitrary tree-sets forecast is defined in the following equation: 𝑁

𝑌 = ∑𝑖 = 1𝑤𝑖(𝑥)𝑌𝑖

(35)

It should be noted that 𝑌 is the mean amount of the conditional average amounts of the whole trees in the regression of Arbitrary tree-sets. Having the input of 𝑋 = 𝑥, the branch points 𝑙𝑘(𝑥,𝜃𝑘) from whole trees, which X belongs to, and the collection of 𝑌𝑖 in these branch points can be specified . Having Yi and the associating weights 𝑤(𝑖), the dependent dispersion function of Y with regard to X could be calculated as the following equation: 𝑁

𝐹 (Y│X = x) = ∑𝑖 = 1𝑤𝑖(𝑥) 𝐼(𝑌𝑖 ≤ 𝑦)

(36)

Journal Pre-proof

In which sign function, denoted by 𝐼( ∙ ), will be equivalent to one if 𝑌 ≤ 𝑌𝑖 and will be zero if 𝑌 > 𝑌𝑖. With regard to a probability α, the quantile 𝑄α(𝑋) could be estimated as the following equation: 𝑄α(𝑋 = 𝑥) = 𝑖𝑛𝑓 {𝑦:𝐹(𝑦|𝑋 = 𝑥) ≥ α}

(37)

For interval forecast, the following equation are suggested: [𝑄𝛼𝑙(𝑋),𝑄𝛼ℎ(𝑋)] = [𝑖𝑛𝑓{𝑦:𝐹(𝑦|X = x) ≥ α1},𝑖𝑛𝑓{𝑦:𝐹(𝑦|X = x) ≥ αℎ} ]

(38)

In which 𝛼𝑙 < 𝛼ℎ and 𝛼ℎ ― 𝛼𝑙 = 𝜏. In the presented equation 𝜏 is the possibility of forecast Y being in the interval of [𝑄𝛼𝑙(𝑋),𝑄𝛼ℎ(𝑋)]. The forecast could be an amount in the interval, e.g. median or average of Yi amounts, for point regression. The average exceeds the median in robustness approaching outlier. The median value of Y is used in the interval of 2 quantiles as the forecast of Y having the input = 𝑥 . 4. The weighting of properties for sub-space choice 4.1. Evaluation of p-value property The procedure for permutation solely presents the properties significance rank. But, to have more desirable property choice in every point of the tree, separating significance properties from less significant ones will be needed. Welch’s two-sample t-test [24] can help us do this, which contrasts the significance score for a property by the highest significance score for the produced noisy properties named shadows. The properties of the shadow don’t possess the ability to forecast the response property. Hence, every property with significance score smaller than the noisy properties’ highest significance score is regarded to have lower significance. Else, it is regarded as significant. This theory has been initially proposed in [25] and improved in [26-31].

Journal Pre-proof

Through this expanded data group including shadow properties, the method for arbitrary tree-sets (AF) is created. According to the significance measurement by the method of the permutation, the AF method is utilized for computing 2M significance scores for 2M properties. The identical procedure is repeated R times for computing R repetitions. In this work, the measurements of M input properties and M shadow properties with permutation of the associated properties’ values is considered in this data. The highest amount from every line is obtained and inserted to the comparing sample 𝑉 ∗ = 𝑚𝑎𝑥{𝐴𝑟𝑗},(𝑟 = 1,..𝑅;𝑗 = 𝑀 + 1,..2𝑀) , through the copies of the shadow properties. The t-statistic for every input property 𝑋 𝑗 is computed as the follows:

𝑡𝑗 =

𝑉𝐼𝑋𝑗 ― 𝑉 ∗ 𝑠2 1 𝑛1

+

(39)

𝑠2 2 𝑛2

In which the un-biased approximators of the 2 candidates’ variances are demonstrated by 𝑠21 and 𝑠22, the R significance scores mean amount in the j-th input property is illustrated by 𝑉𝐼𝑋𝑗 and the mean R comparative amounts in 𝑉 ∗ is shown by 𝑉 ∗ . For importance experiment, the estimation of dispersion 𝑡𝑗 is considered as an ordinary Student’s dispersion by the freedom degree 𝐹𝐷 computed in the following function:

𝐹𝐷 =

(

𝑠2 1 𝑛1

2 (𝑠2 1/𝑛1) 𝑛1 ― 1

+

+

2 𝑠2 2 𝑛2

)

2 (𝑠2 2/𝑛1) 𝑛2 ― 1

(40)

𝑛1 = 𝑛2 = 𝑅

⇒𝐹𝐷 = 𝑅 ― 1

𝑠21 + 𝑠22

(𝑠21)2 + (𝑠22)2

(41)

Journal Pre-proof

The computation of the p-value for the property and performing theory experiment in 𝑉𝐼𝑋𝑗 > 𝑉 ∗ are accomplished by computing the t statistic and 𝐹𝐷. Having a statistic importance, the significant properties are identified. The described experiment proves that if a property is significant, it usually scores highly in comparison to the shadow through the multiple permutations. 4.2 property categorization and sub-space choice The p-value of the property expresses the significance of the property in the tree-set. The size of the p-value of the property has inverse relation with the correlation of the estimator property to the response property, as well as the strength of the property in the forecast. Having the p value for every properties, an important level as the threshold λ has been set, for example 𝜆 = 0.05. Every property that has small p-value in comparison to λ is added to the significant property sub-set 𝑋ℎ𝑖𝑔ℎ and if the p-value is bigger, is added to lower significance property sub-set 𝑋𝑙𝑜𝑤. The 2 sub-sets categorize the collection of the properties for data. Having the 𝑋ℎ𝑖𝑔ℎ and 𝑋𝑙𝑜𝑤 in every point, a number of properties from 𝑋ℎ𝑖𝑔ℎ and 𝑋𝑙𝑜𝑤 are chosen accidentally for forming the property sub-space to split the point. Having a sub-space capacity, the sub-space is created with eighty percent of the properties modeled from 𝑋ℎ𝑖𝑔ℎ and twenty percent modeled from 𝑋𝑙𝑜𝑤. 4.3. The proposed quantile regression tree-set technique Utilizing the novel property weighting sub-space modeling approach for generating divisions in the decision trees points and choosing a forecast value of Y from the interval of the minimum and the maximum quantiles by the highest possibility, we can now expand the quantile regression treeset. The novel method of the quantile regression forest eQRF is briefly explained below:

Journal Pre-proof

1. Having L, create the expanded data group 𝐿𝑒 with dimension size of 2M via permutation of the associated forecaster property amounts for generating shadow properties. 2. Create a model for the regression arbitrary forest 𝐴𝐹𝑒 from 𝐿𝑒 and computing R iterates of raw significance scores of every forecaster properties and shadows using 𝐴𝐹𝑒. Draw out the highest significance score of every iteration for forming the comparative model 𝑉 ∗ of R components. 3. For every estimator property, use R significance scores and calculate t statistic in accordance to equation (12). 4. Based on equation (14) calculate the 𝐹𝐷. 5. Using 𝐹𝐷 and statistic, calculate whole p-value for every forecaster properties. 6. Having an importance value threshold λ, divide high-significant low-significance properties into the 𝑋𝑙𝑜𝑤. and 𝑋ℎ𝑖𝑔ℎ sub-sets. 7. Conduct the data selection in the learning set L through substitution for generating bagged candidates 𝐿1,𝐿2,…,𝐿𝐾. 8. For every model collection 𝐿𝐾, create a regression tree 𝑇𝑘 for every candidate set, as explained below: -

In every point, choose a property sub-space arbitrarily with the size bigger than one, from both 𝑋𝑙𝑜𝑤 and 𝑋ℎ𝑖𝑔ℎ and apply this sub-space property as candidates for dividing the point.

-

Every tree is created non-deterministically and in absence of pruning till the lower point size 𝑛𝑚𝑖𝑛 is reached. In every branch point, every Y of the components in the branch point must be maintained.

-

Using trees individually and the tree-set with candidates out of the bag, calculate the weights of every𝑋𝑖 .

Journal Pre-proof

9. Letting be 𝛼ℎ ― 𝛼𝑙 = 𝜏 , calculate the associated quantile 𝑄𝛼𝑙 and 𝑄𝛼ℎ using equation (10) (in this paper 𝛼𝑙 = 0.05, 𝛼ℎ = 0.95, 𝑎𝑛𝑑 𝜏 = 0.9). 10. With respect to an input X, conduct the forecast using the quantile value in the interval 𝑄𝛼𝑙 and 𝑄𝛼ℎ, e.g. average or median. 4.4. Method for two-stage bias correction The approach for an adaptive bag as a stepwise repetitive procedure has presented in [20]. Take the size of Y in the 1st step and define 𝑌 as the forecasted value that are computed by reducing the estimators, the 2nd step of bagging is conducted utilizing 𝑌 . This implies that if the average square error for novel items from the next step are 1.1 times of the least error computed up to now, the repetition must stop. As a consequence, the remainders 𝑌 ― 𝑌 in the 2nd step would cause more variance. Putting more repetitive steps would cause bias that has a tendency to be zero, when the variance continues enhancing. Therefore, it is not essential to address more than 2 steps. A technique for two-stage bias-modified bmQRF for correcting the forecast bias, as an alternative for the method of [20]. The 1st step of quantile regression forests method is created based on the learning data. The forecast error from the 1st step QRF take the place of the errors since the response property in the initial learning data. The novel learning data including the forecast errors as the response property is utilized for building the 2nd step QRF. The ultimate bias-modified amounts are computed as the forecast amount of the 1st step method subtract the forecast amount of the 2-nd step method. The method of the bmQRF in the interval of the forecast is described as follows: Stage 1: growing the 1st step QRF from the learning data L accompanied by response property Y.

Journal Pre-proof

Stage 2: attaining the forecasted quantile 𝑄𝛼(𝑋 = 𝑥) of x via the learning data. Approximation of the bias as the media of the forecasted values in the quantiles excluding the actual response of the input data, described as the following equation: (42)

𝐸 = 𝑄𝛼(𝑋 = 𝑥) ―𝑌

Stage 3: letting be 𝑋 = 𝑥𝑛𝑒𝑤 , utilize the 1st step QRF method for producing the quantiles and the interval [𝑄𝛼𝑙 (𝑋 = 𝑥𝑛𝑒𝑤), 𝑄𝛼ℎ(𝑋 = 𝑥𝑛𝑒𝑤)]. Stage 4: extending the learning data collection L using the bias errors 𝐸 as a novel response property for generating an expanded data collection 𝐿𝑒 = [ 𝐿, 𝐸]. growing the 2nd step QRF from 𝐿𝑒 including the answer property 𝐸. Utilize the 2nd step QRF for predicting the learning data and attaining a novel collection of errors 𝐸𝑛𝑒𝑤. Stage 5: the bias-modified quantiles are calculated as the following equations: [𝑄𝛼𝑙 𝑛𝑒𝑤, 𝑄𝛼ℎ 𝑛𝑒𝑤 ] = [𝑄𝛼 𝑙 (𝑋 = 𝑥𝑛𝑒𝑤) ― 𝐸𝑛𝑒𝑤,𝑄𝛼ℎ(𝑋 = 𝑥𝑛𝑒𝑤) ― 𝐸𝑛𝑒𝑤

(43)

For point forecast, the forecasted amounts are selected as 𝑄0.5. 5. Performance assessment Three probability-based assessments are chosen in this work to examine and contrast the accuracy of the suggested combination of QRF and MQR and BM. The first one is the enhanced Seasonal Naive (ISN) approach. The ISN set of trees is associated with demands (in the similar hour day) of the nearest day which has the equal kind of objective days, with respect to the TOD categorization [39]. The reason of this conduct is preventing utilization of workday consumption patterns as holidays in the ISN and the other way around. Next, the ISN probability-based tree sets are procured via maintaining this magnitude in every forecast quantile.

Journal Pre-proof

The next assessment, which is named quantile regression forest assessment (QRFA), uses the single-variable QRFs. This assessment does not require post-processes of the achieved outcomes. The whole of the information obtained before execution of prediction are utilized for training the QRF. Prediction of each load is done utilizing equations (1) to (4). To choose the attributes with highest information amongst LILs, ISLLs, LT and date parameters, an OST approach is taken. The procedure is similar to the method utilized for creating basic QRF model in the combination of QRF and proposed MQR with BM. The final assessment, which is called Quantile regression assessment (QRA), uses the singlevariable QR technique in each load, where further QRA predictor does not join the forecaster. QRA forecasters are chosen from the LIL, ISLL, LT and date parameters. The OST is utilized to extract the QRA model. 6. Mathematical practice The information regarding the demands in ISO-NE shown in [40] is utilized here to examine the suggested method and make comparison with other methods. This information is composed of eight separate areas which includes, but not limited to, American states of Vermont, Connecticut, Rhode Island, New Hampshire, Maine and Massachusetts. The first five are related to one load area and their loads are respectively shown by P1, P2, P3, P4, P5. However, Massachusetts is parted into P6, P7, P8. Moreover, two more parameters P9, P10, are taken into account, where P9 is the total load of Massachusetts and P10 is the aggregation of loads of all of the states. The information provides consumption in each hour and the Day light saving is modified in it. A weather station data is used where the temperature readings of dry bulb is measured at each hour.

Journal Pre-proof

This section utilizes 43848 data which is associated with temperature and consumption data from Jan 1 2012 to Dec 31 2016. Table 1 provides the statistical variables of the data. The first 48 month worth of data is utilized for the teaching processes and the last year’s is for examining the suggested method by making comparison with the assessment criteria. This section demonstrates outcome of some of the mathematical tests carried out to assess the suggested QRF and MQR. Having probability-based prediction with 99 forecasting quantiles in stages 0.01, …, 0.99, the majority of the concentration is on prediction of the next day load (k =24h). The prediction error parameters utilized to evaluate the proposed method is presented in the Appendix. Table 1: Consumption and Temperature Statistical Variables Consumption [GWh] State

Temperature [°C]

Standard Average

Standard Median average

deviation

median deviation

Vermont (P1, T1)

0.599

0.112

0.601

9.038

11.732

9.444

Connecticut (P2, T2)

3.438

0.738

3.410

11.01

10.794

11.111

Rhode Island (P3, T3)

0.908

0.212

0.901

11.355

10.088

11.667

New Hampshire (P4, T4)

1.282

0.264

1.301

8.837

11.276

8.889

Maine (P5, T5)

1.169

0.222

1.186

8.887

10.288

9.444

Massachusetts (P6, T6)

1.617

0.365

1.586

11.355

10.088

11.667

Massachusetts (P7, T7)

1.996

0.372

1.990

9.353

10.495

10.000

Massachusetts (P8, T8)

2.816

0.542

2.815

11.266

9.940

11.667

Journal Pre-proof

Figure 1 shows the time ranges utilized in the mathematical tests. Approximation of proposed QR is carried out in real time in every month of year 2016. The location-based sets used in here is composed of set one (i.e. states in the north: P5, P4, P1), set two (i.e. states in the north: P2, P3) and set three (i.e. different parts of Massachusetts: P6, P7, P8). MATLAB application is used to develop the predictors on computer with a core i3-6300 @ 3.8GHz dual-core CPU with 12GB of RAM. Table 2 shows the mean time taken for training the candidates and verify the models needed in order to generate the hour-based prediction of the loads during a year.

Table 2: Time Taken for Training and Assessment Method

Time taken [h]

QRFA

50.8

QRB

2.9

MQR using linear program

12.6

MQR using non-linear program

25.8

Journal Pre-proof

(a)

(b)

12 months

2016/12/31

1 months

...

36 months

48 months

12 months

2016/01/01

2012/01/01 Training data for the hybrid QRF and MQR Training data group for QRF

Examination data group for the combinatorial method

Training data group for MQR

Unutilized data group

Figure 1: Time Ranges Utilized in the Mathematical Tests a. prediction outcomes of the areas The summary of the candidates of the combination of MQR with BM and QRF and outcome of different method are presented in Table 3. In this Table “S” denotes selected candidate, “N” denotes not selected candidate and “I” denotes ignored candidates from the models. Table 3: The Input Forecaster Review for the Combination of MQR with BM and QRF

Journal Pre-proof

Candidate

Objective parameter

forecasters

P1h

P2 h

P3h

P4 h

P5h

P6 h

P7 h

P8h

P1h24 , T1h24

S

S

S

I

I

I

I

I

P1h168 , T1h168

S

S

S

I

I

I

I

I

P2h24 , T2h24

S

S

N

I

I

I

I

I

P2h168 , T2h168

S

S

N

I

I

I

I

I

P3h24 , T3h24

S

S

S

I

I

I

I

I

P3h168 , T3h168

S

S

S

I

I

I

I

I

P4h24 , T4h24

I

I

I

S

S

N

N

N

P4h168 , T4h168

I

I

I

S

S

N

N

N

P5h24 , T5h24

I

I

I

S

S

N

N

N

P5h168 , T5h168

I

I

I

S

S

N

N

N

P6h24 , T6h24

I

I

I

I

I

S

S

S

P6h168 , T6h168

I

I

I

I

I

S

S

S

P7h24 , T7h24

I

I

I

I

I

S

S

N

P7h168 , T7h168

I

I

I

I

I

S

S

N

P8h24 , T8h24

I

I

I

I

I

S

N

S

P8h168 , T8h168

I

I

I

I

I

S

N

S

(1 ) ˆ ( S ) PˆQRF ,1h ,, PQRF ,1h

S

S

S

I

I

I

I

I

(1 ) ˆ ( S ) PˆQRF ,2h ,, PQRF ,2h

S

S

S

I

I

I

I

I

(1 ) ˆ ( S ) PˆQRF ,3h ,, PQRF ,3h

S

S

S

I

I

I

I

I

(1 ) ˆ ( S ) PˆQRF ,4h ,, PQRF ,4h

I

I

I

S

S

I

I

I

Journal Pre-proof

(1 ) ˆ ( S ) PˆQRF ,5h ,, PQRF ,5h

I

I

I

S

S

I

I

I

(1 ) ˆ ( S ) PˆQRF ,6h ,, PQRF ,6h

I

I

I

I

I

S

S

S

(1 ) ˆ ( S ) PˆQRF ,7 h ,, PQRF ,7 h

I

I

I

I

I

S

S

S

(1 ) ˆ ( S ) PˆQRF ,8h ,, PQRF ,8h

I

I

I

I

I

S

S

S

TODh

S

S

S

S

S

S

S

S

DOM h

S

S

S

S

S

S

S

S

MOYh

S

S

S

S

S

S

S

S

KODh

S

S

S

S

S

S

S

S

Moreover, Table 4 presents the average of error indexes of P1, P2, …, P8. Using combination of QRF and MQR with BM in load set one leads to reduction of PL by 2.5% for P1, 3% for P2 and about 1% for P3 compared to the reference value, which is QRA. In addition, the Average Absolute Coverage Error (AACE) is used to verify the enhanced accuracy of prediction. The suggested approach leads to better prediction reliability in comparison to the three aforementioned benchmarks, i.e. the results of the suggested method yields two fifth smaller AACE index compared to QRA. Inescapably, the combination of MQR with BM and QRF has generally less sharpness compared to the reference values. Also more than 49% ISN is achieved. It is worth mentioning that since forecast periods of ISN fall into the point prediction, Forecast Period Normalized Mean Width (FPNMWs), Forecast Period Coverage Probability (FPCP) and AACE in the ISN predictors are insignificant. Same can be said regarding the load set two and three. The suggested hybrid method approximately decreases PL for P4, P5, P6, P7 and P4 by 3%, 2.5%, 6.5% 4.5% and 1% compared to best reference

Journal Pre-proof

method (QRA). The suggested method has AACE that is consistently 40% less than that of the reference method, hence this method has most trustable prediction. If fact, the FPCPs has less distance to the minimum value. However, this is accompanied with less sharpness, which means the FPNMWs of the proposed method is higher than QRFA’s. A number of prediction results are demonstrated in Figure 1. It shows the forecast period achieved by QRA, QRFA and combination of QRF and MQR for week one of examination year. Forecast period nominal values are 10%, 50%, 80% and 98%. It also shows the real consumption and ISN consumption (the ISN forecast periods fall into points of construction). The PL in the prediction of P1 throughout each month of the examination year is shown in Figure 2. The QRA, QRFA and the suggested method have almost identical curve, and all of them have the highest error in April. There is a local maximum in April for PL of ISN, hence, the aforementioned error can be justified. The suggested method, QRA, QRFA are the most optimum for seven, three and months, respectively. Also, the efficiency assessment of prediction of P1 throughout the examination interval is presented in Fig. 3. Table 4: Results of Next-Day Prediction for P1, P2, …, P8, which are Shown in Average form During the Examination Year Error criteria PL [GWh

PLn

AAC

FPCP(0.1

FPCP(0.9

FPNMW(0.1

FPNMW(0.9

E [%]

)

)

)

)

3.48

9.43

93.15

0.88

15.01

] QRF+MQR+ Loa

2.07 BM

1.78

Journal Pre-proof

d P1

ISN

4.08

3.48

QRA

2.12

1.81

6.99

11.78

86.09

0.86

16.31

QRFA

2.13

1.83

6.29

8.50

79.95

0.80

11.25

2.39

1.87

3.66

9.92

89.98

0.81

13.46

ISN

5.01

3.91

QRA

2.46

1.92

7.72

10.61

90.69

0.70

13.66

QRFA

2.65

2.06

8.36

8.22

79.05

0.75

9.84

1.35

2.26

3.91

9.76

91.35

0.89

14.45

ISN

2.80

4.67

QRA

1.35

2.26

6.13

8.96

93.21

0.75

17.78

QRFA

1.44

2.41

7.26

8.06

78.15

0.79

11.62

1.87

2.06

3.77

9.71

91.71

0.83

14.36

ISN

4.29

4.73

QRA

1.93

2.12

8.11

11.24

94.05

0.87

16.92

QRFA

2.09

2.30

7.10

7.97

75.38

0.83

11.26

2.92

1.80

3.45

9.92

91.95

0.77

12.36

9.18

10.50

93.97

0.64

12.68

QRF+MQR+ BM Loa d P2

QRF+MQR+ BM Loa d P3

QRF+MQR+ BM Loa d P5

QRF+MQR+ Loa

BM

d P6

ISN

6.36

3.93

QRA

3.11

1.93

Journal Pre-proof

QRFA

3.40

2.10

6.11

8.21

78.43

0.80

10.66

5.79

2.89

3.74

10.28

91.02

1.22

20.32

ISN

10.68

5.37

QRA

6.05

3.03

6.22

8.97

82.98

0.77

14.99

QRFA

6.27

3.14

11.37

7.11

70.11

0.85

11.59

4.49

1.60

3.66

9.59

91.82

0.69

12.03

ISN

9.68

3.44

QRA

4.52

1.58

9.69

11.25

97.35

0.61

12.38

QRFA

4.68

1.67

7.32

6.61

77.58

0.65

6.63

QRF+MQR+ BM Loa d P7

QRF+MQR+ BM Loa d P8

ISN forecast Real load Signal 98% PI

80 % PI 50% PI 20% PI

Load (MWh)

2000

500 1

Hour (h)

168

Figure 2: Analysis of P1 Prediction in Week One of the Examination Interval.

Journal Pre-proof

6 ISN QRA QRF+MQR QRFA

5

4

3

2

1

1

2

3

4

5

6

7

8

9

10

11

12

Figure 3: Efficiency Assessment of Prediction of P1 throughout the Examination Interval. b. Prediction outcomes of the summed loads Prediction of summed loads (i.e. P9 and P10) using the suggested method is carried out through aggregation of predicted value of every quantile of each load. To maintain the statistical constancy of the 99 variables, the quantiles are arranged from lowest to highest. Table 5 presents the mean of prediction error criteria for the summed loads. This led to contrasting enhancement outcome for the PL, which is just >1% for P10 and 1.5% for P9 in comparison with the QRA and QRFA. Nonetheless, QRFA has more reliability in comparison to other approaches. Moreover, since it has the highest FPNMWs, the suggested hybrid method has smaller sharpness. It can be suggested that simply summing the prediction via an arranged set of QRF and MQR with BM quantiles is a bad approach in this practice. Aggregation of loads requires more endeavor, for example establishing a sophisticated prediction framework. Table 5: Prediction Average of the Aggregated Loads through the Examination Year.

Journal Pre-proof

Error criteria PL [GWh] QRF+MQR+BM 11.23

AACE PLn

FPCP(0.1) FPCP(0.9) FPNMW(0.1) FPNMW(0.9) [%]

1.75 6.03

10.91

95.42

0.89

14.97

Load ISN

24.51

3.45

P9

QRA

11.41

1.77 8.63

10.94

92.83

0.69

13.50

QRFA

11.38

1.77 4.18

9.09

84.65

0.77

10.35

QRF+MQR+BM 21.23

1.54 8.79

12.90

96.62

0.86

14.50

Load ISN

49.95

3.61

P10

QRA

21.78

1.58 10.21

11.63

94.75

0.74

14.25

QRFA

21.36

1.54 6.03

10.32

89.25

0.76

10.41

c. Prediction outcomes using random categorization of loads To evaluate the necessity of suitable load categorization, loads P2 and P5 are put in a set in multivariable approach and the results are shown. As presented in Table 1, these are associated with Connecticut and Maine areas that have the highest distance from each other compared to other areas in the grid. The error criteria of these action is shown in Table 6. The enhancement in PL is this procedure is decreased to smaller than 1%, in contrast to the QRA which is significantly lower compared to 2.5 and 3 percent in location-based categorization. Furthermore, in this table, the proposed method is compared with other well-known forecasting approaches i.e., Bayesian model, quantile regression, and neural network methods. This significant difference in improvement proves the necessity of

Journal Pre-proof

proper categorization in creating a multivariable prediction framework; it should be noted that this necessity increases in case of working with a lot of loads. Table 6: Results of Next-Day Prediction for P2 and P8, which are shown in Average Form during the Examination Year Categorized Randomly Error criteria PL [GWh]

Load P1

Load P4

AACE PLn

FPCP(0.1)

FPCP(0.9)

FPNMW(0.1)

FPNMW(0.9)

3.93

9.39

91.75

0.88

14.38

[%]

QRF+MQR+BM

2.12

1.81

ISN

4.05

3.57

QRA

2.12

1.81

6.96

11.71

85.62

0.86

16.22

QRFA

2.13

1.82

6.26

8.44

79.31

0.80

11.18

Bayesian model

2.87

3.15

7.54

-

-

-

-

quantile regression

2.63

2.16

6.15

-

-

-

-

neural network

4.12

4.47

9.45

-

-

-

-

QRF+MQR+BM

8.93

1.73

4.44

9.64

92.85

0.73

12.67

ISN

13.21

3.83

QRA

5.97

1.74

8.89

11.12

92.91

0.69

12.41

QRFA

6.05

1.76

6.52

8.64

78.60

0.66

9.15

Bayesian model

9.14

2.15

6.87

-

-

-

-

quantile regression

8.67

1.78

6.14

-

-

-

-

neural network

13.55

3.52

9.54

-

-

-

-

7. Conclusion This work suggests a novel solution for the prominent interim probability-based demand prediction problem that satisfies the need the for high-performance prediction of a number of loads in the

Journal Pre-proof

transmission section which is practical in real-world. The suggested method in work is a hybrid one. Firstly, each QRF associated with a specific load is trained. Next, the QRF predictors, which are called forecasting quantiles, are adjusted and used in the multi-variable approach. The multivariable quantile regression can conduct multi-variable prediction for categorized loads. This approach is created using a limited linear program, which is able decrease the quantile cross. The MQR with BM technique is trained in real-time in a rolling-time situation when newer observational information is provided to be in sync with ever-varying state of the loads. This procedure leads to more flexible prediction. This suggested hybrid prediction framework is evaluated using the real-world information of ISONE. From this test network, eight area are chosen for assessment of the performance of the suggested work. The assessment results suggest that this approach leads to high-performance predictors with respect to the PL which is superior to the other reference methods as shown in Table 4 to 6 in comparison with other models. Due to the fact that the suggested method is developed to be practical in real world and have dynamic response, its enhancement do not need much power and more data input. References 1.

Azmoodeh, A., et al., Detecting crypto-ransomware in IoT networks based on energy consumption footprint. Journal of Ambient Intelligence and Humanized Computing, 2018. 9(4): p. 1141-1152.

2.

Tang, X., Zhang, D., Liu, T., Khajepour, A., Yu, H., & Wang, H. (2019). Research on the energy control of a dual-motor hybrid vehicle during engine start-stop process. Energy, 166, 1181-1193.

3.

He, Y., et al., Electricity consumption probability density forecasting method based on LASSOQuantile Regression Neural Network. Applied energy, 2019. 233: p. 565-575.

Journal Pre-proof

4.

Divina, F., et al., A Comparative Study of Time Series Forecasting Methods for Short Term Electric Energy Consumption Prediction in Smart Buildings. Energies, 2019. 12(10): p. 1934.

5.

Poczeta, K., E.I. Papageorgiou, and A. Yastrebov. Application of Fuzzy Cognitive Maps to MultiStep Ahead Prediction of Electricity Consumption. in 2018 Conference on Electrotechnology: Processes, Models, Control and Computer Science (EPMCCS). 2018. IEEE.

6.

Tang, L., et al., Long-term electricity consumption forecasting based on expert prediction and fuzzy Bayesian theory. Energy, 2019. 167: p. 1144-1154.

7.

Kim, T.-Y. and S.-B. Cho, Predicting Residential Energy Consumption using CNN-LSTM Neural Networks. Energy, 2019.

8.

Leicester, P.A., C.I. Goodier, and P.N. Rowley, Probabilistic analysis of solar photovoltaic selfconsumption using Bayesian network models. IET Renewable Power Generation, 2016. 10(4): p. 448-455.

9.

Barak, S. and S.S. Sadegh, Forecasting energy consumption using ensemble ARIMA–ANFIS hybrid algorithm. International Journal of Electrical Power & Energy Systems, 2016. 82: p. 92-104.

10.

Zhang, F., et al., Time series forecasting for building energy consumption using weighted Support Vector Regression with differential evolution optimization technique. Energy and Buildings, 2016. 126: p. 94-103.

11.

Taylor, J.W., Triple seasonal methods for short-term electricity demand forecasting. European Journal of Operational Research, 2010. 204(1): p. 139-152.

12.

Li, C., et al., Building energy consumption prediction: An extreme deep learning approach. Energies, 2017. 10(10): p. 1525.

13.

Hong, T. and S. Fan, Probabilistic electric load forecasting: A tutorial review. International Journal of Forecasting, 2016. 32(3): p. 914-938.

14.

Abdel-Basset, M., et al., Neutrosophic association rule mining algorithm for big data analysis. Symmetry, 2018. 10(4): p. 106.

Journal Pre-proof

15.

Xie, J. and T. Hong, Variable selection methods for probabilistic load forecasting: Empirical evidence from seven states of the united states. IEEE Transactions on Smart Grid, 2017. 9(6): p. 6039-6046.

16.

Vaysse, K. and P. Lagacherie, Using quantile regression forest to estimate uncertainty of digital soil mapping products. Geoderma, 2017. 291: p. 55-64.

17.

Yang, Z., L. Ce, and L. Lian, Electricity price forecasting by a hybrid model, combining wavelet transform, ARMA and kernel-based extreme learning machine methods. Applied Energy, 2017. 190: p. 291-305.

18.

Gallego-Castillo, C., et al., On-line quantile regression in the RKHS (Reproducing Kernel Hilbert Space) for operational probabilistic forecasting of wind power. Energy, 2016. 113: p. 355-365.

19.

Peters, A., Improved predictors. 2019.

20.

Breiman, L., Using adaptive bagging to debias regressions. 1999, Technical Report 547, Statistics Dept. UCB.

21.

Liaw, A. and M. Wiener, Classification and regression by randomForest. R news, 2002. 2(3): p. 18-22.

22.

Gonzalez, O., et al., Analyzing Monte Carlo simulation studies with classification and regression trees. Structural Equation Modeling: A Multidisciplinary Journal, 2018. 25(3): p. 403-413.

23.

Meinshausen, N., Quantile regression forests. Journal of Machine Learning Research, 2006. 7(Jun): p. 983-999.

24.

Sampson, C., Generalized Two-Sample t-Test for Box-Cox Transformations of the Sample Means. 2018.

25.

Stoppiglia, H., et al., Ranking a random feature for variable and feature selection. Journal of machine learning research, 2003. 3(Mar): p. 1399-1414.

26.

Tuv, E., et al., Feature selection with ensembles, artificial variables, and redundancy elimination. Journal of Machine Learning Research, 2009. 10(Jul): p. 1341-1366.

27.

Kursa, M.B., W.R. Rudnicki, and M.M.B. Kursa, Package ‘Boruta’. 2018.

Journal Pre-proof

28.

Nembrini, S., I.R. König, and M.N. Wright, The revival of the Gini importance? Bioinformatics, 2018. 34(21): p. 3711-3718.

29.

Xue, X., M. Yao, and Z. Wu, A novel ensemble-based wrapper method for feature selection using extreme learning machine and genetic algorithm. Knowledge and Information Systems, 2018. 57(2): p. 389-412.

30.

Lei, Y., et al., Learning‐based CBCT correction using alternating random forest based on auto‐context model. Medical physics, 2019. 46(2): p. 601-618.

31.

Aghajani, Gholamreza, and Noradin Ghadimi. "Multi-objective energy management in a microgrid." Energy Reports 4 (2018): 218-225.

32.

Liu, Yang, Wei Wang, and Noradin Ghadimi. "Electricity load forecasting by an improved forecast engine for building level consumers." Energy 139 (2017): 18-30.

33.

Gollou, Abbas Rahimi, and Noradin Ghadimi. "A new feature selection and hybrid forecast engine for day-ahead price forecasting of electricity markets." Journal of Intelligent & Fuzzy Systems 32.6 (2017): 4031-4045.

34.

Mirzapour, Farzaneh, et al. "A new prediction model of battery and wind-solar output in hybrid power system." Journal of Ambient Intelligence and Humanized Computing 10.1 (2019): 77-87.

35.

Hosseini Firouz, Mansour, and Noradin Ghadimi. "Optimal preventive maintenance policy for electric power distribution systems based on the fuzzy AHP methods." Complexity 21.6 (2016): 70-88.

36.

Samworth, R.J., Recent progress in log-concave density estimation. Statistical Science, 2018. 33(4): p. 493-509.

37.

Wilson, T.J., The Effects of Power Transformations on Consumer Expenditure Survey Data. 2018.

38.

Donoho, D. and A. Montanari, High dimensional robust m-estimation: Asymptotic variance via approximate message passing. Probability Theory and Related Fields, 2016. 166(3-4): p. 935-969.

39.

Bracale, A., et al., Short-term industrial reactive power forecasting. International Journal of Electrical Power & Energy Systems, 2019. 107: p. 177-185.

Journal Pre-proof

40.

Hong, T., J. Xie, and J. Black, Global energy forecasting competition 2017: Hierarchical probabilistic load forecasting. International Journal of Forecasting, 2019.

41.

Bracale, A., et al., Multivariate Quantile Regression for Short-Term Probabilistic Load Forecasting. IEEE Transactions on Power Systems, 2019.

42.

Ovcharov, E.Y., Proper scoring rules and Bregman divergence. Bernoulli, 2018. 24(1): p. 53-79.

Journal Pre-proof

Declaration of interests ☒ The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. ☐The authors declare the following financial interests/personal relationships which may be considered as potential competing interests:

Journal Pre-proof

1) Proposal of new cooperative prediction structure 2) Extracting the time-dependent arrangement in the consumption curve of each load 3) Application of new version of QR which uses bias correction in QRF 4) Contrasting the location-based and random categorization techniques