Applied Energy 264 (2020) 114636
Contents lists available at ScienceDirect
Applied Energy journal homepage: www.elsevier.com/locate/apenergy
Neural-network-based Lagrange multiplier selection for distributed demand response in smart grid☆
T
Guangchun Ruana, Haiwang Zhonga, , Jianxiao Wangb, Qing Xiaa, Chongqing Kanga ⁎
a b
State Key Laboratory of Power System, Department of Electrical Engineering, Tsinghua University, Beijing 100084, China School of Electrical and Electronic Engineering, North China Electric Power University, Beijing 100084, China
HIGHLIGHTS
Lagrange multiplier selection model is formulated. • AA neural-network-based two-stage algorithm is designed to solve the nonlinear and nonsmooth programming. • Some special designs of the neural network are proposed. • Price response feature is found to be nonlinear and temporally coupled. • ARTICLE INFO
ABSTRACT
Keywords: Distributed demand response Neural network Optimal step size Lagrange multiplier selection Convergence acceleration
As a general difficult problem, the slow convergence of existing distributed demand response methods greatly hinders the reliable applications in smart grid. To overcome this problem, this paper proposes a new distributed method, namely the neural-network-based Lagrange multiplier selection (NN-LMS), to prominently reduce the iterations and avoid an oscillation. The key improvement lies in the forecast strategy of a load serving entity (LSE), who applies a specially designed neural network to capture the users’ price response features. A novel Lagrange multiplier selection model, the NN-LMS model, is formulated to optimize the iterative step sizes. This complex model is then solved by a two-stage NN-LMS algorithm, containing the interval bisection method and improved sensitivity method in the first and second stages, respectively. In addition, the data selection, batch normalization and additional network are applied to boost the performance of neural networks. Case studies validate the optimality, improved convergence and numerical stability of the proposed method, and demonstrate its great potential in distributed demand response and other smart grid applications.
1. Introduction There is a widespread belief that a tremendous amount of distributed resources from the demand side will be integrated into the future power grid [1,2], so more flexible and intelligent operation modes are in urgent need [3,4]. Accordingly, the distributed demand response [5], as a substitution for the centralized dispatch, has played an important role in system scheduling, providing a general, reliable and efficient platform for the distributed resources. Intelligent control approaches in the distributed demand response are believed to further improve the system efficiency [6]. Data have become a potential resource for analyzing the demand
side behaviors [7]. This paper considers the intelligent decision strategy for a load serving entity (LSE) in a day-ahead retail market. The demand side features can be fully learned by analyzing historical data, which is rather beneficial for a fast convergence procedure [8]. The convergence acceleration is an important issue for all the distributed applications in smart grid. This paper focuses on this topic, and we intend to design an efficient update rule for Lagrange multipliers when applying the dual decomposition framework [9]. Mainstream studies consider this issue from two aspects [10,11]: improving the search direction, and selecting a proper iterative step size. Unlike the remarkable achievements in search direction designs, there are fewer studies discussing about the step size selection rules
☆ This work was supported in part by National Natural Science Foundation of China (No. 51777102, No. U1766212) and Beijing Natural Science Foundation (No. 3182017). ⁎ Corresponding author. E-mail addresses:
[email protected] (G. Ruan),
[email protected] (H. Zhong),
[email protected] (J. Wang),
[email protected] (Q. Xia),
[email protected] (C. Kang).
https://doi.org/10.1016/j.apenergy.2020.114636 Received 5 December 2019; Received in revised form 2 February 2020; Accepted 8 February 2020 0306-2619/ © 2020 Elsevier Ltd. All rights reserved.
Applied Energy 264 (2020) 114636
G. Ruan, et al.
[12,13]. Step sizes in distributed optimization are always determined by some simple rules or heuristic methods, which lack theoretical guidance and are sometimes not satisfactory [8]. According to the current progress, reference [13] studied five traditional and commonly used step size rules. The results showed that none could guarantee both efficiency and stability. There was another adaptive (dynamic) step size rule discussed in [14], where the authors designed an extended Polyak step size, but it relied on an accurate prediction of the final optimal solution. Reference [15] proposed a greedy iterative algorithm with an at-most-twice optimal value bound, and the heuristic step size rule was not yet general. A Lagrange multiplier optimal selection algorithm was proposed in [8] by sensitivity analysis, but its linear expression for the users’ responsive reaction was oversimplified. A similar idea was adopted in [16] where the responsive reactions were modeled by an exponential function with some prior assumptions. Reference [17] adjusted the step sizes by the slope of some historical P–D curves, where the simplified linear expression was applied. In [18], the control signal was updated by an improved consensus method, but the step size selection problem in the iteration formula still existed. Other studies with different perspectives, e.g., game theory [19,20], consensus theory [21], preventing peak rebounds [22], and minimizing social cost [23], always applied a subgradient algorithm with fixed step sizes. Compared with above researches, some intelligent methods were likely to show a better performance with flexible formulations. For example, reference [7] estimated the price response features by a datadriven method, while the distributed iterations relied on a subgradient algorithm with a fixed step size. And reference [24] applied a gravitational search algorithm, but the required information was unavailable in a distributed application. Machine learning models, e.g. neural network [25], are widely applied because of their strong nonlinear fitting ability. A predictive-based framework with several machine learning models was proposed in [26], but the convergence procedure might sometimes be slow. In [27], a neural network was directly applied to predict a clearing price without the guarantee of optimality. Similar tasks were completed for price responsive load modeling [28,29] and short-term load forecast considering demand response factor [30]. Combining analytical and intelligent methods together has shown potential to maintain both advantages. The early success of Hopfield network motivated several researches [31,32], but this network was currently abandoned because its training procedure was too time-consuming. Some other advanced neural networks appeared in recent researches, e.g. multi-layer perceptrons [33], self-organizing networks [34], chaotic neural networks [35] and radial basis neural networks [36]. Reference [37] embedded a neural network in the hybrid system modeling; however, it was a preliminary attempt only for the centralized optimizations. In [38], a neural network and a genetic algorithm were utilized together to solve a multi-objective optimization without any optimality guarantee. In demand response, reference [39] implemented the reinforcement learning models to decide dynamic prices for coordination between the customers, and these models needed a large number of training data and thus inapplicable for realworld applications. Reference [40] adopted a neural network to predict the optimal ON/OFF status of the home appliances. Then in [41], a neural network was applied to identify the effective period to participant in demand response projects. An optimal time-of-use pricing scheme was proposed in [42], where the combined model of a neural network and optimization was hard to solve reliably. It can be found that neural networks are successfully applied in various forecast tasks but their applications in optimization (especially distributed optimization) seems tough and still need improvements. According to above literature review, the combination of analytical and neural network models is promising to find a flexible, general and effective solution for distributed demand response. Within this paradigm, how to efficiently and reliably adopt the advanced machine learning techniques to improve the distributed algorithm is still challenging. This
paper will bridge the knowledge gap by combining the advantages of the neural networks and conventional distributed methods. In brief, this paper applies a neural network to improve the decision efficiency of a LSE by making full use of the available data. We find obvious convergence acceleration in the distributed optimization with this neural network embedded. Note that this idea is completely different from those in the existing researches. Also, it has a broad application prospect in smart grid to increase the computational efficiency and thus beneficial to increase the system performance. The proposed framework and method are general and can be easily extended for other applications with some modifications. The major contributions of this paper are listed below. 1. A neural-network-based Lagrange multiplier selection model (NNLMS model) is formulated to accelerate the distributed demand response procedure. This model can estimate a proper step size by analyzing the historical data. Rich numerical studies validate its practical value and efficiency to reduce the iterations and avoid an oscillation. The proposed method can also be handily extended for other smart grid applications. 2. The NN-LMS model involves a nonlinear and nonsmooth programming with the complex data-driven components. To reliably solve this model, a two-stage NN-LMS algorithm is developed, with an interval bisection method and an improved sensitivity method applied in the first and second stages, respectively. 3. An in-depth analysis of the price response features is presented to show the nonlinear and temporally coupled influences. Some special design measures are proposed, including the data selection, batch normalization and additional network, to capture above features. The rest of this paper is organized as follows: Section 2 proposes the system framework. Section 3 analyzes the price response forecast and special neural network designs. Then, in Section 4, the technical details of the NN-LMS model and algorithm are proposed. Case studies are available in Section 5. Section 6 concludes this paper. 2. Framework As shown in Fig. 1, this paper considers the interactions between a single load serving entity (LSE) and multiple users in a day-ahead retail market. Each user optimizes his/her power demand independently, while the LSE delivers the price signals to coordinate among all of them. The whole convergence can benefit from a better decision of the LSE with the proposed technical highlights marked yellow in Fig. 1. 2.1. Problem formulation Consider the following optimization (1) that maximizes the total welfare with several operation constraints and a system capacity limitation: N
T
max pit
DAP pit ) t
(Uit
(1a)
i =1 t=1
s. t. {pit ,
t}
i
(1b)
i
N
pit
CAPt
t
(1c)
i=1
where pit is the power demand of user i in period t. is the dayahead price. Uit is the utility function [43] with respect to pit . i denotes the operation constraint set for user i that is regarded as personal privacy. CAPt is the system capacity of a distribution network. The above model (1) is a general form of various demand response resources that have different formulations of Uit and i . Note that the subsequent analyses do not rely on specific forms of the operation DAP t
2
Applied Energy 264 (2020) 114636
G. Ruan, et al.
Fig. 1. Proposed system framework of the distributed demand response problem. This framework contains an interaction procedure between one LSE and multiple users, where the LSE updates and delivers price signals and users optimize their power demand independently. The iteration will terminate upon reaching a consensus. The highlighted parts in the red and bronze dotted boxes show the key improvements.
model, but to guarantee the optimality, we focus on the convex versions, e.g. [8,23,19,20,44]. Define a Lagrange function as follows by introducing the congestion price tCP [45] and total energy price t = tCP + tDAP . N
T
T
Uit + i =1 t=1
N
pit
t t=1
2.2.2. Neural network with special designs A neural network is utilized to learn from historical data in Section 3. Despite the general measures to increase prediction accuracy, this paper further proposes some special designs. 2.2.3. NN-LMS model and algorithm NN-LMS model is executed by the LSE to decide an improved update rule for the iterative prices. This model will search the optimal step size in each iteration of (3). See Section 4 for more technical details.
CAPt
i=1
(2)
+ H (pit ) +
where H (pit ) represents the items that relate to constraints (1b), and = t tDAP CAPt is a constant. The dual decomposition procedure can minimize the Lagrange function in a distributed manner with two steps:
3. Price response forecast via a neural network This section introduces the price response feature and forecast strategy, and the neural network with special designs will be discussed later. The content in this section is fundamental to understand the demand side features from a LSE’s perspective.
• Users update their power demand: set the price as a known constant, •
and determine the power demand for the next iteration by an individual optimization model, i.e., maximize their own welfare with constraints (1b). LSE updates the price: set the power demand as a fixed value, and update the price with the following formula in iteration k: t (k + 1) = max{ t (k )
(k )·gt (k ),
DAP } t
3.1. Price response feature Price response feature describes the reaction of price-incentivized users, who adjust their power demand when the external prices are changed. This feature can help understand the responsive behavior of the demand side. In this section, a more complex feature in multiple scheduling periods is considered, which can be defined by the following mapping:
(3)
where (k ) is a step size in iteration k. gt (k ) is the search direction, e.g. a subgradient. The main problem of existing distributed algorithms is the poor choice of the step size (k ) in (3), which may greatly slow down the convergence. The proposed method intends to improve the step size selection efficiency, but remains compatible with many direction-improving methods. Numerical explanations are provided later in the case study.
[ 1,
,
T
]
[ p1 ,
, pT ]
(4)
where pt = i pit is the total power demand. A typical figure of such price response feature is shown in Fig. 2. As shown above, there are two important properties of this price response feature:
2.2. Proposed framework and highlight
• Nonlinearity. The rate of change in Fig. 2 is not a constant, so the
The proposed system framework in Fig. 1 is special in the intelligent decision procedure of the LSE, and that improves a certain step of the whole distributed optimization. This framework can help significantly reduce the iterations and avoid an oscillation. Our basic idea is to increase the LSE’s knowledge of the users’ responsive behaviors with some historical datasets. We mark the special parts in the red and bronze dotted boxes of Fig. 1, and provide some detailed interpretations below.
3.2. Price response forecast
2.2.1. Historical data A large amount of historical data have become a potential resource. This paper considers the price response data, i.e., the iterative price delivered by the LSE and the users’ responsive power demand. These data are useful in describing the demand side features, which are defined as the price response features later in Section 3.
A forecast strategy, namely the price response forecast, is designed for the LSE to make a better decision by estimating the price response feature. It enables the LSE to consider the responsive reactions of users when making a decision. In the update procedure (3), this forecast will certainly lead to quite a different result when the demand side modeling changes from a fixed state to a flexible one pt ( t ) .
•
3
linear estimation of some well-known concepts, such as the sensitivity [8] and elasticity [46], are inconsistent with this property. Temporally coupled influence. This feature should be able to consider the cross-impact between different periods. Sensitivity [8], however, is often applied for a single period.
Applied Energy 264 (2020) 114636
G. Ruan, et al.
action is to build up a small-scale network with some selected data. A typical comparison is provided in the case study to illustrate this statement. The k-nearest neighbor search technique is applied to pick out the most relevant data. This algorithm will calculate the distance between all the price pairs and finally find the k closest ones from the current price. These data can then be split into training and test sets proportionally. 3.3.2. Batch normalization A poor data distribution has a significant impact on the learning quality of a neural network. Batch normalization [47] is an efficient technique that improves the performance and stability of a neural network by adjustment and scaling processes. It can automate the adjustment procedure and outperform hand-designed methods by sensitivity analysis [48]. This technique is utilized in all the layers to increase the prediction accuracy. 3.3.3. Additional network Although some small prediction errors are acceptable, large errors on several samples may severely mislead the decision. To avoid large errors, an additional network is proposed with the structure shown in Fig. 3. Within this structure, a tolerate rate is set as 5% , and any prediction error beyond the tolerance bound will be penalized with a large number M. In practice, if a neural network can hardly avoid large prediction errors, its loss function value will become very large. The recommended action in this situation is expanding the network scale.
Fig. 2. Price response feature of the total power demand in period 12 (11 AM to 12 PM). The curves on the top show the detailed responsive features, and the bar graphs below show the associated rate of change (reciprocal of the slope). Two situations and two methods are considered. Case (a) is generated by scanning the price in period 12, while case (b) scans the prices from period 8–16. The real price response is calculated by individual optimizations of the users, and the estimation results in grey color are based on the sensitivity analysis from [8]. The nonlinear curves show the complexity of the responsive features, and the differences between two cases verify the existence of the temporally coupled influence.
Technically speaking, the LSE will adopt a neural network to find a T f : RT+ R+ to fit (4) with high precision. We use pt ( t ) to denote such function f. Note that the neural network is proved to be a universal mapper with powerful ability in nonlinear and coupled relationship fitting, and thus suitable for modeling the price response feature.
4. Neural-network-based Lagrange multiplier selection This section is focused on the update procedure of the Lagrange multipliers, and our derivations show that the LSE can estimate the Karush-Kuhn-Tucker (KKT) conditions by making a price response forecast. The NN-LMS model for optimal multiplier selection is then formulated, and the solution method, the two-stage NN-LMS algorithm, is also discussed with details.
3.3. Special designs of the neural network Fully connected neural networks are well established and widely applied in many tasks. Based on such structure, this paper further proposes some special designs, and the main purpose is to make the neural network specifically efficient for the proposed problem. Fig. 3 shows a schematic for the special designs.
4.1. Estimation of the KKT conditions Given that all users make their optimal decisions, some of the KKT conditions of optimization (1) are automatically satisfied, i.e., / pit = 0 . We next discuss about the remaining partial derivative:
3.3.1. Training and test data selection Data quality is an important factor that affects the performance of a neural network. The predictive ability in a local area is quite important for the successive predictions in a convergence iteration. With this knowledge, the highly relevant data are carefully chosen for the training and test procedure to increase the prediction accuracy. Instead of training a large network with the entire dataset, the recommended
N t
= i=1 N
= i=1
dUit dpit · dpit d t
t i=1
dpit d t
N
+
pit
CAPt
i=1
N
dp
pit
N
+
· d it + t
pit
CAPt
i=1
(5)
The first part is proved to be zero [49], so we can get
Fig. 3. Special design schematic of a neural network with the batch normalization and additional network. Batch normalization technique is applied to improve the performance and stability, and the additional network provides excess penalty items to bound the prediction errors. Here, et + and et are the prediction errors beyond the upper and lower tolerance bounds respectively; at least one of them should be zero as definition. is the tolerance rate. M is a large penalty coefficient. ReLU is a commonly used activation function, and ReLU(x ) = max{x , 0} . 4
Applied Energy 264 (2020) 114636
G. Ruan, et al.
d d
a dynamic update rule for (k ) can be carefully designed with the guidance from (6). As analyzed previously, a rapid reduction in the excess power demand can contribute to a fast convergence. A novel model, namely the NN-LMS model, is thus proposed as follows:
N
= t
pit
CAPt
pt ( t )
CAPt
i=1
(6)
Eq. (6) shows the estimation of the KKT conditions, because all the KKT conditions will be satisfied if (6) equals zero and vice versa. Therefore, (6) can be regarded as a compact form of multiple KKT conditions. Eq. (6) already shows that the LSE can apply this estimation even without any detailed parameter of the users—the only necessary resource is the ability for the price response forecast. From a physical perspective, Eq. (6) can be interpreted as an excess power demand, i.e., the excess part of total power demand above the system capacity. During the interaction between the LSE and users, the excess power demand will iteratively decrease and finally disappear. This reduction in the excess power demand, therefore, can be regarded as a physical outcome of the convergence.
min
pt ( t (k + 1))
(k )
t
s. t. t (k + 1) = t (k
(k )
CAPt
1
(7a)
T
t
+ 1) = (k )
(k ) + DAP t
(k ) gt (k ) t
(k )
t T
k
T
(7b) (7c) (7d)
where T is a set of congestion times when the prices are t (k ) . (k ) and (k ) are the lower and upper bounds in iteration k. The objective function refers to the sum of excess power demand, which is used as a measure to calculate the distance from the optimal solution. A typical
Algorithm 1. Interval bisection method
Algorithm 2. Improved sensitivity method
4.2. NN-LMS model
selection procedure can be found in Fig. 4. Note that optimization (7) is a nonsmooth and nonlinear programming with a complex data-driven component pt ( t (k + 1)) . A derivative-free solution method is of interest because the derivative of
The basic idea to accelerate the convergence is based on a deep understanding of the iteration procedure. By revisiting the formula (3), 5
Applied Energy 264 (2020) 114636
G. Ruan, et al.
objective function is unavailable.
iterative information from the interval bisection method (stage 1). This procedure can capture the convergence feature of stage 1 and maintain its fast convergence speed.
4.3. Two-stage NN-LMS algorithm
4.3.3. Two-stage optimization procedure The overall procedure is executed as follows. The stage 1 method will run for a few iterations until the transition criteria is met; then enter the stage 2, and the iteration series of stage 1 is one of the inputs of stage 2. The transition criteria between two stages can be designed by judging the local monotonicity of the sum of excess power demand: when the first S (e.g., S = 4 ) points increase monotonously (should decrease if no prediction error exists), transform to the next stage. Fig. 5 shows the overall flowchart of the NN-LMS algorithm. This algorithm will terminate in iteration k when the multiplier difference between two consecutive iterations is smaller than a given tiny value , shown as
A two-stage and derivative-free algorithm, namely the two-stage NNLMS algorithm, is proposed to efficiently solve optimization (7). The interval bisection method is applied first, and then the improved sensitivity method. 4.3.1. Stage 1. Interval bisection method Let us focus on the changing trend of above objective function (7a). In the beginning, when step sizes increase, the excess power demand will decline because of power demand shifting. When step sizes are large and becoming larger, the excess power demand will increase because the total consumption is already below the capacity limit. The analysis of the middle part is complex, especially when the prediction errors are considered. Our practice, however, shows that the fluctuation can be smooth. The interval bisection method is suitable for above features, and it can efficiently output a suitable choice in several loops. The pseudo code is shown in Algorithm 1.
t (k
+ 1)
t (k ) 2
<
(8)
Let us discuss about the convergence issue. Note that the step sizes chosen in above stages are bounded by (k ) and (k ) , so when they are nonsummable diminishing series, the squeeze theorem will guarantee that the chosen step sizes (k ) also follow the nonsummable diminishing rule. With this feature, the NN-LMS algorithm is guaranteed to converge to the optimal value [10].
4.3.2. Stage 2. Improved sensitivity method The improved sensitivity method is inspired by [8], but the sensitivity is now updated by analyzing the iterative information. This stage is motivated by the fact that the previous method in stage 1 may show poor performance after a few iterations. The improved sensitivity method, in this situation, still remains robust and efficient. To properly estimate the sensitivity, we should take care of the influence of the power demand shifting. To do this, a period for the sensitivity prediction is selected by fully assessing the price and power demand change of each period. The pseudocode is shown in Algorithm 2. Note that the improved sensitivity method (stage 2) relies on the
5. Case study 5.1. General setup 5.1.1. Test systems Three test systems are considered in this case study. The first test system applied the operation model and data from [8] with a slight modification. Another set of parameters from [23] is adopted for the second test system. Furthermore, a real-world microgrid in Houston, Texas is formulated as the third system in the last subsection. These three test systems can represent and cover various distributed demand response applications. The hourly day-ahead prices from the Pennsylvania-New JerseyMaryland (PJM) market are applied [50] for the first and second test systems. The historical data are simulated with the prices from Oct. 1st, 2017 to Oct. 29th, 2018, so the whole dataset contains 27400 pairs of data ( t , pt ) . The demand response applications are implemented on Oct. 30th and Nov. 15th, 2018. Similarly, the last test system uses the day-ahead prices from the Electric Reliability Council of Texas (ERCOT) market [51]. The applications are implemented on Dec. 20th, 2017, and the past one-year historical dataset is constructed for training. 5.1.2. Solution methods
• Subgradient algorithm with a fixed step size. The step size para• • • • •
Fig. 4. Typical graph of the sum of excess power demand and the different step sizes’ impact on the total power demand. Three step sizes are selected for illustration. Subfigure (a) shows the sum of excess power demand with respect to different step sizes. The optimal step size is selected by finding the lowest point. Subfigure (b) shows the total power demand situations for different step sizes. The optimal step size performs the best in reducing the excess power demand without a severe rebound. 6
meter is scanned over a typical range [10] and then finally get [SGfix] (k ) = 0.03. Subgradient algorithm with a diminishing step size. Consider two commonly used forms in [10], i.e., [SG-dmsh1] (k ) = 0.2k 0.5 and [SG-dmsh2] (k ) = 0.5k 1. LMOS algorithm from [8]. The sensitivity estimation is supposed to be accurate. Nesterov’s momentum algorithm from [52] with a same step size as above: [NM] (k ) = 0.03. Alternating direction method of multipliers algorithm from [53] with a same step size as above: [ADMM] (k ) = 0.03. NN-LMS algorithm in this paper (Fig. 5) with the following parameters: (k ) = 0.01k 0.1, (k ) = 0.4k 0.1, and tol = 0.01. When considering different search directions, various versions can be formulated as NN-LMS(SG), NN-LMS(NM), and NN-LMS(ADMM).
Applied Energy 264 (2020) 114636
G. Ruan, et al.
Fig. 5. Flowchart of the NN-LMS algorithm. The left and middle parts show the internal decision procedure of the LSE, while the right part shows the external interaction with all users. Within this flowchart, a neural network is trained offline and applied online.
Fig. 6. Optimal solution of the central optimization and the NN-LMS(SG) algorithm. Operation model is from [8]. (a) shows the central optimization result. (b) Shows the iteration procedure of the NN-LMS(SG) algorithm. This procedure illustrates the iterative elimination of the excess power demand. The same final convergence results verify the accuracy of the NN-LMS(SG) algorithm.
Above algorithm parameters are properly scaled according to the system sizes if needed.
settings from [23] in the second test system, and assess the methods with different search directions. Three groups of methods are considered in Fig. 8, they are SG-fix and NN-LMS(SG), NM and NN-LMS (NM), ADMM and NN-LMS(ADMM). Simulation results indicate that the NN-LMS algorithms outperform the origin ones in all cases, and save 45.7% iteration steps for average. This comparison truly verify that NNLMS, as an algorithm for step size selection, can combine with various direction-improving methods for further convergence benefits.
5.2. Analysis of the convergence performance Consider the first test system on Oct. 30th, 2018, and an operation model in [8] is adopted. The optimal total welfare is 21738.52$ according to the central optimization (1). Our simulations verify that all the distributed methods reach this optimum. Fig. 6(a) shows the optimal power demand and prices. In this figure, the users reduce their demand in period 7 and 8 according to the total prices. The congestion prices reflect the shortage of system capacity and they will increase only when the congestion happens. Fig. 6(b) shows the reduction in excess power demand and a demand shifting phenomena. One can observe that the result in iteration 8 is fairly similar to the optimal state, showing a fast convergence in the early stage. Detailed analyses are shown in Fig. 7. NN-LMS(SG) has fewer iteration steps (23 iterations, 63% reduction on average), a faster convergence speed and smoother iterations without an oscillation. This advantage is significant especially in the first few iterations, because the estimated step sizes successfully avoid a few conservative iterations. Fig. 7(a) also illustrates the transition process between two stages, and the stage 2 method approximately maintains a same speed as the previous stage. Consider a different day-ahead price on Nov. 15th, 2018 for the case scanning. The case scales vary from 100 to 1000 [8], and the convergence results are shown in Table 1. NN-LMS(SG) still maintains the best performance with an average iteration reduction of 74.7%. Note that NN-LMS(SG) has also shown a good scalability characteristic, because the neural network scale is unrelated to the case scale—the estimated formula (4) only depends on the number of time slots. In order to test the generality of proposed algorithm, we apply other
5.3. Analysis of the neural network configurations Table 2 shows a comparison on some configurations of the network structures and activation functions. In Table 2, the loss function values before and after the slash can verify the effectiveness of batch normalization. Remarkable improvements are observed in most cases, especially when using ReLU as an activation function. A difference in the network structures often has different impacts on the final performance, e.g., 24-24-24 is the optimal structure when using ReLU. Also, this ReLU function shows great advantages in terms of a low loss value and the short training time. After above analyses, the best configuration is a 24-24-24 network with the batch normalization and ReLU function. Fig. 9 presents a comparison on the optimal step size selection results in the first test system when considering the network scales and prediction errors. Fig. 9(a) shows the step sizes selected by different networks, and a small-scale network performs much better with fewer iteration steps. Here, the configurations of the small-scale network are already discussed above, and the large-scale network is a 24-96-96-24 network with the batch normalization and ReLU. Additionally, this large-scale network is trained with the whole data set, while the smallscale network uses the k-nearest neighbor technique to select 1770 relevant data for its training and test procedure. 7
Applied Energy 264 (2020) 114636
G. Ruan, et al.
Fig. 8. Comparison of different methods and their improved versions with NNLMS algorithm. The improved versions outperform the origin ones in all cases, showing that NN-LMS algorithm can combine with other direction-improving methods and lead to convergence benefit.
The last test system is fully analyzed to verify the numerical stability and potential value of proposed method in a real-world application. Typically, we carry out a series of results on a microgrid whose parameters are from some real-world experiments and surveys in Houston, Texas. This microgrid contains 16 commercial buildings with different typical hourly load profiles from the Office of Energy Efficiency and Renewable Energy (EERE) [54]. Each building owner has a minimum daily consumption plan, as its typical load profile shows, but he/she can
modify at most 20% consumption each hour for feasibility. The typical monthly purchase plans [55] and monthly electricity consumption data [56] in Houston are collected to derive the utility functions by a quadratic regression [8]. The system is working on Dec. 20th, 2017, and the day-ahead price is obtained from ERCOT [51] as mentioned before. With a system capacity of 5 MW, this microgrid needs to manage the congestion occured from 13PM to 17PM. Our major focus is on the numerical stability of NN-LMS(SG) when its upper bound (k ) , lower bound (k ) and neural network structure are fluctuated for various scenarios. We consider four structures and ± 30% fluctuations of the bounds (i.e. large upper bound means 1.3 times original one, and small lower bound for 0.7 times). A comparison on the convergence performance of different settings is shown in Table 3. Two observations can be found from Table 3. First, the upper bounds seem to have a greater impact than the lower bound. A large upper bound and a small lower bound often mean more flexibility in step size selection, and one can find that a large upper bound accelerates the convergence a lot. In this case, SG-fix is regarded as a benchmark, which converges in 82 iterations. All the convergence results, even with a small upper bound or a large lower bound, show a faster convergence performance than the benchmark, and achieve a 74.9% iteration reduction for average. This is an evidence that the proposed NN-LMS method can robustly maintain a good performance with different configurations. Second, not surprisingly, a better neural network will lead to a faster convergence. We train several neural networks with different prediction accuracy, and all of they perform well enough in the test. The worst situation in Table 3 still succeeds to converge in 43 iterations, 48.2% less steps than the benchmark. Above observations and results provide a strong evidence that the proposed framework and method are suitable to support a reliable and efficient real-world implementation. With the proper configurations, the NN-LMS method can converge in around 10–20 iterations, which only cause very low computational and communication burdens.
Table 1 Scan results of the 100–1000 users cases.
Table 2 Comparison between different neural network settings.
Fig. 7. Comparison of the iteration procedure for the discussed methods. (a) Shows the convergence speed of the different methods. The NN-LMS algorithm converges in only 23 iterations: the least of all. (b) Shows the first 23 iterations of the total power demand in period 5. The NN-LMS algorithm is superior in terms of fast convergence and smooth iteration procedure.
Figs. 9(b) and (c) have exhibited some curves of the estimated sum of excess power demand at different iterations. In iteration 6, as Fig. 9(b) shows, both networks provide a high prediction precision, and the prediction error only has a limited impact on some areas. The latter situation of iteration 10, however, appears much different. In Fig. 9(c), both networks fail to capture the monotonicity feature. The transition criteria are then satisfied, and the stage 2 is executing until converged. By means of a two-stage procedure, the influence of a large prediction error can be properly managed. 5.4. Analysis of a real-world application
Case
100
250
400
550
700
850
1000
Structure
ReLU
Sigmoid
Tanh
SG-fix SG-dmsh1 SG-dmsh2 LMOS NN-LMS(SG)
136 185 161 63 25
164 177 196 61 32
171 178 215 66 33
169 172 187 60 35
170 188 194 61 34
177 168 183 65 28
183 175 186 71 30
24–12-24 24–24-24 24–36-24 24–48-24
96.7/2457.4 9.3/1107.9 14.1/996.1 10.8/961.2
1887.2/4924.7 2144.1/3513.1 1395.2/3291.5 2594.9/4542.6
3040.1/7472.2 4022.3/7596.9 6983.4/7668.1 6474.5/7484.7
Note: the loss function values are shown above. The numbers before and after the slash are from the neural network with and without batch normalization, respectively.
Note: the iteration steps are shown above. These cases are generated by a random generation technique, as applied in [8]. 8
Applied Energy 264 (2020) 114636
G. Ruan, et al.
Fig. 9. Step size selection results with different neural network scales and prediction errors. Subfigure (a) shows the selected step sizes of large- and small-scale networks. Subfigure (b) and (c) show the step size selection procedure at iterations 6 and 10, respectively. It can be observed from (b) that a small-scale network performs better in estimating an ideal curve, and the estimated optimal step size is close to the optimal one. However, in the latter iteration, the neural networks fail to achieve a good estimation, and the NN-LMS algorithm enters stage 2. These transition points are also shown in (a).
review & editing. Chongqing Kang: Supervision, Writing - review & editing.
Table 3 Numerical stability assessment for NN-LMS(SG) in a real-world application. Structures
24-12-24 24-24-24 24-36-24 24-48-24
Large UB
References
Small UB
Large LB
Small LB
Large LB
Small LB
30 11 12 12
23 12 15 12
37 20 20 21
43 20 21 20
[1] Fang X, Misra S, Xue GL, Yang DJ. Smart grid - the new and improved power grid: a survey. IEEE Commun Surv Tutorials 2012;14(4):944–80. https://doi.org/10.1109/ Surv.2011.101911.00087. [2] Wang Q, Zhang CY, Ding Y, Xydis G, Wang JH, Ostergaard J. Review of real-time electricity markets for integrating distributed energy resources and demand response. Appl Energy 2015;138:695–706. https://doi.org/10.1016/j.apenergy.2014. 10.048. [3] Aghaei J, Alizadeh MI. Demand response in smart electricity grids equipped with renewable energy sources: A review. Renew Sustain Energy Rev 2013;18:64–72. https://doi.org/10.1016/j.rser.2012.09.019. [4] Vardakas JS, Zorba N, Verikoukis CV. Power demand control scenarios for smart grid applications with finite number of appliances. Appl Energy 2016;162:83–98. https://doi.org/10.1016/j.apenergy.2015.10.008. [5] Palensky P, Dietrich D. Demand side management: demand response, intelligent energy systems, and smart loads. IEEE Trans Industr Inf 2011;7(3):381–8. https:// doi.org/10.1109/tii.2011.2158841. [6] O’Connell N, Pinson P, Madsen H, O’Malley M. Benefits and challenges of electrical demand response: a critical review. Renew Sustain Energy Rev 2014;39:686–99. https://doi.org/10.1016/j.rser.2014.07.098. [7] Lu T, Wang Z, Wang J, Ai Q, Wang C. A data-driven stackelberg market strategy for demand response-enabled distribution systems. IEEE Trans Smart Grid 2018:1. https://doi.org/10.1109/tsg.2018.2795007. [8] Wang J, Zhong H, Lai X, Xia Q, Shu C, Kang C. Distributed real-time demand response based on lagrangian multiplier optimal selection approach. Appl Energy 2017;190:949–59. https://doi.org/10.1016/j.apenergy.2016.12.147. [9] Nedic A, Ozdaglar A. Distributed subgradient methods for multi-agent optimization. IEEE Trans Autom Control 2009;54(1):48–61. https://doi.org/10.1109/Tac.2008. 2009515. [10] Boyd S. Subgradient methods, web.stanford.edu/class/ee364b/lectures [Online]. Stanford University; 2014. [11] Molzahn DK, Dorfler F, Sandberg H, Low SH, Chakrabarti S, Baldick R, et al. A survey of distributed optimization and control algorithms for electric power systems. IEEE Trans Smart Grid 2017;8(6):2941–62. https://doi.org/10.1109/Tsg. 2017.2720471. [12] Yuan Y-X. Step-sizes for the gradient method, AMS IP Studies in Advanced Mathematics. [13] Han M. Computational study of the step size parameter of the subgradient optimization method. Sweden: School of Technology and Business Studies, Dalarna Unversity; 2013. [14] Yuan GL, Zhang MJ. A three-terms Polak-Ribiere-Polyak conjugate gradient algorithm for large-scale nonlinear equations. J Comput Appl Math 2015;286:186–95. https://doi.org/10.1016/j.cam.2015.03.014. [15] Chavali P, Yang P, Nehorai A. A distributed algorithm of appliance scheduling for home energy management system. IEEE Trans Smart Grid 2014;5(1):282–90. https://doi.org/10.1109/tsg.2013.2291003. [16] Wei W, Wang D, Jia H, Wang C, Zhang Y, Fan M. Hierarchical and distributed demand response control strategy for thermostatically controlled appliances in smart grid. J Modern Power Syst Clean Energy 2017;5(1):30–42. https://doi.org/ 10.1007/s40565-016-0255-y. [17] Liu F, Duan S, Liu F, Liu B, Kang Y. A variable step size inc mppt method for pv systems. IEEE Trans Industr Electron 2008;55(7):2622–8. https://doi.org/10.1109/ Tie.2008.920550. [18] Xiong YQ, Wang B, Chu CC, Gadh R. Vehicle grid integration for demand response with mixture user model and decentralized optimization. Appl Energy
Note: the iteration steps are shown above. UB means upper bound of step size (k ) , LB means lower bound (k ) . Large and small bounds are 1.3 and 0.7 times of the original ones respectively.
6. Conclusion To cope with the coordination between tremendous amount of distributed resources in smart grid, some efficient dispatch and operation frameworks are showing great potentials nowadays. This paper proposes a novel framework for convergence acceleration of distributed demand response. The neural-network-based Lagrange multiplier selection model and algorithm are formulated later, and the price response feature is carefully modeled by a neural network with special designs. Various case studies have validated the computational efficiency of proposed method, and a real-world application in Houston also shows the potential practical value. This paper provides a series of convincing examples that neural networks are not only powerful in prediction tasks, but also helpful to improve the distributed optimization. This new insight is also beneficial for a wide range of smart grid applications. In a smart grid with rich data sources, the proposed framework and method will show a great inspiration for other smart grid applications, e.g. distributed generation control, multi-area economic dispatch. Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. CRediT authorship contribution statement Guangchun Ruan: Methodology, Software, Data curation, Visualization, Writing - original draft, Writing - review & editing. Haiwang Zhong: Resources, Methodology, Writing - original draft, Writing - review & editing. Jianxiao Wang: Resources, Writing - original draft, Writing - review & editing. Qing Xia: Supervision, Writing 9
Applied Energy 264 (2020) 114636
G. Ruan, et al.
544–8. https://doi.org/10.1109/PCT.2007.4538375. [37] Baek S, Park J., Venayagamoorthy GK. Power system control with an embedded neural network in hybrid system modeling. In: Conference record of the 2006 IEEE industry applications conference forty-first IAS annual meeting, vol. 2; 2006. p. 650–7. doi: 10.1109/IAS.2006.256595. [38] Nabavi-Pelesaraei A, Shaker-Koohi S, Dehpour MB. Modeling and optimization of energy inputs and greenhouse gas emissions for eggplant production using artificial neural network and multi-objective genetic algorithm. Int J Adv Biol Biomed Res 2013;1(11):1478–89. [39] Lu RZ, Hong SH, Zhang XF. A dynamic pricing demand response algorithm for smart grid: reinforcement learning approach. Appl Energy 2018;220:220–30. https://doi.org/10.1016/j.apenergy.2018.03.072. [40] Ahmed MS, Mohamed A, Shareef H, Homod RZ, Ali JA. Artificial neural network based controller for home energy management considering demand response events. 2016 international conference on advances in electrical, electronic and systems engineering 2016. p. 506–9. https://doi.org/10.1109/ICAEES.2016. 7888097. [41] Kamruzzaman M, Benidris M, Commuri S. An artificial neural network based approach to electric demand response implementation. In: 2018 North American power symposium; 2018. p. 1–5. doi: 10.1109/NAPS.2018.8600595. [42] Holtschneider T, Erlich I. Optimization of electricity pricing considering neural network based model of consumers’ demand response. In: 2013 IEEE computational intelligence applications in smart grid; 2013. p. 154–60. doi: 10.1109/CIASG.2013. 6611512. [43] Houthakker HS. Revealed preference and the utility function. Economica 1950;17(66):159–74. https://doi.org/10.2307/2549382. [44] Mohsenian-Rad AH, Leon-Garcia A. Optimal residential load control with price prediction in real-time electricity pricing environments. IEEE Trans Smart Grid 2010;1(2):120–33. https://doi.org/10.1109/Tsg.2010.2055903. [45] Fan Z. A distributed demand response algorithm and its application to phev charging in smart grids. IEEE Trans Smart Grid 2012;3(3):1280–90. https://doi.org/10. 1109/tsg.2012.2185075. [46] Su CL, Kirschen D. Quantifying the effect of demand response on electricity markets. IEEE Trans Power Syst 2009;24(3):1199–207. https://doi.org/10.1109/ Tpwrs.2009.2023259. [47] Ioffe S, Szegedy C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International conference on machine learning. [48] Sadeghzadeh A, Ostadkelayeh MY, Nabavi-Pelesaraei A. Modeling and sensitivity analysis of environmental impacts for eggplant production using artificial neural networks. Biol Forum 2015;7(1):375–81. [49] Fiacco AV, Ishizuka Y. Sensitivity and stability analysis for nonlinear programming. Ann Oper Res 1990;27(1):215–35. https://doi.org/10.1007/BF02055196. [50] PJM Data Miner 2. Day-ahead hourly lmps, Available: dataminer2; 2018. pjm.com/ feed/da_hrl_lmps [Online]. [51] Electric Reliability Council of Texas, Historical DAM load zone and hub prices, Available: www.ercot.com/mktinfo/prices [Online], 2017. [52] Sutskever I, Martens J, Dahl G, Hinton G. On the importance of initialization and momentum in deep learning. In: International conference on machine learning. [53] Stephen B, Neal P, Eric C, Borja P, Jonathan E. Distributed optimization and statistical learning via the alternating direction method of multipliers, now Publishers Inc.; 2011. p. 1. [54] Office of Energy Efficiency and Renewable Energy, Commercial and residential hourly load profiles for all TMY3 locations in the United States; 2014. Available: openei.org/datasets/files/961/pub [Online]. [55] Public Utility Commission of Texas, Monthly retail electric service bill comparison archive; 2017. Available: www.puc.texas.gov/industry/electric/rates/RESbill/ RESbillarc.aspx [Online]. [56] Public Utility Commission of Texas, Report cards on retail competition and summary of market share data; 2017. Available: www.puc.texas.gov/industry/electric/ reports/RptCard/Default.aspx [Online].
2018;231:481–93. https://doi.org/10.1016/j.apenergy.2018.09.139. [19] Mohsenian-Rad A-H, Wong VWS, Jatskevich J, Schober R, Leon-Garcia A. Autonomous demand-side management based on game-theoretic energy consumption scheduling for the future smart grid. IEEE Trans Smart Grid 2010;1(3):320–31. https://doi.org/10.1109/tsg.2010.2089069. [20] Maharjan S, Zhu Q, Zhang Y, Gjessing S, Basar T. Dependable demand response management in the smart grid: a Stackelberg game approach. IEEE Trans Smart Grid 2013;4(1):120–32. https://doi.org/10.1109/tsg.2012.2223766. [21] Xu YL, Zhang W, Liu WX, Wang X, Ferrese F, Zang CZ, et al. Distributed subgradient-based coordination of multiple renewable generators in a microgrid. IEEE Trans Power Syst 2014;29(1):23–33. https://doi.org/10.1109/tpwrs.2013. 2281038. [22] Safdarian A, Fotuhi-Firuzabad M, Lehtonen M. A distributed algorithm for managing residential demand response in smart grids. IEEE Trans Industr Inf 2014;10(4):2385–93. https://doi.org/10.1109/tii.2014.2316639. [23] Deng RL, Xiao GX, Lu RX, Chen JM. Fast distributed demand response with spatially and temporally coupled constraints in smart grid. IEEE Trans Industr Inf 2015;11(6):1597–606. https://doi.org/10.1109/tii.2015.2408455. [24] Marzband M, Ghadimi M, Sumper A, Dominguez-Garcia JL. Experimental validation of a real-time energy management system using multi-period gravitational search algorithm for microgrids in islanded mode. Appl Energy 2014;128:164–74. https://doi.org/10.1016/j.apenergy.2014.04.056. [25] Bandbafha HH, Safarzadeh D, Ahmadi E, Nabavi-Pelesaraei A. Modeling output energy and greenhouse gas emissions of dairy farms using adaptive neural fuzzy inference system. Agric Commun 2016;4(2):14–23. [26] Pallonetto F, De Rosa M, Milano F, Finn DP. Demand response algorithms for smartgrid ready residential buildings using machine learning models. Appl Energy 2019;239:1265–82. https://doi.org/10.1016/j.apenergy.2019.02.020. [27] Gelazanskas L, Gamage KAA. Neural network based real-time pricing in demand side management for future smart grid. In: 7th IET international conference on power electronics, machines and drives. [28] Paterakis NG, Tascikaraoglu A, Erdinc O, Bakirtzis AG, Catalao JPS. Assessment of demand-response-driven load pattern elasticity using a combined approach for smart households. IEEE Trans Industr Inf 2016;12(4):1529–39. https://doi.org/10. 1109/Tii.2016.2585122. [29] Xu FY, Wang X, Lai LL, Lai CS. Agent-based modeling and neural network for residential customer demand response. In: 2013 IEEE international conference on systems, man, and cybernetics; 2013. p. 1312–6. doi: 10.1109/Smc.2013.227. [30] Liu ZJ, Xiao N, Wang X, Xu H. Elman neural network model for short term load forecasting based on improved demand response factor. In: 2017 7th international conference on power electronics systems and applications - smart mobility, power transfer and security; 2017. p. 1–5. doi: 10.1109/PESA.2017.8277754. [31] Schaller HN. Problem solving by global optimization: the rolling-stone neural network. In: Proceedings of 1993 international conference on neural networks, vol. 2, 1993, p. 1481–4. doi: 10.1109/IJCNN.1993.716825. [32] Sakurai K, Nishimura K, Hayashi H. A practical method based on structural neural networks to optimize power system operation. In: Proceedings of 1993 international conference on neural networks, vol. 1; 1993. p. 451–4. doi: 10.1109/IJCNN.1993. 713952. [33] Nabavi-Pelesaraei A, Abdi R, Rafiee S. Applying artificial neural networks and multi-objective genetic algorithm to modeling and optimization of energy inputs and greenhouse gas emissions for peanut production. Int J Biosci 2014;4(7):170–83. [34] Wu M, Rastgoufard P. Optimum decision by artificial neural networks for reactive power control equipment to enhance power system stability and security performance. In: IEEE power engineering society general meeting, vol. 2; 2004. p. 2120–5 doi: 10.1109/PES.2004.1373257. [35] Xue W. Chaotic artificial neural network in reactive power optimization of distribution network. CICED 2010 proceedings. 2010. p. 1–4. [36] Paucar BC, Ortiz JLR, Collazos KSL, Leite LC, Pinto JOP. Power operation optimization of photovoltaic stand alone system with variable loads using fuzzy voltage estimator and neural network controller. 2007 IEEE Lausanne Power Tech 2007. p.
10