A distributed stochastic gradient algorithm for economic dispatch over directed network with communication delays

A distributed stochastic gradient algorithm for economic dispatch over directed network with communication delays

Electrical Power and Energy Systems 110 (2019) 759–771 Contents lists available at ScienceDirect Electrical Power and Energy Systems journal homepag...

4MB Sizes 0 Downloads 39 Views

Electrical Power and Energy Systems 110 (2019) 759–771

Contents lists available at ScienceDirect

Electrical Power and Energy Systems journal homepage: www.elsevier.com/locate/ijepes

A distributed stochastic gradient algorithm for economic dispatch over directed network with communication delays

T

Hao Zhanga, Huaqing Lia, , Yifan Zhua, Zheng Wanga, Dawen Xiab ⁎

a

Chongqing Key Laboratory of Nonlinear Circuits and Intelligent Information Processing, College of Electronic and Information Engineering, Southwest University, Chongqing 400715, PR China b College of Data Science and Information Engineering, Guizhou Minzu University, Guiyang 550025, PR China

ARTICLE INFO

ABSTRACT

Keywords: Economic dispatch problem (EDP) Distributed optimization algorithm Stochastic gradient descent Communication delays Uncoordinated step-sizes Directed network

Economic dispatch problem (EDP) is one of the fundamental optimization problems in power systems, which involves a coupling linear constraint and several individual box constraints. In this paper, we propose a distributed stochastic gradient descent algorithm based on consensus theory to solve the EDP under directed network, where the convex cost function for each generator only needs to satisfy the condition that the function is strictly convex with Lipschitz continuous gradient. The proposed algorithm utilizes stochastic gradient descent to update values of generators for dealing with the noise which is incurred during the gradient estimation, and the step-sizes are heterogeneous. Under strictly convex assumption on objective functions, the algorithm can

( ( ) ), where K is the number of iteration.

seek the exact optimal solution with probability one at the rate of O

ln K K

Furthermore, the algorithm is also suitable and effective to the network with communication delays if the communication delays are bounded. Simulation results illustrate the effectiveness of the algorithm.

1. Introduction

1.1. Literature review

EDP is one of the fundamental problems for energy management in power systems, and it can be casted as an optimization problem in essence. The objective of EDP is to schedule the power generations and minimize the entire cost of all generators, which not only meets the actual load demands but also satisfies individual capacity constraints [1]. This problem is traditionally solved in a central manner, and there exist many basic and classical algorithms [1–4] that adopt centralized methods to work out the EDP. The main cause for this situation is that centralized methods are simple to implement. All of these centralized algorithms require a powerful control center to access the information of the whole network and process large amounts of data. Therefore, the centralized control approaches are subjected to high communication requirement and cost. In addition, robustness for the network is vulnerable, and one of its main causes is single-point-failure. The trend for communication and control networks of future power systems will be inseparably integrated together accordingly [5,6], and this trend requires that the designed algorithms are scalable and robust. In order to overcome these limitations and accommodate various resources when new loads and generators are installed, it is reasonable to develop distributed algorithms for the future power systems.

The special superiority for distributed algorithms is that substantially computational burden is not imposed, thus distributed algorithms become more practical and robust than their centralized counterparts. Motivated by favorable aspects for inexpensive computation, robustness for networks and practical applications, varieties of distributed algorithms have been adopted in applications of power systems [7–14] over the last few years. The main cause for this phenomenon is the rising of research on distributed control and multi-agent systems [15–25]. In particular, some of them have taken into account the impact of the random noise which is often occurred during the communication process among nodes in the network. Considering the effect of random noise on (sub-) gradient for nodes, a distributed stochastic (sub) gradient projection algorithm is developed in [24], and it proves that the algorithm can converge to an optimal solution with probability one. The work in [25] proposes a distributed (sub-) gradient algorithm with random sleep scheme, which takes stochastic (sub-) gradient into consideration. Its theoretical analysis indicates that the estimates of the nodes can reach consensus almost surely and in mean, and the consensus point is the optimal solution with probability one. With regard to the EDP, solution methodologies of EDP shift their nature for the past



Corresponding author. E-mail address: [email protected] (H. Li).

https://doi.org/10.1016/j.ijepes.2019.03.024 Received 25 September 2018; Received in revised form 23 January 2019; Accepted 11 March 2019 0142-0615/ © 2019 Elsevier Ltd. All rights reserved.

Electrical Power and Energy Systems 110 (2019) 759–771

H. Zhang, et al.

few years from centralized methods [3,26] to distributed ones [27,28]. There has been a surge of interest in developing consensus-based distributed algorithms, and the generation incremental cost in these proposed algorithms is chosen as the consensus variable. In addition, each generator in these algorithms holds some variables, and these variables are updated through information exchange which is just operated with its neighbor generators. For instance, a leader-follower consensus-based algorithm proposed in [29] first uses the leader to collect the deviation between demand and supply, and the marginal cost for the system is updated later. The results in [6,29,28] indicate that the convergence rate of these algorithms are affected by the leader they chosen. Hence, it is necessary to design some methods for the election of the leader. In order to solve EDP without requirement for selecting the leader, a consensus and innovation framework is introduced in [28]. The innovation term is the mismatch between demand and supply, and it can choose the incremental cost of each generator as nodes’ consensus variable. To achieve consensus on incremental costs, distributed algorithms are proposed in [30,31] based on consensus theory. These algorithms just can be applied to undirected communication networks. That is to say, the information flow must be bidirectionally transmitted among generators. Because the communication cost in undirected network is more than their directed counterparts, the interest of researchers tends to develop control and coordination algorithms in more general directed network. It is favorable for the future electrical power systems. To solve EDP over the directed network, a distributed algorithm is proposed in [32], and the estimation of mismatch is computed and participated by all the generators. In [33], the push-sum consensus idea is adopted to solve EDP over directed fixed network. The proposed minimum-time consensus-based algorithm in [34] can solve the EDP in less number of iterations. All of these works [30–34] consider the case of quadratic cost functions, but cost functions in [35,36] for EDP are eased to more general convex functions. Most existing literature has considered the case that communication links in the network are time invariant, but one case that communication links may suffer communication delays is not taken into account. However, the communication for each generator is often unreliable in practice. The communication in large-scale power systems for each generator with any other generator can no longer be guaranteed accordingly. For a given network, the information passed on communication links may not directly reach the target generators at some time for the existence of communication delays. Therefore, the research on investigating the influence of communication delays on most existing distributed EDP algorithms is desirable. It is also reasonable to develop robust algorithms to overcome the negative impact cased by this situation for future smart grid. The researchers in [37] study the performance of a two-level consensus algorithm under communication delays, and they find that this algorithm cannot converge to the optimal solution due to communication delays. In [38], the influence on the algorithm presented in [32] caused by communication delays is investigated. It finds that the communication delays may cause slower convergence rate, incorrect convergence point and even divergence for the algorithms.

gradients for individual objective functions can be accessed by the agents themselves. Although this algorithm adopts stochastic (sub-) gradients, it still can achieve fast convergence rate under different assumptions on objective functions. But it only considers the out-degree information for every agent at each time, the in-degree information for each agent has not been taken into consideration. To address the distributed average consensus problem in a directed network with communication delays, Nedic et al. in [41] investigate the convergence rate for consensus with delays. They establish the convergence rate to consensus under a bounded delay condition, and a bound for the communication delays is provided which is necessary to achieve the consensus. However, the works in [39–41] just could solve unconstrained optimization problems, and the result shown in [41] is only applicable to the distributed consensus problems with communication delays. In addition, Yang et al. in [42] propose a novel distributed algorithm to solve EDP which arouses our attention, but the algorithm only considers the out-degree information for every agent at each time to solve the EDP without using both the out-degree information and the in-degree information of agents. 1.3. Contributions Motivated by the primal-dual method for constrained optimization problems [44] together with the recent works [39–42], a novel distributed optimization algorithm is proposed to solve EDP over a directed network in this paper. Compared with the existing literature on EDP, the main contributions of this paper can be summarized as the following three aspects: (1) We propose a distributed primal-dual algorithm to solve constrained optimization problems over a directed network. The proposed algorithm adopts stochastic gradient descent method to update values of nodes for dealing with the noise which is often incurred during the gradient estimation. If the strictly convex assumption on objective functions is satisfied, theoretical analysis indicates that the algorithm can exactly seek the optimal solution to EDP with probability one. The algorithm is quite different from those in [39–42]. The results in [39–41] are only applicable to the distributed optimization problems without constraints. In addition, the algorithms developed in [40,41] oblige to employ the information of out-degree to update values of nodes. Instead of using the information of out-degree and in-degree for every node at each iteration, the algorithm in [42] adopts the information of out-degree to update values of nodes to solve a class of constrained optimization problems. (2) The proposed algorithm allows uncoordinated step-sizes for the local optimization of each node, which also guarantees that the algorithm achieves a fast convergence rate to optimal solution in contrast to most existing literature. Furthermore, the uncoordinated step-sizes have more significance in practical applications compared with the coordinated step-size. For [40–42], the step-sizes among generators that each node adopted are coordinated. (3) To demonstrate the robustness and scalability of the proposed algorithm, we extend the algorithm to the network with communication delays. If the delays on communication links are bounded, it has been proved that the algorithm also can exactly reach the optimal solution of EDP with probability one. The effectiveness of the algorithm to EDP in power systems is performed through some simulations.

1.2. Motivations The recent literature [39–42] are the most relevant to our work. Xi et al. in [39] propose the distributed gradient descent algorithm based on the gradient descent method in [43] to solve the distributed optimization over a directed network. Comparing with most of the existing literature, the algorithm considers both the out-degree information and the in-degree information for every agent at each time during the process of information communication among agents, and theoretical analysis indicates that the optimal solution can be reached. A stochastic (sub-) gradient-push algorithm is presented in [40] for distributed optimization over time-varying directed networks, where the noisy (sub-)

1.4. Organization The rest of this paper is structured as follows. In Section 2, we introduce the notations and the communication model of the network. Then, we introduce a formal statement of the optimization problem in 760

Electrical Power and Energy Systems 110 (2019) 759–771

H. Zhang, et al.

Section 3. The convergence and convergence rate results of proposed algorithm are presented in Section 4. In Section 5, the robustness of the algorithm with communication delays is studied. Some numerical examples on EDP are provided to testify the effectiveness of the algorithm in Section 6. Finally, conclusions and discussions of future extensions are drawn in Section 7.

X= blem (1).

to denote the optimal solution and the feasible set to pro-

Remark 1. In model (1), fi (x i ) and x i are the descriptions of the private convex cost function and the generated power for generator i, respectively. For the whole network, the total cost function f (x ) N denotes the sum of costs at each generator, i.e., f (x ) = i = 1 fi (x i) . The entire load of the network satisfies the total power generation of all generators, where the limited positive constant is represented by M, i.e., N x = M . Considering the actual situations, the assumption is i=1 i reasonable since generator i generates limited power which is restricted on the interval, i.e., i x i ui .

2. Preliminaries For the convenience of later statement, some notations throughout the paper and preliminary results on communication network are introduced in this section. 2.1. Notations

3.2. Necessary assumptions

We use Z+ to indicate the positive integer set, R the real number set, RN the N-dimensional column vector set, and RN × N the N × N real matrix set. For a matrix A, [A]ij denotes its (i , j )-th entry. Let 1N denote a column vector whose all elements are equal to one, and IN represent an N × N identity matrix. The standard Euclidean norm for a vector x is indicated by x , and the spectral norm for a matrix A is represented by A . For any two vectors x and y , their inner product is indicated by x, R , its conjugate is indicated as y . Given a convex function f (·) :Rn f (y ) = sup yT x f (x ) .

Assumption 1. The directed network

Remark 2. Assumption 1 is a mild condition on the connectivity of communication topology for general directed fixed network, and it is commonly used to describe information exchange. If there exists a path in each direction between each pair of nodes for the directed network, this network is strongly-connected, and it ensures that all nodes can always directly or indirectly obtain information from any other nodes. For more details, see related literature [39,45].

x Rn

Assumption 2. For i {1, …, N } , each fi is a strictly convex function. It is continuously differentiable and has Lipschitz gradient:

2.2. Communication network = { , , W } , which consists A directed network is indicated by × , and an weight of a node set ={1, 2, , N} , an edge set matrix W = [[W ]ij ] RN × N . In particular, the weight element [W ]ij > 0, if there exists a directed edge from node j to node i, and [W ]ij = 0, otherwise. If there exists a directed edge (j, i), it implies that node i can } to directly receive information from node j. We use +i = {j (j, i) denote the in-neighbors set of node i, that is, these nodes in this set can directly send information to node i. Similarly, the out-neighbors set of } . The network node i is defined as is strongly i = {j (i , j ) , there exists a connected, if and only if any two distinct nodes i , j directed path from node i to node j. The network is balanced when N N j [W ]ij = i = 1 [W ] ji for , and unbalanced, otherwise. i=1

Furthermore, if

stochastic, and if chastic.

is strongly-connected.

N [W ]ij = 1 for i=1 N [W ]ij = 1 for j=1

fi (x )

and the constant Li

i

Li x

y ,

x, y

Rn

(0, + ) .

Remark 3. The convexity of function fi guarantees that its any local minimum is also a global minimum. The Lipschitz continuity condition is the central condition of Picard-Lindelo¨ f theorem in the differential equations theory. For an initial value problem, the existence and uniqueness of its solution are guaranteed by this condition. In the analysis of optimization algorithms, Assumption 2 is standard and mild, see e.g. [39,40]. Remark 4. For the sake of simplicity, we assume that each generator i has one dimension, i.e., x i R (n = 1) . The obtained results can be easily extended to the high dimensional case through using the Kronecker product.

, the matrix W is column-

j

fi (y )

, the matrix W is row-sto-

3. Problem formulation 4. Main results

In this section, the mathematical formulation for EDP is presented, and some necessary assumptions are proposed.

In this section, we prove that the proposed distributed primal-dual algorithm can converge to an optimal point of the optimization problem (1) over a directed network.

3.1. Formulation of EDP The EDP in power systems can be formulated as the following optimization problem:

4.1. Conversion of optimization problem

N

min f (x )= minn

x R Nn

xi R

fi (x i )

The Lagrangian function can be defined as

i=1

N

s.t.

xi = M

N

i=1

x i ui , i x = (x1, x2, , xN ) T i

{1,

x,

,N}

=

×R

xi

M

R of optimization problem (1)

N

fi (x i ) i=1

(1)

:

(2)

i=1

R denotes Lagrangian multiplier that corresponds to the where coupling constraint in problem (1). Considering a convex function f (x ) , R, its conjugate is indicated as f (y ) = sup y T x f (x ) . For a given

where the variable x i Rn and the private cost convex function fi (·) N R are held by node i. The set :Rn = {x RNn i = 1 x i = M } indicates the coupling constraint, and i the box constraint, i.e., Rn i x i ui } . Moreover, let represent the Cartesian i = {x i N products of i , i.e., = i = 1 i . We use x = (x1 T , x 2 T , , xNT) T and

the dual function 761

(·): R

x

R of problem (1) is defined by

Electrical Power and Energy Systems 110 (2019) 759–771

H. Zhang, et al. N xi

where

N

( ) = min i i=1

fi (xi )

xi

M

i=1

N

= M+

{ sup [ x i xi

i=1

i(

Mi , and Mi represents the virtual local demand

) = fi ( )

at each generator, thus it holds that i = 1 Mi = M . Mi of i ( ) is uniformly bounded, i.e., The gradient i ( ) = xi ( )

Ci (xi )]}

i(

i

) = xi ( )

N

max x imax + max Mi

Mi

i

N

=

[ f i ( )] + M

i=1

In addition, the function 5. Hence, i( ) = max x imax + max Mi .

(3)

Remark 5. If fi is strictly convex and closed, the conjugate fi for fi is also strictly convex and closed. See related literature [46–48]. Definition 1. For a given Lagrange function following inequalities hold, that is,

(x , )

(x ,

)

(x ,

),

The pair of optimal vector (x , (x , ) [49].

x

X,

:

×R

i

R , if the

R

i(

)+ M

In the algorithm, each generator i holds three variables, i.e., 1) , each generator j i (k ) and x i (k ) . At the k-th iteration (k sends its state information j (k ) and a weighted surplus variable [A c ]ij j (k ) to its out-neighbor j . After that, generator i updates its variables i (k + 1) and i (k + 1) with the information received from its in-neighbors +i . From relation (6), at the k-th iteration (k 1) , the primal solution x i (k ) can be solved through the dual solution i (k ) which usually indicates the incremental cost for each generator i. Then, the update form for the distributed primal-dual algorithm can be summarized as follows:

where i(

) = min fi (xi ) xi

xi

(5)

i

) has the According to Assumption 2, we have known that fi (i continuity and strict monotonicity, thus the value of f i 1 belongs to the range [ fi 1 (x imin ), fi 1 (ximax )], where f i 1 represents the inverse R , there must exist a unique minifunction of fi . For any given mizer x i for the right-hand side of relation (5) under Assumption 2, and it is indicated by 1

x i ( ) = min{max{ f i ( ),

x imin },

x imax },

i

N i (k

i (k )( i (k

(6)

xi i=1

(k )

M

R

i=1

)

i (k )

+

i (k ))

i (k )

[Ar ]ij

j (k )

[Ac ]ij j (k )

i (k )

j=1

x i (k + 1) = min{max{ f i 1 ( i (k + 1)), x imin}, x imax }

(11)

(0, ¯) plays an important role in the convergence of The constant 1 µ3 ) N [45]. The µ3 is the third this algorithm, where ¯ = (20 + 8N ) N (1 largest eigenvalue of weight matrix A through setting = 0 , and the form of weight matrix A is given in relation (15), where Ar = [[Ar ]ij ] RN × N and Ac = [[Ac ]ij ] RN × N are the row-stochastic weight matrix and column-stochastic weight matrix. We summarize (11) in Algorithm 1. Further, we use k to denote the -algebra generated by the entire history of the Algorithm 1 up to time k 1, k = {( iT (l), iT (l))T , i , 1 l k 1} k, i.e., with T T T }. 0 = {( i (0), i (0)) , i + For Ar , its elements [Ar ]ij > 0 if j i ; and [Ar ]ij = 0 , otherwise. [A c ]ij = 0 , otherwise. It is For Ac , its elements [A c ]ij > 0 if i ; and j indeed possible to construct such weights, e.g., through choosing

(8)

[Ar ]ij =

[A c ]ij =

1/|

1/|

+ i |,

+ j i , 0, otherwise

j |, i

j

0, otherwise

,

i

(12)

j

i {1, …, N } , let Assumption 3. For > 0 and k 1. Assume that ai (k ) a (k ) < . k=0

(13)

= k + 1 + ai (k ) , where i a (k ) for and

i (k )

Remark 6. Assumption 3 is the diminishing step-size condition, and the uncoordinated step-sizes satisfy the persistence conditions for (k ) = , k = 0 i2 (k ) < , i diminishing step-size that . k=0 i

N i(

+

N

where (k ) is the step-size at the k-th iteration, and the initial state (0) R . The update form of in (8) depends on the global quantity N x ( ) M which is centralized collection of the information at i=1 i every generator, and this mechanism prevents us from implementing the algorithm in a distributed manner. To overcome this defect, each generator must somehow estimate the multiplier based only on local interaction with its neighbors. For the sake of applying a distributed algorithm to deal with dual problem (4), we convert it into a minimization form first and adopt the gradient descent method to figure it out in a distributed way. The dual problem (4) is equivalent to the following optimization problem

min ( ) =

i ( i (k ))

+ 1) =

+

N

(k )

j (k )

j=1

where x and denote the primal optimal solution and dual optimal solution, respectively. R . In We have already known that x i ( ) is unique for any given addition, the dual function ( ) is differentiable at , and its gradient N is ( i = 1 x i ( ) M ) . Through using the gradient decent method, the dual problem in (4) can be solved as follows:

k + 1 = (k )

[Ar ]ij

N

(7)

i

+ 1) = j=1

Moreover, for the Lagrangian dual problem (5), there must exist one optimal solution at least, and the unique optimal solution to the primal problem (1) can be represented as

x i = x i ( ),

i

i (k ),

(4)

i=1

) is strictly convex function under Remark is -Lipschitz continuous, where

4.2. Distributed primal-dual stochastic gradient optimization algorithm

) is the Lagrangian saddle point of

N R

(10)

To address (9), we present a distributed primal-dual algorithm by adding additional surplus variables i (k ) , i for all generators at the k-th iteration (k 1) . The proposed algorithm combines primal-dual method [44] and the distributed gradient descent algorithm [39].

Moreover, the dual problem of (1) can be reformulated as

max ( ) =

i(

i

(9)

762

Electrical Power and Energy Systems 110 (2019) 759–771

H. Zhang, et al.

Assumption 3 is standard and mild in the existing literature. For more details, see related literature [39,43]. Assumption 4. For i {1, …, N } , the noisy samples of node i are only generated when it accesses to its gradient at a given point i (k ) . The noise for gradient is an independent random vector with zero mean, and its mean square is uniformly bounded. That is,

E [ i (k ) where

k 1]

= 0, E [

i (k )

2

k 1]

2,

k

Mi +

{1. ,N }, {N + 1. ,2N },

IN Ar Ac

IN

(15)

i (k ) ,

2N

1 N

z i (k ) = i=1

i (k )

1 N

+

i=1

N i (k )

(16)

i=1

To illustrate the main results, we introduce some supporting lemmas in this subsection. They are indispensable and crucial for the following convergence analysis. Lemma 1 ([39,45]). Supposing Assumption 1 holds, consider the weight (0, ¯) with matrix A defined in (15). The constant in A satisfies 1 N ¯= (1 µ ) µ , where is the third largest eigenvalue of A N 3 3 (20 + 8N )

arbitrary

through setting = 0 . For i, j {1, …, 2N } , the entries [Ak ]ij converge to their limits as k at a geometric rate. That is,

Ak

1N 1T N

1N 1TN

N

N

k,

0N × N 0N × N

k

1, (17)

> 0 and 0 <

where the constants

< 1.

Lemma 2 ([39]). For a given sequence { (k )} k we have the following inequalities hold K

1 2

k

(k ) k= 1

Remark 8. Most existing literature on distributed algorithms solve the optimization problems under the assumption of undirected network, that is to say, the weight matrix associated with the communication network needs to be doubly stochastic [27]. The row-stochasticity of the weight matrix ensures that all nodes reach consensus, while the column-stochasticity guarantees optimality. It has been already known that the in-degree information and out-degree information for each node in the directed network can be used to construct row-stochastic weight matrix and column-stochastic weight matrix respectively. If we do not add an extra step to eliminate the imbalance of the directed network caused by only using its row-stochastic weight matrix or its column-stochastic weight matrix, we should consider to use both of them. Actually, through increasing a surplus variable i (k ) to record the state updates and using the in-degree information and out-degree information in the directed network, this an extra step is not needed in the proposed algorithm (11). At each iteration, the surplus variable i (k ) ensures that the sum of weighted information sending out equals to one, which is guaranteed by the column-stochastic weight matrix Ac . The incremental cost i (k ) ensures that the sum of weighted information received equals to one, which is guaranteed by the rowstochastic weight matrix Ar . The algorithm can be classified surplus based algorithm. For more details, see related literature [39,45].

K

K 2 (k )

+

k=1

k r

(k )

r

(0, 1) ,

K 2 (k )

1

k= 1 r = 1

and a constant

2)

1

1

1

1 2(1

k 1

k=1

Lemma 3. Suppose Assumptions 2 4 hold. Consider the sequence {z (k )} k 1 generated by the algorithm (14), There exist some bounded constants, > 0 and 0 < < 1. When k 1, the following equations hold: (i) For

{1, …, N } ,

i

2N

E [ z i (k )

k 1

k

z¯ (k ) ]

z j (0)

+N

k ra

+

j= 1

r

1

r= 1 k 1

+N

k r

+ r= 1

(ii) For

r

+2

+

a (k ) +

k

{N + 1, …, 2N } ,

i

2N k

E [ z i (k ) ]

k 1

zj (0)

+N

k ra

+

j=1

r

1

r=1 k 1

+N

The distributed primal-dual stochastic gradient optimization algorithm in (11) can be rewritten in a compact form as

k r

+ r=1

2N

Proof. For

i (k ) gi (k )

j=1

x i (k + 1) = min{max{ fi 1 ( i (k + 1)), x imin}, x imax }

N

1 N

4.3. Supporting Lemmas

2: Output: The optimal incremental cost and the optimal generation x i . 3: Repeat 4: Set k = 1. 5: for i = 1 to N do 6: Execute the update rule (11) . 7: Return i (k + 1) and xi (k + 1) . 8: end for 9: Set k = k + 1. 10: Until a predefined stopping rule is satisfied.

[A]ij zj (k )

i (k ), i

i

Ar IN

z¯ (k ) =

Algorithm 1. Distributed stochastic gradient algorithm for EDP over directed network

z i (k + 1)=

{1. ,N }, {N + 1. ,2N },

Because the proposed algorithm (11) bases on the consensus theory, for the convenience of following analysis, we first define

Remark 7. In simulation-based optimization, the noise for each node is often incurred during the gradient estimation. This is primarily because the randomness of noise, and the stochastic gradient noise made by each node propagates through the network to every other node and also across time, which makes the iterates statistically dependent across time and node. There are various sources that often incur noise, such as modeling and discretization errors, incomplete convergence, and finite sample size for Monte-Carlo methods [40,50]

, W}, the step-size . i i for

xi (k ) 0N ,

A=

1

i i

i (k ),

gi (k )=

> 0 is a deterministic constant.

1:Input: For a given directed network = { , assigned initial value i (0), i (0) , and xi (0)

i (k ),

z i (k )=

k

1, we write the first relation in (14) recursively

2N

(14)

j =1 i (k

where 763

k 1

2N

r=1

j=1

[Ak ]ij z j (0)

z i (k )=

r

1) gi (k

[Ak r ]ij 1)

j (r

1) gj (r

1) (18)

Electrical Power and Energy Systems 110 (2019) 759–771

H. Zhang, et al.

1–4hold. The sequence {z i (k )} k 1 is generated by algorithm (14). When k 1, the following equations hold:

Considering the recursive relation of z i (k ) in (18), we deduce that z¯i (k ) can be denoted as 1 N

z¯ (k )=

2N

1 N

zj (0) j=1

k 1

2N

r=1

j=1

r

j

1 gj r

(1) For

1

lim E z i (k ) = lim E z¯ (k )

k

k

j

1 gj k

1

E [ z i (k )

r=1

j =1

k

(2) For

(19)

j=1

Subtracting Eqs. (19) from (18), and taking the norm and conditional expectation, for i {1, …, N } , we obtain 2N

{1, …, N } ,

2N

1 N

k 1

i

z¯ (k )

i

{N + 1, …, 2N } ,

lim E z i (k ) = 0

k

k 1] 1 N

[Ak r ]ij

1 E

r

j

j

j

r

1

Proof. According to Lemma 3, for K

+

j

r

1

2N i (k )

k 1

z i (k )

k =1

K

zj0

z¯ (k )

{1, …, N } , we have

i

j =1

+ +

2N j

k

1 E

k

1

1)) +

i (k

j

j

+

k

j

+N

1

i ( i (k

2N

1 N

[Ak ]ij

+ j=1

Realizing i (k )

i (k )

a (k ) +

1)

k

+N

1]

k+1

+N

k ra

+

j=1

K

z i (k )

r

k r

+

r

+2

+

a (k ) +

k=1

1

+ +

k

j

j

r

1

+

j

r

1

2

i (k ) a (k )

N ( + ) 2(1 )

K

K 2 i (k )

k=1 K

a2 (k )

+ k=1

2 i (k )

K

+

k=1

k=1

( )

2

k

K

+2

+ k=0

It can be deduced that

K k=0

i (k ) a (k ),

K k=0

i (k ) k

i (k ) k

,

(25) K k=1

a2 (k ) are limited from Assumption 3 as K and means the right-hand side of (25) is bounded as K i {1, …, N } ,

N

1 E

+

N ( + ) 2(1 )

K k=1

r

k=1

1 1

k=0

k 1]

j

+

K

(22)

r=1

2 i (k )

j=1

+2

Similarly, taking the norm and conditional expectation of (18), for i {N + 1, …, 2N } , it yields

[Ak r ]ij

K

zj0

z¯ (k )

r=1

r=1

k 1

r

i (k ) k

2N i (k )

k 1

E [ z i (k )

k r

1

Through applying Lemma 2 in (24), it yields

k 1

z j (0)

+N

i (k )

r

(24)

, from Assumption 4, it yields

2N

+ k=0

(21)

k

k ra

k 1

+

+2

(20)

k+1

z¯ (k ) ]

i (k )

K

We have known that the gradient i ( ) is bounded by a constant , and the noisy i is limited from Assumption 4. Invoking Lemma 1, taking (21) into (20), and taking the total expectation, the relation (20) can be simplified as

E [ z i (k )

k 1

k =1 r =1

zj (0)

= ai (k ) +

i (k ) a (k )

k =1 r =1 K

1) E [

+ k= 0

+

k 1

j=1

i (k

+2

k=1 K

1 N

K k

i (k )

k 1

j=1

2 i (k ),

K k=1

( )

2

k

, which also . That is, for

2N

[Ak ]ij

+

zj (0)

i (k ) E

j=1

z i (k )

z¯ (k )

k= 1

(23)

<

(26)

which means E [z i (k )] asymptotically converges to E [z¯ (k )] as k . The proof of part (ii) follows the same spirit in the proof of part (i). Hence, through using Lemma 2 and Lemma 3, for i {N + 1, …, 2N } , we have

Applying Lemma 1, taking (21) into (23), realizing its gradient i and the noise i are uniformly bounded and taking the total expectation, the result in part (ii) can be obtained. □ 4.4. Convergence analysis

i (k ) E k= 1

With the above lemmas, we are ready to present the main results which consists of two theorems. In Theorem 1, we will prove that the expectation E [z i (k )] for i {1, …, N } asymptotically converges to the accumulation point E [z¯ (k )] as k . When k , the accumulation point E [z¯ (k )] converges to will be proved in Theorem 2.

z i (k )

<

(27)

i {N + 1, …, 2N } , the relation (27) indicates that E [z i (k )] For asymptotically converges to zero as K .□ Theorem 2. (Convergence of E [z¯ (k )] to )Suppose Assumptions 1–4hold. The sequence {z i (k )} k 1 is generated by algorithm (14). When k 1, the following equation holds

Theorem 1. (Convergence of E [z i (k )] to E [z¯ (k )])Suppose Assumptions 764

Electrical Power and Energy Systems 110 (2019) 759–771

H. Zhang, et al.

Remark 9. Combining the results shown in Theorem 1 and Theorem 2,

lim E z¯ (k ) =

k

Proof. Realizing the fact that A is a column-stochastic weight matrix, and using the first relation in (14), it yields i (k )

z i (k ) +

i

i (k )

(28)

i=1

Hence, we have 2=

2

z¯ (k )

2

N

1 N

+

i (k )

z i (k ) +

i

4.5. Convergence rate

i (k )

i=1

In this subsection, the convergence rate of the proposed algorithm (11) will be established.

N

2 N

i (k )

z¯ (k )

,

i

z i (k ) +

i (k )

i=1

Proposition 1. Suppose Assumptions 1–4 hold. The proposed algorithm in (11) can converge to its optimal solution with probability one, and its global

(29) Since E [

i (z i (k ))

E [ z¯ (k )

,

+

i (k )

i ( i (k ))

+ , we obtain

k 1]

k 1]

convergence rate is O

E [( i (z¯ (k )) i ( )) k 1] (2 + ) E [ z¯ (k ) z i (k )

k 1]

a (k ) +

Through substituting (30) into the conditional expectation form of (29), after rearranging terms and taking the total expectation, it yields i

z¯ (k )

i(

, where K is the number of iteration.

= inf

m

and

z¯ (k )

k 1

= b (k ) , we have

k+1 i(

m

N i (k ) E

lnK K

Proof. According to Theorem 2, letting

(30)

2

{1, …, N } . In addition, when the

i

incremental cost i achieves its optimal solution with probability one, the relation (7) indicates that the primal variable x i can also reach its optimal solution x with probability one. Thus, we conclude that the proposed algorithm in (11) can converge to its optimal solution with probability one. That is, the saddle point (x , ) of problem (2) can be converged by point (E [x (k )], E [ (k )]) asymptotically with probability one as k .

N

1 N

z¯ k + 1 = z¯ (k )

z¯ (k + 1)

for

we have lim E z i (k ) =

k

)

b (k ) k=1

)

b (k )

z¯ (k )

i

i(

)

(34)

k=1

i=1 2]

NE [ z¯ (k ) +

1 N

N

2]

NE [ z¯ (k )

We have known that b (k ) >

2 2 i (k ) E

i

z i (k ) +

k 1. Replacing b (k ) with the terms inequality (24) still holds, and it yields

i (k )

i=1 N

+2 2 +

i (k ) E

z¯ (k )

K

z i (k )

2N

b (k )

(31)

i=1

We have known that

z i (k )

k=1

2b (k ) E [ i (z¯ (k ))

i(

2]

+ +

b2 (k ) N2

+4

E

z i (k ) +

i

i (k )

N

b (k ) E

z¯ (k )

2

b (k ) E

i

z¯ (k )

i(

)

z i (k )

i=1

2]

k=1

+

2(2 + ) N

C2 =

N

b (k ) E

b 2 (k ) k=1 K

b 2 (k )

+

(35)

)

C1 K k=1

+

b (k )

K b 2 (k ) k=1 K b (k ) k=1

C2

(36)

z¯ (k )

+

2N

( + )2 2N

+

+4

+

z j0

2 +

+

j=1

z¯ (k )

k=1 i=1

2 1 E 2

z¯ (k )

2 ( + )

+

1

2N

2

zj0

j=1

2N ( + )(2 + ) 1

2 +

z i (k ) ( + )2 N

2

K

2 1

C1 = 2 E

2]

E [ z¯ (k )

1 1

where the parameters C1 and C2 are constants, i.e.,

and rearranging terms, we obtain

E [ z¯ (k )

i(

m

(32) Summing (32) over k from 1 to

k=1

for

Through combining (33)–(35), the relation (34) can be rewritten as the following form

i=1

2(2 + ) N

b 2 (k ) +

M k

in (24), the

M r

k=1

2

N

K

2N ( + ) 1

+

2]

E [ z¯ (k )

i (k ), a (k ) and

j=1

a (k ) +

)] E [ z¯ (k )

b (k ) > a (k ) and b (k ) >

z j0

z¯ (k )

from (21). Letting k+1 a (k ) + k + 1 = b (k ) and according to Assumption 3, the sequence b (k ) satisfies k = 0 b (k ) = , k = 0 b2 (k ) < . Invoking (21) in (31), we have i (k )

i (k ),

(37) In particular, letting b (k ) = k

b 2 (k ) k=1

C1

(33)

K k=1

According to Lemma 3, it can be deduced that the right-hand side of (33) are limited, which also indicates that E [z¯ (k )] asymptotically converges to as k .□

b (k )

=O

1 , K

C2

1 2,

k

1, we obtain

K b 2 (k ) k=1 K b (k ) k=1

=O

lnK K

It can be observed that the second term on right-hand side of (36) dominates, Hence, the overall convergence rate of the proposed 765

Electrical Power and Energy Systems 110 (2019) 759–771

H. Zhang, et al.

algorithm (11) is O

delayed communication model for directed network into a communication model without delays. In particular, for each and its neighbors j , if the communication generator i delays occur in the network, we introduce virtual generators {i1, i 2, …, i } between them as buffers. Any information from generator j to generator i will be first transmitted to these virtual generators where the time that information passes through these buffers equals to the desire delay, and then the information will be finally transmitted to generator i. Note that the in-degree and out-degree of these virtual generators are equal to 1, and they are the ones without computing capability. Specifically, the real generators are indexed as {1, …, N } , and the virtual generators are indexed as {N + 1, …, N ( + 1)} . Furthermore, for a generator i, the virtual generators indexed as {N + i , …, N + i} are used to respectively model the delay of {1, …, } step.

.□

lnK K

5. Robustness against communication delays Since communication delays usually happen in a communication network, the research on robustness of the proposed algorithm (11) for EDP under communication delays is necessary and meaningful, which will make the future power systems more reliable and stable. In this section, we will indicate that the algorithm (11) is competent to solve the EDP over a directed network with communication delays. 5.1. Model of algorithm with communication delays The model of the proposed algorithm (11) with communication delays in a directed communication network will be formulated first. In particular, at the k-th iteration, the communication link (j, i ) is subjected to a priori unknown delay ji (k ) where ji (k ) Z+. For communication delays, in order not to lose its generality, we impose the following assumption on them.

Through introducing variables i (k ) and i (k ) that generator i holds in the enlarged system at time k, the algorithm (38) for generators in the augmented system is given by: for all i {1, …, N ( + 1)} , N ( + 1) i (k

Assumption 5. The communication delays are uniformly bounded at all times, i.e., 0 for k Z+ with some bounded positive ji (k ) integer . In addition, each generator is accessible to its own value without any communication delays, i.e., ii = 0 for all i and k Z+.

+ 1)=

[Ar ]ih

h (k )

+

i (k )

h=1 i (k )(

i ( i (k ))

+

i (k ))

N ( + 1) i (k

Remark 10. For a given network, when some communication links subject to the existence of communication delays, the information passed on them cannot reach target generators in time. However, the delayed information will be finally delivered to target generators through the network as time goes on if and only if the communication delays are bounded. Supposing the communication delays are infinite, the delayed information cannot reach target generators in a limited period of time, thus the corresponding nodes may become isolated ones which is contradicted to Assumption 1. Hence, Assumption 5 is a mild condition for existence of communication delays. For more details, see related literature [41,42].

+ 1)= i (k )

[Ar ]ih

h (k )

h=1 N ( + 1)

+

[A c ]hj j (k )

i (k )

j =1

x i (k + 1)=min{max{ fi 1 ( i (k + 1)), ximin }, x imax } where for all h

In the algorithm (11), note that the updates of incremental cost + 1) and surplus variable i (k + 1) are both dependent on communication among its neighbors, while the update of x i (k + 1) is exe. That is, when communications of the netcuted locally for all i work suffer from communication delays, each generator i executes the following steps i (k

{1, …, N ( + 1)} and j

[Ar ]ih =

[Ar ]ij , if h = j + N ij (k ) 0 , otherwise

[A c ]hj =

[Ac ]ij , if j = i + N ij (k ) 0, otherwise

(39)

{1, …, N ( + 1)} ,

and [Ar ]ij , [A c ]ij have been predefined in (11). The evolution of states for virtual generators is given by: for all i {N + 1, …, N ( + 1)} and k 1,

N i (k

+ 1)=

[Ar ]ij

k

j

ji (k )

+

i (k

i (k )

j =1 i (k )(

i ( i (k ))

+

i (k ))

+ 1)= i (k )

[Ar ]ij

j

k

ji (k )

j =1 N

+

[Ac ]ij

k

ji (k )

i N (k ) i N (k )

The initial values for them are and i (0) = 0 (0) = 0, i , j { N + 1, … , N ( + 1)} . In addition, for virtual geni erator i and generator j, we have

N i (k

+ 1) =

i (k + 1) =

i (k )

[Ar ]ih =

1, if h = i 0, otherwise

N

5.2. Convergence analysis with communication delays

[A c ]hj =

1, if h = j 0, otherwise

N

Theorem 3. Suppose Assumptions 1–5 hold. When the communication links of directed network are subject to arbitrarily large but bounded communication delays, the proposed primal-dual optimization algorithm (38) still can converge to the optimal solution to problem (2) with probability one.

After defining these weights in the augmented system, for any virtual N ( + 1) generators i , j {N + 1, …, N ( + 1)} , we have h = 1 [Ar ]ih = 1 and

j

j=1

x i (k + 1)=min{max{ fi 1 ( i (k + 1)), ximin}, x imax }

(38)

N ( + 1) h=1

[Ac ]hj = 1.

Therefore,

the

weight

matrices

Ar , Ac RN ( + 1) × N ( + 1) for the augmented directed network are still respectively row-stochastic and column-stochastic. The algorithm (38) dealing with communication delays can be rewritten in a compact form without delays as

Proof. The proof is mainly based on an augmented directed network representation [43], which allows us to transform the original bounded 766

Electrical Power and Energy Systems 110 (2019) 759–771

H. Zhang, et al. 2N ( + 1)

z i (k + 1)=

[A ]ij zj (k )

i (k ) gi (k )

j=1

x i (k + 1) = min{max{ fi 1 ( i (k + 1)), x imin}, x imax }

(40)

where

z i (k )=

i (k ), i

(k ),

xi (k ) Mi gi (k )= + i (k ), 0N ( + 1) , A=

Ar IN (

IN ( + 1)

Ar Ac

i i

{1. ,N ( + 1)}, {N ( + 1) + 1. ,2N ( + 1)},

i i

{1. ,N ( + 1)}, {N ( + 1) + 1. ,2N ( + 1)},

+ 1)

IN (

+ 1)

(41)

(0, ) plays an important role in the convergence of The constant 1 (1 µ3 ) N ( + 1) [45]. The this algorithm, where = N ( + 1) (20 + 8N ( + 1))

µ3 is the third largest eigenvalue of augmented weight matrix A through setting = 0 , and the form of augmented weight matrix A is , its gradient given in relation (41). For each original generator i i and the noise i are uniformly bounded, and there is no perturbation for the virtual generators. The rest proof follows from the proof of Theorem 1 and Theorem 2. □ 6. Numerical simulation result In order to verify the effectiveness of the distributed primal-dual optimization algorithm (11) and its extension for system subjected to communication delays, we provide three numerical simulation examples, and these simulations are both studied with or without communication delays. In the first case study, a numerical example is used to evaluate the algorithm performance in a relatively small-scale system. The stability of the algorithm is testified with time-varying demand in the second case study. In the third case study, the scalability of the algorithm is demonstrated in a large-scale network.

Fig. 2. Simulation results for five generators over directed network without communication delays.

6.1. Case study I

of to the convergence property, see related literature [45]. We assume that all generators know the entire required system demand P = 300kW . Recall that the virtual local demand at each generator i is arbitrarily assigned as long as the summation is equal to the total demand. To implement the proposed algorithm (11), we first choose the virtual local demands at each generator i as M1 = 60 kW, M2 = 40 kW, M3 = 110 kW, M4 = 40 kW, M5 = 50 kW , and 5 Mi = P = 300 kW . The uncoordinated step-sizes are i (k ) = k +i 1 , i=1 and i belongs to the range [0.0009, 0.001].

In this case study, the system in Fig. 1 is used to study EDP. The network totally has five generators which are respectively marked as generators 1, 2, 3, 4 and 5. The cost function that each generator i holds is fi (Pi) = ai Pi2 + bi Pi , its cost coefficients are ai and bi , and Pi is the power generated by generator i . The ranges of the coefficients fi ai [0.000225, 0.000765] are and for each function bi [0.0185, 0.0415], respectively. Taking the actual situation into may have different power production consideration, generators i capacities, and we use a limited bound [Pimin, Pimax ] to represent its capacity. The unit of power is kW in this system. The upper bound Pimax and lower bound Pimin for generator i are randomly chosen from the intervals [70, 130] and [20, 40], respectively. For the directed network shown in Fig. 1, we set = 0.3. For more details on the influence

6.1.1. Without communication delays We use the case where the communication network is perfect without communication delays to test the proposed algorithm (11). In this case, the calculation of gradient for each generator i is corrupted with gaussian noise (0, 8) which is zero mean and variance one, but its observation is exact. From the simulation results in Fig. 2 (a), all the incremental costs converge to an optimal solution = 0.07173 "$"/kWh . As shown in Fig. 2(b), the power outputs of the generators also converge to the optimal solution, x1 = 100.4 kW, x 2 = 42.69 kW, x 3 = 67.2kW, x4 = 32.85 kW, x5 . As

= 56.86 kW (k ) and x i (k ) converge to the optimal solution, it is shown that the total generation meets the total demand M = 300 kW in Fig. 2(c). 6.1.2. With communication delays In this case, we consider the situation that communication links

Fig. 1. The directed network with five generators. 767

Electrical Power and Energy Systems 110 (2019) 759–771

H. Zhang, et al.

Fig. 3. Simulation results for five generators over directed network with communication delays.

Fig. 4. Simulation results for five generators over directed network under timevarying demand without communication delays.

subject to communication delays, while the calculation of gradient for each generator i is also corrupted with gaussian noise (0, 8) . The communication delays are time-varying, and they are assumed with an upper bound = 9 . Therefore, at each time step, the probability for each communication link that the communication delays tend to be an arbitrary integer in {0, 1, …, 9} is 1/10 . The parameter setting for the rest parameters is the same as the former one in Case Study I. Nevertheless, the proposed algorithm (38) still converges to its optimal solution. The simulation results in Fig. 3 indicate that although communication links suffer communication delays, the variables still successfully converge to the optimal solution within a few more iterations as that in the Case Study I without communication delays.

these time steps before continuing updating the variables. Figs. 4 and 5 show the results which are respectively corresponding to the case without or with communication delays. After the demand changes at k = 300 and k = 3000 respectively, the algorithm asymptomatically = 0.06468 converges to the new optimal solution, i.e., "$"/kWh, x1 = 81.13 kW, x 2 = 36.44 kW, x 3 = 51.53 kW, x4 = 28.26 kW and x5 = 46.64 kW . Compared to the corresponding results in the Case Study I, all the outputs decrease since the demand is reduced. 6.3. Case study III In order to demonstrate the scalability of the proposed algorithm (11), we apply it into a large-scale system included fifty-four generators to study the EDP. A quadratic cost function is held by each generator i , i.e., fi (Pi ) = ai + bi P + ci Pi2 . The coefficients ai , bi and ci of fi functions are selected among the ranges ai [6.78, 74.33], bi [0.0185, 0.0415], and ci [0.000225, 0.000765]. The unit of ai , bi and ci in this system are MBtu, MBtu/MW and MBtu/MW 2 , respectively. Each generator i is assumed within a limited bound [Pimin, Pimax ] where Pimin [30, 70] and Pimax [150, 240], and they are randomly chosen from the intervals. Each generator i is assumed that it knows the total load required from the system, and the total load is P = 6000 MW . The uncoordinated step-sizes are i i (k ) = k + 1 , and i belongs to the range [0.001, 0.0011]. In order to implement the algorithm, the parameter for the directed network is set = 0.1. Meanwhile, the virtual local demands Mi at each generator

6.2. Case study II In the practical situations, the load demand is not likely to be a constant over time. To demonstrate the stability of the proposed algorithm (11), we apply it into the system in Fig. 1 with time-varying demand in this case study. Corresponding to Case Study I, this case is also studied with or without communication delays, and the corresponding parameter setting is same as those in Case Study I. The initial demand is P = 300 kW , and it will be purposely changed to P = 250 kW at some time step. For the case without communication delays, we purposely change it at time step k = 300 , and the demand will deliberately be changed at time step k = 3000 for the case with communication delays. Hence, the algorithm needs to modify the local estimated mismatch at

768

Electrical Power and Energy Systems 110 (2019) 759–771

H. Zhang, et al.

Fig. 6. Simulation results for fifty-four generators over directed network without communication delays.

Fig. 5. Simulation results for five generators over directed network under timevarying demand with communication delays.

i

is arbitrarily assigned, and

54 i=1

converge to the optimal solution as the case in the Case Study I without communication delays at a slower rate.

Mi = P = 6000 MW .

6.3.1. Without communication delays The proposed algorithm (11) is firstly applied into the network whose communication is perfect without delays, and we only consider the situation that the calculation of gradient is corrupted with gaussian noise (0, 8) in this case. The simulation results in Fig. 6 show the scalability of the algorithm to large-scale network. Comparing with the number of iterations for the simulation results in Fig. 2 in the Case Study I, the algorithm successfully drives the variables to an optimal solution within a few more iterations.

7. Conclusion We have proposed and analyzed a distributed primal-dual algorithm for EDP over a directed network through using stochastic gradient descent method. The algorithm has considered the fact that the calculation of gradient is often corrupted with noise, and it has more practical significance. Furthermore, if the strictly convex assumption on objective functions is satisfied, the algorithm not only allowed uncoordinated step-sizes but also could seek the exact optimal solution to EDP with probability one at the rate of O

6.3.2. With communication delays In this case, not only the situation that communication links subject to communication delays, but also the calculation of gradient corrupted with gaussian noise been considered. We assume that the gaussian noise (0, 8) , and the communication delays are time-varying and is bounded, i.e., = 4 . Thus, the probability that the communication delays at each time step for each communication link tend to be an arbitrary integer in {0, 1, …, 4} is 1/5. The parameter setting for the rest parameters is also the same as the former one in Case Study I. In spite of this, the proposed algorithm (38) still converges to its optimal solution. The simulation results in Fig. 7 indicate that although communication links subject to time-varying delays, but the variables still successfully

lnK K

, where K is the number

of iteration. Moreover, the algorithm is also robust to communication delays and has been extended to the network with time-varying but bounded communication delays. Simulation examples verify the correctness of the algorithm. The future work for the algorithm will continue to relax the assumption that the cost functions are strictly convex with Lipschitz continuous gradient to general convex functions, speed up the convergent rate, and extend the network to generally directed time-varying communication networks.

769

Electrical Power and Energy Systems 110 (2019) 759–771

H. Zhang, et al.

[6] Zhang Z, Chow M. Incremental cost consensus algorithm in a smart grid environment. In: Power and energy society general meeting; 2011. p. 1–6. [7] Zhao C, Topcu U, Low S. Optimal load control via frequency measurement and neighborhood area communication. IEEE Trans Power Syst 2013;28(4):3576–87. [8] Chen G, Zhao Z, Li Z. Distributed finite-step iterative algorithm for economic dispatch of generation. IEEE Trans Industr Inf 2019;14(12):5221–32. [9] Wang S, Xing J, Jiang Z, Li J. Decentralized economic dispatch of an isolated distributed generator network. Int J Electric Power Energy Syst 2019;105:297–304. [10] Chen G, Yang Q. An ADMM-based distributed algorithm for economic dispatch in islanded microgrids. IEEE Trans Industr Inf 2018;14(9):3892–903. [11] Liu W, Chi M. Distributed optimal active power dispatch with energy storage units and power flow limits in smart grids. Int J Electric Power Energy Syst 2019;105:420–8. [12] Yi P, Hong Y, Liu F. Distributed gradient algorithm for constrained optimization with application to load sharing in power systems. Syst Control Lett 2015;83(711):45–52. [13] Liu M, Shi Y, Liu X. Distributed MPC of aggregated heterogeneous thermostatically controlled loads in smart grid. IEEE Trans Industr Electron 2016;63(2):1120–9. [14] Dorfler F, Simpson-Porco J, Bullo F. Breaking the hierarchy: distributed control and economic optimality in microgrids. IEEE Trans Control Netw Syst 2016;3(3):241–53. [15] Olfati-Saber R, Murray R. Consensus problems in networks of agents with switching topology and time-delays. IEEE Trans Autom Control 2004;49(9):1520–33. [16] Li H, Huang C, Chen G, Liao X, Huang T. Distributed consensus optimization in multiagent networks with time-varying directed topologies and quantized communication. IEEE Trans Cybernet 2017;47(8):2044–57. [17] Cui S, Wang Y, Lin X, Liu X. Distributed auction optimization algorithm for the nonconvex economic dispatch problem based on the gossip communication mechanism. Int J Electric Power Energy Syst 2018;95:417–26. [18] Han D, Sun W, Fan X. Dynamic energy management in smart grid: a fast randomized first-order optimization algorithm. Int J Electric Power Energy Syst 2018;94:179–87. [19] Ren W, Beard R. Consensus seeking in multiagent systems under dynamically changing interaction topologies. IEEE Trans Autom Control 2005;50(5):655–61. [20] Li H, Lü Q, Huang T. Distributed projection sub-gradient algorithm over timevarying general unbalanced directed graphs. IEEE Trans Autom Control 2019;64(3):1309–16. [21] Gupta R, Chow M. Networked control system: overview and research trends. IEEE Trans Industr Electron 2010;57(7):2527–35. [22] Qin J, Yu C, Gao H. Coordination for linear multiagent systems with dynamic interaction topology in the leader-following framework. IEEE Trans Industr Electron 2013;61(5):2412–22. [23] Cao Y, Yu W, Ren W, Chen G. An overview of recent progress in the study of distributed multi-agent coordination. IEEE Trans Industr Inf 2013;9(1):427–38. [24] Sundhar S, Nedic A, Veeravalli V. Distributed stochastic sub-gradient projection algorithms for convex optimization. J Optim Theory Appl 2010;147(3):516–45. [25] Yi P, Hong Y. Stochastic sub-gradient algorithm for distributed optimization with random sleep scheme. Control Theory Technol 2015;13(4):333–47. [26] Chowdhury B, Rahman S. A review of recent advances in economic dispatch. IEEE Trans Power Syst 1990;5(4):1248–59. [27] Doan T, Olshevsky A. On the geometric convergence rate of distributed economic dispatch/demand response in power systems, arXiv:<1609.06660>. [28] Zhang Z, Ying X, Chow M. Decentralizing the economic dispatch problem using a two-level incremental cost consensus algorithm in a smart grid environment. In: North American power symposium; 2011. p. 1–7. [29] Zhang Z, Chow M. Convergence analysis of the incremental cost consensus algorithm under different communication network topologies in a smart grid. IEEE Trans Power Syst 2012;27(4):1761–8. [30] Hug G, Kar S, Wu C. Consensus + innovations approach for distributed multiagent coordination in a microgrid. IEEE Trans Smart Grid 2017;6(4):1893–903. [31] Kar S, Hug G. Distributed robust economic dispatch in power systems: a consensus + innovations approach. In: IEEE power and energy society general meeting; 2012. p. 1–8. [32] Yang S, Tan S, Xu J. Consensus based approach for economic dispatch problem in a smart grid. IEEE Trans Power Syst 2013;28(4):4416–26. [33] Xing H, Mou Y, Fu M, Lin Z. Distributed bisection method for economic power dispatch in smart grid. IEEE Trans Power Syst 2015;30(6):3024–35. [34] Yang T, Wu D, Sun Y, Lian J. Minimum-time consensus-based approach for power system applications. IEEE Trans Industr Electron 2016;63(2):1318–28. [35] Cherukuri A, Cortes J. Distributed generator coordination for initialization and anytime optimization in economic dispatch. IEEE Trans Control Network Syst 2015;2(3):226–37. [36] Lakshmanan H, Defarias D. Decentralized resource allocation in dynamic networks of agents. SIAM J Optim 2008;19(2):911–40. [37] Zhang Z, Chow M. The influence of time delays on decentralized economic dispatch by using incremental cost consensus algorithm. Control and Optimization Methods for Electric Smart Grids New York, NY: Springer; 2012. p. 313–26. [38] Yang T, Wu D, Sun Y, Lian J. Impacts of time delays on distributed algorithms for economic dispatch. In: Power & energy society general meeting; 2015. p. 1–5. [39] Xi C, Wu Q, Khan U. On the distributed optimization over directed networks. Neurocomputing 2017;267:508–15. [40] Nedic A, Olshevsky A. Stochastic gradient-push for strongly convex functions on time-varying directed graphs. IEEE Trans Autom Control 2015;61(12):3936–47.

Fig. 7. Simulation results for fifty-four generators over directed network with communication delays.

Acknowledgments The work described in this paper was supported in part by the Innovation Support Program for Chongqing Overseas Returnees under Grant No. cx2017043, in part by the Special Financial Support from Chongqing Postdoctoral Science Foundation under Grant No. Xm2017100, in part by the National Natural Science Foundation of China under Grant Nos. 61773321, 61673080 and 61762020, in part by the High-level Innovative Talents Project of Guizhou under Grant No. QRLF201621, in part by the Science and Technology Top-notch Talents Support Project of Colleges and Universities in Guizhou under Grant No. QJHKY2016065, in part by the Science and Technology Foundation of Guizhou under Grant Nos. QKHJC20161076 and QKHJC20181083, and in part by the National Statistical Science Research Project of China under Grant No. 2018LY66. References [1] Spangler R, Shoults R. Power generation, operation, and control. IEEE Power Energy Mag 2014;12(4):90–3. [2] Abido M. Optimal design of power system stabilizers using particle swarm optimization. IEEE Power Eng Rev 2007;22(7):53. [3] Bakirtzis A, Petridis V, Kazarlis S. Genetic algorithm solution to the economic dispatch problem. IEE Proc – Gen Transm Distrib 1994;141(4):377–82. [4] Lin C, Viviani G. Hierarchical economic dispatch for piecewise quadratic cost functions. IEEE Trans Power App Syst PAS-103 2007(6):1170–5. [5] Huang A, Crow M, Heydt G, Zheng J, Dale S. The future renewable electric energy delivery and management (freedm) system: the energy internet. Proc IEEE 2010;99(1):133–48.

770

Electrical Power and Energy Systems 110 (2019) 759–771

H. Zhang, et al. [41] Nedic A, Ozdaglar A. Convergence rate for consensus with delays. J Global Optim 2010;47(3):437–56. [42] Yang T, Lu J, Wu D, Wu J, Shi G, Meng Z, et al. A distributed algorithm for economic dispatch over time-varying directed networks with delays. IEEE Trans Industr Electron 2017;64(6):5095–106. [43] Nedic A, Ozdaglar A. Distributed sub-gradient methods for multi-agent optimization. IEEE Trans Autom Control 2009;54(1):48–61. [44] Pettie S, Stout Q. Parallel and distributed computation. RAIRO Oper Res 1999;33(1):87–92. [45] Cai K, Ishii H. Average consensus on general strongly connected digraphs.

Automatica 2012;48(11):2750–61. [46] Bertsekas D. Convex optimization theory. Cambridge, MA: Athena Scientific; 2011. [47] Bauschke H, Combettes P. Convex analysis and monotone operator theory in hilbert spaces. New York: Springer; 2011. [48] Rockafellar R, Wets R. Variational analysis. Berlin: Springer; 1998. [49] Boyd S, Vandenberghe L. Convex optimization. IEEE Trans Autom Control 2006;51(11):1859. [50] Kleijnen J. Design and analysis of simulation experiments. Biometrical J 2008;47(3):299–308.

771