Applied Energy 178 (2016) 198–211
Contents lists available at ScienceDirect
Applied Energy journal homepage: www.elsevier.com/locate/apenergy
A wolf pack hunting strategy based virtual tribes control for automatic generation control of smart grid q Lei Xi a,b, Tao Yu a,⇑, Bo Yang c, Xiaoshun Zhang a, Xuanyu Qiu a a
School of Electric Power, South China University of Technology, Guangzhou 510641, China College of Electrical Engineering and New Energy, China Three Gorges University, Yichang 443002, China c Faculty of Electric Power Engineering, Kunming University of Science and Technology, Kunming 650504, China b
h i g h l i g h t s A novel distributed autonomous virtual tribes control system is proposed. WPH-VTC strategy is designed to solve the distributed virtual tribes control. Stochastic consensus game on mixed homogeneous and heterogeneous multi-agent are resolved. The optimal total power reference and its dispatch are resolved simultaneously in a dynamic way. The utilization rate of renewable energy is increased with a reduced carbon emissions.
a r t i c l e
i n f o
Article history: Received 4 November 2015 Received in revised form 25 May 2016 Accepted 11 June 2016
Keywords: Electric power autonomy Virtual tribes control Wolf pack hunting Multi-agent Carbon emissions
a b s t r a c t This paper proposes a novel electric power autonomy to satisfy the requirement of power generation optimization of smart grid and decentralized energy management system. A decentralized virtual tribes control (VTC) is developed which can effectively coordinate the regional dispatch centre and the distributed energy. Then a wolf pack hunting (WPH) strategy based VTC (WPH-VTC) is designed through combining the multi-agent system stochastic game and multi-agent system collaborative consensus, which is called the multi-agent system stochastic consensus game, to achieve the coordination and optimization of the decentralized VTC, such that different types of renewable energy can be effectively integrated into the electric power autonomy. The proposed scheme is implemented on a flexible and dynamic multi-agent stochastic game-based VTC simulation platform, which control performance is evaluated on a typical two-area load–frequency control power system and a practical Guangdong power grid model in southern China. Simulation results verify that it can improve the closed-loop system performances, increase the utilization rate of the renewable energy, reduce the carbon emissions, and achieve a fast convergence rate with significant robustness compared with those of existing schemes. Ó 2016 Elsevier Ltd. All rights reserved.
1. Introduction The recent development of smart grid [1,2] and distributed energy may provide an ultimate solution to the global energy crisis and sustainable development of the modern society. This drives the current energy management system (EMS) [3,4] to evolve from a centralized coordinated control into a decentralized autonomous
q This work was partially supported by National Key Basic Research Program of China (973 Program) (Grant no. 2013CB228205), National Natural Science Foundation of China (Grant no. 51477055). ⇑ Corresponding author. E-mail addresses:
[email protected] (L. Xi),
[email protected] (T. Yu),
[email protected] (B. Yang),
[email protected] (X. Zhang),
[email protected] (X. Qiu).
http://dx.doi.org/10.1016/j.apenergy.2016.06.041 0306-2619/Ó 2016 Elsevier Ltd. All rights reserved.
control. In general, the generation dispatch and control of EMS are achieved by the cooperation of automatic generation control (AGC) [5,6], plant controller (PLC) and generator speed governor, among which AGC plays a very crucial role for active power balance and autonomous frequency control of an island power grid. AGC means automatic frequency control and active power control of power systems, while AGC of interconnected grid is the fundamental function of generation dispatch and control. Frequency control can be divided into primary frequency regulation (FR), secondary FR and tertiary FR. In particular, the secondary FR is a centralized control among all areas in order to achieve a balance between the generation power and the load. The AGC of dispatch centre can track the grid frequency and line power deviation in the real-time, and then the total power reference is dispatched to PLC to achieve a global closed-loop control.
L. Xi et al. / Applied Energy 178 (2016) 198–211
AGC is the most crucial part of zero-error FR in the main power grid. Conventionally, a linear IEEE two-area load frequency control (LFC) [7,8] based power system model is adopted, which ignores the topology of power grids. LFC stabilizes grid frequency and line power through a regulation of output power of units. In practice, the control objectives of LFC are area control error (ACE) [9,10] and economic dispatch (ED) [11,12], while ACE can be calculated from frequency deviation and line power deviation, and ED is an optimal dispatch of active power. However, the modern power grid is developing into a large interconnected system, in which AGC is employed as the main tool to achieve active power control and frequency control of the interconnected grid power. Thus, the control performance of AGC will directly and mainly influence the power quality and security of the power grid. The conventional AGC adopts a centralized control structure, which is however inadequate to satisfy the requirement of smart grid as ever-increasing technological innovations emerge. An obvious weakness of the centralized AGC is the inefficient cooperation with ED due to the ignorance of power grid topology. Particularly, the total power references of regional dispatch centre are assigned through a fixed proportion of the adjustable capacity rather than a dynamic optimization, and it cannot effectively cooperate with the distribution networks and microgrids. Furthermore, no frequency support can be provided for the island power grid when the whole power grid is split. Many researches have been undertaken to resolve the above issue. Wu et al. [13] pointed out that the structure of future EMS should be in a decentralized, highly integrated, flexible and open pattern. Bose [14] proposed an explicit decentralized control for the smart grid. Sun et al. [15] deemed that the existing EMS need to be upgraded into a hybrid of both decentralized and centralized control, in which a decentralized autonomous coordinated EMS will be constructed to achieve an optimal coordination of sources-grids-loads. In fact, the decentralized autonomous control of power systems has motivated the active power grid splitting [16] and frequency support of the island power grid [17], from which a novel concept known as electric power autonomy (EPA) has been proposed. However, AGC of the island power grid was not taken into account by EPA. So far, the decentralized autonomous control has been adopted into the AGC of microgrids [18], which is very attractive as a virtual power plant (VPP) was introduced for the system FR [19]. Currently, how to integrate multiple microgrids, VPPs, and distributed energy into the whole power grid to achieve an optimal AGC remains pending. In order to resolve the above challenge, EPA and virtual tribes control (VTC) are employed for a novel optimal AGC design in this paper, which are illustrated as follow: EPA: This is defined as the autonomous control of electric power at the regional power grids. A regional power grid is separated into several smaller regional power grids (territory) based on the graph theory, which boundary is defined as the territory border. Each territory exchanges the electric power and maintains the whole grid frequency under a normal operation or small disturbance condition. However, if a severe disturbance occurs, it will be split from the whole power grid and rapidly degraded into an island power grid, hence each territory has to autonomously control its own electric power. VTC: This is a supplementary control in a decentralized autonomous pattern between the regional dispatch centre and distributed energy. It includes the PLC of large power plants, distribution networks and microgrids, and load control systems. Under this framework, a group of units is defined as a virtual generation tribe (VGT), as specified in Fig. 1, which can be interpreted as an equivalent virtual unit. Based on the hybrid
199
of centralized AGC and decentralized VTC, a smart generation control (SGC) can be developed which has the following four merits: (a) Optimal coordination of the secondary FR and tertiary FR; (b) effective coordination of sources and load control systems; (c) implementable cooperation between the regional dispatch centre and distributed energy; and (d) enhanced frequency support of the island power grid. The last obstacle for the implementation of VTC into the power systems is an appropriate optimal coordinated control design of the decentralized VTC. Recently, the coordinated control of multiagent (MA) has inspired numerous studies in the gas turbine power plant energy management system [20], optimal time-of-use pricing for urban gas [21], and smart buildings [22]. As the power systems have various uncertainties, they can be regarded as a non-markov decision process (MDP) and stochastic game. Then the non-MDP and game theory [23] based multiagent system (MAS) stochastic game (MAS-SG) [24] is developed to handle the complicated dynamic-gaming and decision-making among the heterogeneous MA. Based on the game theory, many advanced algorithms have been employed to obtain system equilibriums, such as correlated-Q (CE-Q) [25], asymmetric-Q [26], and their modifications [24]. Author’s previous work on the single-agent reinforcement learning (SARL) and MAS-SG has demonstrated that an optimal AGC can be achieved when the agent number is relatively small [27–35]. However, multi-equilibrium may emerge as the agent number increases, which inevitably consumes longer time resulted from the extensive online calculation of all system equilibriums, and may even lead to a severe system stability collapse. However, the aforementioned literatures [23–35] only calculated an optimal total power reference, which is then dispatched through a fixed proportion to the adjustable capacity. In general, they may not obtain an optimal dispatch due to the static optimization. Thus, some alternatives based on dynamic dispatch are needed for the optimal coordinated control of the decentralized VTC. Multi-agent system collaborative consensus (MAS-CC) is an interdisciplinary research area for the past two decades, which was inspired from the collaborative consensus phenomenon of wild animal groups for the struggle for existence in nature, e.g., group hunting or safari of a school of fish, a flock of birds, and a herd of horses. In 1986, Renolds initially proposed the three well-known rules of MAS-CC: (a) Gathering; (b) Collision avoidance; and (c) Speed synchronization. Based on these rules, Vicsek et al. [36] has modelled the collaborative consensus of speed direction. Moreover, Olfati-Saber and Murray [37] has developed the fundamental theory of MAS-CC based on the graph theory and system stability, which performance was discussed in the presence of dynamic system topology by [38]. The MAS-CC for homogeneous MA has also been used in the control system design of military, logistics, robot, etc. [39]. So far, few studies has been undertaken to investigate the MAS-CC application to power systems and the hybrid of homogeneous and heterogeneous MA. One possible approach to sort out this issue is to employ the leader–follower mode into the MA. Basically, a leader is someone who has obtained some important information for the benefit of the whole group, e.g., the closest path to food locations, thus it is eligible to lead the other group members (followers) to approach that destination. Mathematically, this mode can be described by assuming a virtual leader who has a constant speed that is regarded as the speed reference, which is then feedback to all followers to reach a collaborative consensus among the MA. Hu and Hong [40] has studied the collaborative consensus in both the fixed and switching topology, which convergence is analyzed by [41] via the combination of a collaboration diagram and an energy function. The effectiveness of leader–follower mode has been verified
200
L. Xi et al. / Applied Energy 178 (2016) 198–211
Fig. 1. The decentralized framework of AGC based on VGT.
by the robot formation [42]. A VTC based collaborative consensus was developed in [43]. However, approaches [36–43] mentioned above are based on collaborative consensus, which cannot dynamically calculate the optimal total power reference. So, a novel multi-agent system stochastic consensus game (MAS-SCG) has been proposed for the hybrid of homogeneous and heterogeneous MA, in which the MAS-CC will be chosen if a system has many followers, while the MAS-SG will be used if a system has few leaders. The idea of MAS-SCG stems from the group hunting of a wild wolf pack in the harsh environment, which ensures the survival and prosperity of the whole wolf pack via the collaborative consensus. To summarize, RL can only obtain optimal total power reference (control) while collaborative consensus can only obtain the optimal dispatch of the total power reference (dispatch), neither of them can achieve a global optimization (control + dispatch). As a consequence, this paper combined RL and collaborative consensus to achieve a global optimization. In this paper, a novel EPA and VTC are developed to satisfy the requirement of power generation optimization of the smart grid, which can effectively coordinate the regional dispatch centre and the distributed energy. A wolf pack hunting (WPH) strategy based VTC (WPH-VTC) is designed, which is a hybrid of control (DWoLF-PHC(k) based on MAS-SG) and optimization (collaborative consensus based on MAS-CC). Each VGT games with each other to obtain the optimal total power reference, in which each wolf pack learns a given strategy (sinusoidal wave in the paper) in the pre-learning, then the Q values and look-up tables are saved, and the optimal action strategy will be selected in online operation progress by the saved Q values and look-up tables (the whole progress belongs to MAS-SG). Then among a given VGT, all agents are trying to reach a consensus on ramp time, such that an optimal dispatch can be achieved dynamically. A highly reliable simulation platform based on the multi-area coordination of interconnected complex power grids is then built for its online implementation. The effectiveness of the proposed control scheme is evaluated by both a two-area load–frequency control power system and a practical Guangdong (GD) power grid model in southern China, in which the penetration of distributed renewable energy and
electric vehicles has been modelled as different types of intensive random disturbance. Simulation results verify that it can improve the closed-loop system performances, increase the utilization rate of the renewable energy, reduce the carbon emissions (CE), and achieve a fast convergence rate with significant robustness compared with those of existing schemes. 2. Framework development Based on the North American Electric Reliability Council (NERC) [44] and Institute of Electrical and Electronic Engineers (IEEE) standard [45] on the smart grid dispatching control, the overall framework of VTC and multi-area WPH-VTC is described by Figs. 1 and 2, respectively. The related concepts are defined as follows: Territory: It is a small regional power grid in an independent cutset, which is generally a small regional transmission–distri bution network of an island power grid, and can cooperate with the active splitting system in the regional power grid. Note that the territory power grid is different from the microgrids or active distribution networks as it is often connected with large power sources. Tribe: It is a group of all actual units and virtual units (e.g., storage systems and interruptible loads) for FR in a territory. Note that each territory can have one and only one tribe. Chief: This is a dispatch end of a whole tribe. It communicates and coordinates with a superior regional dispatch end and other tribe’s chief, and sends commands to the patriarch of each family in a tribe. Normally the unit with the largest total capacity is chosen as the chief, which does not belong to any family. Family: This is a group of units which has the same type of resources in a tribe, such as thermal, gas, hydro, wind power, solar, and nuclear. Patriarch: Who is a leader of generation control units (a large wolf in the WPH-VTC) with significant dispatch capacity, it can achieve active searching and execute complex commands independently.
201
L. Xi et al. / Applied Energy 178 (2016) 198–211
Main real-time states’information of provicial grid: -frequency deviation Δ f -area control error ACE -real -time value of CPS1 -real-time value of CPS2 and so on Data of short p eriod load forecasting in 24h Data of super short period load forecast
Optimizition of base generation curve in 24 h Optimizition of 15 min generation curve
Giving dispatching instructions 24 h in advance Giving dispatching instructions 15 min in advance
power deviation Δ P T of other provincial communication tie line
Real-time switch and states information of local territory grid ˖ -interconnected/islanding state -local frequency deviation -territory control error And so on
Giving generation dispatch and control orders by address
1 st VTC ith VTC
1 st VGT
1 st Territory grid
ith VGT
ith Territory grid
jth VGT
jth Territory grid
Dispatch center smart generation control Real -time values of CPS /ACE //Δ f/ ΔPT
CPS Intelligent controller
jth VTC
CPS control order dynamic optimizing distribution
Decentralized virtual generating tribe control system
Local provicial grid
Data of unit real -time coal consumption /emission /rea l-time trading cost
Load stochastic disturbance PLoad
Generation dispatch and control system of centralized provincial dispatch center
Fig. 2. The overall VTC framework.
Family member: Who is a follower of generation control units (a small wolf in the WPH-VTC), it can only follow the patriarch behaviour and execute some simple commands. Reserve: Who is a standby group of pumped-storage hydro plants, it will only be put into operation if a load disturbance exceeds 50% of the default value. Each decentralized autonomous generation tribe and its VTC constitute one VPP, thus the original power balance and optimal dispatch of the regional dispatch centre can be achieved by that VPP, while the regional dispatch centre does not need to significantly adjust its AGC. Under such framework, the novel VTC has the following three features: (a) The optimal power flow of all units in the territory can be calculated and distributed daily; (b) the secondary FR can be rapidly achieved under the grid-interconnected operation; and (c) the secondary FR and coordination of protection systems can be achieved for the EPA of the island power grid. 3. WPH-VTC The WPH-VTC is developed through combining the MAS-CC and MAS-SG framework to resolve the coordination and distributed VTC optimization. 3.1. MAS-SG framework A pack of wolves adopts the MAS-SG to game with other packs. Note that each VGT contains one and only one wolf pack. The DWoLF-PHC(k) method based MAS-SG has been developed by the authors. Some basic results are recalled in this section while more details can be found in [31]. The optimal target state value function Vp⁄(s) and strategy p⁄(s) under state s in Q-learning can be expressed as follows
V p ðsÞ ¼ max Q ðs; aÞ
ð1Þ
p ðsÞ ¼ arg max Q ðs; aÞ
ð2Þ
a2A
a2A
where A is the set of action. The state-action-reward-state-action with eligibility trace (SARSA(k)) RL algorithm [46], which provides a straightforward way to design an on-policy temporal-difference approach, such
that the algorithm efficiency can be significantly increased. The eligibility trace based on SARSA(k) is chosen as
ekþ1 ðs; aÞ ¼
ckek ðs; aÞ þ 1; if ðs; aÞ ¼ ðsk ; ak Þ ckek ðs; aÞ; otherwise
ð3Þ
where ek(s, a) denotes the eligibility trace at the kth iteration under state s and action a; c is the discount factor; and k is the trace-attenuation factor. Q(k) adopts SARSA(k) returns as a value function estimator through traces, which combines the frequency and recency of heuristic events. The estimates of the current value function errors are calculated by
qk ¼ Rðsk ; skþ1 ; ak Þ þ cQ k ðskþ1 ; ag Þ Q k ðsk ; ak Þ
ð4Þ
dk ¼ Rðsk ; skþ1 ; ak Þ þ cQ k ðskþ1 ; ag Þ Q k ðsk ; ag Þ
ð5Þ
where R(sk, sk+1, ak) indicates the agent’s reward function from state sk to sk+1 under a selected action ak; ag is a greedy action strategy; qk denotes the Q-function error of the agent at the kth iteration; dk is the estimate of Q-function error. Q-functions are updated as follows
Q kþ1 ðs; aÞ ¼ Q k ðs; aÞ þ adk ek ðs; aÞ
ð6Þ
Q kþ1 ðsk ; ak Þ ¼ Q kþ1 ðsk ; ak Þ þ aqk
ð7Þ
where a is the Q-learning rate. With sufficient trial-and-error, the state-action value function Qk(s, a) can converge to an optimal joint action strategy denoted by matrix Q⁄ with the probability of one. For a given agent, based on the mixed strategy set U(sk, ak), an exploration action ak will be executed under state sk and transit to state sk+1 with a reward R, while the updating law of U(sk, ak) is chosen as
(
Uðsk ; ak Þ
Uðsk ; ak Þ þ
usk ak ; P
if ak – arg maxakþ1 Q ðsk ; akþ1 Þ
usk akþ1 ; otherwise ð8Þ
usk ak ¼ minðUðsk ; ak Þ; ui =ðjAi j 1ÞÞ
ð9Þ
where u is a variable learning rate with ulose > uwin. If an average mixed strategy value is lower than the current value, then the agent wins and uwin will be selected, otherwise ulose will be chosen. The updating law is given as
202
L. Xi et al. / Applied Energy 178 (2016) 198–211
(
ui ¼
P
uwin ; if ai 2A Uðsk ; ai ÞQ ðsk ; ai Þ > ulose ; otherwise
P
~
ai 2A Uðsk ; ai ÞQ ðsk ; ai Þ
ð10Þ ~ k ; ai Þ is the average mixed strategy. where Uðs After action ai is executed, the average mixed strategy table of all actions is updated under state sk as
~ k ; ai Þ Uðs
~ k ; ai Þ þ ðUðsk ; ai Þ Uðs ~ k ; ai ÞÞ=visitðsk Þ; Uðs
8ai 2 A
DPrate iw ¼
where URiw and DRiw are the upper and lower bounds of the ramp rate, respectively. The ramp time of each family member can be updated according to (13) as
ð11Þ
3.2.1. Graph theory The topology of MAS is represented through a directed graph G = (V, E, A) with a set of nodes V = {v1, v2, . . . , vn}, edges E # V V, and a weighted adjacency matrix B = [bij] 2 Rnn [47]. Here node vi denotes the ith agent, edge means the relationship among the agents, and constant bij P 0 is the weight between vi and vj, respectively. A graph G is strongly connected if any vertex can be realized from any other vertex by a directed path. The Laplacian matrix L = [lij] 2 Rnn of the graph G can be written as follows:
8i – j
bij ; lij ¼ bij ;
where mi is the total number of units in the ith VGT and h i ~ðiÞ 2 Rmi mi is the stochastic row matrix of the ith VGT. ~ ðiÞ ¼ d D wv
ð12Þ
t iw ½k þ 1 ¼
DPerrori ¼ DPi
ð13Þ
j¼1
, n X jlij j;
i ¼ 1; . . . ; n
ð14Þ
j¼1
With the time-invariant communication and constant gains bij, a collaborative consensus can be achieved if and only if the directed graph is strongly connected [49]. 3.2.3. Ramp time consensus The ramp time is chosen as the consensus variable among all units in a VGT. A unit which has a higher ramp rate will be allocated with more disturbances. The wth unit’s ramp time in the ith VGT can be obtained as
t iw ¼ DPiw =DPrate iw
ð19Þ
(
DPiw ¼
DPmax if DPiw > DPmax iw ; iw DPmin iw ;
t max iw
¼
if DPiw < DPmin iw
DPmax iw =URiw ;
if DP iw > DPmax iw
DPmin iw =DRiw ;
if DP iw < DPmin iw
ð20Þ
ð21Þ
Again, if the DPiw exceeds its bound, the adjacency elements become
~ðiÞ ¼ 0; b wv ~ ðiÞ where B
v ¼ 1; 2; . . . ; mi
ð22Þ
h i ~ðiÞ 2 Rmi mi is the weighted adjacency matrix of the ¼ b wv
ith VGT. 3.3. WPH-VTC procedure
where xi is the state of the ith agent; k is the discrete time index; and dij[k] is the (i, j) entry of the stochastic row matrix D = [dij] 2 Rnn in the kth communication, which is given by
dij ½k ¼ jlij j
mi X DP iw
If DPi > 0, then t_ iw DPerrori > 0, otherwise t_ iw DP errori < 0. Similarly, the maximum ramp time and the DPiw will be modified if their bounds are reached as
t iw ¼
3.2.2. Collaborative consensus An MAS consisting of n autonomous agents is regarded as a node in a directed graph G. Collaborative consensus aims to achieve a consensus among each agent, which uses real-time updates of the state after communicating with adjacent agents. Due to the communication delay among agents, the first-order algorithm of a discrete time system is chosen as [48]
ð18Þ
w¼1
(
where L determines the topology of MAS.
n X dij ½kxj ½k
i ~ ðiÞ v ¼1 dwv ½kt iv ½k þ li DP errori ; if DP i > 0 Pmi ~ðiÞ v ¼1 dwv ½kt iv ½k li DP errori ; if DP i < 0
where li > 0 represents the ith VGT’s adjustment factor of the power error, DPerrori denotes the power error between the total power reference DPi and the sum of the unit’s regulation power in the ith VGT, which is obtained by
j¼1;j–i
xi ½k þ 1 ¼
ð17Þ
v ¼1
( Pm
The MAS-CC is introduced into the WPH-VTC, which is adopted by the family members with homogeneous MAS to follow the patriarch of a wolf pack.
lii ¼
mi X ~ðiÞ ½kt iv ½k d wv
Then the ramp time of the chief can be updated as [50]:
3.2. MAS-CC framework
n X
ð16Þ
DRiw ; if DPi < 0
tiw ½k þ 1 ¼
where visit(sk) is the total number of state sk from the initial state to the current state.
if DPi > 0
URiw
ð15Þ
where DPiw is the wth unit’s regulation power in the ith VGT. DPrate iw is the ramp rate of that unit determined by
The WPH-VTC has the following three features: A wolf pack embedded DWoLF-PHC(k) method games with each other to obtain greater benefits (better control performance), each wolf pack learns the given strategy (sinusoidal wave in the paper) in the pre-learning progress, then the Q values and look-up tables are saved, the optimal action strategy will be select in online operation progress by the saved Q values and look-up tables (the whole progress belongs to MAS-SG), while other all agents are trying to reach a consensus, small wolves (family members) permanently follow their chief (correspond to MAS-CC). The optimal strategy for a given region is only valid in that region. The value function Qk+1(s, a) cannot be updated simultaneously in all regions, which results in a time-delay for the obtained optimal strategy. The overall WPH-VTC procedure is described in Fig. 3.
203
L. Xi et al. / Applied Energy 178 (2016) 198–211
4. WPH-VTC design for AGC This section aims to design a novel WPH-VTC for the adaptive coordinated AGC. In every iteration, each area monitors the current operation state online to update the value function and Q-functions, and then an action will be executed based on the average mixed strategy.
Diw = 0.99 when jDPiw ðkÞj < 300 MW. Set Di of oil-fired units, liquefied natural gas (LNG) units and hydro units to be 0.7, 0.5, 0 in each VGT, respectively. The magnitude of l is the same in each area, here l = 0.5 is chosen. 4.2. Parameter setting There are four parameters k, c, a, and u need to be appropriately selected as follows [28,31]:
4.1. Reward function selection In general, ACE maximizes the long-term benefit of control performance standards (CPS) and alleviates severe power fluctuations, while CE takes the effect of EMS to the environment into account. Hence, the weighted sum of ACE and CE is selected as the reward function, in which a larger weighted sum will result in a smaller reward. The reward function R in each VGT is chosen as
!, mi X Rðsk1 ; sk ; ak1 Þ ¼ l½ACEðkÞ ð1 lÞ Diw ½DPiw ðkÞ 1000; 2
w¼1 max DPmin iw 6 DP iw ðkÞ 6 DP iw
ð23Þ
where ACE(k) and DPiw ðkÞ indicate the instantaneous value of ACE and the regulation power of the wth power unit at the kth iteration, l and (1 l) represent the controlled area’s ACE and CE metrics, respectively. Diw is the CE intensity coefficient of the wth unit in kg/kW h,
DPmax and DPmin iw iw are the upper and lower bound of the wth unit’s capacity, respectively. In particular, Diw = 0.87 when jDPiw ðkÞj > 600 MW, Diw = 0.89 when 600 MW P jDP iw ðkÞj>300 MW, and
Initialize Q0(s, a), R(0), U0(s, a), Ũ0(s, a), visit(s0), and e0(s, a), for all s∈S, a∈A; Set parameters φwin, φlose, φ, γ, λ, α, and Tstep = AGC decision time; Give the initial state s0, k = 0; Repeat 1. Choose an exploration action ak based on the mixed strategy set U(sk, ak); 2. Execute the exploration action ak to AGC plants and run the LFC system for the next Tstep seconds; 3. Observe a new state sk+1 via the moving averages of CPS1/ACE; 4. Obtain a short-term reward R(k) from (23); 5. Calculate the one step Q-function error ρk by (4); 6. Estimate the SARSA(0) value function error δk using (5); 7. For each state-action pair (s, a), execute: i) Let ek+1(s, a)←γλek(s, a); ii) Update Q-function Qk(s, a) to Qk+1(s, a) using (6); 8. Resolve the mixed strategy Uk(sk, ak) according to (8) and (9); 9. Update the value function Qk(sk, ak) to Qk+1(sk, ak) by (7); 10. Update the eligibility trace by (3), let e(sk, ak) ←e(sk, ak)+1; 11. Select the variable learning rate φ by (10); 12. Resolve average mixed strategy table through (11); 13. Set visit(sk)←visit(sk)+1; 14. Output total regulation power ∆Pi (i=1,2,…,n); 15. Determine the ramp rate (16); 16. Consensus algorithm (17) or (18); 17. Calculate the unit’s regulation power ∆Piw; 18. If the generation bound is not exceeded, then execute step 20; 19. Correct ∆Piw (20), tiw (21) and dwv (12),(14),(22); 20. Calculate the power error ∆Perror-i; 21. If |∆Perror-i|<εi is false, execute step 16; 22. Output regulation power ∆Piw (w=1,2,…,mi); 23. Set k = k + 1, return to step 1; End Fig. 3. Execution steps of the WPH-VTC.
The trace-attenuation factor 0 < k < 1, which allocates the credits among state-action pairs. It determines the convergence rate and the non-MDP effects for large time-delay systems. In general, k can be interpreted as a time scaling element in the backtracking. A small k means that few credits will be given to the historical state-action pairs for Q-function errors, while a large k denotes that much credit will be assigned. Through trial-and-error, it shows that 0.7 < k < 0.95 is acceptable. The discount factor 0 < c < 1, which discounts the future rewards in Q-functions. A value close to 1 should be chosen as the latest rewards in the thermal-dominated LFC process are the most important [28]. Experiments demonstrate that 0.6 < c < 0.95 is proper. The Q-learning rate 0 < a < 1, which comprises the convergence rate and algorithm stability in the Q-functions. Note that a small a value leads to decelerating the learning rate and enhancing the system stability. Simulation studies show that a value in the range of 0.001–0.95 is acceptable. The variable learning rate 0 < u < 1, which derives an optimal policy by maximizing the value actions. In particular, the algorithm will be degraded into Q-learning if u = 1 as a maximum value action is permanently executed in every iteration. For a fast convergence rate a stochastic game ratio ulose/uwin = 4 is selected. Based on the above guidance, the parameter values used in the WPH-VTC are given in Table 1, which can provide optimal control performance through trial-and-error. 5. Case studies The overall VTC-SP structure of multi-area power systems is developed for the WPH-VTC online implementation, as illustrated by Fig. 4. In Fig. 4, everything except the system model has been implemented in JADE. Particularly, system model is implemented in the Matlab, which is interacted with JADE. The related concepts and mechanism of agents used in the proposed VTC-SP structure are the same as that of [30]. For control performance analysis, multi-agent virtual tribes control (MA-VTC) module plays the core part of VTC-SP, in which Q-learning, Q(k), DWoLF-PHC(k), and WPH-VTC are all embedded. 5.1. Two-area LFC power system In order to test the control performance of the proposed MA-VTC, an IEEE two-area LFC power system [51] has been used,
Table 1 The parameter values used in the WPH-VTC. Parameter
Value
k (trace-attenuation factor) c (discount factor) a (Q-learning rate) u (variable learning rate)
0.9 0.9 0.5 0.06
204
L. Xi et al. / Applied Energy 178 (2016) 198–211
Fig. 4. The overall VTC-SP structure of multi-area power systems.
which framework is shown by Fig. 5, while the system parameters are taken from [52] and those of VGT1 and VGT2 provided in Table 2. The period of VTC at the dispatch end is chosen to be 4 s with different time-delay Ts in the secondary FR. Note that WPH-VTC has to undergo a sufficient pre-learning through offline trial-and-error before the online implementation, which includes extensive explorations in CPS state space for the optimization of Q-functions and state-value functions. Previous work [28–31] and this paper choose a sinusoidal load profile in the pre-learning due to the fact that most of the signals can be transformed into the sum of series of sinusoidal signals via Fourier Transform. Thus a sinusoidal load contains sufficient information for the algorithm to learn various signals in the prelearning procedure, such that an effective online operation of the algorithm can be achieved. A sinusoidal load disturbance was firstly adopted in the pre-learning to obtain the Q values and look-up table, which had been saved and will be employed for the future online operation. Then, in the online operation, a step change of load disturbance was used to simulate a sudden load increase, which often occurs in the power system operation to evaluate the control performance of WPH-PHC. In fact, we have already tested the algorithm on a more realistic load profile with
no symmetry, by using an additional white noise load disturbance representing an unpredictable penetration of distributed generations (a white noise, Table 4). Fig. 6 presents the pre-learning of each area, in which a consistent 10-min sinusoidal load disturbance is applied. It is obvious that WPH-VTC can converge to the optimal strategy in each VGT with qualified CPS1 (10-min average value of CPS1) and EAVE-10-min (10-min average value of ACE). Furthermore, a Q matrix 2-norms ||Qik(s, a) Qi(k1)(s, a)||2 6 1 (1 = 0.1 is a specified positive constant) is used as the criterion for the pre-learning termination of an optimal strategy [28]. Both the Q values and look-up table will be saved after the pre-learning, such that WPH-VTC can be applied into a real power system. The convergence result of Q-function differences obtained in each VGT during the pre-learning is given by Fig. 7. Apparently, WPH-VTC can accelerate the convergence rate by nearly 60% over that of Q (k). Note that the system will converge faster but also with a poorer learning performance when the load profile changes from the current sinusoidal signal to other signals. However, the convergence rate of pre-learning is not very important compared to the leaning performance, thus the sinusoidal signal is used in the pre-learning [28–31].
Fig. 5. The two-area LFC power system framework.
205
L. Xi et al. / Applied Energy 178 (2016) 198–211 Table 2 Model parameters of VGT units used in the GD power grid model. VGT no.
Plants type
Family no.
Units no.
Ts (s)
DP max (MW) iw
DP min iw (MW)
URiw/DRiw (MW/min)
Diw (kg/kW h)
VGT1
Coal-fired
Family1
G1 G2 G3 G4 G5 G6 G7 G8 G9 G10
40 40 40 45 45 45 45 45 8 5
120 120 120 135 135 300 300 320 188 180
120 120 120 135 135 300 300 320 188 0
6 6 6 6.75 6.75 15 15 16 18.81 180
0.99 0.99 0.99 0.99 0.99 0.99 0.99 0.89 0.5 0
G11 G12 G13 G14 G15 G16 G17 G18 G19 G20
40 43 43 43 38 38 38 10 25 25
500 330 125 125 150 150 150 280 120 120
500 330 125 125 150 150 150 280 120 120
25 13.2 5.625 5.625 5.85 5.85 5.85 30.8 9 9
0.89 0.89 0.99 0.99 0.99 0.99 0.99 0.5 0.7 0.7
G21 G22 G23 G24 G25 G26 G27 G28 G29
43 43 43 43 40 40 10 10 12
220 220 220 660 180 180 200 200 200
220 220 220 660 180 180 200 200 200
8.25 8.25 8.25 24.75 7.2 7.2 22 22 25
0.99 0.99 0.99 0.87 0.99 0.99 0.5 0.5 0.5
G30 G31 G32 G33 G34 G35 G36 G37 G38 G39 G40 G41 G42 G43 G44 G45 G46 G47 G48 G49 G50 G51 G52 G53 G54 G55 G56 G57 G58
45 45 45 45 45 45 45 40 40 40 40 40 40 40 38 38 38 38 12 12 12 20 20 20 22 22 5 5 5
600 100 100 200 200 200 210 240 240 280 280 280 250 250 360 360 400 400 180 180 180 150 150 180 180 180 300 300 400
600 100 100 200 200 200 210 240 240 280 280 280 250 250 360 360 400 400 180 180 180 150 150 180 180 180 0 0 0
30 5 5 10 10 10 10.5 12 12 14 14 14 12.5 12.5 14.19 14.19 15.77 15.77 22.5 22.5 22.5 9 9 10.8 9 9 300 300 400
0.89 0.99 0.99 0.99 0.99 0.99 0.99 0.99 0.99 0.99 0.99 0.99 0.99 0.99 0.89 0.89 0.89 0.89 0.5 0.5 0.5 0.7 0.7 0.7 0.7 0.7 0 0 0
Family2 Family3
VGT2
LNG Reserve (pumped-storage hydro plant)
Chief Family4
Coal-fired
Chief Family5
Family6
VGT3
LNG Oil-fired
Family7
Coal-fired
Family8
Chief Family9
VGT4
LNG
Family10
Coal-fired
Chief Family11
Family12
Family13
LNG
Family14
Oil-fired
Family15
Reserve (pumped-storage hydro plant)
Family16
The control performance of WPH-VTC, DWoLF-PHC(k), Q(k), and Q-learning is evaluated in the presence of a step load disturbance in VGT2 area, in which the same reward function of WPH-VTC is chosen according to (23) in each VGT. Fig. 8(a) shows that their overshoots are around 8.1%, 10.4%, 8.3%, and 6.6%, respectively, while their steady-state errors are 0%, 0%, 4.3%, and 4.2%, respectively. In addition, Fig. 8(b) and (c) illustrate that their minimum CPS1 are 188.8%, 187.5%, 188%, and 184.9%, respectively, while the average absolute values of ACE are 0.2533 MW, 0.9225 MW, 1.0877 MW, and 1.5026 MW, respectively. As a consequence,
WPH-VTC can provide better CPS performance and relaxation effect for AGC plants with less control costs, such that the wear-and-tear on the units can be significantly reduced. The control performance of each algorithm obtained in each VGT is summarized in Fig. 9, in which the stochastic white noise load fluctuation is used as the load disturbance after the pre-learning process, Tc is the average convergence time of the pre-learning; CE, |Df| (absolute values of the frequency deviation), |EAVE-min| (absolute values of 1-min ACE) and CPS1 are the average index over 24 h after the pre-learning.
L. Xi et al. / Applied Energy 178 (2016) 198–211
VGT2(%)
10 min CPS1
VGT1(%)
206 200 190 180 200 190 180 0
5000
Time (s)
10000
15000
VGT1(MW) VGT2(MW)
10 min ACE
(a) The average of 10-min CPS1. 0 -50 -100 0 -50 -100 0
5000
Time (s)
10000
15000
VGT1(MW) VGT2(MW)
WPH-VTC
(b) The average of 10-min ACE. 1000 0 -1000 1000 0 -1000 0
5000
10000
15000
Time (s)
(c) The WPH-VTC controller output. Fig. 6. The pre-learning of WPH-VTC obtained in each VGT.
VGT1
20000 0 60000
VGT2
Q-function differences
40000
40000 20000 0
0
5000
10000
15000
Time (s)
VGT1 VGT2
Q-function differences
(a) The convergence result of Q(λ). 10000 5000 0 10000 5000 0 0
5000
Time (s)
10000
15000
(b) The convergence result of WPH-VTC. Fig. 7. The convergence result of Q-function differences obtained in each VTC during the pre-learning.
207
1000
Q(λ)
1000
500
500
0 1000
0 1000
500
500
0
Q
DWoLF-PHC(λ) WPH-VTC
Controller Output VGT2 (MW)
L. Xi et al. / Applied Energy 178 (2016) 198–211
0 5000
15000
Time (s)
25000
0
35000
0 5000
15000
25000
Time (s)
35000
200
195
195
Q(λ)
200
190
190 200
200 195
190
Q
CPS1
VGT2
(%)
DWoLF-PHC(λ) WPH-VTC
(a) The VGT2 power regulating commands.
190 0
500
180
1000
0
500
Time (s)
1000
Time (s)
0
Q(λ)
0
-50
-50
-100 0
-100 0
-50
-50
Q
ACE
VGT2
(MW)
DWoLF-PHC(λ) WPH-VTC
(b) The CPS1 of VGT2 area.
-100
-100 0 5000
15000
25000
35000
0 5000
Time (s)
15000
25000
35000
Time (s)
(c) The ACE of VGT2 area. Fig. 8. Control performance of four VGT controllers.
0.08 0.0678 0.07 0.0620 0.06 0.0571 0.0466 0.05 0.0352 0.0363 0.04 0.0325 0.0310 0.03 0.02 0.01 0.00 VGT1 VGT2 MW |EAVE-min|
1.2E-04 1.0604E-04 1.0615E-04 1.0000E-04 1.0000E-04 1.0E-04 9.3988E-05 9.4046E-05 8.0E-05 6.0E-05 4.0E-05 2.0E-05 0.0E+00 Hz
0.0000E+0 0
0.0000E+0 0
VGT1
VGT2 |∆f | 14000 12000
11988
11640 11800 10128
11368
10000
7040
8000 6000
4928
4000
4080
199.9944
200.0
199.9940
199.5 199.0
198.9862 198.9941 198.6936
198.9891 198.9941 198.6953
198.5 198.0 %
VGT1
VGT2 CPS1
780 770.0 768.0 767.7 760 758.4 740
Q-learning Q(λ)-learning 711.1 709.8 709.3 700.4 DWoLF-PHC(λ)
720 700
WPH-VTC
680
2000
660
0 s
200.5
VGT1
VGT2 Tc
t/h
VGT1
VGT2 CE
Fig. 9. Statistic performance of each VTC obtained in the two-area LFC power system.
At last, Fig. 9 demonstrates that WPH-VTC can improve Tc over that of other methods in VGT1 by 17.21–65.97%, |Df| by 9.3988E005 Hz 1.0604E004 Hz, CPS1 by 0.50–0.65%, |EAVE-min| by 14.60–50%, and CE by 1.21–1.51%, respectively.
5.2. The GD power grid model in Southern China Fig. 10 shows the interconnected network of the GD power grid model consisting of 58 units [43] including VGT1, VGT2, VGT3 and
208
L. Xi et al. / Applied Energy 178 (2016) 198–211 Table 3 Statistic experiment results obtained under the impulsive perturbations in the GD power grid model. Controlled Performance Controller type area index Q-learning Q(k) |Df| |ACE| (MW) CPS1 (%) CPS2 (%) CPS (%) CE (t/h)
0.0046 43.0747 195.5091 100 100 500.1727
0.0041 41.5545 196.4034 100 100 478.0845
0.0041 32.9518 197.6214 100 100 479.7273
0.0038 17.5770 198.2325 100 100 464.6656
VGT2
|Df| |ACE| (MW) CPS1 (%) CPS2 (%) CPS (%) CE (t/h)
3.56E03 46.1218 195.2115 100 100 479.2516
0.0041 45.7591 196.1198 100 100 458.4044
0.0041 44.106 196.5922 100 100 446.6685
0.0038 16.4318 196.8649 100 100 435.3203
VGT3
|Df| |ACE| (MW) CPS1 (%) CPS2 (%) CPS (%) CE (t/h)
3.57E03 39.2178 195.7111 100 100 455.2462
0.0041 44.0816 196.1342 99.3056 99.50 442.0078
0.0041 38.7376 197.3540 99.31 99.47 429.0699
0.0038 29.8905 198.2111 100 100 424.1595
VGT4
|Df| |ACE| (MW) CPS1 (%) CPS2 (%) CPS (%) CE (t/h)
3.56E03 48.9634 193.0020 100 100 477.3273
0.0041 35.4877 196.7254 100 100 443.0146
0.0041 37.5741 196.0185 100 100 467.8777
0.0038 21.7724 199.1086 100 100 399.0937
Fig. 10. The interconnected network of the GD power grid model in southern China.
ΔPord-VGT1
ΔPwind ΔPphotovoltaic Δf VGT1 ΔPg ě ě 1 / (1 + sTt ) K p1 / (1 + sTp ) GRC Power Turbine ΔPtie 2π T12 / s ě Unit G1 Limiter ΔPelectric 9*7
ΔX g
1/ R
1 / (1 + sTs ) ě 1 / (1 + sTg ) Governor Secondary time Unit G1 delay
ΔX g
1 / (1 + sTg ) Governor Unit G10
ΔPord-VGT4 1 / (1 + sTs )
1/ R
Secondary time delay
ě
vehicle
GRC
1 / (1 + sTt )
Δf VGT2
Turbine Power Unit G10 Limiter
Table 4 Statistic experiment results obtained under the stochastic white noise load fluctuation in the GD power grid model.
ΔPwind ΔPphotovoltaic ΔPg Δf VGT4 ě ě K p4 / (1 + sTp ) 1 / (1 + sTg ) 1 / (1 + sTt ) GRC Governor Turbine Power ΔPtie Unit G30 Unit G30 Limiter ΔP 2π T42 / s ě electric Δf 9*7
ΔX g
ΔX g
1 / (1 + sTg ) Governor Unit G58
GRC
1 / (1 + sTt )
DWoLF-PHC(k) WPH-VTC
VGT1
vehicle
Controlled Performance Controller type area index Q-learning Q(k) VGT1
|Df| |ACE| (MW) CPS1 (%) CPS2 (%) CPS (%) CE (t/h)
0.0007 31.9768 198.5660 100 100 761.7248
0.0019 14.4401 199.5793 100 100 766.0606
0.0010 16.3316 199.7001 100 100 765.2647
0.0001 1.9741 199.9787 100 100 757.9095
VGT2
|Df| |ACE| (MW) CPS1 (%) CPS2 (%) CPS (%) CE (t/h)
0.0007 42.1753 196.6866 100 100 712.1337
0.0019 20.1592 199.1828 100 100 713.3994
0.0010 39.6439 197.4925 100 100 709.2844
0.0001 1.8831 199.9776 100 100 700.4521
VGT3
|Df| |ACE| (MW) CPS1 (%) CPS2 (%) CPS (%) CE (t/h)
0.0007 27.9780 198.5247 100 100 695.1489
0.0019 13.3999 199.6203 100 100 694.1036
0.0010 23.8873 199.7138 100 100 673.9260
0.0001 1.2649 199.9816 100 100 661.6600
VGT4
|Df| |ACE| (MW) CPS1 (%) CPS2 (%) CPS (%) CE (t/h)
0.0007 39.2592 195.5287 100 100 644.0882
0.0019 21.7010 198.6943 100 100 637.1655
0.0010 37.6097 196.3786 100 100 660.7673
0.0001 2.6243 199.9615 100 100 635.5505
VGT2
Turbine Power Unit G58 Limiter
Fig. 11. The GD power grid model.
VGT4 power grid. The MA-VTC is then analyzed in the GD power grid model shown in Fig. 11. It includes an alternating current (AC)/direct current (DC) hybrid transmission system, which satisfies CPS with a VTC period of 4 s. The L10 of GD power grid model is 288 MW. DPg is the turbine output, DXg is the governor output, Ts is the secondary time-delay, Tg is the time constant of the governor, Tt is the time constant of the turbine, and Kp/(1 + sTp) is the equivalent function of AC frequency response. Here Tg = 0.08, Tt = 0.3, Kp1 = 0.002667, Kp2 = 0.00285, Kp3 = 0.002667, Kp4 = 0.0025, and Tp = 20 are adopted, generation rate constraint (GRC) is the URiw/DRiw in this paper, GRC and all the other system parameters are given in Table 2 [43]. The system includes four types of plant, that is, coal-fired, LNG, hydro, and oil-fired. The output of each plant is controlled by its own governor and the set point of VTC is obtained according to the optimal dispatch. The long-term MA-VTC control performance is evaluated by statistic experiments, which undergoes a specific disturbance during a period of 30 days. Four types of controller are tested, i.e., Q-learning, Q(k), DWoLF-PHC(k), and WPH-VTC. The statistic experiment results obtained under the impulsive perturbations and stochastic white noise load fluctuation are tabulated by Tables 3 and 4, respectively, where |Df| and |ACE| are the average absolute values of the frequency deviation and ACE over the entire simulation. CPS1, CPS2 and CPS are the
DWoLF-PHC(k) WPH-VTC
monthly compliance percentages. Note that the renewable energy sources are considered as a load disturbance. Both an impulsive perturbation (amplitude: 1000; period: 1200 s; pulse width: 50% of period) and a white noise (noise power: 10,000; sample time: 60; seed: 23,341). The previous work [28–31] adopted the same type of load disturbance to represent the renewable energy sources but with different parameter values for different utilization rates of
L. Xi et al. / Applied Energy 178 (2016) 198–211
renewable energy. The same weight of WPH-VTC in each VGT is chosen through a more effective joint cooperation than other policies, such that a higher scalability and self-learning efficiency can be achieved. Remark. Each territory grid adopts the DWoLF-PHC(k) which is an independent self-play [31], while the computation and communication cost is only determined by its own scale, not the total number of territory grid. Furthermore, among each territory grid, the computation and communication cost is only determined by the number of adjacent units, not the total number of units in that territory grid. So if the number of unit increases significantly, e.g., with 10, 50 or 200 units, one can easily divide them into more territory grids in such a way that the total unit number and adjacent unit number of each territory grid being limited to a resolvable number, thus the proposed method is scalable to large-scale power system. Table 3 illustrates that WPH-VTC can improve |ACE| in VGT1 than that of others by 46.7–59.2%, CPS1 by 0.3–1.4%, |Df| by 0.0003–0.0008 Hz, and CE by 2.8–7.1%, respectively. While Table 4 shows that WPH-VTC improves |ACE| in VGT1 over others by 86.3–93.8%, CPS1 by 0.1–0.7%, |Df| by 0.0006–0.0018 Hz, CE by 0.2–3.8%, respectively. Similar results can be found in other areas. As a result, WPH-VTC is more adaptive under various operation conditions and has a superior self-learning capability than that of others, particularly when the system is perturbed under a stochastic white noise load fluctuation. Since both the joint decision actions and previous state-action pairs are employed, the MA-VTC uses the average policy value so as to design a variable learning rate to achieve the VTC coordination. Both Tables 3 and 4 verify that WPH-VTC has better control performance than that of others in terms of CPS. Since the average mixed strategy needs to be resolved online for the mixed strategy update of each area, a real-time control performance must be considered to design the variable learning rate and the average policy value. Furthermore, it is straightforward to obtain a relativity weight of each area, which can dynamically update its Q values and look-up tables through the experience sharing, such that the control can be properly and timely loosened or tightened to optimize the overall control performance. Moreover, for a given total generation power in the power grid, an increase of the utilization rate of the renewable energy (power) will cause a decrease of the same amount of conventional units (power), e.g., coal fired. Thus one can obtain the conversion ratio between the utilization rate of the renewable energy and CE. Precisely, according to (23), an increase of renewable energy penetration (4Piw(k)) will result in a decrease of CE by Diw4Piw(k)/1000. Thus the experiment results verify that the utilization rate of the renewable energy has been significantly increased with a reduced CE. To this end, VTC control performance is evaluated by CPS and | Df| with the following criterions: (i) CPS is qualified if CPS1 P 200% and CPS2 is an arbitrary value; (ii) CPS is qualified if 100% 6 CPS1 < 200% and CPS2 P 90%; (iii) CPS is unqualified if CPS1 < 100%. It’s worth mentioning that Df must be strictly regulated between (0.05 to 0.2) Hz when the system frequency is required to be maintained at its rated value during the steady-state operation.
6. Discussion The difference of each algorithm is provided in Table 5. This paper proposes a novel EPA and decentralized VTC, which has the following three main advantages:
209
From the fundamental theory perspective, WPH-VTC resolves the issue of optimal coordinated control for decentralized VTC, which achieves the stochastic consensus game for the hybrid of homogeneous and heterogeneous MAS-SCG. From the control engineering perspective, VTC is based on active power control and area frequency autonomy while the existing automatic voltage control (AVC) is based on reactive power control and node voltage control. This similarity inspires a hybrid of VTC and AVC for future studies. As a result, the implementation of VTC of the decentralized EMS is feasible and the cost is acceptable. From the power system perspective, decentralized VTC can optimize the power generation in the presence of increasing penetration of wind, solar, and electric vehicles. The introduction of decentralized autonomous middle coordinated control layer can fully exploit the power generated from the centralized large sources (hydro, thermal, gas, and nuclear energy, etc.), distributed small sources (wind, solar and ocean energy, etc.), controllable loads, and static/dynamic storage systems. Note that WPH-VTC has the fastest convergence rate among others for the VTC, which is within 4–16 s control period. Hence it is adequate for the control design of relatively small time-scale systems, such as drone group, and robot group. There are four contributions of this paper: (1) The conventional centralized AGC cannot achieve an optimal AGC when a large number of the distribution networks, microgrids, VPPs, and distributed energy integrated into the main power grid, which is resulted from the malignant effect of significant stochastic load disturbances. Thus this paper proposes a decentralized AGC called virtual tribes control (VTC) to achieve an optimal AGC when a large number of multiple microgrids, VPPs, and distributed energy integrated into the whole power grid, as the total stochastic load disturbances can be divided into each territory power grid (regional power grid), such that the malignant effect of stochastic load disturbances can be dramatically reduced. (2) The conventional centralized AGC cannot provide frequency support for the island power grid when the whole power grid is split. Thus this paper introduces a new concept called EPA to provide frequency support for the island power grid. (3) A WPH-VTC was developed to resolve VTC. WPH-VTC is a hybrid of control (DWoLF-PHC(k) based on MAS-SG) and optimization (collaborative consensus based on MAS-CC). In particular, DWoLF-PHC(k) is firstly used under MAS-SG to rapidly obtain the total power reference (control), then collaborative consensus is adopted under MAS-CC to optimally distribute the obtained total power reference into each unit (optimization). (4) The current literatures have only considered either to obtain the overall power reference of AGC (control) or to optimally distribute the obtained total power reference into each unit (optimization). In contrast, WPH-VTC can simultaneously calculate the optimal total power reference (control) and optimally dispatch it into each unit (optimization). In the optimization process, only collaborative consensus can be used, while in the control process there are some alternatives available. In fact, the authors’ published work has thoroughly compared the control performance of different methods, such as Q, Q (k), R(k), DCEQ(k) and DWoLF-PHC(k) [31], which has shown that DWoLF-PHC(k) has the best control performance, thus this paper adopts DWoLF-PHC(k) for the control process. WPH-VTC is a decentralized control which can be easily extended into a larger system by only increasing the number of
210
L. Xi et al. / Applied Energy 178 (2016) 198–211
Table 5 Feature comparison of different algorithms. Algorithm
Convergence
Agent type
Mixed policy
Game type
Q-learning Q(k) DWoLF-PHC(k) WPH-VTC
No Yes Yes Yes
SA SA MA MA
No No Yes Yes
General General General General
sum sum sum sum
games games games and self play games and self play
VGTs to divide a large system into several smaller subsystems. To choose a VGT, it only requires high-voltage tie-lines to be used as the boundaries based on the power grid topology. Normally there must include at least a whole city in a chosen VGT for an easy communication and power dispatch in practice.
7. Conclusion This paper proposes EPA and VTC for the AGC coordination in the presence of significant penetration of distributed energy. A novel WPH-VTC is firstly developed by combining MAS-SG and MAS-CC to resolve the decentralized VTC, which can simultaneously handle a stochastic consensus game on mixed homogeneous and heterogeneous MA. In particular, WPH-VTC can simultaneously calculate the optimal total power reference and dispatch it into each unit in a dynamic way, which consists of two parts, e.g., DWoLF-PHC(k) and collaborative consensus. The former one is associated with a variable learning rate and an average policy is adopted in MAS-SG to obtain the total power reference, while the latter one is employed in MAS-CC to dispatch the regulation power to each unit. A flexible VTC-SP framework in the multi-area power systems has been developed for the online implementation of WPH-VTC, which provides a powerful tool for the analysis of different control schemes under various operation conditions. Two case studies for AGC coordination have been done with Q-learning, Q(k), DWoLF-PHC(k), and WPH-VTC, respectively. Simulation results verify that WPH-VTC is highly adaptive and robust in the multi-regional, intensively random, and interconnected complex power grid, which can significantly increase the utilization rate of renewable energy and reduce CE. References [1] Crespo P, Granado D, Pang Z, Wallace SW. Synergy of smart grids and hybrid distributed generation on the value of energy storage. Appl Energy 2016;170:476–88. [2] Coelho VN, Coelho IM, Coelho BN, Reis AJR, Enayatifar R, Souza MJF, et al. A self-adaptive evolutionary fuzzy model for load forecasting problems on smart grid environment. Appl Energy 2016;169:567–84. [3] Casals M, Gangolells M, Forcada N, Macarulla M, Giretti A, Vaccarini M. SEAM4US: an intelligent energy management system for underground stations. Appl Energy 2016;166:150–64. [4] Zhao B, Xue M, Zhang X, Wang C, Zhao J. An MAS based energy management system for a stand-alone microgrid at high altitude. Appl Energy 2015;143:251–61. [5] Li Z, Qiu F, Wang J. Data-driven real-time power dispatch for maximizing variable renewable generation. Appl Energy 2016;170:304–13. [6] Muttaqi KM, Le ADT, Aghaei J, Mahboubi-Moghaddam E, Negnevitsky M, Ledwich G. Optimizing distributed generation parameters through economic feasibility assessment. Appl Energy 2016;165:893–903. [7] Lakshmanan V, Marinelli M, Hu J, Bindner HW. Provision of secondary frequency control via demand response activation on thermostatically controlled loads: solutions and experiences from Denmark. Appl Energy 2016;173:470–80. [8] Kaneko T, Uehara A, Senjyu T, Yona A, Urasaki N. An integrated control method for a wind farm to reduce frequency deviations in a small power system. Appl Energy 2011;88(4):1049–58. [9] Park J, Law KH. A data-driven, cooperative wind farm control to maximize the total power production. Appl Energy 2016;165:151–65. [10] Chiang N, Zavala VM. Large-scale optimal control of interconnected natural gas and electrical transmission systems. Appl Energy 2016;168:226–35.
Framework
Feature
MDP MDP MAS-SG MAS-SCG
Decentralized, Decentralized, Decentralized, Decentralized,
CE autonomous autonomous autonomous autonomous, strong robustness
High Medium Medium Low
[11] Chen Y, Wei W, Liu F, Mei S. Distributionally robust hydro-thermal-wind economic dispatch. Appl Energy 2016;173:511–9. [12] Biswas (Raha) S, Mandal KK, Chakraborty N. Pareto-efficient double auction power transactions for economic reactive power dispatch. Appl Energy 2016;168:610–27. [13] Wu FF, Khosrow M, Anjan B. Power system control centers: past, present, and future. Proc IEEE 2005;93(11):1890–908. [14] Bose A. Smart transmission grid applications and their supporting infrastructure. IEEE Trans Smart Grid 2010;1(1):11–9. [15] Sun H, Zhang B, Wu W. Family of energy, management system for smart grid. In: IEEE PES innovative smart grid technologies Europe, Berlin, Germany; 2012. p. 1–5. [16] You H, Vittal V, Yang Z. Self-healing in power systems: an approach using islanding and rate of frequency decline-based load shedding. IEEE Trans Power Syst 2003;18(1):174–81. [17] Ali R, Mohamed TH, Qudaih YS, Mitani Y. A new load frequency control approach in an isolated small power systems using coefficient diagram method. Int J Electr Power Energy Syst 2014;56(3):110–6. [18] Bevrani H, Hiyama T. Int automa gen contr. CRC Press; 2011. [19] Pudjianto D, Ramsay C, Strbac G. Virtual power plant and system integration of distributed energy resources. IET Renew Power Gen 2007;1(1):10–6. [20] Roche R, Idoumghar L, Suryanarayanan S, Daggag M, Solacolu C, Miraoui A. A flexible and efficient multi-agent gas turbine power plant energy management system with economic and environmental constraints. Appl Energy 2013;101 (1):644–54. [21] Gong C, Tang K, Zhu K, Hailu A. An optimal time-of-use pricing for urban gas: a study with a multi-agent. Appl Energy 2016;163:283–94. [22] Wang Z, Wang L, Dounis AI, Yang R. Multi-agent control system with information fusion based comfort model for smart buildings. Appl Energy 2012;99:247–54. [23] Nash J. Non-cooperative games. Ann Math 1951;54(2):286–95. [24] Busoniu L, Babsaka R, Schutter B. A comprehensive survey of multiagent reinforcement learning. IEEE Trans Syst Man Cyber C–Appl Rev 2008;38 (2):156–72. [25] Greenwald A, Hall K. Correlated-Q learning. In: Proceedings of ICML-2003. Washington (DC); 2003. p. 242–9. [26] Könönen V. Dynamic pricing based on asymmetric multiagent reinforcement learning. Int J Intell Syst 2006;21(1):73–98. [27] Yu T, Wang Y, Ye W, Zhou B, Chan KW. Stochastic optimal generation command dispatch based on improved hierarchical reinforcement learning approach. IET Gener Transm Distrib 2011;5(8):789–97. [28] Yu T, Zhou B, Chan KW, Chen L, Yang B. Stochastic optimal relaxed automatic generation control in Non-Markov environment based on multi-step Q(k) learning. IEEE Trans Power Syst 2011;26(3):1272–82. [29] Yu T, Zhou B, Chan KW, Yuan Y, Yang B, Wu Q. R(k) imitation learning for automatic generation control of interconnected power grids. Automatica 2012;48(9):2130–6. [30] Yu T, Xi L, Yang B, Xu Z, Jiang L. Multiagent stochastic dynamic game for smart generation control. J Energy Eng 2016;142(1):04015012. [31] Xi L, Yu T, Yang B, Zhang X. A novel multi-agent decentralized win or learn fast policy hill-climbing with eligibility trace algorithm for smart generation control of interconnected complex power grids. Energy Convers Mange 2015;103(10):82–93. [32] Yu T, Liu J, Hu X, Chan KW, Wang J. Distributed multi-step Q(k) learning for optimal power flow of large-scale power grid based on distributed multi-step backtrack Q(k) learning. Int J Electr Power 2012;42 (1):614–20. [33] Zhou B, Chan KW, Yu T. Equilibrium-inspired multiple group search optimizer with synergistic learning for multi-objective electric power dispatch. IEEE Trans Power Syst 2013;28(4):3534–45. [34] Zhou B, Chan KW, Yu T, Wei H, Tang J. Strength pareto multi-group search optimizer for multiobjective optimal VAR dispatch. IEEE Trans Ind Inform 2014;10(2):1012–22. [35] Yu T, Zhou B, Chan KW, Lu E. Stochastic optimal CPS relaxed control methodology for interconnected power systems using Q-learning method. J Energy Eng ASCE 2011;137(3):116–29. [36] Vicsek T, Czirok A, Ben-Jacob E. Novel type of phase transition in a system of self-driven particles. Phys Rev Lett 1995;75(6):1226–9. [37] Olfati-Saber R, Murray RM. Consensus problems in networks of agents with switching topology and time-delays. IEEE Trans Automat Contr 2001;9 (19):1520–33. [38] Ren W, Beard R. Consensus seeking in multiagent systems under dynamically changing interaction topologies. IEEE Trans Automat Contr 2005;50 (5):655–61.
L. Xi et al. / Applied Energy 178 (2016) 198–211 [39] Wang Z, Wang L, Dounis AI, Yang R. Multi-agent control system with information fusion based comfort model for smart buildings. Appl Energy 2012;99(6):247–54. [40] Hu J, Hong Y. Leader-following coordination of multi-agent systems with coupling time delays. Phys A: Stat Mech Appl 2007;374(2):853–63. [41] Leonard NE, Fiorelli E. Virtual leaders, artificial potentials and coordinated control of groups. In: IEEE conference on decision & control; 2001. p. 2968–73. [42] Conslin L, Morbidini F. Leader-follower formation control of non-homonymic mobile robots with input constraints. Automatica 2008;44 (5):1343–9. [43] Zhang X, Yu T, Yang B. Virtual generation tribe based robust collaborative consensus algorithm for dynamic generation command dispatch optimization of smart grid. Energy 2015;101:34–51. [44] Jaleeli N, VanSlyck LS. NERC’s new control performance standards. IEEE Trans Power Syst 1999;14(3):1092–9.
211
[45] Jaleeli N, VanSlyck LS. A review of computer tools for modeling electric vehicle energy requirements and their impact on power distribution networks. IEEE Trans Appl Energy 2016;172:337–59. [46] Sutton RS, Barto AG. Reinforcement learning: an introduction. MIT Press; 1998. [47] Godsil C, Royle G. Algebraic graph theory. New York: Springer-Verlag; 2001. [48] Moreau L. Stability of multiagent systems with time-dependent communication links. IEEE Trans Automat Contr 2005;50(2):169–82. [49] Ren W, Beard RW. Distributed consensus in multi-vehicle cooperative control: theory and applications. London: Springer-Verlag; 2008. [50] Conradt L, Roper TJ. Consensus decision making in animals. Trends Ecol Evol 2005;20:449–56. [51] Elgerd OI. Electric energy system theory-an introduction. New Delhi: McGrawHill; 1983. [52] Ray G, Prasad AN, Prasad GD. A new approach to the design of robust load frequency controller for large scale power systems. Electr Power Syst Res 1999;52(1):13–22.