Real-time stochastic optimal scheduling of large-scale electric vehicles: A multidimensional approximate dynamic programming approach

Real-time stochastic optimal scheduling of large-scale electric vehicles: A multidimensional approximate dynamic programming approach

Electrical Power and Energy Systems 116 (2020) 105542 Contents lists available at ScienceDirect Electrical Power and Energy Systems journal homepage...

3MB Sizes 7 Downloads 160 Views

Electrical Power and Energy Systems 116 (2020) 105542

Contents lists available at ScienceDirect

Electrical Power and Energy Systems journal homepage: www.elsevier.com/locate/ijepes

Real-time stochastic optimal scheduling of large-scale electric vehicles: A multidimensional approximate dynamic programming approach

T

Z.N. Pana, T. Yua, , L.P. Chena, B. Yangb, B. Wangc, W.X. Guoc ⁎

a

College of Electric Power, South China University of Technology, Guangzhou 510640, China Faculty of Electric Power Engineering, Kunming University of Science and Technology, Kunming 650500, China c Electric Dispatch and Control Center, Guangdong Power Grid Co., Ltd., Guangzhou 510600, China b

ARTICLE INFO

ABSTRACT

Keywords: Electric vehicle Charging optimization Real-time optimization Stochastic optimization Approximate dynamic programming

This paper studies the real-time charging optimization (RTCO) of large-scale electric vehicles (EVs), which is a multistage and multidimensional stochastic resource allocation problem. In order to handle the complex RTCO of large-scale EVs, a multidimensional approximate dynamic programming (ADP-RTCO) is designed for the sequent optimal decision makings. The hierarchy of ADP-RTCO contains two layers. In the upper layer, RTCO of large-scale EVs is formulated as a multidimensional energy storage problem by classifying EVs into several virtual EV clusters (EVCs). Then, temporal difference learning based policy iteration method is used to obtain the value function approximation for each EVC. In the lower layer, priority based reallocation algorithm is employed to obtain the detailed charging power for connected EVs. ADP-RTCO is designed with on-line learning ability to enhance its adaptability and robustness to ambiguous environment. Comprehensive simulation results demonstrate the optimality and robustness of ADP-RTCO in both cost-saving and load-flattening problems. Furthermore, ADP-RTCO can obtain higher-quality solution compared with other algorithms and is applicable for RTCO of large-scale EVs under uncertainties due to its high computation efficiency.

1. Introduction Smart grid has attracted enormous attentions around the globe, which involves various renewable energy, e.g., wind [1], solar [2], biomass, etc., to handle the energy crisis and to reduce greenhouse gas emission. An alternative to resolve this thorny obstacle is the application of electric vehicle (EV) thanks to its low pollution emission and high energy efficiency [3]. So far, many countries have promoted the domestic deployment of EV [3,4], especially in United States, China, and Japan [5,6]. In the presence of high EV penetration, uncoordinated charging might degrade the power system operation, e.g., transformer overloading or unsatisfied power quality. Therefore, real-time charging optimization (RTCO) of large-scale EVs should be concerned [7,8]. In essence, RTCO of EVs attempts to utilize the flexibilities of distributed resources [9] with large number, small capacity, and diverse characteristics [10], which is a typical dynamic resource allocation problem with multivariate stochastic factors, because individual EV charging behavior, real-time price (RTP), conventional load profiles, and renewable generation outputs [11] are highly stochastic. RTCO of EVs has attracted significant attentions in recent years.



Note that it is an original multistage problem which can be temporally decomposed into several sub-problems and then be solved by finite horizon dynamic programming. However, due to the large numbers of EVs as well as their continuous state and decision space, it is almost intractable for traditional approaches, e.g., dynamic programming (DP) [12]. For the sake of compromise, model predictive control (MPC) is the most widely used method to obtain close-to-optimal solution for RTCO, by assuming stochastic factors are known in advanced [13–19]. In particular, MPC was to achieve load-flattening considering random charging behaviors of EVs in [13–15]. The performance gap between MPC-based solution and the optimal solution in terms of probability distributions of stochastic variables was analyzed in [15]. Ref. [16] applied MPC to minimize the power fluctuation of renewable energy. A transactive control based EV charging was proposed in [17], by assuming the future market price and EV behaviors were known in prior. Ref. [18] applied MPC to match the uncertain renewable generation with EV charging demand. Generally speaking, MPC based approaches are easily implemented, however, their performances are highly dependent on the forecasting performance because of their requirements of probability distributions [19] or expected values of stochastic variables [15] for a long prediction horizon. Consequently, MPC is

Corresponding author. E-mail address: [email protected] (T. Yu).

https://doi.org/10.1016/j.ijepes.2019.105542 Received 13 April 2019; Received in revised form 6 July 2019; Accepted 8 September 2019 0142-0615/ © 2019 Elsevier Ltd. All rights reserved.

Electrical Power and Energy Systems 116 (2020) 105542

Z.N. Pan, et al.

Nomenclature

upper cumulative energy trajectory boundary of EV n at time-slot t enmin (t ) lower cumulative energy trajectory boundary of EV n at time-slot t Ptdmax (t ) maximum allowed charging power of EVC td at time-slot t Etdmax (t ) upper cumulative energy trajectory boundary of EVC td at time-slot t Etdmin (t ) lower cumulative energy trajectory boundary of EVC td at time-slot t Etdmax (t ) change amounts of upper energy trajectory boundary of EVC td at time-slot t Etdmin (t ) change amounts of lower energy trajectory boundary of EVC td at time-slot t Ptdmax (t ) change amounts of maximum allowed charging power of EVC td at time-slot t decision variables of ADP-RTCO at time-slot t Xt St state variables of ADP-RTCO at time-slot t Wt exogenous information of ADP-RTCO at time-slot t Πt feasible region of Xt

enmax (t )

A. Acronyms ACTES average computation time for each stage ADP approximate dynamic programming CS/LF cost-saving/load-flattening DP dynamic programming EV/EVA/EVC electric vehicle/electric vehicle aggregator/electric vehicle cluster MPC model predictive control MDP Markov decision process OOS optimal offline solution RTCO real-time charging optimization RTP real-time price SOC state of charge TD temporal difference TSADP two-stage ADP UL/LL upper/lower layer V2G vehicle-to-grid VFA value function approximation

C. Variables charging power of EVA at time-slot t charging power of EVC td at time-slot t charging power of the ith EV in EVC td at time-slot t charging power curtailments of EVA at time-slot t charging power curtailments of EVC td at time-slot t charging power curtailments of the ith EV in EVC td at time-slot t etd, i (t ) cumulative energy trajectory of the ith EV in EVC td at time-slot t etdcur cumulative charging demand curtailment of the ith EV in , i (t ) EVC td at time-slot t Etd (t ) cumulative energy trajectory of EVC td at time-slot t Etdcur (t ) cumulative charging demand curtailment of EVC td at time-slot t rtd, i (t ) uncharged capacity of the ith EV of EVC td at time-slot t Rtd (t ) , Rtdx (t ) pre-decision and post-decision uncharged capacity of EVC td at time-slot t

B. Sets and parameters t T b B

t NtEV NtEVC NtdEV, t (t ) β PCON (t ) η max PEVA tnarr ,tndep Dn qnmax

PEVA (t ) Ptd (t ) qtd, i (t ) cur PEVA (t ) Ptdcur (t ) qtdcur , i (t )

index of scheduling stages set of scheduling stages T = {1,2,…,|T|} index of segments of resource state set of segments of resource state B = {1,2,…,|B|} time interval set of EVs that are connected to EVA at time-slot t set of EVCs of EVA at time-slot t set of EVs in EVC td at time-slot t real-time price at time-slot t penalty factor of charging demand curtailment conventional load profile at time-slot t efficiency of charging pile maximum allowed charging power of EVA arrival and departure time of EV n charging demand of EV n maximum allowed charging power of EV n

inadequate to obtain high-quality solutions for RTCO in the absence of accurate and detailed forecasting information. To overcome the disadvantages of high dependence on the forecasting information of MPC, this paper firstly proposes an approximate dynamic programming (ADP) [20–23] based RTCO (ADP-RTCO) for large-scale EVs. ADP is capable of solving the multistage stochastic problem which may be intractable for DP. Once properly trained, it provides a computationally efficient and close-to-optimal real-time policy without requiring forecasting information. Due to these superiorities, ADP has been successfully implemented in similar dynamic resource allocation problems, e.g., energy storage problems. Refs. [22] and [23] used ADP to minimize the generation cost of power system with grid level energy storages. Ref. [24] applied ADP to tackle the realtime stochastic optimization of microgrid with energy storage. Ref. [25] leveraged ADP for smart home energy managements with electrical and thermal storages. To some extent RTCO of large-scale EVs is similar with energy storage problems in [22–25]. The key distinction between the two problems lies in the complexities of approximating the value function because the number of both state and decision variables is

huge and random caused by stochastic charging behaviors of large-scale EVs. Ref. [26] proposed a stochastic dynamic programming for smart home energy management considering uncertainties of EV mobility and charging requirements. However, whether this approach can be used for large-scale EVs charging managements are not verified. Ref. [27] proposed a two-stage ADP (TSADP) for RTCO in a charging station to minimize the charging cost, however, TSADP only utilized the total uncharged demand of EVA to obtain single dimensional VFAs where the impact of different departure times on the value function was not considered. As a result, the performance gap between TSADP and optimal off-line solution was large (more than 10%). Besides, the charging constraints of EVs are directly incorporated in the optimization, which may make the problem intractable when large-scale EVs are connected. Ref. [28] leveraged a DP based three steps hierarchical approach for transactive energy management of EV aggregator (EVA), however, its performance gap was more than 40% because uncertainties of the volumes and time frames of demands of future arrival EVs were not considered in DP processes. To tackle the weaknesses of current ADP/DP based RTCO in poor

2

Electrical Power and Energy Systems 116 (2020) 105542

Z.N. Pan, et al.

accuracy, insufficient consideration of stochastic factors, and incapability of large-scale EVs optimization, ADP-RTCO consists of two-layer dispatches and uses multidimensional value function approximations to handle the difficulties brought by large-scale EVs and their stochastic charging behaviors. Compared with previous works, it significantly improves the optimality, computation efficiency, and scalability. The main novelties/contributions of this paper are summarized as follows:

Here, two common problems are considered in this paper, respectively, i.e., cost-saving (CS) problem and load-flattening (LF) problem [15]. Besides, the penalties of partial fulfillment of charging demand is considered in the objective function, as

• An ADP-based hierarchical approach is firstly introduced to RTCO of





cur (t ) PEVA (t ) t PEVA (t ) t , (cost cur 2 (PCON (t ) + PEVA (t )) PEVA (t ), (load

Ct =

s.t. PEVA (t ) =

large-scale EVs. The complex multi-stage problem is decomposed into time-indexed single-stage problems, thus the highly close-tooptimal real-time solution can be rapidly made by a multidimensional ADP with temporal difference (TD) learning [23], without requiring forecasting information. By embedding empirical knowledge in off-line training phase, ADPRTCO provides real-time solution which is robust to multivariate stochastic uncertainties. Besides, ADP-RTCO is designed with online learning ability and operates in a learning-while-operating manner, thus promising results can still be achieved in complex scenarios of on-line operation phase. By using the equivalent model of large-scale EVs via virtual EV cluster (EVC), the computational complexity is significantly reduced. The computation time of ADP-RTCO is short and grows slowly as EV number increases, thus it is applicable to RTCO of extremely large-scale EVs.

n NtEV cur PEVA (t ) =

0

n NtEV

qncur (t ),

max PEVA

PEVA (t )

qn (t ),

,

t

t

t

k = tnarr

enmin (t )

en (t )

n

[tnarr , tndep]

0

qn (t )

encur (t ) =

The remaining of this paper is organized as follows: In Section 2, problem formulation of RTCO of large-scale EVs is developed. The detail description of ADP-RTCO is presented in Section 3. In Section 4, simulation results are presented. Finally, conclusions are made in Section 5.

n

n 0

NtEV , qncur (t )

qnmax , t k = tnarr

t

(4) (5)

NtEV ,

t

[tnarr , tndep]

(7)

[tnarr , tndep]

t

2.1. EV flexibility modeling According to [29], the flexibility of EV can be represented by its cumulative energy boundaries. As shown in Fig. 1, EV n with charging demand Dn arrives at the EVA at tnarr and departs at tndep . Its upper cumulative energy boundary enmax (t ) (A-B-C) can be obtained if it is instantaneously charged at its maximum charging power until its cumulative energy trajectory en (t ) reaches Dn. While its lower cumulative energy boundary enmin (t ) (A-D-C) can be obtained if charging is delayed as long as possible without charging demand curtailment. The slopes of A-B and D-C equals to the maximum charging power. Note that in practical, due to the limited charging capacity, non-optimal control strategies may cause charging demand curtailment (at some time-slots, EVA has to consume enough energy to satisfy EVs’ charging demand, which may cause transformer overloading), thus charging demand curtailment has to be implemented. When en (t ) goes below its lower boundaries, the difference between enmin (t ) and en (t ) can be defined as the cumulative charging demand curtailment encur (t ) . Two possible trajectories of en (t ) are shown in Fig. 1, in which e1 is a feasible trajectory (encur (t ) = 0 for t [tnarr , tndep ]) while e2 causes charging demand curtailment.

qncur (k ) t , [tnarr , tndep]

qnmax ,

n

(9)

NtEV ,

t

[tnarr, tndep]

2.2. Objective function and constraints Real-time stochastic charging optimization of large-scale EVs is formulated as a Markov decision process (MDP). The general objective function of an MDP can be described by

Ct t T

(8)

(10)

where the charging power limits of EVA are denoted in (5) and the charging constraints of individual EV considering charging demand curtailment are denoted in (6)–(10) [29]. Specifically, the constraints of cumulative energy trajectory are represented in (6)–(7); charging power limits are denoted in (8); constraint (9) denotes the relationship between charging power curtailment and cumulative charging demand curtailment; and charging power curtailment limits are formulated in (10). Note that the state of charge (SOC) of battery is assumed to be accurately estimated and only unidirectional energy flow between EVA and EV is considered in this paper, however, vehicle-to-grid (V2G) can be directly incorporated into the problems if needed. When V2G is applied to utilize the flexibilities of EVs, battery degradation cost is another important issue which should be considered in the problems. The cycle life of battery can be affected by multiple factors, e.g., depth of discharge, discharge rate, ambient temperature, and charging regime [30]. For more detailed modeling of battery degradation cost, readers can refer to Refs. [30] and [31]. In this paper, stochastic factors includes RTP, conventional load, EVs’ charging demands, maximum charging power, and their arrival and departure times, which are unknown to EVA in advance. Therefore problem (1) is a multistage stochastic resource allocation problem with multivariate stochastic variables, which is hard to be solved directly. Based on Bellman’s equation [12], the original multistage problem can be decomposed into several single-stage sub-problems, yields

2. Problem formulation

F = maxE

(6)

enmax (t ),

NtEV ,

n

(2) (3)

T

qn (k ) t ,

encur (t ) EV Nt , t

T

T

t

en (t ) =

saving) flattening)

(1)

where E represents the expectation operator and Ct is the contribution function at time-slot t.

Fig. 1. Flexibility modeling of EV. 3

Electrical Power and Energy Systems 116 (2020) 105542

Z.N. Pan, et al.

Vt (St ) = max (Ct (St , Xt , Wt ) + E (Vtx (Stx ))) Xt

t

3.1. Equivalent model of large-scale EVs

(11)

To overcome “the curse of dimensionality” brought by large-scale EVs, in this paper, EVs with the same departure time are grouped into the same EVC. For example, at time-slot t, EVs with departure time td are classified into the EVC td and the NtEVC cardinality NtEVC is (|T| − t + 1). Aggregate the constraints of EVs in the same EVC, the models of EVCs can be obtained by

where St and Stx are the pre-decision state and post-decision state, respectively; Vt and Vtx are the pre-decision value function and post-decision value function, respectively; Xt is the decision variables at time-slot t; and Wt is the exogenous information [21]. DP is a basic way to solve (11) by using backward approach. However, DP will encounter “the curse of dimensionality” because problem (11) has multidimensional and continuous state space, which is too large to be fully explored. In contrast, ADP is a powerful method to overcome this issue by using value function approximations (VFAs). Compared with other VFA approaches, e.g., stochastic dual dynamic programming [32], it is more practical because problem linearity and stage-wise independence are not required [33]. ADP has been successfully applied to some similar stochastic resource allocation problems, e.g., stochastic control of energy storage. Compare with energy storage problem, two issues need further considerations when applying ADP to RTCO of large-scale EVs. Firstly, RTCO has a much higher dimension resulted from large EV numbers; Secondly, the dimensions of problem (11) are stochastically changing due to random charging behaviors of EVs, thus it is extremely hard to create VFAs.

Etdmin (t ) td

Etdcur (t ) EVC Nt , t

Etdmax (t ),

Etd (t )

(12)

td

t

Etd (t ) =

Ptd (k ) t ,

NtEVC,

td

t

td

(13)

k=1 t

Etdcur (t ) =

Ptdcur (k ) t ,

td

NtEVC,

t

td

k=1

Ptd (t ) = EV i Ntd ,t

Ptdcur (t ) =

0

3. ADP-RTCO algorithm

EV i Ntd ,t

Ptd (t ) td

In view of the above mentioned difficulties of solving (11), an ADP based hierarchical approach, named ADP-RTCO, is proposed to obtain close-to-optimal real-time charging policies for large-scale EVs. As shown in Fig. 2, ADP-RTCO consists of two-layer dispatches, i.e., (a) Upper layer (UL) for coordinated dispatch of EVCs which is the equivalent model of large-scale connected EVs, and (b) Lower layer (LL) for charging power reallocation from each EVC to its subordinate EVs. In particular, UL is solved by multidimensional ADP with TD learning while LL can be rapidly tackled by priority based reallocation algorithm (PBRA), which will be introduced later.

NtEVC,

td

Etdmax (t ) =

EV i Ntd ,t

EV i Ntd ,t

td

t

(15)

td

(16)

qtdmax , i (t ), (17)

td

t

EV i Ntd ,t

t

NtEVC,

td

Ptdmax (t ) =

NtEVC,

Etdmin (t ) =

t

NtEVC,

td

qtdcur , i (t ),

Ptdmax (t ) =

Ptdcur (t )

0

qtd, i (t ),

(14)

EV i Ntd ,t

qtdmax , i (t ), (18)

td etdmin , i (t ),

td

NtEVC,

t

td

etdmax , i (t ),

td

NtEVC,

t

td

(19) (20)

Note that the energy trajectory boundaries and maximum allowed

Fig. 2. Hierarchical framework of ADP-RTCO. 4

Electrical Power and Energy Systems 116 (2020) 105542

Z.N. Pan, et al.

influences of random variables. It includes the changes of RTP, conventional load, and changes of EVC parameters, as

Wt = ( Etdmax (t ), Etdmin (t ), Ptdmax (t ), td

(t ), PCON (t )),

NtEVC

(25)

(t ) and PCON (t ) are the change amounts of RTP and conwhere ventional load at time-slot t, respectively. System state: Pre-decision state at time-slot t is defined as the state after Wt being realized but before decision Xt being made, while postdecision state at time-slot t is the state after decision Xt being made but before Wt + 1 arrives, as St = (Rtd (t ), Etdmax (t ),Etdmin (t ),Ptdmax (t ), (t ),PCON (t )),

td

NtEVC

Stx = (Rtdx (t ), Etdmax (t ),Etdmin (t ),Ptdmax (t ), (t ),PCON (t )),

td

NtEVC (26)

The uncharged capacity of EVC is defined as the resource state of the system, thus Rtd (t ) and Rtdx (t ) are named as the pre-decision resource state and post-decision resource state, respectively [23]. State transition: The state transition function is a mapping from Stx 1 to Stx , which can be divided into two stages, i.e., pre-decision stage and post-decision stage [33]. In pre-decision stage, EVA makes no decision but updates Wt , thus the state transition function in this stage can be formulated as

Fig. 3. EVC dynamics in pre-decision stage and post-decision stage.

charging power of each EVC are stochastic due to the random charging behaviors of EVs [25]. The uncharged capacity of EVC at time-slot t is defined as

Rtd (t ) = Etdmax (t )

(t ) =

(21)

Etdcur (t )

Etd (t )

Etdmax (t )

By employing EVC, EVA is able to evaluate the overall state of largescale EVs. Furthermore, EVC model is quite similar to energy storage model. Specifically, at time-slot t, EVC td can be regarded as a “virtual” energy storage whose capacity is Etdmax (t ) , maximum charging power is Ptdmax (t ) , and SOC level is Etd (t ) /Etdmax (t ) . Different from energy storage, capacities and charging power limits of EVCs are stochastically changing due to the random charging behaviors of EVs.

Ptd (t ),

t

T

cur PEVA (t ) =

td NtEVC

Ptdcur (t ),

t

T (23)

(12)to (20)

It is obvious that problem (22) has much lower computational complexity due to the variables reduction compared to problem (1). For example, at time-slot t, the number of decision variables decreases from NtEV to NtEVC despite of EV number in LL. In order to solve (22) efficiently, some important definitions of ADP-RTCO are given below. Decision variables: Decision variables at time-slot t include the charging power and charging power curtailment of each EVC, as

Xt = (Ptd (t ), Ptdcur (t )),

td

NtEVC

td

NtEVC

(30)

1) + Ptdmax (t ),

td

NtEVC

(31)

Etdmax (t ),

td

(32)

NtEVC

(Ptd (t ) + Ptdcur (t )) t ,

td

NtEVC

(33)

The state transitions of EVCs in pre-decision stage and post-decision stage are shown in Fig. 3, and the detailed explanations of Fig. 3 are as follows: Pre-decision stage: From t to t* is pre-decision stage where EVA makes no decision but updates Wt . Particularly, as shown in Fig. 3, due to new arrived EVs, Etdmax (t ) changes from A to B and Etdmin (t ) changes from D to E according to (29) and (30). Since no decision is made in this stage, Etd (t ) remains unchanged. As a consequence, the uncharged capacity changes from post-decision state Rtdx (t − 1) to pre-decision state Rtd (t ) , i.e., from (A-G) to (B-H) according to (32). Post-decision stage: post-decision stage is from t* to t + 1 where EVA makes decision Xt based on St . As shown in Fig. 3, in this stage, Etdmax (t ) and Etdmin (t ) remain unchanged while Etd (t ) changes from H to I, which makes the uncharged capacity changes from Rtd (t ) to Rtdx (t ) according to (33), i.e., from (B-H) to (C-I). With the definitions of decision variables, exogenous information, system state, and state transition in UL, the optimal real-time charging policies of EVCs can be obtained through (11). However, it is still intractable for DP because problem (22) still has multidimensional and continuous spaces of state and decision variables. Instead, ADP is used to obtain close-to-optimal real-time charging policies, as

td NtEVC

s.t.

1) + Etdmin (t ),

1) +

Rtdx (t ) = Rtd (t )

(22)

PEVA (t ) =

(29)

1) +

Based on the updated pre-decision state in pre-decision stage, EVA makes decision in post-decision stage where the state transition function in this stage can be formulated as

According to Remark 1, RTCO of large-scale EVs can be transferred to EVCs. The problem of UL is written as

Ct

td

NtEVC

Ptdmax (t ) = Ptdmax (t Rtdx (t

(28)

1) + PCON (t )

Etdmin (t ) = Etdmin (t

=

Etdmax (t

(27)

(t )

Etdmax (t ),

Rtd (t ) =

3.2. Problem reformulation based on multidimensional ADP in upper layer

t T

1) +

PCON (t ) = PCON (t

Remark 1. EVCs model is equivalent to the large-scale EVs and there always exists a reallocation policy satisfying constraints (6)–(10) if the charging power of EVC meets constraints (12)–(20). (Detailed proof can be found in Appendix).

F1 = maxE

(t

(24)

Xt (St ) = arg max(Ct (St , Xt ) + Vtx (Stx ))

Exogenous information: Exogenous information represents the

Xt

5

t

(34)

Electrical Power and Energy Systems 116 (2020) 105542

Z.N. Pan, et al.

s.t.

PEVA (t ) =

Ptd (t )

cur PEVA (t ) =

PEVA (t )

td NtEVC

0

Ptdcur (t )

(36)

Etdcur (t )

Etd (t ) = Etd (t

td

NtEVC

(Ptd (t ) +

Ptdcur (t ))

Etd (t )

t,

td

Etdmax (t ),

1) + Ptd (t ) t ,

Etdcur (t ) = Etdcur (t

(39)

td

1) + Ptdcur (t ) t ,

NtEVC NtEVC

td

NtEVC NtEVC

td

(41)

R x, td (t ) =

(42)

R x, td + (t ) =

(43)

td NtEVC b B

(35) s.t.

b

0

yttd (b)

to

vttd (b) yttd (b)

(46)

R x (t )

R x , td (t ) R

R x , td + (t ) R x (t ) R+

(47)

R x , td + (t )

Update VFAs in the backward pass, detailed procedures are as follows: (3.1) Start from t = |T|. (3.2) For each EVC, update the left and right marginal values of Rtdx (t − 1) as

(44)

(43) Etdmin (t )) B

Ct (St , Xt )

(2.5) Update the states of connected EVs by using PBRA (see Section 3.4). (2.6) If t ≤ |T|, set t = t + 1 and return to step 2.2). Step 3: Policy evaluation iteration

vttd1 (Rtdx (t

y td (b) = Rtdx (t ) B t (Etdmax (t )

R+

(t ) and are the post-decision states of all EVCs as a where result of the perturbations imposing to Rtdx (t − 1) .

By exploiting the value function concavity, firstly, a concave and pieces-wise linear function is used to estimate the value function, secondly, an exploitation can be employed instead of an exploration in VFA training [22]. Therefore, problem (34) is rewritten as

t

Cttd + (Sttd +, Xttd +)

R x , td

Remark 2. The value function Vtx is concave in Rtdx (t ) for both CS and LF problems, because CS and LF are both convex problems with linear constraints and Rtdx (t ) appear as right-hand side constraints [33].

Xt

Cttd (Sttd , Xttd ) R

(2.4) Calculate the changes of all EVCs’ post-decision states, R x , td (t ) and R x , td + (t ) , due to perturbations, yields

(40)

where Vtx is the approximated post-decision value function.

Xt (St ) = arg max Ct (St , Xt ) +

Ct (St , Xt )

where R and R+ are the negative and positive perturbations, respectively; Cttd and Cttd+ are the left and right marginal contributions, respectively; Sttd , Sttd+, Xttd , Xttd +, Cttd , and Cttd+ are the pre-decision states, decision variables, and contributions after a negative or positive perturbation is applied, respectively.

(38)

NtEVC

td

Ptdmax (t ),

= Rtd (t )

Etdmin (t )

Cttd + (St ) =

(37)

Ptdmax (t ),

Ptd (t )

Rtdx (t )

Ptdcur (t )

max PEVA

0

Cttd (St ) =

(35)

td NtEVC

vttd1+ (Rtdx (t

(45)

1)) = (1

td x µ ) vold, t 1 (Rtd (t

1)) + µvttd (Rtd (t ))

1)) = (1

td + x µ ) vold, t 1 (Rtd (t

1)) + µvttd + (Rtd (t ))

(48)

where vttd 1 and vttd +1 are the updated left and right marginal values of td td + EVC td, respectively; vold, t 1 and vold, t 1 are the marginal values obtained from last iteration; μ is the step size and harmonic step size is rule is used in off-line training [23]; vttd and vttd + are obtained from (49) and (50), respectively.

where B denotes the set of piece-wise linear function segments; vttd (b) is the slope of each segment; and yttd (b) is the resource coordinate variable for each segment, respectively. 3.3. Value function approximations using time differential learning

vttd (Rtd (t )) =

Several approaches can be used to train VFAs by empirical data, among which TD(0) based policy iteration algorithm is the most common one that has been widely used in energy storage problems [22,24]. However, it has the weakness of slow convergence. Hence, this paper adopts TD(1) based double pass algorithm thanks to its merits of great convergence and scalability [23]. The procedures of double pass algorithm can be summarized as follows:

vttd + (Rtd (t )) =

Cttd (St ) + R x , td (t ) vttd+ 1 (St + 1), t CTtd (ST ),

T

t= T

Cttd + (St ) + R x , td+ (t ) vttd++1 (St + 1), t CTtd + (ST ),

(49)

T

t= T

(50)

(3.3) Update VFAs by using concave adaptive value estimation algorithm [34]. (3.4) Set t = t − 1, return to step (3.2) if t ≥ 1. (3.5) Return to Step 2 until all samples are used.

Step 1: Initialization Generate a series of samples by using empirical data or Monte Carlo method, and set the initial slopes.

3.4. Priority based reallocation algorithm (PBRA) for EV clusters in lower layer

Step 2: Policy improvement iteration

In LL, the charging power of each EVC obtained in UL should be reallocated to the connected EVs. The uncharged capacity of the ith EV of EVC td at time-slot t before reallocation is defined as

Update the optimal policy based on current VFAs in the forward pass, detailed procedures are given as

rtd, i (t ) = etdmax , i (t )

(2.1) Choose a sample and start from t = 1; (2.2) Observe Wt + 1, calculate optimal policy Xt , and obtain Rtdx (t ) by solving (44); (2.3) For each EVC, calculate the left and right observations of marginal contributions by imposing a negative and positive perturbations to Rtdx (t − 1) , respectively:

etd, i (t

1)

etdcur , i (t

1)

(51)

Since EVs in the same EVC have the same departure time, in order to retain EVC’s charging feasibility, EVC’s charging power should be reallocated to those EVs which have higher rtd, i (t ) with higher priority. Therefore, priority-based reallocation algorithm (PBRA) is proposed to obtain the charging power qtd, i (t ) for each connected EVs. Pseudo codes of the detailed procedures of PBRA are presented in Algorithm 1. 6

Electrical Power and Energy Systems 116 (2020) 105542

Z.N. Pan, et al. Algorithm 1: Priority based reallocation algorithm Input: Ptd (t ) ,

3.5. Overall procedure of ADP-RTCO with on-line learning ability

EV Ntd ,t

Generally speaking, the overall procedure of ADP-RTCO includes two phases, i.e., off-line training phase and on-line operation phase. In the former one, pre-selected training samples are used to obtain VFAs according to Section 3.3. The finely trained VFAs are then used in the on-line operation phase, which allows the close-to-optimal real-time charging policies of connected EVs to be given stage by stage under uncertainties. To improve the on-line performance and adaptability of ADP-RTCO, it should be capable of adjusting its policies on-line and learning in practice, by considering the following two issues: Firstly, training samples may be mismatched with reality and realized exogenous information can be quite different from that of training samples. Secondly, EVA may be uninformed of the distributions of stochastic variables or be short of empirical data. Thus ADP-RTCO is designed with on-line learning ability and operates in a learning-while-operating manner in on-line operation phase. On-line operation phase contains two procedures which operate at the same time, i.e., on-line operation and on-line learning. Based on the pre-trained VFAs and newly observed exogenous information, EVA obtains real-time charging policies for each connected EVs, in the meantime, it also observes the marginal contributions of each EVC based on (46) and (47). When all scheduling stages are completed, EVA can calculate the optimal off-line solution based on the realized exogenous information, and evaluate the performance gap between ADP-

EV 1: for i = 1, 2, …, Ntd ,t

2:

min qtd, i (t ) = max(0,(etd , i (t ) –etd, i (t

3:

rtd, i (t ) = rtd, i (t ) –ηqtd, i (t ) t

4: 5:

1)

etdcur , i (t

1))

t)

Ptd (t ) =Ptd (t ) –qtd, i (t )

Rtd (t ) = Rtd (t ) –ηqtd, i (t ) t

6: end for 7: whilePtd (t ) ≠ 0 8:

9: 10: 11: 12: 13:

EV for i = 1, 2, …, Ntd ,t

δqtd, i (t ) = min(rtd, i (t ) /η/Δt, qtdmax ,i

qtd, i (t ) , Ptd (t ) rtd, i (t ) /Rtd (t ) )

qtd, i (t ) = qtd, i (t ) + qtd, i (t )

rtd, i (t ) = rtd, i (t ) – qtd, i (t ) t

Ptd (t ) = Ptd (t ) – qtd, i (t )

Rtd (t ) = Rtd (t ) – qtd, i (t ) t

14: end for 15: end while

EV 16: for i = 1, 2, …, Ntd ,t

17: 18:

etd, i (t ) =etd, i (t

1) + ηqtd, i (t ) t

cur (t ) = max(0, e min (t ) etd td, i ,i

19: end for EV } Output: {qtd, i (t ) | i∈Ntd ,t

etd, i (t ) )

Fig. 4. Procedures of ADP-RTCO with on-line learning ability in on-line operation phase.

7

Electrical Power and Energy Systems 116 (2020) 105542

Z.N. Pan, et al.

RTCO and the optimal off-line solution according to (52).

G = |(F

optimal value F* obtained by OOS, as (52)

F) F

=F F A closer ξ value to 1 means a better solution.

where G is the performance gap; F and F* are the objective function value obtained by ADP-RTCO and the off-line solution, respectively. If the performance gap is larger than a preset threshold ε, EVA will update VFAs by executing Step (3.1) to Step (3.4) in Section 3.3. Detailed procedures of ADP-RTCO with on-line learning ability in on-line operation phase are given in Fig. 4. x In both off-line training and on-line learning phases, Vt is updated after the entire optimization horizon. However, there is no difference to x EVA whether Vt is updated at time-slot t + 1 or after the entire optix mization horizon, because the updated Vt can only be used at time-slot x t of the next optimization horizon at the earliest. If Vt should be updated at time-slot t + 1, the policy iteration algorithm without TD learning proposed in [22] can be used. Moreover, it is a challenge issue x if Vt should be updated at time-slot t, i.e., before the exogenous information of t + 1 arrives, which will be left for future investigation.

4.2. Parameter settings The optimization horizon starts from 8 am and ends at 8 am of the next day, which is divided into |T| = 96 time-slots, i.e., time-slot 1 starts at 8 am and time-slot 96 ends at 8 am of the next day. The expectations of RTP and conventional load profiles are shown in Fig. 5. The forecasting errors of RTP and conventional load are assumed to follow the normal distribution εtRTP ~ N(0,0.082) and εtCON ~ N (0,0.032), respectively. The charging scenarios involve 4000 EVs unless otherwise specified. Three types of EV are considered, i.e., the battery capacities are 24, 36, and 48 kWh, whose corresponding maximum charging power are 6, 8, and 10 kW. The initial SOC is assumed to follow the normal distribution N(0.4,0.12). The efficiency η is set to be 98% while the maximum charging power of EVA is set as 10 MW. To avoid necessary charging demand curtailment, β is set to be 0.25$/kWh and 50 MW in CS and LF problems, respectively. Finally, the segment number of piece-wise linear function is set as 80 and initial slopes of each segment are set to be all 0. To evaluate the performance of different algorithms under different EV charging behaviors, two EV charging behaviors scenarios are considered, i.e., EV-S1 and EV-S2. The EV charging behaviors are assumed to follow the arrival and departure probabilities in Fig. 6. It is obvious that the arrival and departure times in EV-S1 are more concentrated than EV-S2, which indicates the EV charging behaviors in EV-S2 are much more stochastic than that in EV-S1.

4. Case studies 4.1. Comparison algorithms and evaluation indices Three algorithms are used for comparison, as follows: Optimal off-line solution (OOS): OOS can be obtained by directly solving (1) which is a linear problem (CS problem) or a quadratic problem (LF problem) if stochastic variables are assumed to be known in advance. MPC-τ: A multi-scenario based MPC is adopted to obtain the upper layer solutions according to (53):

Remark 3. If EVA attempts to save charging cost and EV charging behaviors follow EV-S1, then this case is abbreviated as CS-EV-S1. Similarity, If EVA attempts to minimize load fluctuations and EV charging behaviors follow EV-S2, and then this case is abbreviated as LF-EV-S2.

Xt (St )= min( T , t + )

arg max Xt , Xt + 1 ..., Xt +

Ct (St , Xt ) +

Ck (Sk , Xk ) k=t+1

(54)

(53)

where τ is the prediction horizon, e.g., MPC-0 is the myopic strategy, MPC-4 is the MPC with prediction horizon of 4, and MPC-T is the receding horizon control; Ω is the set of scenarios; ω is scenario subscript; and is the probability of the ωth scenario. At each time-slot t, MPC-τ obtains a series of decisions {Xt, Xt+1,…, Xt+τ} based on the predicted system states, but only Xt will be implemented. TSADP: Two-stage ADP with a prediction horizon of 2 time-slots is used. Detailed descriptions of TSADP can be referred to [27]. The optimality of the solutions can be quantified by comparing the objective function value F obtained by different algorithm and the

All simulations are carried out in MATLAB R2016a using a PC with inter(R) Core(TM) i7-6700 3.40 GHz CPU, while each sub-problem is solved by CPLEX solver. 4.3. Performance of ADP-RTCO in off-line training phase In this section, both deterministic and stochastic cases of CS and LF scenarios are used to evaluate the performance of ADP-RTCO in off-line training phase. The objective function values (denoted by F) obtained by OOS and ADP-RTCO in different cases are shown in Figs. 7 and 8,

Fig. 5. The expectation of RTP and conventional load profiles. 8

Electrical Power and Energy Systems 116 (2020) 105542

Z.N. Pan, et al.

Fig. 6. The EV arrival and departure probability distributions in EV-S1 and EV-S2.

same distributions as training samples, as On-line case: On-line case contains 50 samples of which the first 20 samples are drawn from EV-S1 while the rest samples are drawn from EV-S2. ADP-RTCO is well-trained by training samples from EV-S1. The objective function values obtained by ADP-RTCO and OOS in this on-line case are shown in Fig. 10. It can be found from Fig. 10 that ADP-RTCO with on-line learning ability shows its great robustness to complex scenarios. In the first 20 samples, ADP-RTCO is guaranteed to provide promising results since the exogenous information realized in real-time and that of training samples follow the same distributions. ξ in the first 20 samples are all greater than 0.999. In sample 21–25, the performance gap between OOS and ADP-RTCO becomes obviously larger due to the sudden change in charging behaviors of EVs. However, ADP-RTCO is capable of tracking OOS thanks to its on-line learning ability. It can be found that the gap gradually becomes smaller and smaller as ADP-RTCO gradually converges to a new policy.

respectively. (1) Performance of ADP-RTCO in deterministic cases: Firstly, 25 identical samples of CS-EV-S1 and LF-EV-S1 are used to verify the optimality of ADP-RTCO, respectively. As shown in Fig. 7, it is obvious that ADP-RTCO with TD learning shows great convergence. ξ at the end of the training are greater than 0.99999 in both two cases, which means ADP-RTCO achieves more than 99.999% optimality of the original problems. (2) Performance of ADP-RTCO in stochastic cases: Then, 100 stochastic samples generated by Monte Carlo method are applied to train ADPRTCO. Due to the different realizations of exogenous information, OOS of each sample are different. However, after sufficient training (around 30 samples), high quality solution can be provided by ADPRTCO under uncertainties (ξ in the remaining samples are all greater than 0.99). (3) Comparison results with ADP without TD learning: The performances of ADP-RTCO and ADP without TD learning [22] in CS-EV-S1 are shown in Fig. 9. It can be found that ADP-RTCO with TD learning has a much faster convergence than that without TD learning. By employing TD learning, ADP-RTCO is capable of reaching 98% optimality of the original problems in about 20 samples, while it takes more than 40 samples if TD learning is not used. Besides, more close-to-optimal policies can be obtained by ADP-RTCO with TD learning in the end of the training.

4.5. Comparison results with TSADP [27] Here, 100 samples of different scenarios are employed to evaluate the on-line performance of ADP-RTCO and TSADP in terms of optimality and computation efficiency. (1) Optimality: Fig. 11 shows the boxplots of ξ obtained by ADP-RTCO and TSADP in different scenarios. It can be found from Fig. 11 that, ADP-RTCO can obtain much higher quality solutions than TSADP in all scenarios. In contrast, the performance of TSADP is highly affected by EV charging behaviors. Its performances significantly degrade in EV-S2 scenarios

4.4. Performance of ADP-RTCO in on-line operation phase To evaluate the on-line learning ability of ADP-RTCO, a more complex on-line case is applied instead of using samples drawn from the

Fig. 7. The performance of ADP-RTCO in deterministic cases. ((a). CS-EV-S1. (b). LF-EV-S1.) 9

Electrical Power and Energy Systems 116 (2020) 105542

Z.N. Pan, et al.

Fig. 8. The performance of ADP-RTCO in stochastic cases. ((a). CS-EV-S1. (b). LF-EV-S1.)

where EV charging behaviors are much more random than that in EVS1. The reason of the phenomenon is that TSADP only utilizes the total uncharged demand of EVA to obtain single dimensional VFAs, in which different charging deadlines of EVs are not incorporated. If charging deadlines of EVs are not considered as a necessary part of the system state, EVA will mistakenly regard post-decision states whose the value functions are different as the same post-decision state. As a result, single dimensional VFAs are inadequate to accurately reflect the actual postdecision value function especially when EVs arrival and departure time varies greatly. In contrast, ADP-RTCO uses multidimensional VFAs by fully considering different departure time of EVs, thus the actual postdecision value function can be well approximated. For example, assume an EVA has two EVs (EV 1 and EV 2) with different departure time (t1dep and t2dep ) and uncharged demand (r1x (t ) = 3 kWh and r2x (t ) = 5 kWh in case 1, while r1x (t ) = 5 kWh and r2x (t ) = 3 kWh in case 2), it is obvious that post-decision value functions of these two cases are different. ADPRTCO can easily distinguish these two states because EV 1 and EV 2 are grouped into different EVCs, while TSADP fails because the total uncharged demand of these two cases is the same.

runs out of memory when EV number reaches one million. In contrast, by employing a hierarchical approach, ACTES of ADP-RTCO in UL remains almost unchanged (less than 0.2 s) because the number of variables in UL is small and remains the same despite of EV number, while ACTES of its LL grows linearly and slowly as EV number increases. It can be found that ADP-RTCO is capable to obtain the optimal charging policy in less than 12.5 s when EV number is one million, therefore, it can efficiently operate under extremely large EV number. 4.6. Comparison results with MPC-based approaches In this section, 100 samples generated from LF-EV-S1 are used to compared the on-line performance of ADP-RTCO and MPC based approaches (constraint (5) is not considered). First, the average charging power of EVA obtained by different algorithms is shown in Fig. 12. It is clear that the performance of MPC depends on its prediction horizon thus poor results are obtained in the absence of complete forecasting information. In comparison, the smallest gap from OOS can be achieved by ADP-RTCO. Then, to compare the computation efficiency and the scalability, ACTES of ADP-RTCO and MPC-T in problems with different scheduling stages are shown in Fig. 13. As the scheduling stages of original problem (1) increases, ACTES of both the two approaches increase. However, ADP-RTCO has a much lesser slope compared with MPC-T, and its ACTES is only 23% of that of MPC-T when the number of schedule stages becomes 672 (a week). In summary, ADP-RTCO and MPC have their own advantages and drawbacks. MPC is easily implemented in practical since forecast information can be directly incorporated into optimization, but its

(2) Computation efficiency and scalability: The average computation time for each stage (ACTES) of ADP-RTCO and TSADP under different EV number is tabulated in Table 1. It gives that ACTES of TSADP grows rapidly as EV number increases due to the rapid growth of variable number, this is because TSADP directly incorporates variables and constraints of each single EV into optimization. Here, this problem becomes intractable for TSADP as the solver

Fig. 9. The performances of ADP-RTCO with/without TD learning in CS-EV-S1. 10

Electrical Power and Energy Systems 116 (2020) 105542

Z.N. Pan, et al.

Fig. 10. The performance of ADP-RTCO in on-line operation phase.

performance is highly dependent on forecasting information accuracy. In contrast, ADP-RTCO can offer highly close-to-optimal solutions without forecasting information because empirical knowledge about the impact of current decision to the future rewards is embedded in the value function. Such feature is quite meaningful in on-line operation as the complete and accurate forecasting information is usually hard to collect in real-time. Although the optimality of ADP-RTCO can be strongly affected by the accuracy of VFAs, this can be overcame by its on-line learning ability. Such ability also improves its robustness to complex or ambiguous environments. Furthermore, due to the prominent computation efficiency and scalability of ADP-RTCO, it can be easily applied to problems with extremely large EV number and much longer scheduling stages (weekly or monthly), which may be intractable for MPC.

Table 1 ACTES comparisons of different EV number. EV number

4000 10,000 40,000 100,000 1,000,000

CS-EV-S1

LF-EV-S1

UL

LL

TSADP

UL

LL

TSADP

0.18 s 0.18 s 0.18 s 0.18 s 0.18 s

0.06 s 0.14 s 0.65 s 1.18 s 11.9 s

1.18 s 2.28 s 5.46 s 13.89 s Intractable

0.18 s 0.19 s 0.19 s 0.19 s 0.19 s

0.07 s 0.16 s 0.68 s 1.20 s 12.1 s

1.22 s 2.67 s 6.49 s 14.62 s Intractable

5. Conclusion This paper proposes a multidimensional ADP based hierarchical approach to solve the real-time stochastic optimal scheduling of largescale EVs. ADP-RTCO reformulates the complex RTCO as a multidimensional energy storage problem by equalizing large-scale EVs to several EVCs. TD learning based ADP is employed to obtain highly close-to-optimal real-time charging policies for large-scale EVs. ADPRTCO is designed with on-line learning ability to enhance its on-line performance. The performance of ADP-RTCO in different scenarios and comparison results with different algorithms are investigated. Based on the simulations, the following conclusions can be made:

4.7. Discussion of the practicality of ADP-RTCO In practical applications, ADP-RTCO is a practical and powerful tool for EVA operator to obtain fast and high quality on-line solutions under uncertainties. Under the background of increasing EV penetration, ADP-RTCO has a meaningful application prospect since the computational difficulties brought by large-scale EVs can be easily handled. By such means the fully utilization of EVs’ flexibilities and the coordinated operation of power system and EVs become possible. The future smart grid will incorporate more and more flexible resources with large number, small capacity, and diverse characteristics. How to efficiently manage those resources becomes a challenge problem. Although this paper uses EV as an illustration, ADP-RTCO is also compatible with other distributed resources, e.g., smart home devices, flexible thermal loads, and energy storages. Thus ADP-RTCO can also be extended to some hot issues, e.g., smart home energy management, multi-energy system scheduling, and micro-grid operation.

(1) By employing TD(1) learning, ADP-RTCO has a remarkably high convergence rate in off-line training phase. Simulations show that it achieves more than 99.9% optimality of the off-line optimal solution, in both cost-saving and load-flattening problems. (2) ADP-RTCO provides great robustness to uncertainties in on-line operation phase. Promising results can still be achieved in complex scenarios since ADP-RTCO is designed with on-line learning ability. Simulations demonstrate that ADP-RTCO can adjust its policy in response to a dynamic environment.

Fig. 11. The boxplots of ξ of different algorithms in on-line operation phase. 11

Electrical Power and Energy Systems 116 (2020) 105542

Z.N. Pan, et al.

Fig. 12. Charging power of EVA obtained by different algorithms.

Fig. 13. ACTES of UL comparison of ADP-RTCO and MPC-T.

(3) ADP-RTCO is more effective than MPC and other ADP based approaches in terms of optimality, computation efficiency, and scalability. Simulations illustrate that, ADP-RTCO can be easily applied to problems with extremely large (even millions of) EVs and long scheduling stages.

Declaration of Competing Interest

For future studies, it is interesting to incorporate more types of distributed resources into ADP-RTCO framework. Besides, a multi-agent learning based decentralized real-time stochastic optimal scheduling for multiple EVAs considering networks constraints will be developed.

The authors gratefully acknowledge the support of National Natural Science Foundation of China (51777078), the Fundamental Research Funds for the Central Universities (D2172920), the Key Projects of Basic Research and Applied Basic Research in Universities of Guangdong Province (2018KZDXM001), and the Science and Technology Projects of China Southern Power Grid (GDKJXM20172831).

The authors declared that there is no conflict of interest. Acknowledgments

Appendix Proof of Remark 1 If the optimized control strategies will not cause charging demand curtailment, i.e., Etdcur (t ) = 0, Ref. [13] has proved that if Ptd (t ) satisfies the following inequality (A1) for t = 1,2,…,|T| t

t

Dn k = 1 n {n tndep= k }

Ptd (t ) k=1

n {n tnarr = k }

Dn

(A1)

where η and Δt are set to be 1. Then there always exists a set of qn (t ) which are feasible to (6)–(10). Since EVs in the same EVC have the same departure time, i.e., td, one can be obtained the following relation according to (6)–(10) and (12)–(20):

12

Electrical Power and Energy Systems 116 (2020) 105542

Z.N. Pan, et al. t k=1 t k=1

Etd (t )

n {n tndep= k } n {n tndep= k }

Etdmax (t )

Dn = 0 Dn =

Etdmin (t )

Etdmin (td )

t k=1

=

Etd (t ), Etdmax (td )

t < td = Etd (td ),

e max (tndep ) n {n tnarr = k } n

=

t k=1

n {n tnarr = k }

Dn ,

t

td

t

T

(A2)

According to (A2), it is obvious that Ptd (t ) satisfy (A1), therefore there always exists a reallocation strategy which satisfies (6)–(10) as long as the EVC strategy satisfies (12)–(20). This completes the proof.

[18] Kou P, Feng Y, Liang D, Gao L. A model predictive control approach for matching uncertain wind generation with PEV charging demand in a microgrid. Int J Elect Power Energy Syst 2019;105:488–99. [19] Zhang T, Chen W, Han Z, Cao Z. Charging scheduling of electric vehicles with local renewable energy under uncertain electric vehicle arrival and grid power price. IEEE Trans Veh Technol 2014;63(6):2600–12. [20] Powell WB, Meisel S. Tutorial on stochastic optimization in energy—part I: Modeling and policies. IEEE Trans Power Syst 2016;31(2):1459–67. [21] Powell WB, Meisel S. Tutorial on stochastic optimization in energy—part II: An energy storage illustration. IEEE Trans Power Syst 2016;31(2):1468–75. [22] Nascimento J, Powell WB. An optimal approximate dynamic programming algorithm for concave, scalar storage problems with vector-valued controls. IEEE Trans Autom Control 2013;58(12):2995–3010. [23] Salas DF, Powell WB. Benchmarking a scalable approximate dynamic programming algorithm for stochastic control of multidimensional energy storage problems department of operations research and financial engineering. Princeton Univ., Princeton, NJ, USA, Tech. Rep. 2004; 2015. [24] Shuai H, Fang J, Ai X, Tang Y, Wen J, He H. Stochastic optimization of economic dispatch for microgrid based on approximate dynamic programming. IEEE Trans Smart Grid 2018. https://doi.org/10.1109/TSG.2018.2798039. [in press]. [25] Keerthisinghe C, Verbič G, Chapman AC. A fast technique for smart home management: ADP with temporal difference learning. IEEE Trans Smart Grid 2018;9(4):3291–303. [26] Wu X, Hu X, Moura S, Yin X, Pickert V. Stochastic control of smart home energy management with plug-in electric vehicle battery energy storage and photovoltaic array. J Power Sources 2016;333:203–12. [27] Zhang L, Li Y. Optimal management for parking-lot electric vehicle charging by two-stage approximate dynamic programming. IEEE Trans Smart Grid 2017;8(4):1722–30. [28] Vandael S, Claessens B, Hommelberg M, Holvoet T, Deconinck G. A scalable threestep approach for demand side management of plug-in hybrid vehicles. IEEE Trans Smart Grid 2013;4(2):720–8. [29] Xu Z, Hu Z, Song Y, Wang J. Risk-averse optimal bidding strategy for demand-side resource aggregators in day-ahead electricity markets under uncertainty. IEEE Trans Smart Grid 2017;8(1):96–105. [30] Zhou C, Qian K, Allen M, Zhou W. Modeling of the cost of EV battery wear due to V2G application in power system. IEEE Trans Energy Conver 2011;26(4):1041–50. [31] Petit M, Prada E, Sauvant-Moynot V. Development of an empirical aging model for Li-ion batteries and application to assess the impact of vehicle-to-grid strategies on battery lifetime. Appl Energy 2016;172:398–407. [32] Shapiro A, Dentcheva D, Ruszczynski A. Lectures on stochastic programming: modeling and theory. Philadelphia, PA, USA: SIAM; 2009. [33] Asamov T, Salas DF, Powell WB. SDDP vs. ADP: the effect of dimensionality in multistage stochastic optimization for grid level energy storage. [Online]. Available: < https://arxiv.org/abs/1605.01521 > . [34] Godfrey GA, Powell WB. An adaptive, distribution-free algorithm for the newsvendor problem with censored demands, with applications to inventory and distribution. Manag Sci 2011;47(8):1101–12.

References [1] Yang B, Yu T, Shu HC, Dong J, Jiang L. Robust sliding-mode control of wind energy conversion systems for optimal power extraction via nonlinear perturbation observers. Appl Energy 2018;210:711–23. [2] Yang B, Yu T, Shu HC, Zhang YM, Chen J, Sang YY, et al. Passivity-based slidingmode control design for optimal power extraction of a PMSG based variable speed wind turbine. Renew Energy 2018;119:577–89. [3] Hadley SW, Tsvetkova AA. Potential impacts of plug-in hybrid electric vehicles on regional power generation. Electr J 2009;22(10):56–68. [4] IEA Publications, Global EV Outlook 2017: Two Million and Counting, International Energy Agency, Tech. Rep.; 2017. [Online] Available: < https://www.iea.org/ publications/freepublications/publication/GlobalEVOutlook2017.pdf > . [5] Hu X, Wang H, Tang X. Cyber-physical control for energy-saving vehicle following with connectivity. IEEE Trans Ind Electron 2017;64(11):8578–87. [6] Martinez XM, Hu X, Cao D, Velenis E, Cao B, Wellers M. Energy management in plug-in hybrid electric vehicles: recent progress and a connected vehicles perspective. IEEE Trans Veh Technol 2017;66(6):4534–49. [7] Denholm P, Short W. An evaluation of utility system impacts and benefits of optimally dispatched plug-in hybrid electric vehicles. National Renewable Energy Laboratory, Golden, CO, USA, Tech. Rep. NREL/TP-620-40293; Jul. 2006. [8] Wang L, Chen B. Distributed control for large-scale plug-in electric vehicle charging with a consensus algorithm. Int J Elect Power Energy Syst 2019;109:369–83. [9] Yang B, Zhang XS, Yu T, Shu HC, Fang ZH. Grouped grey wolf optimizer for maximum power point tracking of doubly-fed induction generator based wind turbine. Energ Convers Manage 2017;133:427–43. [10] Loukarakis E, Dent CJ, Bialek JW. Decentralized multi-period economic dispatch for real-time flexible demand management. IEEE Trans Power Syst 2016;31(1):672–84. [11] Yang B, Jiang L, Wang L, Yao W, Wu QH. Nonlinear maximum power point tracking control and modal analysis of DFIG based wind turbine. Int J Elect Power Energy Syst 2016;74:429–36. [12] Bellman RE. Dynamic programming. Princeton, NJ, USA: Princeton Univ. Press; 1957. [13] Li Z, Guo Q, Sun H, Xin S, Wang J. A new real-time smart-charging method considering expected electric vehicle fleet connections. IEEE Trans Power Syst 2014;29(6):3114–5. [14] Jian L, Zheng Y, Xiao X, Chan CC. Optimal scheduling for vehicle-to-grid operation with stochastic connection of plug-in electric vehicles to smart grid. Appl Energy 2015;146:150–61. [15] Tang W, Angela Zhang YJ. A model predictive control approach for low-complexity electric vehicle charging scheduling: optimality and scalability. IEEE Trans Power Syst 2017;32(2):1050–63. [16] Yang X, Zhang Y, He H, Ren S, Weng G. Real-time demand side management for a microgrid considering uncertainties. IEEE Trans Smart Grid 2018. https://doi.org/ 10.1109/TSG.2018.2825388. [in press]. [17] Liu Z, Wu Q, Shahidehpour M, Li C. Transactive real-time electric vehicle charging management for commercial buildings with PV on-site generation. IEEE Trans Smart Grid 2018. https://doi.org/10.1109/TSG.2018.2871171. [in press].

13