Real-Time Markov Chain Driver Model for Tracked Vehicles

Real-Time Markov Chain Driver Model for Tracked Vehicles

4th IFAC Workshop on 4th IFACand Workshop on Control, Simulation and Modeling Engine Powertrain 4th IFAC Workshop Engine Powertrain Control, OH, Simul...

2MB Sizes 9 Downloads 88 Views

4th IFAC Workshop on 4th IFACand Workshop on Control, Simulation and Modeling Engine Powertrain 4th IFAC Workshop Engine Powertrain Control, OH, Simulation and Modeling 4th IFACand Workshop on August 23-26, 2015. on Columbus, USA Available online at www.sciencedirect.com Engine Powertrain Control, Simulation August 23-26, 2015. Columbus, USA and Engine and and Powertrain Control, OH, Simulation and Modeling Modeling August 23-26, 2015. Columbus, OH, USA August 23-26, 2015. Columbus, OH, USA

ScienceDirect

IFAC-PapersOnLine 48-15 (2015) 361–367 Real-Time Real-Time Markov Markov Chain Chain Driver Driver Model Model for for Tracked Tracked Vehicles Vehicles Real-Time Markov Chain Driver Model for Tracked Vehicles Dexing Liu, Yuan Zou, Teng Liu

Dexing Liu, Yuan Zou, Teng Liu  Dexing Zou,  Dexing Liu, Liu, Yuan Yuan Zou, Teng Teng Liu Liu School of Mechanical Engineering, Beijing Institute of Technology, Beijing, China,   School of Mechanical Engineering, Beijing Institute of Technology, Beijing, China, School School of of Mechanical Mechanical Engineering, Engineering, Beijing Beijing Institute Institute of of Technology, Technology, Beijing, Beijing, China, China, (e-mail: liudexing15271@ 163.com, [email protected], [email protected]). (e-mail: liudexing15271@ 163.com, [email protected], [email protected]). (e-mail: (e-mail: liudexing15271@ liudexing15271@ 163.com, 163.com, [email protected], [email protected], [email protected]). [email protected]). Abstract: The design of an energy management strategy for a hybrid electric vehicle typically requires Abstract: of an energy management for a hybrid vehiclea typically requires an estimateThe of design requested power from the driver. strategy If the driving cycle electric is not known priori, stochastic Abstract: of an energy management for aa hybrid vehicle requires an estimateThe of design requested power from the driver. strategy If the driving cycle electric is not known a typically priori, stochastic Abstract: The design of an energy management strategy for hybrid electric vehicle typically requires method suchof as requested a Markov power chain driver model (MCDM) must be employed. Forknown trackeda vehicles, steering an estimate from the driver. If the driving cycle is not priori, stochastic method such as a Markov chain driver model (MCDM) must be employed. For tracked vehicles, steering an estimate of requested power from the driver. If the driving cycle is not known a priori, stochastic power, which is related to the vehicle angular velocity, is a significant component of the driver demand. method such asisarelated Markov driver model be For tracked vehicles, steering power, which to chain the vehicle angular velocity, must is a significant component of the driver demand. method such Markov chain driverMCDM model (MCDM) (MCDM) must be employed. employed. For for tracked vehicles, steering In this which paper,asisaarelated three-dimensional incorporating angular velocity a the tracked vehicle is power, to the vehicle angular velocity, is a significant component of driver demand. In this paper, a three-dimensional MCDM incorporating angular velocity for a tracked vehicle is power, which is related to the vehicle angular velocity, is a significant component of the driver demand. proposed. Based on the nearest-neighborhood method (NNM), an online transition probability matrix In this this paper, paper, three-dimensional MCDM incorporating incorporating angular velocity for aa tracked tracked vehicle is proposed. Basedaa on the nearest-neighborhood method (NNM), an online transition probability matrix In three-dimensional MCDM angular velocity for vehicle is (TPM) updating algorithm is implemented for method the three-dimensional MCDM. Simulation resultsmatrix show proposed. Based algorithm on the the nearest-neighborhood nearest-neighborhood (NNM), an an online online transition probability (TPM) updating is implemented for the three-dimensional MCDM. Simulation results show proposed. Based on method (NNM), transition probability matrix that the updating TPM is able to update online when the driving cycle is available. Moreover, the older and recent (TPM) algorithm is implemented implemented for driving the three-dimensional three-dimensional MCDM. Simulation results show that the updating TPM is able to update online when the cycle is available. Moreover, the older and recent (TPM) algorithm is for the MCDM. Simulation results show observations can be weighted appropriately by adjusting a forgetting factor. that the is to online the driving is observations be weighted a forgetting factor.Moreover, that the TPM TPMcan is able able to update updateappropriately online when when by theadjusting driving cycle cycle is available. available. Moreover, the the older older and and recent recent observations can be weighted appropriately by aa vehicle, forgetting factor. © 2015, IFAC Federation of Automatic Control) Hosting by Elsevier Ltd. All method rights reserved. observations can(International be chain weighted appropriately by adjusting adjusting forgetting factor. Keywords: Markov driver model (MCDM), tracked nearest-neighborhood (NNM), Keywords: Markov chain driver model (MCDM), tracked vehicle, nearest-neighborhood method (NNM), transition probability matrix (TPM), online updating algorithm, energy management. Keywords: Markov driver model (MCDM), tracked vehicle, nearest-neighborhood method transition probability matrix (TPM), online updating algorithm, energy management. Keywords: Markov chain chain driver model (MCDM), tracked vehicle, nearest-neighborhood method (NNM), (NNM), transition probability matrix (TPM), online updating algorithm, energy management. transition probability matrix (TPM), online updating algorithm, energy management. power requested from the driver is represented by a Markov 1. INTRODUCTION power requested from the driver is represented by a Markov model (Cairano from et al. the [2014]). Because of thebyprobability 1. INTRODUCTION power driver is represented aa Markov model requested (Cairano from et al. the [2014]). Because of thebyprobability power requested driver is request, represented Markov 1. INTRODUCTION distribution of the drivers’ power thethecost function 1. INTRODUCTION With the hybrid vehicles being introduced to the market in model (Cairano et al. [2014]). Because of probability distribution of theet drivers’ powerBecause request,ofthethecost function (Cairano al. [2014]). probability With the hybrid vehicles being introduced to the market in model can be minimized in an expected form. the Considering the recent years andvehicles having being the potential to to reduce the fuel of drivers’ power cost function can be minimized in an expected form. the Considering the With the the hybrid introduced the market market in distribution distribution of the thea two-dimensional drivers’ power request, request, costdescribed function recent years andvehicles having being the potential to to reduce the fuel With hybrid introduced the in vehicle velocity, Markov chain consumption and emissions, an energy management strategy can be minimized in an expected form. Considering the velocity, a two-dimensional Markov chain described recent years years and and emissions, having the theanpotential potential to reduce reduce the the fuel vehicle can be minimized in an expected form. Considering the consumption energy management strategy recent and having to fuel drivers’ behaviour more accurately. An chain infinite-horizon needs to be and proposed to coordinate the operationstrategy of the the vehicle velocity, a two-dimensional Markov described the drivers’ behaviour more accurately. An infinite-horizon consumption emissions, an energy management vehicle velocity, a two-dimensional Markov chain described needs to be and proposed to coordinate the operationstrategy of the optimization problem with the discounted future costs is consumption emissions, an energy management multiple energy sources to on coordinate board (Choithe et al. [2014], of Emadi the drivers’ behaviour behaviour more the accurately. An infinite-horizon infinite-horizon optimization problem with discounted future costs is needs be operation the drivers’ more accurately. An multipleto sources to on coordinate board (Choithe et al. [2014], of Emadi needs to energy be proposed proposed operation the the solved by using the Stochastic Dynamic Programming (SDP) et al. [2005]). Even though it is impossible to know exactly optimization problem with the discounted future costs costs is solved by using the Stochastic Dynamic Programming (SDP) multiple energy sources on board (Choi et al. [2014], Emadi optimization problem with the discounted future is et al. [2005]). Even though it is impossible to[2014], know exactly multiple energy sources on board (Choi et al. Emadi (Liu and Peng [2008]). A weakness of the SDP approach is the future driving conditions (speed, road to slope etc.), the solved by using the Stochastic Dynamic Programming (SDP) (Liu and Peng [2008]). A weakness of the SDP approach is et al. [2005]). Even though it is impossible know exactly solved by using the Stochastic Dynamic Programming (SDP) theal.future driving conditions road to slope the that the optimization criterion discounts the future costs and et [2005]). Even thoughsolved it is(speed, impossible knowetc.), exactly global optimal approach by deterministic dynamic (Liu and Peng [2008]). A weakness of approach is that the criterion discounts theSDP future costs and the future driving conditions (speed, road slope etc.), the andaoptimization Peng [2008]). weakness of the the SDP approach is global optimal approach solved by deterministic dynamic the future driving conditions (speed, road slope etc.), the (Liu assigns penalty to criterion theA battery State oftheCharge (SOC) at programming (DDP) is widely used as a benchmark for other that the optimization discounts future costs and assigns aoptimization penalty to criterion the battery State oftheCharge (SOC) at global optimal optimal(DDP) approach solved byasdeterministic deterministic dynamic the discounts future costs and programming is widely usedby a benchmark for other that global approach solved dynamic instant (Tatetoetthe al. battery [2008]).State To solve this problem, a strategies (Salmasi [2014]),forand to every aa penalty of Charge (SOC) at every instant (Tatetoetthe al. battery [2008]).State To solve this problem, a programming (DDP) [2007], is widely widelyMalikopoulos used as as aa benchmark benchmark other assigns penalty of Charge (SOC) at strategies (Salmasi [2007], Malikopoulos [2014]),forand to assigns programming (DDP) is used other terminal state representation when the vehicles turn off is improve performance by appropriate rule extraction (Lin et al. every instant et al. [2008]). this problem, state (Tate representation whenTo thesolve vehicles turn off isaa strategiesperformance (Salmasi [2007], [2007], Malikopoulos [2014]),(Lin andet to to every instant (Tate et al. [2008]). To solve this problem, improve by appropriate rule extraction al. terminal strategies (Salmasi Malikopoulos [2014]), and added the representation two-dimensional Markov chain turn mentioned [2003]). As an alternative, stochasticrule control avoids perfect terminalto state when the vehicles vehicles off is is added tostate the representation two-dimensional Markov chain turn mentioned improve by extraction (Lin et when the off [2003]). performance As an alternative, stochasticrule control avoids perfect improve performance by appropriate appropriate extraction (Lin et al. al. terminal above. The terminal state turns to absorbing: every state assumption about the future driving by modelling the driver’s added to the two-dimensional Markov chain mentioned above. The terminal state turns to absorbing: every state [2003]). As an alternative, stochastic control avoids perfect added to the two-dimensional Markov chain mentioned assumption the future stochastic driving by control modelling the driver’s [2003]). Asasabout ana alternative, perfect transitions into it in finite time. Once in the terminal state, no behaviour stochastic input to by themodelling energyavoids management above. The Theinto terminal statetime. turns to in absorbing: every state transitions it in finite Once the terminal state, no assumption the future driving the driver’s above. terminal state turns to absorbing: every state behaviour asabout a stochastic input to by themodelling energy management assumption about the future driving the driver’s costs are into incurred andtime. there is in zero probability of controller. Due to the simplification of the mathematical transitions it in finite Once the terminal state, no costs are incurred and there is zero probability of behaviour as a stochastic input to the energy management transitions into it in finite time. Once in the terminal state, no controller. as Due to the simplification of the mathematical behaviour a stochastic input to the energy management transitioning out of it. The existence of this terminal state expression and to thethe easy incorporation with the optimal costs are incurred and there is zero probability of transitioning out of it. The existence of this terminal state controller. Due simplification of the mathematical costs are incurred and there is zero probability of expression and the easy incorporation with the optimal controller. Due to the simplification of the mathematical forms a Shortest-Path Stochastic Dynamic Programming (SPcontrol, the Markov chain (Grimmet and Stirzaker [2004]) transitioning out of it. The existence of this terminal state forms a Shortest-Path Stochastic Dynamic Programming (SPexpression and the chain easy incorporation incorporation with the optimal optimal out of it. The existence of this terminal state control, the and Markov (Grimmet and with Stirzaker [2004]) transitioning expression the easy the problem, guaranteeing the expected objective cost to be based model has been(Grimmet successfully to [2004]) lay the SDP) aa Shortest-Path Stochastic Dynamic Programming (SPSDP) problem, guaranteeing the expected objective cost to be control,driver the Markov Markov chain and utilized Stirzaker forms Shortest-Path Stochastic Dynamic Programming (SPbased driver model has been(Grimmet successfully utilized to [2004]) lay the forms control, the chain and Stirzaker finite with no discount, and only penalizing the SOC foundation for the stochastic control approaches as opposed SDP) problem, the expected objective cost to be with noguaranteeing discount, and only penalizing the SOC based driver driverformodel model has been been control successfully utilizedasto toopposed lay the the finite SDP) problem, guaranteeing the expected objective cost to be foundation the stochastic approaches based has successfully utilized lay deviation from adiscount, set point and whenonly the penalizing vehicle is turned off to the deterministic ones. finite with no the SOC deviation from a set point when the vehicle is turned off foundation for the stochastic control approaches as opposed finite with no discount, and only penalizing the SOC to the deterministic ones. foundation for the stochastic control approaches as opposed (Tate et al. [2008]). A similar Markov chain containing the deviation from a set setApoint point when the vehicle vehicle is turned turned the off (Tate et al.from [2008]). similar Markov chain containing to the ones. when the is off to the deterministic deterministic ones. A Markov process (or Markov chain) is a system that can be deviation terminal is aproposed in recent years (Opilacontaining et al. [2012]), (Tate et et state al. [2008]). [2008]). A similar similar Markov chain the A Markov process (or Markov chain) is a system that can be (Tate terminal state is proposed in recent years (Opila et al. [2012]), al. A Markov chain containing the in one of process several (or states and can move from one state to where the vehicle velocity and the acceleration constitute the A Markov chain) is system that can is proposed recent years (Opila et in Markov one of process several (or states and can move from one to terminal A Markov Markov chain) is aastep system thatstate cantobe be where thestate vehicle velocityin and the acceleration constitute the state is proposed recent years (Opila et al. al. [2012]), [2012]), another state, including itself, each time according a terminal state space. The above inand Markov chain models are the all in one of several states and can move from one state to where the vehicle velocity the acceleration constitute another state, including itself, each time step according to toa where in one of several states and can move from one state state space. The above Markov chain models are all the vehicle velocity and the acceleration constitute the transition probability matrix (TPM). The Markov property stationary, which means the model is invariant for the time another state, including itself, each time according to aa state space. The above Markov chain models are all transition probability matrix (TPM). Thestep Markov property another state, including itself, each time step according to stationary, which means the model is invariant for the time state space. The above Markov chain models are all states that probability the future states are(TPM). independent of the past states and position, hence a position-dependent Markov chain with transition matrix The Markov Markov property which means the model is invariant for the time states that probability the future states are(TPM). independent of the past states stationary, transition matrix The property and position, hence a position-dependent Markov chain with stationary, which means the model is invariant for the time given the present state (Grimmet and Stirzaker [2004]). The the states of the vehicle velocity and acceleration is states that that the future future states are independent independent of the the past states states and position, position-dependent Markov chain with given the present state (Grimmet and Stirzaker [2004]). The the states the states are past states ofhence the aavehicle velocity and acceleration is and position, hence position-dependent Markov chain with verification of thestate Markov property can beoffound in some established potential of SDP’s control given the present (Grimmet and Stirzaker [2004]). The the states states to ofassess the the vehicle velocity andpredictive acceleration is verification of thestate Markov property can be found in some given the present (Grimmet and Stirzaker [2004]). The established to assess the potential of SDP’s predictive control the of the vehicle velocity and acceleration is previous papers (ShiMarkov et al. [2013]). Many researches focused ability in contrast homogeneous Markov chain with verification of the property can be found in some established to assess assessto theaa potential potential of SDP’s SDP’s predictive control previous papers (ShiMarkov et al. [2013]). Many researches focused verification of the property can be found in some ability in contrast to homogeneous Markov chain with established to the of predictive control on the Markov(Shichain based driver stochastic focused energy vehicle velocity and power request as the states (Johannesson previous et [2013]). Many contrast aa homogeneous chain on the papers Markov(Shichain based stochastic focused energy ability previous papers et al. al. [2013]).driver Many researches researches vehicle in velocity and to power request as theMarkov states (Johannesson ability in contrast to homogeneous Markov chain with with management strategies. Stochastic Model Predictive Control et al. [2007]). on the Markov chain based driver stochastic energy vehicle velocity and power request as the states (Johannesson management strategies. Stochastic Model stochastic Predictive Control on the Markov chain based driver energy et al. [2007]). vehicle velocity and power request as the states (Johannesson (SMPC) is promising to outperform thePredictive other real-time management strategies. Model Control et al. [2007]). (SMPC) is promising to outperform other real-time management strategies.byStochastic Stochastic Modelthe Predictive Control energy managements the underlying assumption that the et al. [2007]). (SMPC) is promising to outperform the other real-time energy managements by the underlying assumption that the (SMPC) is promising to outperform the other real-time energy managements managements by by the the underlying underlying assumption assumption that that the the energy

Copyright © 2015, 2015 IFAC 361 Hosting by Elsevier Ltd. All rights reserved. 2405-8963 © IFAC (International Federation of Automatic Control) Copyright © 2015 IFAC 361 Peer review under responsibility of International Federation of Automatic Control. Copyright © 361 Copyright © 2015 2015 IFAC IFAC 361 10.1016/j.ifacol.2015.10.052

IFAC E-COSM 2015 362 August 23-26, 2015. Columbus, OH, USA

Dexing Liu et al. / IFAC-PapersOnLine 48-15 (2015) 361–367

Even though much attention has been paid to the Markov chain model application to the stochastic energy management strategy for HEVs, the research on the Markov chain driver models (MCDMs) for tracked vehicles is scarce. Significantly differing from the wheeled vehicle, the power consumption during steering is much higher than that during heading straight for tracked vehicle (Wang et al. [1983]). A stochastic driver model incorporating the heading and steering motion is high in demand. Previous research on the stochastic control for tracked vehicles neglects the steering power so that the MCDM is still a two-state Markov chain in consideration of the heading power and velocity (Zou et al. [2012a]). Moreover, the driver behaviour is also affected by the surrounding environment all the time, for instance due to the variant traffic conditions, the road types, and the driver emotional states and objectives. Therefore, the MCDM should have the flexibility to real-time update to reflect the changes of the driver behaviour. In other words, TPM can update online by utilizing the new velocity data provided by telematics systems on board (Cairano et al. [2014]), unlike the offline estimation of the TPM on the basis of observed sample data, such as standard driving cycles, or past driving records (Liu and Peng [2008]).

When the slippage of the tracks is not considered, the vehicular average heading speed vave is calculated as Eq. (2)

vave 

(2)

where v1 and v2 are the speeds of two tracks. The vehicular rotational speed is calculated as Eq. (3)

This paper discusses a new MCDM for tracked vehicles and a TPM online updating algorithm based on nearestneighbourhood method (NNM) with a forgetting factor adjusting the weights between older and recent observations. A comparison measurement of TPMs called the KullbackLeibler (KL) divergence (Rached et al. [2004]) is applied to quantize the difference between the updated time-variable TPMs.

Fig. 1. Force diagram of tracked vehicles



v1  v2 B

(3)

where B is the tread of the vehicle.

The remainder of the paper is arranged as follows: Section 2 introduces the MCDM for tracked vehicles and formulates the TPM online updating algorithm; moreover, the KL divergence is expounded; the results are discussed in Section 3; Section 4 concludes this paper.

The value of Fi is evaluated by

Fi  ma

(4)

where m is curb weight and a is the acceleration. The value of Fa is calculated by

2. A NEW MCDM FOR TRACKED VEHICLES AND ONLINE UPDATING ALGORITHM FOR NNM

Fa 

2.1 MCDM for Tracked Vehicles

Cd A 2 vave 21.15

(5)

where A is the fronted area and Cd is the aerodynamic coefficient. The rolling resistance Fr is computed by

An essential task in constructing an MCDM is to express the power demand in a form that is computationally simple, but with adequate precision. The general force diagram of a tracked vehicle is shown in Fig. 1.The demanded power to propel the vehicle, Pdem, is calculated as the combination of heading power and steering power (Wong [2001])

Pdem  ( Fi  Fa  Fr )vave  M 

v1  v2 2

Fr  mg  f

(6)

where f is the rolling resistance coefficient and g is the acceleration of the gravity. The value of M is calculated by

1 M  ut mgL 4

(1)

where the first product is the heading power and the second is the steering power. Fi is the inertial force, Fa is the aerodynamic drag, Fr is the rolling resistance, vave is the average speed of vehicle, M is the resisting yaw moment from the ground assuming steady-state turning and ω is the rotational speed of the vehicle.

(7)

where L is the contacting length of tracks and the coefficient of the lateral resistance ut is computed based on empirical results (Wang et al. [1983])

ut  umax / (0.925  0.15R / B)

362

(8)

IFAC E-COSM 2015 August 23-26, 2015. Columbus, OH, USA

R

Dexing Liu et al. / IFAC-PapersOnLine 48-15 (2015) 361–367

vave | |

angular velocity and average velocity. Besides, the power demand is assumed to take on a finite number of values

(9)

N

1 2 Pdem  {Pdem , Pdem , ... , Pdemp }

where R is the turning radius of the vehicle, and umax is the maximum value of the coefficient of lateral resistance depending on terrain type.

Nv 2 vave  {v1ave , vave , ... , vave }

  {1 ,  2 , ... ,  N } w

2.2 Online Updating Algorithm for NNM Nearest-neighbourhood method can transform the value in the continuous domain X to one singleton whose value is closest to the continuous original from a finite set of discrete granules representing the state space of the Markov chain (Filev and Kolmanovsky [2014]). The MCDM proposed in the above section is based on the NNM. Assume that the state space S is taken as the set {xj, j=1,2,…,M}. Based on offline TPM estimation (Liu and Peng [2008]), the TPM p can be identified from a set of measured data.

Velocity (km/h)

Track 2,v2

20 10 0

0

100

200

300

400

500

600

700

800

900

1000

200 Power (KW)

Heading Power

Steering Power

100

pij 

0 -100

0

100

200

300

400

500 Time (s)

600

700

800

900

Table 1. Parameters of the test tracked vehicle

 pij Value

Curb weight m/kg Fronted area A/m

5.4

Aerodynamic coefficient Cd Rolling resistance coefficient f Tread of the vehicle B/m

1 0.049 2.55

Maximum lateral resistance coefficient umax Contacting track width L/m

1 3.57

Fij (k ) 

 pij,l Pr {(Pdem,)k1 (Pdem,) | (Pdem,)k (P dem,) , vave,k v } j

i,j 1,2,..., NP  N, l 1,2,..., Nv

i

N oi

(13)

Nij (k ) Nij (k ) / k Fij (k )   Noi (k ) N oi (k ) / k Foi (k )

(14)

where Fij(k) is the frequency rate of transition events fij(k) from state xi to state xj and Foi(k) is the total frequency rate of transition events initiated from the state xi, fi(k)=∑j=1Mfij(k).

15200

2

N ij

where Nij denotes the transition numbers from state xi to state xj, and Noi=∑j=1MNij is the total transition numbers initiated from state xi. The values of i and j belong to the set {1,2,…,M} throughout the text unless declared. A variant can be obtained by modifying (13) into a frequency rate-based counterpart.

1000

Fig. 2. Heading power and steering power of a natural driving schedule

Description

(12)

Note that the pairs (Pdem,ω)i is the 2-D Cartesian space of {ω1, ω2,…, ωNw} and {Pdem1, Pdem2,…PdemNp}.

40 Track 1,v1

(11)

The average velocity and angular velocity are also discretized into a finite number of values

Steering power for tracked vehicles is the product of yaw moment and angular velocity, as shown in Eq. (1), and can be a significant component of the total requested power when the vehicle steers where the difference between v1 and v2 is not zero, as illustrated in the field test driving cycle data in Fig. 2. Thus the rotational velocity ω is incorporated in the Markov state to predict the power request more precisely, forming the TPM description in Eq. (10). The characteristic parameters of the tracked vehicle used in the field test are given in Table 1 (Zou et al. [2012b]).

30

363

l ave

Foi (k )  (10)

where pij,l is the one-step probability that the system is in (Pdem,ω)j at time instant k+1, given that at time step k the values are (Pdem,ω)i and vavel. Notice that pij,l is actually a simplification of a three-state Markov chain pij,lm (with m=1,2,…,Nv), justied by the fact that the variation of velocity from one time step to the next is a deterministic process, since it can be predicted using the current power demand,

N ij (k ) 1 k   fij (t) k k t 1

M N oi (k ) 1 k f (t)    i  Fij (k ) k k t 1 j 1

(15)

(16)

where fij(t)=1 if there occurs a transition from xi to xj at time instant t; fi(t)=1 if there occurs a transition initiated from xi at time instant t; otherwise, these take zero values. From Eqs. (14)-(16), a recursive expression for online updating the TPM is derived (Filev and Kolmanovsky [2014]).

363

IFAC E-COSM 2015 364 August 23-26, 2015. Columbus, OH, USA

Dexing Liu et al. / IFAC-PapersOnLine 48-15 (2015) 361–367

More specifically,

1 k 1 f ij (t ) [(k  1) Fij (k  1)  fij (k )]  k t 1 k (17) 1  Fij (k  1)  [ f ij (k )  Fij (k  1)] k

Fij (k )

p(k )  [diag( Fo (k ))]1 F (k )

where the matrix F(k) and the vector Fo(k) are formed by Fij(k) and Foi(k) respectively

1 k 1 f i (t ) [(k  1) Foi (k  1)  f i (k )]  k t 1 k (18) 1  Foi (k  1)  [ fi ( k )  Foi (k  1)] k

F (k )  F (k  1)   [ (k ) (k )T  F (k  1)]

Foi (k )

pij 

Foi (k  1)   [ f i (k )  Foi (k  1)]

with initial conditions

F (0)   E ; Fo (0)  F (0) E

A metric is needed to measure the difference between the TPMs. The KL divergence between two TPMs, namely, P and Q, is given by the following equation (Rached et al. [2004])

(19)

DKL (P || Q)  [P(x | x)P*(x)]log[P(x | x)/ Q(x | x)] (26) x x

where x and x+ is the current and next states respectively, and P* is the steady-state probability distribution of P with the size of 1×M. Moreover, the summations are performed for all possible x and x+. The steady-state probability distribution P* is obtained by solving the following equation

W [(1   )k  (1   ) k 1  2 (1   )k  2 ...  ] (20) where the sum of the elements of the vector W equals to 1 and it can be interpreted as a weighted average type aggregating operator applied to the sequence fij(t) (Filev and Kolmanovsky [2014]). There exists a property that the effective memory depth Kδ, that is, the length of the moving window of W approximates the reciprocal of δ, i.e., Kδ=1/δ. Another distinct point is that the transition probabilities are recursively iterated over the transition events belonging to the soft interval {k-Kδ+1,k], where symbol { represents a soft lower interval limit which indicates the transition events with lower time indexes than (k-Kδ) have relatively low impact. This allows for weighing the older observations and recent observations by varying the value of the forgetting factor, contributing to continuous adaptation to the changing traffic conditions, road types, and driving style.

P * P  P*

(27)

*

P can be interpreted as an eigenvector of the TPM P whose eigenvalue corresponds to 1. The KL divergence is a quantification of the difference between P and Q when Q is used for the purpose of the prediction while real transitions are governed by P. To make the logarithm operator be meaningful, the elements of P and Q must be greater than zero. In order to satisfy this condition, we replace P and Q by Preg and Qreg respectively.

(1  ) P   ( Preg 

Eq. (19) can be rewritten in a matrix form by replacing fij(k) and fi(k) by their vector counterparts τ(k) and γ(k). If fij(k)=1 and fi(k)=1, then one can rearrange these two equations as vector multiplication counterparts, τ(k)γ(k)T and τ(k)γ(k)TeM, where τ(k) and γ(k) are M-dimensional column vector with ones as the i-th and j-th elements, respectively, and zeros elsewhere, and eM is an M-dimensional column vector with all elements equal to one.

1 ) EM  M M

(28)

1 ) EM  M M

(29)

(1  )Q   ( Qreg 

where β is a small constant ranging from 0 to 1, M is the number of states of the TPMs, and EM×M is a unit matrix with the size of M×M. Eqs. (28)-(29) secure the non-zero transition probabilities between any two states.

(20)

Three properties are proposed here for better understanding for KL divergence:

The i th element is 1

  (k ) [0 ... 1 ... 0],if x(k ) x j   

(25)

where λ is a small nonnegative number introduced to avoid singularity and E is an M×M matrix with all elements equal to 1.

The forgetting factor updates the frequency rates Fij(k) and Foi(k) by assigning a set of exponentially decreasing weights to the older observations

  (k ) [0 ... 1  ... 0],if x(k  1) xi   

(23)

Fo (k )  Fo (k  1)   [ (k ) (k )T eM  Fo (k  1)] (24)

By replacing 1/k with a constant δ ranging from 0 to 1, (17)(18) will assign the old data with exponentially decreasing weights such that this recursive algorithm can be interpreted as an autoregressive (AR) model implementing an exponential smoothing algorithm with forgetting factor δ. The recursive algorithm then becomes

Fij (k  1)   [ f ij (k )  Fij (k  1)]

(22)

(21)

It is always non-negative;

The j th element is 1

DKL(P||Q)=0 if, and only if, P=Q; 364

IFAC E-COSM 2015 August 23-26, 2015. Columbus, OH, USA

Dexing Liu et al. / IFAC-PapersOnLine 48-15 (2015) 361–367

The KL divergence is a non-symmetric measure. In other words, DKL(P||Q) ≠DKL(Q||P), in general.

365

The grids of the velocity, angular velocity, and power demand are discretized as sets {0,10,20,30}, {-0.6,-0.3,0,0.3}, and {-20,0,20,…,100}, respectively. Besides, all the other parameters are inherited from the above subsection. Then, the TPMs demonstrated in Eq. (10) at every 100 seconds when the velocity vave is 10km/h obtained by the online updating algorithm are illustrated in Fig. 4.

The closer the KL divergence is to zero, the more similar P is to Q. Thus, the KL divergence provides a metric to assess the difference between TPMs that are updated by the online updating algorithm presented in Section 3. 3. RESULTS The driving cycle shown as Fig. 1 and the tracked vehicle with the parameters listed in Table 1 are chosen to validate the proposed method. After the appropriate evaluation, the grids of the velocity, angular velocity, and steering power are determined as sets {0,5,10,15,20,25,30}, {-0.6,-0.3,0,0.3}, and {0,10,…,100}, respectively. The forgetting factor is set to be 0.01. The constant λ for the initialization in (25) is 10-12, and the constant β in Eqs. (28)-(29) is initialized as 0.0001. Then, the steering power transition probabilities at 900s are obtained with respect to different vehicle velocity rates and angular velocity rates by the online updating algorithm in Eqs. (22)-(25), which is shown in Fig. 3.

Fig. 4. The TPMs obtained by online updating algorithm at vave=10km/h The labels xk and xk+1 represent the current state index and the next state index respectively. The TPMs at 100s, 200s, and 300s are different from each other very significantly. However, the TPMs at 300s, 400s,…, and 900s seem to remain unchanged. The differences between the TPMs can be further quantized with the KL divergence. Fig. 5 shows the KL divergences between successive TPMs. The KL divergences are relatively large at the first two points, but the values are almost 0 at the other points, which is in good accordance with Fig. 4. However, the KL divergences for the cases of 0km/h and 20km/h change more frequently. Fig. 6 demonstrates the TPMs at vave=20km/h. Almost every TPM at vave=20km/h is a little different from each other so that the KL divergences fluctuate more frequently. Furthermore, only a peak appears when vave equals to 30km/h because there are few transitions occur at this velocity.

Fig. 3. The steering power TPMs obtained by online updating algorithm with respect to different vehicle velocity rates and angular velocity rates at 900s The labels Psk and Psk+1 represent the current steering power and the next steering power respectively. Zero probabilities occur when the vehicle velocity is above 25km/h while steering. However, nonzero probabilities arise when the vehicle velocity decreases. Therefore, the steering power is not only highly coupled with the vehicle velocity, but also with the angular velocity. As a consequence, the power demand is also closely associated with the vehicle velocity and the angular velocity, because of the steering power being a part of the power demand. That is the reason why we add the angular velocity to the Markov state to form the new MCDM.

The above procedure inspires us to change the control policy when the significant variation is observed in the drivers’ power request using the KL divergence criteria. When the KL divergence exceeds some threshold, which represents that the learning algorithm should update the power request transition probability model while forgetting the previously identified model, then the control policy is supposed to be recalculated based on the statistical information available up to now. This idea will be explored in our future research work regarding stochastic energy management control for HEVs. 365

IFAC E-COSM 2015 366 August 23-26, 2015. Columbus, OH, USA

Dexing Liu et al. / IFAC-PapersOnLine 48-15 (2015) 361–367

(2013BAG10B00) and University Talent Introduction 111 Project (B12022).

The KL divergences between successive TPMs

0.9 vave=0km/h

0.8

vave=10km/h vave=20km/h

0.7

REFERENCES

vave=30km/h

0.6

Cairano S. D., Bernardini D., Bemporad A., and Kolmanovsky I. (2014). Stochastic MPC with learning for driver-predictive vehicle control and its application to HEV energy management. IEEE Trans. Contol Syst. Technol., 22(3), 1018-1031. Choi M.-E., Lee J.-S., and Seo S.-W. (2014). Real-time optimization for power management systems of a battery/supercapacitor hybrid energy storage system in electric vehicles. IEEE Trans. Veh. Technol., 63(8), 3600-3611. Emadi A., Rajashekara K., Williamson S. S., and Lukic S. M. (2005). Topological overview of hybrid electric and fuel cell vehicular power system architectures and configurations. IEEE Trans. Veh. Technol., 54(3), 763770. Filev D. P. and Kolmanovsky I. (2014). Generalized Markov models for real-time modeling of continuous systems. IEEE Trans. Fuzzy Syst., 22(4), 983-998. Grimmet G. and Stirzaker D. (2004). Probability and Random Processes. Oxford Univ. Press, London, U.K. Johannesson L., Asbogard M., and Egardt B. (2007). Assessing the potential of predictive control for hybrid vehicle powertrain using stochastic dynamic programming. IEEE Trans. Intell. Transp. Syst., 8(1), 7183. Kosko B. (1996). Fuzzy Engineering. Prentice-Hall, Upper Saddle River, NJ, USA. Lin C.-C., Peng H., Grizzle J. W., and Kang J.-M. (2003). Power management strategy for a parallel hybrid electric truck. IEEE Trans. Contol Syst. Technol., 11(6), 839 849. Liu J. and Peng H. (2008). Modeling and control of a powersplit hybrid vehicle. IEEE Trans. Contol Syst. Technol., 16(6), 1242-1251. Malikopoulos A. A. (2014). Supervisory power management control algorithms for hybrid electric vehicles: a survey. IEEE Trans. Intell. Transp. Syst., 15(5), 1869-1885. Opila D. F., Wang X., McGee R., Gillespie R. B., Cook J. A., and Grizzle J. W. (2012). An energy management controller to optimally trade off fuel economy and drivability for hybrid vehicles. IEEE Trans. Contol Syst. Technol., 20(6), 1490-1505. Rached Z., Alajaji F., and Campbell L. L. (2004). The Kullback-Leibler divergence rate between Markov sources. IEEE Trans. Inform. Theory, 50(5), 917-921. Salmasi F. R. (2007). Control strategies for hybrid electric vehicles: evolution, classification, comparison, and future trends. IEEE Trans. Veh. Technol., 56(5), 23932404. Shi S. M., Lin N., Zhang Y., Huang C. S., Liu L., and Lu B. W. (2013). Research on Markov property analysis of driving cycle. IEEE Vehicle Power and Propulsion Conference, 453-457. Tate E. D., Grizzle J. W., and Peng H. (2008). Shortest path stochastic control for hybrid electric vehicles. Int. J. Robust Nonlinear Control, 18, 1409-1429.

0.5 0.4 0.3 0.2 0.1 0

-0.1 100s vs 200s 200s vs 300s

300s vs 400s 400s vs 500s 500s vs 600s 600s vs 700s Comparison pairs

700s vs 800s

800s vs 900s

Fig. 5. The KL divergences between successive TPMs

Fig. 6. The TPMs obtained by online updating algorithm at vave=20km/h 4. CONCLUSION This paper presents a new three-dimensional MCDM for the tracked vehicles considering the angular velocity. Subsequently, NNM based online updating algorithm for TPM is applied to this new three-dimensional MCDM. With the help of the KL divergence, the differences between TPMs for the online updating algorithm are compared to trigger the update of the TPM. The future work will extend to combine the new threedimensional MCDM and online updating algorithm for TPM with the reinforcement learning-based energy management for hybrid tracked vehicles. ACKNOWLEDGEMENT This research work is supported by NSF China with grant 51375044, National Science-Technology Pillar Plan Project 366

IFAC E-COSM 2015 August 23-26, 2015. Columbus, OH, USA

Dexing Liu et al. / IFAC-PapersOnLine 48-15 (2015) 361–367

Wong J.-Y. (2001). Theory of Ground Vehicles, 3rd ed. J. Wiley & Sons Press: New York, NY, USA. Wang M.-D., Zhao Y.-Q., and Zhu J.-G. (1983). The Driving Principles for Tank. Defense Industry Press, Beijing, China. Zou Y., Chen R., Hou S. J and Hu X. S. (2012a). Energy management strategy for hybrid electric tracked vehicle based on stochastic dynamic programming. Journal of Mechanical Engineering, 48(14), 91-96. Zou Y., Sun F. C., Hu X. S., and Guzzella L. (2012b). Combined Optimal Sizing and Control for a Hybrid Tracked Vehicle. Energies, 5, 4697-4710.

367

367