Applied Energy 254 (2019) 113648
Contents lists available at ScienceDirect
Applied Energy journal homepage: www.elsevier.com/locate/apenergy
Synchronous multi-parameter prediction of battery systems on electric vehicles using long short-term memory networks ⁎
T
⁎
Jichao Honga,b,c, Zhenpo Wanga,b, , Wen Chenc, , Yongtao Yaod a
National Engineering Laboratory for Electric Vehicles, Beijing Institute of Technology, Beijing, 100081, China Beijing Co-innovation Center for Electric Vehicles Lecturer, Beijing 100081, China c Division of Engineering Technology, Wayne State University, Detroit, MI 48201, USA d Department of Computer Science, Wayne State University, Detroit, MI 48202, USA b
H I GH L IG H T S
multi-parameter prediction for battery systems is investigated. • Synchronous and all-state application in electric vehicles can be obtained. • All-climate developed pre-dropout technique is introduced to prevent LSTM from overfitting. • ADifferent models can be intelligently applied in driving/charging states. • The predicted horizons are adjustable to provide adequate emergency time. •
A R T I C LE I N FO
A B S T R A C T
Keywords: Battery systems Electric vehicles Parameter prediction Long short-term memory Hyperparameter Fault prognosis
Voltage, temperature, and state of charge (SOC) are the main characterizing parameters for various battery faults that can cause these parameters’ abnormal fluctuations. Accurate prediction for these parameters is critical for the safe, durable, and reliable operation of battery systems in electric vehicles. This paper investigates a new deep-learning-enabled method to perform accurate synchronous multi-parameter prediction for battery systems using a long short-term memory (LSTM) recurrent neural network. A year-long dataset of an electric taxi was retrieved at the Service and Management Center for electric vehicles (SMC-EV) in Beijing to train the LSTM model and verify the model’s validity and stability. By taking into account the impacts of weather and driver’s behaviors on a battery system’s performance to improve the prediction accuracy, a Weather-Vehicle-Driver analysis method is proposed, and a developed pre-dropout technique is introduced to prevent LSTM from overfitting. Besides, the many-to-many(m-n) model structure using a developed dual-model-cooperation prediction strategy is applied for offline training the LSTM model after all hyperparameters pre-optimized. Additionally, the stability and robustness of this method have been verified through 10-fold cross-validation and comparative analysis of multiple sets of hyperparameters. The results show that the proposed model has powerful and precise online prediction ability for the three target parameters. This paper also provides feasibility for synchronous multiple fault prognosis based on accurate parameter prediction of the battery system. This is the first of its kind to apply LSTM to the synchronous multi-parameter prediction of the battery system.
1. Introduction Global warming and depletion of fossil fuels have gained tremendous attention in the past decades; this has also paved opportunities for the rapid development of electric vehicles (EVs). As a critical onboard energy storage component, a battery system is vital to the safe, durable, and reliable operation of an EV. Traffic accidents caused by
battery faults/failures have also been occurring continuously in recent years, which leads to an increasing demand for battery systems to become more safe and reliable. Among various battery parameters, voltage, temperature, and state of charge (SOC) are the three main characterizing parameters for various battery faults; various battery faults can be indicated by the abnormal fluctuations of these parameters, and potential battery thermal runaway may encounter without appropriate
⁎ Corresponding author at: National Engineering Laboratory for Electric Vehicles, Department of Vehicle Engineering, School of Mechanical Engineering, Beijing Institute of Technology, Beijing 100081, China. E-mail addresses:
[email protected] (J. Hong),
[email protected] (Z. Wang),
[email protected] (W. Chen).
https://doi.org/10.1016/j.apenergy.2019.113648 Received 3 February 2019; Received in revised form 16 July 2019; Accepted 1 August 2019 0306-2619/ © 2019 Elsevier Ltd. All rights reserved.
Applied Energy 254 (2019) 113648
J. Hong, et al.
was 3.05% via validation of experimental data. Several more efforts have been made to make accurate temperature prediction of battery systems, but these studies are mostly developed in laboratory environments, and the effects in practical applications have yet to be verified. In addition, few of them attempt to make real-time temperature prediction and related thermal fault prognosis based on actual vehicular operation data. SOC is one of the most critical factors for a battery system since it is related directly to the remaining capacity and energy of the battery system. Accurate SOC estimation is a cumbersome task since the battery system undergoes unpredictable vehicle operating conditions, and the uncertain nonlinear relationships between SOC and battery parameters such as voltage, current, and other parameters have not been clearly answered. Up to now, there are no direct methods to measure and obtain the real SOC except that some estimation methods are employed [12]. A considerable amount of literature has been published on SOC estimation, and open circuit voltage [13] and coulomb counting [14] are the two initially used estimation techniques. Then, due to their limitations, many more efficient algorithms are developed such as the Luenberger observer [15], sliding mode observer [16,17], and H-infinity observer [18], etc. Besides, Xiong and He et al. [19–22] presented several model-based online SOC estimation methods based on Kalman filtering (KF) and extended KF algorithms. However, all these methods are usually computationally intensive, and additional parameters or multiple models are typically required to accommodate different working environments. In recent years, several studies have been carried out on machine-learning techniques for SOC estimation, such as support vector machines [23] and feedforward neural networks (FNN) [24]. However, these methods mostly use constant pulse discharge/ charge profiles under experimental conditions, so it is unpredictable for these models’ predictive abilities in dynamic real-world operating conditions. Besides, the existing methods based on FNN with a unidirectional multi-layer structure cannot learn and remember long-term historical information, so it fails to perform multi-forward-step SOC prediction. Therefore, RNN may be a better candidate for SOC estimation such as the long short-term memory (LSTM) neural networks owning memory cells [25]. It is difficult not to find the great strides achieved in the field of machine-learning nowadays, which has been deeply entrenched in our lives nowadays [26]. Its applications cover all areas of artificial intelligence, such as data mining, computer vision, search engines, and intelligent robots, etc. In recent years, the applications of machinelearning in EVs or power battery systems have gradually increased, and most of them are focused on estimation of SOC, state of power, or state of health [27,28]. However, most of these efforts were carried out under laboratory conditions, which are difficult to have rational persuasiveness. At the same time, most of them are committed to predicting individual parameter one step forward, lacking the exploration of multi-parameter and multiple-forward-step SOC prediction. There is no doubt that various battery parameters and their relationships have significant impacts on safe operation of a battery system, and neglecting the correlations among these parameters may worsen the model accuracy especially in the case of individual parameter prediction. Furthermore, separate predictions for multiple battery parameters may lead to mutual interference or efficiency offsetting problems due to the inconsistency on prediction methods. So far, there has no relevant research exploring the possibility of synchronous prediction on multiple battery parameters during the actual vehicular operations. To predict various parameter status of battery systems more accurately, in this paper, we first investigate the LSTM-based synchronous multi-parameter prediction. Dataset of an electric taxi is retrieved from a big data platform for EVs, and how LSTM can simultaneously predict battery voltage, probe temperature, and SOC by selflearning the historical data and network parameters are showcased. Compared with the traditional methods, machine learning technique has a robust deep-learning capacity, and the learning accuracy will be
safety measures. For instance, over-voltage means there may be a fault that the battery system is over-charged and the charge protection circuit is ineffective; over-temperature of the battery pack represents that there may be faults that the single cell is short-circuited internally or the battery pack is short-circuited internally/externally; SOC’s over-fast dropping means that there may be faults the current occurs abnormity (the forward current of Hall is too large, and the feedback current is too small) or the cell voltage is too low to result in that voltage drops too fast. Therefore, human interventions such as real-time monitoring and accurate prediction of these parameters are essential and meaningful to identify potential battery problems to safeguard the vehicular safety operation. 1.1. Literature review However, on the one hand, the existing prediction research for battery systems are mainly model-based conducted in experimental environments or the pre-set vehicular operation conditions. On the other hand, the existing typical battery models only can handle muchlimited input data (usually current, voltage etc.) without considering driver behaviors). So these existing studies are inefficient for learning complex road conditions and driver’s driving behaviors during realtime vehicular operation. Due to the difficulty in predicting battery voltage based on current physical experiments and mathematical models, very few papers have reported voltage prediction of batteries in the past decades. In 2005, Rakhmatov [1] presented a prediction model that can accurately predict the battery voltage with the given current. By using the proposed novel kernel adaptive filter for multiple inputs, Tobar et al. [2] showed that the incorporation of external variables could improve prediction performance associated with the evolution of the battery voltage. Based on laboratory-scale experimental tests, Li et al. [3] proposed a Li-ion battery model for estimating battery terminal voltage under dynamic loads and battery aging conditions. To diagnose faults by predicting battery voltage, Farid et al. [4] presented a novel method for predicting voltage trajectory, where the calculation speed can remarkably increase with retaining precision. Wang et al. [5] proposed an in situ voltage fault diagnosis method based on the modified Shannon entropy, and the set abnormity coefficients can make a real-time evaluation of voltage abnormities. With the gradual and in-depth development of machine learning in recent years, there has been an increasing amount of machine-learning-based literature on prediction battery voltage. For instance, Zhao et al. [6] investigated a prediction approach to combining accurate lithium-ion battery dynamic modeling with a recurrent neural network (RNN), and the error rate is less than 5%. However, these presented methods in the literature are mostly experiments based; there are still many difficulties in applying these methods to the battery packs in real operating EVs due to the existence of unpredictable road conditions and drivers’ driving behaviors. Recent developments in the research of thermal runaway prevention have led to increased interests in temperature prediction of battery systems. Usually, the performance of a battery system is strongly dependent on its ambient temperature [7], so an effective and stable battery thermal management system is essential since the extreme heat will significantly affect safety and durability of EVs. For accurately predicting the temperature of battery packs to mitigate these potential hazards, Hong et al. [8] proposed a thermal runaway prognosis scheme for battery systems in EVs based on an entropy method. Sun et al. [9] proposed an online internal temperature estimation method using the Kalman filter. However, these methods are to evaluate future temperature trend based on current temperature, but they are unable to obtain future one or more values of battery voltages. To predict the battery temperature, Feng [10] presented a prediction model using a simplified equivalent circuit and modified correction factors to improve the precision. Chen et al. [11] investigated a two-step prediction approach based on support vector machine, and the mean prediction error 2
Applied Energy 254 (2019) 113648
J. Hong, et al.
gradually improved with the accumulation of historically learned information, so the continuously accumulated vehicle operation data will undoubtedly provide a broader application prospect for the proposed method. 1.2. Contributions of the work This paper attempts to make several notable contributions and improvements to the current prediction techniques for battery systems as shown in the following. (1) A Weather-Vehicle-Driver analysis method is proposed to take into account the impacts of weather and driver’s behavior on a battery system’s performance under real operating conditions, and thereby it allows both the environmental factors and the human interventions to be factored in when predicting the future characteristics of battery parameters. (2) An improved pre-dropout technique is proposed to prevent LSTM from overfitting by effectively appointing the most suitable input parameters before training. This technique can also maximally preserve the neurons and connections that are most conducive to the prediction results. (3) Via intelligently applying two models separately in driving/charging status, the developed dual-model-cooperation (DMC) prediction strategy can maximally ensure the prediction accuracy in all vehicular operation status. (4) To reduce the calculation time of each real-time prediction, the LSTM model is well-trained offline using a huge input sample (IS), and multiple target parameters can share one generic model with high accuracy by using the applied LSTM model with mapping of many-to-many(m-n) architecture. Hence each online prediction is very fast (less than the sampling interval) and will not be interfered by the training process. (5) The predicted horizons can be flexibly adjusted according to prediction accuracy requirements of the target parameters, which can provide sufficient emergency processing time for drivers or admin staffs if any parameter anomaly is predicted.
Fig. 1. The monitoring and management mechanism of SMC-EV.
It can also obtain motion information of the monitored vehicles and the states of critical components through the vehicle-to-platform communication. Data of the studied vehicle in this work was collected from SMC-EV with a sampling interval 10 s. The vehicle is an electric taxi assembled with a high-energy 18650 lithium-ion battery pack, and the cell specifications are summarized in Table 1. The power battery pack consists of 100 single cells, and there are 16 temperature probes distributed inside/outside the battery box. It is well known that voltage, temperature, and SOC are the main battery parameters related closely to battery faults on a running EV. This article focuses on predicting battery voltage, probe temperature, and SOC. To simplify prediction in the preliminary analysis process, the cell voltage is represented by the voltage of Cell 1, and the probe temperature is expressed by the temperature of probe 1. 2.2. Data preprocessing For more comprehensive analysis, a Weather-Vehicle-Driver analysis method is developed in this work, through which the meteorological factors, vehicle factors, and drivers’ behaviors can be simultaneously taken into account for multi-parameter prediction of the battery system. Historical weather data that may affect vehicular operation is exported from weather site https://www.wunderground.com. Six weather parameters are added for analysis as humidity, precipitation, barometric pressure, air temperature, visibility, and wind speed. The flow chart of preprocessing vehicle data and meteorological data is demonstrated in Fig. 2. The sampling interval of the original weather data is three hours, which is not consistent with the vehicle’s sampling interval of 10s. Therefore, the Lagrangian interpolation method is applied to interpolate the weather parameters by 10s. Take the air temperature on January 1, 2016, as an example, the air temperature curves before and after interpolation can be demonstrated in Fig. 3. Then, the interpolated weather data is merged with vehicle data according to the time alignment principle. The weather parameters usually fluctuate very slowly, and their impacts on battery performance are relatively lower than other factors, so the interpolated results can meet our research requirement to approximately represent the real weather characteristics.
1.3. Organization of the paper After a brief introduction, Section 2 preliminarily describes the collected data and preprocessing. Section 3 introduces the methodology for predicting battery parameters and related technologies. Section 4 details the manual optimisation process of hyperparameters and discussion of training results. Section 5 concludes this work. 2. Data description and preprocessing 2.1. Data description Some battery parameters will fluctuate significantly and randomly during actual vehicular operation due to variability and unpredictability of road conditions, driver’s driving behaviors, and other known/ unknown factors. Since more than a decade ago, big data platforms for EVs have been developing rapidly; meanwhile, actual running data of a large number of vehicles have also been acquired and stored. This provides a viable solution to the limitations of the existing experimentbased studies on prediction and safety management of battery parameters. In this context, the Service and Management Center for electric vehicles (SMC-EV) was built in Beijing whose monitoring and management mechanism is shown in Fig. 1. The operational goal of this platform is to monitor public service vehicles in real time and give feedback to vehicles to ensure their safe and efficient operation. Its main functions include monitoring and collecting the real-time running data of EVs such as the voltage and temperature of battery systems and conducting in-depth analysis and research through big-data techniques.
Table 1 Cell specifications of the studied vehicle.
3
Vehicle Type
Pure electric car
Curb weight Wheelbase Maximum speed Rated motor power Total motor power Total motor torque Maximum driving range
1350 kg 2500 mm 125 km/h 20 kW 45 kW 144 N·m 200 km
Applied Energy 254 (2019) 113648
J. Hong, et al.
Concerning the vehicle factors that will affect the predicted parameters and vehicular operation, apart from these three predicted parameters themselves, current and remaining useful life (RUL) are also relevant parameters. However, it is difficult for precisely monitoring RUL in real time, and many existing RUL estimation algorithms are not sufficient to obtain an accurate real-time RUL evaluation [29]. It is well known that the battery’s RUL is negatively related to the vehicle mileage; in other words, the longer the driving distance, the shorter the RUL is. Therefore, the vehicle mileage will be used to represent the RUL of a battery system. Driver’s driving behavior is an inevitable common issue for every running vehicle, and it will make a significant impact on safety and durability of the power battery system [30,31]. Parameters that could reflect driver’s driving behaviors can be listed as brake pedal stroke value, vehicle speed, and acceleration. Unfortunately, the SMC-EV platform currently cannot measure and record acceleration in real time; the sampling interval 10s is also too big to estimate the acceleration by taking the derivative of vehicle speed. Hence, acceleration must be calculated or represented by another parameter. To solve this problem, we first analyze the correlation among acceleration, motor speed, rated power, wheel radius, and other parameters as follows:
Fig. 2. The flow chart of preprocessing vehicle data and meteorological data.
9550 ∗ P n
(1)
Twheel = Tmotor ∗ i
(2)
F T = wheel m R∗m
(3)
Tmotor =
a=
Fig. 3. The air temperature curves before and after Lagrangian interpolation.
So, acceleration can be calculated as follows
Fig. 4. All parameters curves of the year of 2016. 4
Applied Energy 254 (2019) 113648
J. Hong, et al.
principle, an RNN can be useful for any time series data [32]. However, the problem with a regular RNN is that, when we try to model dependencies among InPs or sequence values that are separated by a significant number of other OutPs, we will experience the vanishing gradient or exploding gradient problem [33]. The reason is that small gradients or weights are multiplied many times over the multiple time intervals, and the gradients will shrink asymptotically to zero. However, when the gradient is too small or disappears, the network cannot adjust the weight in the direction of reducing the error, so that the RNN network stops learning and can’t learn long-term dependencies. In the mid-1990s, German scholars Hochreiter et al. [34] proposed the RNN with LSTM cells to solve the problem of gradient disappearance, which enabled the RNN to learn many time steps forward (more than 1000 time steps), and opened the path to establishing long-distance causal links. The architecture of unfolded LSTM RNN is shown in Fig. 5, which can represent various nonlinear dynamic systems by mapping input sequences to output sequences. When we apply LSTM toward multiparameter prediction, a typical data set used to train the networks is given by D = {(InP1, OutP1 ∗), (InP2, OutP2 ∗),…, (InPn, OutPn ∗)} where OutPk ∗ is the actual value at time step k and InPk is the inputs also at the same time step. The network schematic of LSTM predictor is shown in Fig. 6. During training, the input, output, and forget gates allow the LSTM to forget or write new information into the memory cell. A new sequence value InPt is concatenated to the previous output from the cell OutPt − 1. This combined input is to be squashed via a tanh layer; then this input is passed through an input gate. An input gate is a sigmoid-activated tanh layer whose output is multiplied by the squashed input. These sigmoids of input gate can absorb and eliminate any element of the undesired input vectors. Values from 0 to 1 can be outputted from the sigmoid function, so values close to 0/1 can be trained out by the weights connecting the input to these nodes, and then certain input values can be blocked/released. Besides, LSTM cells have an internal variable pt which is a one-time step behind. Then, an effective recurrence layer is created by adding pt − 1; this operation can reduce the risk of vanishing gradients. However, this recurrence loop is controlled by a forget gate. Finally, a squashing function of output layer tanh is controlled by an output gate; this gate can determine which values can be finally outputted from cell OutPt . As the built LSTM predictor for multiple parameters shows, the standard neural network layer is replaced by cell blocks. This cell consists of three components called input gate, forget gate, and the output gate. Input gate: A tanh activation function squashes the input between −1 and 1, which can be expressed as follows:
Fig. 5. Architecture of unfolded LSTM-RNN.
a=
9550 ∗ P ∗ i R∗m∗n
(4)
where Tmotor is motor torque, P is rated motor power, n is motor speed, Twheel is wheel torque, i is the transmission efficiency of motor-to-wheel, F the driving force of vehicle, m is the curb weight vehicle, and R is tire radius. In (4), all parameters are all constants except for n and a, so acceleration and motor speed are inversely proportional to each other. However, owing to the error of i and other parameters, acceleration calculation will cause an inevitable error for the result. Therefore, to avoid the errors in the calculation process, motor speed is used to represent acceleration directly in this research. According to the above Weather-Vehicle-Driver analysis, fifteen parameters are finally determined for training the LSTM model such as cell voltage, probe temperature, brake pedal stroke value, motor speed, vehicle speed, pack voltage, current, SOC, humidity, precipitation, barometric pressure, air temperature, visibility, wind speed, and Mileage. All parameters curves in the year 2016 are available and shown in Fig. 4. 3. Methodology When applying LSTM to predict battery parameters during actual vehicular operation, four concerns will be encountered. First, to converge to the correct network weights, traditional training algorithms based on batch gradient descent or stochastic gradient descent are too small, which are thus not applicable to online parameter prediction of battery systems. Second, most of the existing literature about LSTMoriented parameter prediction uses single-parameter or multi-parameters as input without prior correlation analysis so that the prediction accuracy will be limited by the characteristics of the trained parameters and no more adjustable leeway available without configuration optimization. Third, battery parameters under experimental conditions usually have constant-current and constant-voltage charging/discharging cycles, but battery parameters of real-world vehicles are diversified due to unpredictable road conditions and driver’s driving behaviors. Under this circumstance, the parameters’ fluctuations under the driving state are distinguished from those under the stable charging state. This presents a challenge to find the best full-state training parameters. Fourth, overfitting is an inevitable issue in deep neural networks like LSTM, which will lead to no-good learning outcomes. To address these challenges, the developed LSTM model in this work includes three components: a many-to-many(m-n) LSTM architecture, a network parameter optimizer using the Adam method, a DMC multiinput technique with separate charging/driving states, and a superior pre-dropout technology to prevent LSTM from overfitting.
gt = tanh (bg + InPt IWg + OutPt − 1 OWg )
(5)
where IWg and OWg respectively represent the weights for input and previous output, and bg is the input bias. Then, the squashed input is multiplied by the output elements of the
3.1. LSTM RNN LSTM network is a kind of RNN that can model time or sequencedependent behavior, such as language, stock price, and electricity demand, etc. This is performed by feeding back the output of a network layer at time t to the input of the same network layer at time t + 1. In
Fig. 6. The network schematic of LSTM predictor. 5
Applied Energy 254 (2019) 113648
J. Hong, et al.
input gate, which are a series of sigmoid-activated nodes as above discussion:
it = σ (bi + InPt IWi + OutPt − 1 OWi )
(6)
The output of input section of the LSTM cell can be then given by:
g ∘i
(7)
where the operator ∘ means the element-wise multiplication. Forget gate and state loop: they decide what information is going to be discarded; this should be the first step of LSTM. This decision is made by a forget gate. It takes inputs of InPt and OutPt − 1, and a number between 0 and 1 will be outputted. An output value 1 indicates a full retention state and 0 means a full discard state. For multi-parameter prediction of a battery system, the discarded information can be the outliers, noises, or irrelevant parameters. The forget gate can be calculated as follows
ft = σ (bf + InPt IWf + OutPt − 1 OWf )
Fig. 8. Neural network models with and without dropout: (a) standard neural network with 2 hidden layers; (b) neural network after applying dropout; (c) neural network after applying pre-dropout.
process may contain essential neurons. So, with this dropout method, each thinned sparse network with less training will become less sensitive to the specific neuron weights. This will naturally result in a weaker network for predicting target parameters. To address the above problems, we will present a pre-dropout technique in this work. Via Weather-Vehicle-Driver analysis, many parameters that have effects on multi-parameter prediction will be chosen for training the LSTM model. So, the proposed pre-dropout technology will be combined with a correlation analysis method, which will be detailed in the Section of 4.1 Correlation analysis among parameters. After correlation analysis, the parameters that are irrelevant or strongly related to the target parameters will be removed, and the most suitable parameters will be appointed as input parameters. Neural network models with and without dropouts are demonstrated in Fig. 8. As illustrated in Fig. 8(c), by applying the proposed pre-dropout technique, the favorable parameters can be effectively picked out, and the neurons and connections that are conducive to the prediction results can also be maximally preserved. Hence, on the premise of ensuring the effect of preventing neural networks from overfitting, better training and prediction effects can be obtained by using this method.
(8)
So, the output from the forget gate/state loop will be
pt = pt − 1 ∘ft + g ∘i
(9)
Output gate: What information is finally outputted is determined by the output gate, which can be implemented as follows
O = σ (bO + InPt IWO + OutPt − 1 OWO )
(10)
OutPt = tanh (pt ) ∘O
(11)
The recurrent networks allow us to manipulate over sequences of vectors, so the LSTM has five kinds of mappings types as represented in Fig. 7, such as one-to-one, many-to-one, one-to-many, many-to-many (m-n), and many-to-many(m-m). In this research, we perform multiforward-step prediction and make various window-size optimization for multi-parameter outputs, so the many-to-many(m-n) LSTM is selected to build the desired model. The specific input steps and output steps will be described explicitly in the following sections.
3.3. LSTM-oriented multi-parameter prediction
3.2. Pre-dropout to prevent LSTM from ovefitting
To build the LSTM-oriented multi-parameter prediction model, Keras, an easy-to-use and powerful neural network library, is used in this paper [38]. Keras is a high-level API developed with a focus on enabling fast experimentation. It is written in Python and capable of running on the TensorFlow [39], Computational Network Toolkit (CNTK) [40], or Theano [41] platform. Keras model construction includes five steps: define, compile, fit, evaluate, and predict. First, for defining a model, we can determine the number of layers and the number of neurons in each layer. Second, after defining the model, we need to compile the model to configure the learning process. We can specify various parameters for the compilation of the model, such as the optimizer and loss function. Third, the compiled model begins to fit, which can be simply understood as a process of determining the connection weights between neurons. Neural network training usually uses the back-propagation algorithm, so we need to specify the number of epochs during the training period and batch size (BS) (the data volume calculated each time). When this training is completed, the well-trained model and historical training description can both be saved. Fourth, when the model training is finished, the training effect needs to be evaluated for whether it has reached our expectation or not. It is generally preferred to use a portion of testing data to assess the model loss. Finally, when the model meets the requirements of performance evaluation, we can use the well-trained model to conduct prediction on the new data. Based on the above analysis, we design the LSTM-oriented multi-parameter model schematic of the battery system for EVs, as shown in Fig. 9. In addition to the modeling process mentioned above, we will also use the latest vehicular operation data to periodically update the model to ensure the prediction accuracy of the model. When Keras-LSTM configures the network learning process, the
When LSTM networks use a limited data set to optimize their network learning, gradient descent will be taken as an iterative process. Therefore, updating the network weights with a single pass or one epoch will not be enough; the weight changes as epoch increases and the predicted curves will show a trend from underfitting to optimum then to overfitting. A considerable amount of literature has been published on technologies to prevent neural networks from overfitting, among which L1 and L2 regularisation are two most universal and typical [35,36]. Nevertheless, these two methods add some extra terms to the cost functions to regularise the learning weights where L1 regularisation adds the sum of all network weights and L2 regularisation adds the sum of the weight squares. To solve this problem, Srivastava et al. [37] proposed a more efficient dropout technique which can drop neurons randomly from the networks during training. Each neuron is retained with an independent probability. All its incoming and outgoing connections will be temporarily deleted once a neuron is removed from the network. This dropout method is equivalent to a “sparse” sampling network. However, this dropout method also has a significant problem that only the randomly surviving neurons can exist in the sparse networks; those neurons that are randomly removed during the training
Fig. 7. Five mappings types of LSTM. 6
Applied Energy 254 (2019) 113648
J. Hong, et al.
θt + 1 = θt −
mt ̂ nt ̂ + ∊
∗η (16)
where mt and nt are first-order moment estimates and second-order moment estimates for the gradient, respectively; they can be considered as estimations of the expectations E|gt |, E|gt2 |;mt̂ , nt ̂ are corrections of mt and nt . So, they can be approximated as unbiased estimations of the expectations. It can be seen that this direct gradient estimation has no additional requirements for memory and can be dynamically adjusted mt̂ according to the gradient, and − forms a dynamic constraint on nt ̂+ ∊
the learning rate with a clear scope. Based upon the above analysis, Adam optimizer has the following excellent benefits: First, it is beneficial for dealing with not only sparse gradients but also non-stationary targets. Second, it has less demand for processing the memory of a computer. Besides, it can calculate different adaptive learning rates for various parameters. More importantly, this optimizer is more applicable to a large dataset and high dimensional space than other learners. A great deal of data and multiple parameters will be retrieved and used during the LSTM-oriented modeling process in this work; therefore, Adam optimizer is best suited to be used in this research.
4. Results and discussion 4.1. Correlation analysis among parameters This section is to detail how to build the LSTM model and predict future voltage, temperature, and SOC of the battery system. According to the above Weather-Vehicle-Driver analysis, fifteen parameters that contain factors of the vehicle/battery, driver’s driving behaviors and meteorological factors, are considered. However, during the training process of the LSTM model, training high-correlation parameters together with low-correlation parameters will worsen the training effect and increase the probability of overfitting. When we take all fifteen parameters as inputs to train the LSTM model, the training will end at beginning due to overfitting no matter which parameter is the output. That is due to the fact that the correlations among many of these parameters are particularly strong or irrelevant, so a correlation analysis before modeling is quite essential to pick out the most suitable input parameters and eliminate the detrimental parameters. In statistics, the Pearson correlation coefficient (PCC) is widely used in scientific research. Karl Pearson developed it from a related idea introduced by Francis Galton in the 1880s [44,45]. It has a value between 1 and −1, where 1/-1 is a total positive/negative linear correlation, and 0 represents no linear relationship. The PCCs among all considered parameters are illustrated in Fig. 10 where all PCCs have been taken with absolute values for simplified analysis. The correlation between two parameters can be judged via the
Fig. 9. The LSTM-oriented multi-parameter model schematic.
optimizer is one of the most significant components to determine the compilation effect of the built model. The frequently-used optimizers are SGD, Adagrad, Adamelta, Adam, RMSprop [42], Adamax, Nadam, etc. In the current machine learning and deep learning applications, the most used optimizer is Adam [43]. Its main advantage is that, after the offset correction, each iteration learning rate has a certain range to make the parameters relatively stable. Considerable practice has shown that Adam is more effective than other adaptive learning methods. An update of the network parameters θ (weights and biases) for Adam is as follows
mt = μ ∗ mt − 1 + (1 − μ) ∗ gt
(12)
nt = υ ∗ nt − 1 + (1 − υ) ∗ gt2
(13)
mt ̂ =
mt 1 − μt
(14)
nt ̂ =
nt 1 − υt
(15)
Fig. 10. The PCCs among all considered parameters. 7
Applied Energy 254 (2019) 113648
J. Hong, et al.
Additionally, the size of IS has a significant influence on the predicted result, and a larger IS will theoretically result in a more accurate prediction. Also, there are some other crucial neural network structural parameters, such as the number of layers, the number of neurons in each layer, and so on. Therefore, based on the above analysis and our research objectives, a set of hyperparameters are initially determined as: IS = one-month data, WS = 360 (one-hour data), PWS = 6 (oneminute data), sliding window size(SWS)=60 (10 min data). The applied LSTM network in this paper is constructed with a sequence of 6 layers: an input layer, 3 LSTM hidden layers with 100 neurons per layer, a dense hidden layer with linear activation, and an output layer. Initial epoch = 50 and BS = 128.
Fig. 11. The PCCs among all considered parameters.
absolute value of PCC between them: the interval 0.8–1.0 means extremely strong correlation, 0.6–0.8 means strong correlation, 0.4–0.6 means moderate correlation, 0.2–0.4 means weak correlation, and 0–0.2 represents very weak correlation or irrelevant. The PCCs among all considered parameters are exhibited in Fig. 11. In the case of eliminating irrelevant and extremely strong correlated parameters, the parameters that satisfy the training requirements can be obtained. It appears from Fig. 11 that, matching parameters with cell voltage are brake pedal stroke value, current, motor speed, SOC, and vehicle speed; matching parameters with SOC are pack voltage and cell voltage; no parameter matches probe temperature. Finally, after the correlation among all existing matchable parameters is analyzed, only five input parameters are available for training the LSTM model, and they are cell voltage, probe temperature, SOC, brake pedal stroke value, and vehicle speed.
4.3. Training results and discussion In this section, to validate the predictive effects of the proposed approach, comparisons of different model parameters are conducted. Python 3.6 is used to program for training the well-programmed LSTM model. The simulation is implemented on a ThinkPad S5 laptop equipped with an Intel(R) Core(TM) i7-6700HQ CPU, 20 GB RAM, and a 4 GHz discrete graphics card. 4.3.1. Determination of verification method In machine learning algorithms, the original dataset is usually divided into three parts: training data, validation data, and testing data. The training data’s role is to calculate gradient update weights; the validation data is used to determine some hyperparameters during the training process to avoid overfitting (such as deciding the epoch numbers according to validation loss); the testing data gives accuracy to evaluate the training quality. Due to the sensitivity of the initial test conditions during the training process, the training results are usually good for the training data, but the fitting degree of the testing data outside the training data is often not so satisfied. Therefore, we typically do not train the entire IS but separate a part of it not participating in the training. The partially separated data is the validation data that is used to test the parameters generated by the training data objectively. One-month vehicular operation data in February 2016 is determined as the initial IS. Because the testing data does not participate in the network training process at all, in order to compare the prediction effects of different ISs (one-year data of year 2016, one-month data of year 2016, and one-week data of year 2016) in the following section of comparative analysis, one-day data on February 8, 2017 is taken as the testing data. To verify the prediction effect, “cross-validation” is one of the most commonly used verification methods in modeling applications. Crossvalidation, sometimes called “rotation estimation”, is a practical way to statistically cut data samples into smaller subsets, which was proposed by Seymour Geisser in the year 1993 [46]. Common forms of crossvalidation are Holdout verification, K-fold cross-validation, and retention verification. Among these validation methods, K-fold cross-validation is the most widely used. This method repeatedly uses randomly generated subsamples for training and verification, and the result of each training will be verified once. To ensure the effectiveness and stability of the LSTM model, 10-fold cross-validation is performed in this paper, whose schematic is indicated in Fig. 13; training data and validation data occupy 90% and 10% of the entire IS, respectively. Besides, mean relative error (MRE) is to be used for evaluating the prediction result on the testing data after cross-validation, and the average MRE of the ten cross-validations is taken as the final MRE for each set of hyperparameters.
4.2. Selection and debugging of hyperparameters Many model parameters need to be set and optimized before training an LSTM model. Notably, a model with a large dataset will take a longer time to complete a set of training. Finding all the optimal parameters in a short time is impossible, so we should pre-set a set of hyperparameters empirically, and then gradually optimize these parameters to get better effects. Data need to be divided into smaller sizes and given to the computer one by one during the machine learning process, and the weights of the neural networks should be updated at the end of every step to fit it to the given data. Therefore, we need terminologies like epoch and BS. BS can be presented by the total number of training samples in a single batch. One epoch signifies a process that the entire dataset passes forward and backward through the LSTM neural network only once. We have been working on limited datasets, and the learning process is an iterative process of gradient descent, so it is not enough to update the weights just with a single pass or one epoch. Furthermore, window size (WS) also has a significant impact on the prediction effects because it can determine the number of training samples per input. The predicted window size (PWS) is the window size of each predicted result. To make a training process remember more information than ever, the sliding window (SW) technique is to be applied in this paper. Fig. 12 illustrates how to prepare LSTM training samples by using the SW technique. However, an LSTM network is prone to overfitting, and a great deal of redundant data will inevitably be added due to SW, so the sizes of WS, PWS, and SW should be chosen appropriately at the beginning.
4.3.2. Effect of pre-dropout method to prevent overfitting Based on the preliminarily selected hyperparameters, the prediction results of cell voltage, probe temperature, and SOC based on 10-fold cross-validation are shown in Fig. 14. The MREs of these three target parameters are 1.15%, 5.56%, and 3.55%, respectively, which
Fig. 12. Illustration of preparing LSTM training samples. 8
Applied Energy 254 (2019) 113648
J. Hong, et al.
Fig. 13. The 10-fold cross-validation schematic.
Fig. 15. Traning loss and validation loss: (a) loss of overfitting; (b) loss of the selected superparameters.
demonstrates that the predicted values of three target parameters are very close to the real values. Besides, Fig. 14(c) illustrates that the studied vehicle has performed one driving and one charging on February 8, 2017. Overfitting usually occurs when overtraining or training data is not enough. The intuitive performance of overfitting is indicated in Fig. 15(a). Typically, the complexity of the model will increase with the training process progressing. When the trained network has fitted the training data, the loss on training data will gradually decrease; but at the same time, the loss on verification data will gradually increase since the verification data outside the training data could not work with the trained network. The training result by using the proposed pre-dropout method can be demonstrated in Fig. 15(b), which presents that overfitting can be effectively prevented by using the pre-dropout technique. Besides, this technology maximally preserves the most beneficial neurons and connections that stabilize the training process by using the picked out five favorable input parameters; meanwhile, the initial number of epochs can also ensure the model loss to reach the lowest value and stay steady.
Fig. 16. Validation losses based on different numbers of hidden layers.
direction in which the gradient will fall. For example, a small dataset can use a full batch for learning, but for a large dataset such as the research object of this paper, it is not workable to load all the data at once due to the memory limitation of the laptop used in this study. Increasing the BS can expand the memory utilization, reduce the required iterations to run an epoch (full dataset), and improve the processing speed for the same data volume. When the BS increases to a specific value, there will be no significant change in the gradient direction. However, blind increasing the BS is prone to exceed the computer’s tolerance, and it will also extend the running time. Thus, we should adjust the BS according to the demand of prediction accuracy within a reasonable range. In this investigation, the BS is to be studied in the range of {8, 16, 32, 64, 128, 256, 512, 1024}. The results of cross-validation losses based on different BSs are shown in Fig. 17. The MREs on the testing data are calculated based on different BSs to verify the prediction accuracy of the three target parameters, which are indicated in Fig. 18. Fig. 17 illustrates that the convergence speed of validation loss is the fastest when BS = 16, and it has been stabilized substantially when epoch = 10. The validation loss increases as the BS increases, and the convergence speed becomes slower and the oscillation of the loss curves become much more dramatic. Besides, Fig. 18 shows that the best testing performance with the lowest MRE also appears when BS = 16. Therefore, BS is determined to be 16.
4.3.3. Determination of parameters of the LSTM network As addressed above, the developed LSTM network consists of three hidden layers, a dense hidden layer, an input layer, and an output layer. In general, the hidden layers is dedicated to modeling the relationship between the past time series and future time series; they can also make the LSTM network build more complex models than those networks without hidden layers. The dense hidden layer can change the output vector dimensions of the previous layer, and map these outputs into the final predicted time sequence. To determine the optimal number of hidden layers, 10-fold cross-validation is performed on the IS, and the average validation losses are calculated for different numbers of hidden layers. The results are plotted in Fig. 16, and the lowest average validation loss can be available when there are three hidden layers. The results first demonstrate that Adam optimizer used in this research can effectively make the LSTM model well converge. More hidden layers can present the LSTM network architecture with deeper levels and enable the LSTM model to learn more complex time series relationships. Meanwhile, with more hidden layers, more accurate predicted results can be generated. For the sake of superior prediction accuracy, three hidden layers are to be adopted in this work. BS is a vital model parameter involving many contradictions, and it will be distinguished for different datasets. The BS first determines the
Fig. 14. The prediction results of LSTM-oriented multi-parameter based on the preliminarily selected hyperparameters: (a) prediction results of cell voltage; (b) prediction results of probe temperature; (c) prediction results of SOC. 9
Applied Energy 254 (2019) 113648
J. Hong, et al.
PWS increases. The AE curves exhibit small fluctuations up and down when the vehicle is running, but the absolutions of AEs gradually increase during charging, and this rise gets more prominent with the increase of PWS. Considering that the fluctuation characteristics of each parameter during the process of charging and driving are different, training the LSTM model in all states with the same input parameters may result in different prediction results. This will be detailed in the following section of “Determination of prediction strategies”. By applying the SW technique, the training window is iteratively trained when every SW is swiped so that the LSTM network can learn more historical information. Following the previous descriptions, different SWSs as SWS = {1, 6, 30, 60, 90, 120, 180, 360} are to be used for comparison; they respectively represent the predicted values within future 10s, 1 min, 5 min, 10 min, 15 min, 20 min, 30 min, and 1 h. When SWS = 360 = WS, it means there are no iteration and no sliding for the training window. The MREs and AEs on the testing data are computed for different SWSs, respectively (see Figs. 22 and 23). Fig. 23 shows the AE curves of three parameters have different fluctuation characteristics in the driving range and charging range as SWS increases; this finding is consistent with the analysis results of different PWSs. Fig. 22 indicates that the MRE curves of three parameters are gradually increasing as SWS increases; wherein, the change of voltage’s MREs is not obvious, but the MREs of temperature and SOC have very sharp fluctuations. This may be because the number of repeated training is too small, and the optimal repeatability prediction results have not been obtained. Fig. 22 shows that the prediction errors on the testing data increase as the SWS increases; that is, a bigger SWS will result in less iterative training and fewer available historical information. In general, the smallest SWS (SWS = 1) can acquire all available historical information. However, a smaller SWS also means a higher computer configuration and longer training time. Fig. 22 presents that the MRE when SWS = 6 is very close to that when SWS = 1, which means that the similar learning capability for our prediction purpose can be obtained when SWS = 6. Therefore, the SWS was set as SWS = 6 in this study. The prediction results of LSTM-oriented multi-parameter based on the selected SWS are depicted in Fig. 24, and the MREs of three target parameters are 1.27%, 4.94% and 2.87%, respectively.
Fig. 17. Validation losses based on different numbers of BSs.
Fig. 18. MREs based on different numbers of BSs.
4.3.4. Determination of different window sizes In this section, we will introduce the determination of different windows, such as WS, PWS, and SWS. According to the above description about the hyperparameters, WS is defined as WS = 360, which means that each prediction is to predict the parameter values of the next PWS using the previous one-hour actual operation data. PWS and SWS are to be determined by manual optimization based on 10-fold cross-validation. Following the previous descriptions, the LSTM network will carry out prediction throughout the entire degradation process with different PWSs, such as PWS={6, 30, 60, 90, 120, 150, 180, 360}. They respectively represent the predicted values within future 1 min, 5 min, 10 min, 15 min, 20 min, 25 min, 30 min, and one hour. The MREs and absolute errors (AEs) on the testing data are compared with different PWSs, respectively (see Figs. 19 and 20). Fig. 19 indicates that the MREs curves of three parameters are gradually increasing as the PWS increases, and the MREs of voltage change most gently. The results show that a longer prediction horizon will result in a greater learning difficulty and a lower prediction accuracy due to the limited WS. Generally, the smallest PWS leads to the highest prediction accuracy, however, once a battery system is predicted to be faulty via parameter abnormity, the available time for the driver to take action will be short when the PWS is too small. So, to predict battery faults for the sake of battery safety during actual vehicular operation, the prediction horizon is initially set as five minutes in this study, that can also be known as PWS = 30. The prediction results of LSTM-oriented multi-parameter based on the selected PWS are depicted in Fig. 21, and the MREs of these three target parameters are 1.15%, 7.36%, and 1.55%, respectively. Fig. 20 shows that AE curves of three parameters have different fluctuation characteristics in the driving range and charging range as
4.3.5. Comparison of different LSTM model architectures As represented in Fig. 7, LSTM has five kinds of model architectures, and the many-to-many(m-n) architecture is selected in this study. Five input parameters have been picked out via correlation analysis according to the Weather-Vehicle-Driver analysis, and three output parameters have also been determined as cell voltage, probe temperature, and SOC. So the final model structure is a many-to-many(5–3) architecture. By summarizing the above analysis results, a set of hyperparameters can be determined as IS = one-month data, epoch = 50, BS = 16, WS = 360, PWS = 30, and SWS = 6. To verify the superiority of the proposed many-to-many(5–3) LSTM model structure, we will compare it with one-to-one, many-to-one(5–1), many-to-many(5–5), and many-to-many(3–3) architectures based on the same hyperparameters. The three one-to-one LSTM models respectively take a single parameter of cell voltage, probe temperature, and SOC as input and output; that is, the LSTM model learns only the information about the target parameter itself. Then, the same five input parameters of above many-to-many(5–3) model are taken as the input parameters of many-to-one LSTM model, and cell voltage, probe temperature, and SOC are respectively taken as the single-output parameters, and three many-to-one(5–1) LSTM models are formed. Furthermore, the same five-input parameters are also taken as the input parameters and the output parameters of the many-to-many(5–5) LSTM model. Finally, without correlation analysis of various parameters, the three target parameters are synchronously taken as inputs and outputs, thus forming the many-to-many(3–3) LSTM model. MREs of these five kinds of LSTM models based on 10-fold crossvalidation on the same testing data are shown in Fig. 25. It appears
Fig. 19. MREs of different PWSs on the testing data. 10
Applied Energy 254 (2019) 113648
J. Hong, et al.
Fig. 20. AEs on the testing data of different PWSs: (a) voltage errors; (b) temperature errors; (c) SOC errors.
from Fig. 25 that the proposed many-to-many(5–3) LSTM model has the lowest MREs for the prediction on three target parameters except that the MRE of one-to-one temperature model is slightly lower. According to the correlation analysis in Fig. 10, no matchable temperature-related parameter can be available for training the proposed many-to-many (5–3) LSTM model. Besides, the probe temperature has slow self-fluctuation and will be less affected by other parameters so that the one-toone LSTM model can work better than the proposed many-to-many (5–3) LSTM model. Whatever, the proposed many-to-many(5–3) LSTM model still has the best overall effect on the synchronous prediction of the three target parameters using only one LSTM model. In the section of correlation analysis, it has been verified not feasible to take all parameters as an input due to overfitting. Besides, Fig. 25 demonstrates that one-to-one and many-to-many(3–3) LSTM models without correlation analysis also can not achieve an ideal prediction efficiency. Therefore, these results denote the proposed Weather-Vehicle-Driver analysis method, and the pre-dropout technique can provide the most suitable input parameters, leading to the best prediction effect without overfitting.
Fig. 22. MREs of different SWSs on the testing data.
that the AE curves of three target parameters show different fluctuation characteristics in the process of charging and driving. Three possible reasons are inferred to cause this result. First, the timeliness of the model deteriorates with the extension of prediction time. Second, the characteristics of battery parameters themselves are different between charging and driving states; better performance will be very stable under the steady charging state, whereas it will have dramatic fluctuation under the driving state due to certain randomness of road conditions and driver’s driving behavior, so parameters under the charging state is more predictable than those of driving state. Third, in addition to the three target parameters, brake pedal stroke value and vehicle speed are also determined as the input parameters in this paper. However, as the training indicators of the driver’s driving behavior, these two parameters work only when the vehicle is running but will always equalize to zero during charging progress. So, training the LSTM model using the same five input parameters in the charging period may lead to an adverse effect on the prediction results. However, the first two reasons are irreversible to be controlled. For the third reason, to analyze the difference between charging and driving states for the proposed LSTM model, the testing data are divided into two separate parts as charging data and driving data. Then, under the existing hyperparametric conditions, the prediction results in these two states between the proposed many-to-many(5–3) model and the many-to-many(3–3) model are compared, respectively. Based on 10fold cross-validation method, MREs of many-to-many(5–3) model and the many-to-many(3–3) tested respectively on charging data and driving data are demonstrated in Fig. 27. By comparison in Fig. 27, it can be found that these two models show different prediction effects on charging data and driving data, respectively. When the vehicle is running, the many-to-many(5–3) model’s prediction effect is much better, and its MREs are significantly smaller than those of many-to-many(3–3) model because its input parameters consider the driver’s driving
4.3.6. Determination of ISs After all the hyperparameters are determined, the prediction effects of different ISs (one-year data of the year 2016, one-month data of February 2016, and one-week data of the first week of February 2016) are compared, and the testing data is one-day data on February 8, 2017. MREs of different ISs on the testing data is indicated in Fig. 26. As Fig. 26 illustrates, there is a clear decreasing trend for the MREs of the three parameters with IS increasing. There is a slight gap between the MREs when IS = one-year data and IS = one-month data, but the MREs when IS = one-week data are much higher. As we know, one-month data of February 2016 is determined as the initial IS. In principle, a lager IS can naturally result in a model with a better prediction effect. However, the laptop used in this study has a limited memory for calculating such a large dataset within a short period. But, for a big data monitoring and management platform for EVs, more than one well-configured servers with GPU can be used for building and optimizing the ideal LSTM model. Moreover, the proposed technique in this paper can realize off-line modeling and fast online prediction. Hence, considering the timeliness and update cycle of the built model, one-year data is more appropriate to be used as the IS. 4.3.7. Determination of prediction strategies Fig. 14 indicates that the studied vehicle performers one driving cycle and one charging cycle on the testing data. Figs. 20 and 23 display
Fig. 21. The prediction results of LSTM-oriented multi-parameter based on the selected PWS: (a) prediction results of cell voltage; (b) prediction results of probe temperature; (c) prediction results of SOC. 11
Applied Energy 254 (2019) 113648
J. Hong, et al.
Fig. 23. AEs on the testing data of different SWSs: (a) voltage errors; (b) temperature errors; (c) SOC errors.
behaviors. When the vehicle is being charged, since vehicle speed and brake pedal stroke value is constant to zero, then the input parameters actually used are only three target parameters, but the five input parameters are all involved in the training of many-to-many(5–3) model, so it will adversely affect the prediction efficiency on charging data. Therefore, the many-to-many(3–3) model has slightly smaller MREs and better prediction effect on charging data than the many-tomany(5–3) model. To ensure the prediction accuracies under both the charging and driving states, we propose the novel DMC prediction strategy, which can make many-to-many(5–3) and many-to-many(3–3) models work separately during charging and driving stages. The prediction strategy can be detailed as follows: First, the vehicle state should be judged via vehicle’s status based on the battery pack current I and vehicle speed V, as shown in Table 2. When the vehicle is being charged, I>0, and when it is being driven, I<0. It appears from Table 2 that the vehicle is judged as charging state when I<0 and V = 0, and other conditions are regarded as driving state except for I = 0. Then, the prediction is performed using the many-to-many(5–3) model when the vehicle is in a complex and variable driving state, then the many-to-many(3–3) model will be switched to be used when the vehicle is in a stable charging state. Additionally, since these two models can both be well-trained offline in advance, so there will be no increase in testing time by using the DMC prediction strategy compared with using just one model. Therefore, the DMC prediction strategy can ensure the prediction accuracy to the maximum extent without affecting the prediction time. Under the existing computer configuration and selected hyperparameters, through the 10-fold cross-validation, the average time lengths required to train the LSTM model are respectively 69.38 h, 16.10 h, and 4.26 h when IS = one-year data, IS = one-month data, and IS = one-week data. It can be seen that training a required LSTM model will take a long time, but when hyperparameters are offline pre-optimized, the LSTM model needs only one training session. LSTM model is offline trained by historical data, and the training time will have no impact on the online prediction of our target parameters. The testing time on the testing data is 7s regardless of ISs, less than 10s of the sampling interval of SMC-EV. Hence, the proposed LSTM model can realize the function of synchronous online prediction for multiple battery parameters. Furthermore, in actual applications, to implement real-time prediction of battery parameters, only current one-hour data of a WS is required to predict various parameters for the next PWS, so the needed calculating time will be much less. Using the DMC prediction strategy, we retrain the LSTM models and test them with the same hyperparameters and testing data. MREs for
Fig. 25. MREs of different LSTM model architectures on the testing data.
Fig. 26. MREs of different ISs on the testing data.
Fig. 27. MREs of many-to-many(5-3) model and the many-to-many(3-3) tested in charging data and driving data.
Table 2 Vehicle status analysis based on I and V. Status of I and V
Possible vehicle status
I > 0 and V = 0 I > 0 and V > 0 I > 0 and V < 0 I < 0 and V = 0 I < 0 and V > 0 I < 0 and V < 0 I=0
Parking during driving Driving Reversing Charging Brake energy recovery Never happens The vehicle was not activated
Fig. 24. The prediction results of LSTM-oriented multi-parameter based on the selected SWS: (a) prediction results of cell voltage; (b) prediction results of probe temperature; (c) prediction results of SOC. 12
Applied Energy 254 (2019) 113648
J. Hong, et al.
Fig. 28. MREs for different PWSs when IS = one-year data and IS = one-month data using the DMC prediction strategy.
some different PWSs when IS = one-year data and IS = one-month data can be obtained by using the DMC prediction strategy, which is shown in Fig. 28. Fig. 28 indicates that values of MREs gradually decrease with the increase of PWSs, which is consistent with the conclusions of Fig. 19. When PWS = 30 by using the DMC prediction strategy, MREs when IS = one-year data for voltage, temperature, and SOC are respectively 0.7%, 2.02%, and 0.58%; MREs when IS = one-month data for these three parameters are respectively 0.75%, 2.28%, and 0.66%. These two sets of MREs are both smaller than those without using this method as shown in Fig. 26. Moreover, the prediction effects when IS = one-year data is better than those when IS = one-month data with the same PWSs, but the results are very close to each other. LSTM models when IS = one-month data and IS = one-year data have little difference in predictive ability for the testing data. The main reasons can be attributed to the ISs and the testing data: First, the difference in the data volumes between the two sets of ISs is huge, but the real characteristics of training data between them are very close, so the learning contents are also very close; Second, these two sets of ISs both contain all attributes of the selected testing data, so these two sets of models are both well predictive based on the testing data. Fig. 28 indicates that different PWSs will lead to different prediction effects for the proposed LSTM model. A smaller PWS can lead to a more accurate prediction and vice versa. For example, when IS = one-year data and PWS = 1, the MREs of the three target parameters are minimal as 0.58%, 0.82%, and 0.51%, respectively, and the prediction results of the three target parameters are indicated in Fig. 29. However, the research aims to synchronously predict multiple parameters of the battery system and serve various battery faults prognosis in the future, so the excessive pursuit of small PWS is not necessarily able to obtain the ideal result. As shown in Fig. 30, a small PWS represents a short-term prediction, which can obtain a higher prediction accuracy but a shorter available time to take urgent actions once any battery fault is predicted; a large PWS means a long-term prediction, its prediction accuracy is much lower but more time can be available to take safety measures once any battery fault is predicted. Therefore, how to choose PWS depends on the length of the target prediction duration and the requirement of prediction accuracy. In the future practical application of the proposed LSTM model. To simultaneously improve the prediction accuracy and extend the prediction duration, it is necessary to continuously enrich the IS and optimize the hyperparameters of the model as much as possible.
Fig. 30. The relationship between different PWSs and prediction accuracies.
4.3.8. Comparison of different prediction methods This paper presents a novel multi-forward-step prediction technique for multiple parameters of battery systems based on LSTM networks. The existing work that predicts future battery states relies mainly on the characteristics of current battery parameters, and few of them predict future parameters directly, such as [5,8]; a few of them estimate the current parameter states or predicts the parameter states of next step. The proposed LSTM model can realize the real-time prediction with one-PWS-ahead being predicted each time step. So, the prediction results for the proposed LSTM model when IS = one-year data and PWS = 1 are taken as the comparison object, and they are compared with other algorithms mentioned in the literature which are shown in Table 3. The comparison results in Table 3 indicate that the proposed LSTM model has a competitive prediction performance than other methods. More significantly, the proposed LSTM model using a oneyear IS can be dedicated to all-climate electric vehicles applications, and it can accurately predict multiple battery parameters synchronously without being restricted by the complex operating environment. So, this method has a broader realistic application prospect in synchronous multi-parameter prediction and safety management for the battery system of EVs. 5. Conclusions This paper has combined the LSTM method for the first time with mulita-forward-step parameter prediction for battery systems. The machine-learning algorithms have been proven to be a powerful tool for parameter prediction. Given the yearlong data acquired from the SMCEV, the LSTM networks can realize synchronous multi-parameter prediction of battery systems for all-climate and all-state application in EVs. This paper has showcased how this method can self-learn the historical data of multiple parameters simultaneously even when it is exposed to scarce datasets. The developed Weather-Vehicle-Driver analysis method allows the LSTM model to consider both the environmental factors and human interventions, and the improved pre-dropout technique also shows its capacity to prevent LSTM from overfitting by picking out the most suitable parameters before training. Additionally, the proposed DMC prediction strategy can intelligently apply two types of models in the vehicle’s driving/charging states, respectively. Meanwhile, the
Fig. 29. The prediction results of LSTM-oriented multi-parameter when IS = one-year data and PWS = 1: (a) prediction results of cell voltage; (b) prediction results of probe temperature; (c) prediction results of SOC. 13
Applied Energy 254 (2019) 113648
J. Hong, et al.
Table 3 Comparison of errors for various studies. Method
Error
Lithium battery
Test environment
Battery model based on simplified physical analysis [1] LSTM-RNN battery model [6] KLMS-X filter algorithm [2]
Voltage prediction error MRE < 5% MRE < 5% RMSE < 0.329
2.9Ah Panasonic 18650 2.9 Ah Panasonic NCR18650PF 4 Ah
−10, 0, 20, and 45 °C −20 to 25 °C An running electric bicycle
A two-step prediction approach for temperature rise [11] Kalman Filter [9]
Temperature prediction error MRE < 3.05% RMSE < 0.269
2 Ah 18650 40 Ah
20, 30, and 45 °C (ambient) 60 A Charge/Discharge
AUKF with LSSVM battery model [47] LSTM-RNN battery model [25] Fuzzy NN with genetic algorithm [48]
SOC prediction error MRE < 2% MRE < 0.6% MRE < 0.9%
70 Ah Kokam 2.9 Ah Panasonic 18650 10 Ah Lyno Power LYS347094S
25 °C (ambient) 0, 10, and 25 °C (ambient) 25 °C (ambient)
2.7Ah 18650
All-climate application
The LSTM-oriented model (predictor in this paper)
Voltage Temperature SOC
MRE < 0.58%, RMSE < 0.037 MRE < 0.82%, RMSE < 0.1513 MRE < 0.49%
comparative simulation results have also shown this strategy’s superiority of improving the prediction accuracy without extending the prediction time. Furthermore, the well-trained LSTM model based on the offline dataset can implement fast online prediction, and the training process will not interfere speed and accuracy of prediction, which guarantees the stability and robustness of the prediction effects. More importantly, based on accurate multi-parameter prediction, the proposed LSTM model is also feasible for various battery fault prognosis. Under the premise of ensured prediction accuracy, the adjustable PWSs can provide sufficient survival time for the drivers/passengers once a serious fault is predicted. More sets of data will be integrated into the IS in the future, especially the historical data with various faults of vehicles with the same model and batch, and the learning capability of the LSTM model can be gradually improved. The updated LSTM model periodically/irregularly will make the proposed technique more applicable to multi-parameter prediction and safety management of battery systems in EVs.
[11] Chen Z, Xiong R, Lu J, Li X. Temperature rise prediction of lithium-ion battery suffering external short circuit for all-climate electric vehicles application. Appl Energy 2018;213(213):375–83. [12] Liu X, Chen Z, Zhang C, Wu J. A novel temperature-compensated model for power li-ion batteries with dual-particle-filter state of charge estimation. Appl Energy 2014;123(3):263–72. [13] Dong G, Wei J, Zhang C, Chen Z. Online state of charge estimation and open circuit voltage hysteresis modeling of LiFePO4 battery using invariant imbedding method. Appl. Energy 2016;162(1):163–71. [14] Kong SN, Moo CS, Chen YP, Hsieh YC. Enhanced coulomb counting method for estimating state-of-charge and state-of-health of lithium-ion batteries. Appl Energy 2009;86(9):1506–11. [15] Hu X, Sun F, Zou Y. Estimation of state of charge of a lithium-ion battery pack for electric vehicles using an adaptive luenberger observer. Energies 2010;3(9):1586–603. [16] Kim IS. The novel state of charge estimation method for lithium battery using sliding mode observer. J Power Sources 2006;163(1):584–90. [17] Barillas JK, Li J, Günther C, Danzer MA. A comparative study and validation of state estimation algorithms for li-ion batteries in battery management systems. Appl Energy 2015;155:455–62. [18] Lin C, Mu H, Xiong R, Cao J. Multi-model probabilities based state fusion estimation method of lithium-ion battery for electric vehicles: state-of-energy. Appl Energy 2017;194:560–8. [19] Xiong R, Sun F, He H, Nguyen TD. A data-driven adaptive state of charge and power capability joint estimator of lithium-ion polymer battery used in electric vehicles. Energy 2013;63(1):295–308. [20] He H, Xiong R, Guo H. Online estimation of model parameters and state-of-charge of LiFePo4 batteries in electric vehicles. Appl Energy 2012;89(1):413–20. [21] Xiong R, Sun F, Gong X, Gao C. A data-driven based adaptive state of charge estimator of lithium-ion polymer battery used in electric vehicles. Appl Energy 2014;113(1):1421–33. [22] He H, Xiong R, Peng J. Real-time estimation of battery state-of-charge with unscented kalman filter and rtos os-ii platform. Appl Energy 2015;162. [23] Zheng LC, Li LX, Zheng YH, Wang X, Zhao JM, Chen HT. Simplified least squares support vector machines for lead-acid batteries soc estimation. Appl Mech Mater 2014;672–674:680–3. [24] Eddahech A, Briat O, Vinassa JM. Adaptive voltage estimation for EV Li-ion cell based on artificial neural networks state-of-charge meter; 2012. [25] Chemali E, Kollmeyer P, Preindl M, Ahmed R, Emadi A. Long short-term memorynetworks for accurate state of charge estimation of li-ion batteries. IEEE Trans Industr Electron 2017;PP(99):1. [26] Lv C, Xing Y, Zhang J, Na X, Li Y, Liu T, et al. Levenberg–marquardt backpropagation training of multilayer neural networks for state estimation of a safetycritical cyber-physical system. IEEE Trans Industr Inf 2018;14(8):3436–46. [27] Michel PH, Heiries V. An adaptive sigma point Kalman filter hybridized by support vector machine algorithm for battery SoC and SoH estimation; 2015. [28] Wang H, Hao Z, Hu Y, Li G. Power state prediction of battery based on BP neural network; 2012. [29] Zhang Y, Xiong R, He H, Pecht M. Lithium-ion battery remaining useful life prediction with box-cox transformation and monte carlo simulation. IEEE Trans Industr Electron 2018;PP(99):1. [30] Lv C, Hu X, Sangiovanni-Vincentelli A, Li Y, Martinez CM, Cao D. Driving-stylebased codesign optimization of an automated electric vehicle: a cyber-physical system approach. IEEE Trans Industr Electron 2019;66(4):2965–75. [31] Lv C, Xing Y, Lu C, Liu Y, Guo H, Gao H, et al. Hybrid-learning-based classification and quantitative inference of driver braking intensity of an electrified vehicle. IEEE Trans Veh Technol 2018;67(7):5718–29. [32] Sutskever I. Training recurrent neural networks. Doctoral; 2013. [33] Hochreiter S. The vanishing gradient problem during learning recurrent neural nets and problem solutions. World Scientific Publishing Co., Inc.; 1998. [34] Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput 1997;9(8):1735–80.
Acknowledgements This work is supported in part by China NSFU1564206, 61733003, and USA NSF1507096, and the Chinese Scholarship Council (CSC) [2017] 3109. References [1] Rakhmatov DN. Battery voltage prediction for portable systems; 2005. [2] Tobar F, Castro I, Silva J, Orchard M. Improving battery voltage prediction in an electric bicycle using altitude measurements and kernel adaptive filters. Pattern Recogn Lett 2017. [3] Li K, Wei F, Tseng KJ, Soong BH. A practical lithium-ion battery model for state of energy and voltage responses prediction incorporating temperature and ageing effects. IEEE Trans Industr Electron 2017;PP(99):1. [4] Karbalaei F, Shahbazi H. A quick method to solve the optimal coordinated voltage control problem based on reduction of system dimensions. Electric Power Syst Res 2017;142:310–9. [5] Wang Z, Hong J, Liu P, Zhang L. Voltage fault diagnosis and prognosis of battery systems based on entropy and z-score for electric vehicles. Appl Energy 2017;196:289–302. [6] Zhao R, Kollmeyer PJ, Lorenz RD, Jahns TM. A compact unified methodology via a recurrent neural network for accurate modeling of lithium-ion battery voltage and state-of-charge; 2017. [7] Yuksel T, Litster S, Viswanathan V, Michalek JJ. Plug-in hybrid electric vehicle LiFePO4 battery life implications of thermal management, driving conditions, and regional climate. J Power Sources 2017;338:49–64. [8] Hong J, Wang Z, Liu P. Big-data-based thermal runaway prognosis of battery systems for electric vehicles. Energies 2017;10(7):919. [9] Sun J, Wei G, Pei L, Lu R, Song K, Wu C, et al. Online internal temperature estimation for lithium-ion batteries based on kalman filter. Energies 2015;8(5):4400–15. [10] Feng X, Gooi HB, Chen SX. An improved lithium-ion battery model with temperature prediction considering entropy; 2012.
14
Applied Energy 254 (2019) 113648
J. Hong, et al.
[42] Zhang Y, Xiong R, He H, Pecht M. Long short-term memory recurrent neural network for remaining useful life prediction of lithium-ion batteries. IEEE Trans Veh Technol 2018;PP(99):1. [43] Kingma DP, Ba J. Adam: a method for stochastic optimization. Comput Sci 2014. [44] Galton F. Section h; anthropology; opening address. Nature 1885;32:507–10. [45] Pearson K. Note on regression and inheritance in the case of two parents. Proc Roy Soc Lond 2006;58:240–2. [46] Geisser S. Predictive inference: an introduction. Chapman & Hall; 1993. [47] Meng J, Luo G, Gao F. Lithium polymer battery state-of-charge estimation based on adaptive unscented kalman filter and support vector machine. IEEE Trans Power Electron 2016;31(3):2226–38. [48] Lee YS, Wang WY, Kuo TY. Soft computing for battery state-of-charge (bsoc) estimation in battery string systems. IEEE Trans Industr Electron 2008;55(1):229–39.
[35] Tikhonov AN. On the stability of inverse problems. Dolkakadnauk Sssr 1943;39(5):176–9. [36] Tibshirani R. Regression shrinkage and selection via the lasso. J Roy Stat Soc 2011;73(3):273–82. [37] Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 2014;15(1):1929–58. [38] Brownlee J. Time series prediction with lstm recurrent neural networks in python with keras. Available at: machinelearningmastery com; 2016. [39] Kapoor A, Guili A. TensorFlow 1.x Deeplearning Cookbook; 2017. [40] Seide F, Agarwal A. CNTK: Microsoft’s Open-Source Deep-Learning Toolkit; 2016. [41] Bergstra J, Bastien F, Breuleux O, Lamblin P, Pascanu R, Delalleau O, et al. Theano: deep learning on gpus with python. Nips 2011.
15