ARTICLE IN PRESS
Engineering Applications of Artificial Intelligence 19 (2006) 277–287 www.elsevier.com/locate/engappai
Prediction of automotive engine power and torque using least squares support vector machines and Bayesian inference Chi-Man Vonga,, Pak-Kin Wongb, Yi-Ping Lia a
Department of Computer and Information Science, University of Macau, P.O. Box 3001, Macau, China b Department of Electromechanical Engineering, University of Macau, P.O. Box 3001, Macau, China Received 12 November 2004; received in revised form 25 July 2005; accepted 30 August 2005 Available online 21 October 2005
Abstract Automotive engine power and torque are significantly affected with effective tune-up. Current practice of engine tune-up relies on the experience of the automotive engineer. The engine tune-up is usually done by trial-and-error method, and then the vehicle engine is run on the dynamometer to show the actual engine output power and torque. Obviously, the current practice costs a large amount of time and money, and may even fail to tune up the engine optimally because a formal power and torque model of the engine has not been determined yet. With an emerging technique, least squares support vector machines (LS-SVM), the approximated power and torque model of a vehicle engine can be determined by training the sample data acquired from the dynamometer. The number of dynamometer tests for an engine tune-up can therefore be reduced because the estimated engine power and torque functions can replace the dynamometer tests to a certain extent. Besides, Bayesian framework is also applied to infer the hyperparameters used in LS-SVM so as to eliminate the work of cross-validation, and this leads to a significant reduction in training time. In this paper, the construction, validation and accuracy of the functions are discussed. The study shows that the predicted results using the estimated model from LS-SVM are good agreement with the actual test results. To illustrate the significance of the LS-SVM methodology, the results are also compared with that regressed using a multilayer feed forward neural networks. r 2005 Elsevier Ltd. All rights reserved. Keywords: Automotive engine power and torque; Least squares support vector machines; Bayesian inference
1. Introduction 1.1. ECU tune-up Modern automotive gasoline engines are controlled by the electronic control unit (ECU). The engine output power and torque are significantly affected by the setup of control parameters in the ECU. Many parameters are stored in the ECU using a look-up table or map. Normally, the data of a car engine and torque is obtained through dynamometer tests. An example of performance data of an engine output horsepower and torque against speeds is shown in Fig. 1. The engine power and torque reflect the dynamic performance of an engine. Traditionally, the setup of ECU is done by the vehicle manufacturer. However, in Corresponding author. Tel.: +86 853 3974476.
E-mail address:
[email protected] (C.-M. Vong). 0952-1976/$ - see front matter r 2005 Elsevier Ltd. All rights reserved. doi:10.1016/j.engappai.2005.09.001
recent, the programmable ECU and ECU read only memory (ROM) editors have been widely adopted by many passenger cars. These devices allow the non-OEM’s engineers to tune up their engines according to different add-on components and driver’s requirements. Current practice of engine tune-up relies on the experience of the automotive engineer who handles a huge number of combinations of engine control parameters. The relationship between the input and output parameters of a modern car engine is a complex multivariable nonlinear model, which is very difficult to be estimated, because modern automotive engine is an integration of thermofluid, electromechanical and computer control systems. Consequently, engine tune-up is usually done by trial-anderror method. Firstly, the engineer guesses an ECU setup based on his/ her experience and then stores the setup values in the ECU. Then, the engine is run on a dynamometer to test the actual
ARTICLE IN PRESS C.-M. Vong et al. / Engineering Applications of Artificial Intelligence 19 (2006) 277–287
278 140
1.3. Nonlinear regression and its drawbacks
140 Torque 120
120 HP
100
80
80
60
60
40
40
20
20
0 2000
2500
3000
3500 RPM
4000
4500
Torque
Horsepower
100
0 5000
Fig. 1. Example of engine output horsepower and torque curves.
engine power and torque. If the performance is loss, the engineer adjusts the ECU setting and repeats the procedure until the performance is satisfactory. That is why vehicle manufacturers normally spend many months to tune-up an ECU optimally for a new car model. Moreover, the power and torque functions are engine dependent as well. Every engine requires doing the similar tune-up procedure. By knowing the power and torque functions, the automotive engineers can predict if a trial ECU setup is gain or loss. The car engine only requires going through a dynamometer test for verification after estimating a satisfactory setup from the functions. Hence, the number of unnecessary dynamometer tests for the trail setup can be drastically reduced so as to save a large amount of time and money for testing. 1.2. Neural networks and its drawbacks Recent researches (Brace, 1998; Traver et al., 1999; Su et al., 2002; Yan et al., 2003; Liu and Fei, 2004) have described the use of neural networks for modeling the diesel engine emission performance based on experimental data. It is well known that a neural network (Bishop, 1995; Haykin, 1999) is a universal estimator. It has in general, however, two main drawbacks for its learning process (Smola et al., 1996; Scho¨lkopf and Smola, 2002): (1) The architecture, including the number of hidden neurons, has to be determined a priori or modified while training by heuristic, which results in a nonnecessarily optimal network structure. (2) The training process (i.e., the minimization of the residual squared error cost function) in neural networks can easily be stuck by local minima. Various ways of preventing local minima, like early stopping, weight decay, etc., are employed. However, those methods greatly affect the generalization of the estimated function, i.e., the capacity of handling new input cases.
Traditional mathematical methods for nonlinear regression (Sen and Srivastava, 1990; Ryan, 1996; Harrell, 2001; Tabachnick and Fidell, 2001; Seber and Wild, 2003) may be applied to estimate the engine power and torque models. It works by transforming the nonlinear data space into linear data space, i.e., removing the nonlinearity, and then performs linear regression over the transformed data space. The drawbacks of nonlinear regression methods are: (1) These nonlinear transformations are not guaranteed to retain the information of the transformed data. Usually after the transformation, the training data would be distorted and hence affecting the predictability of the regressed model from the transformed training data. (2) These nonlinear transformations can work well only for low-dimensional data set. In the current application, an engine setup involves many parameters. Constructing the prediction models in such a high-dimensional and nonlinear data space is very difficult for traditional regression methods. So, it is not recommended to apply the traditional nonlinear regression methods for highdimensional data set.
1.4. Support vector machines (SVM) for engine output prediction subject to ECU tune-up With an emerging technique of support vector machines (SVM) (Cristianini and Shawe-Taylor, 2000; Scho¨lkopf and Smola, 2002; Suykens et al., 2002) combining the advantages of neural networks (handling large amount of highly nonlinear data) and nonlinear regression (high generalization), the issues of high dimensionality as well as the previous drawbacks from neural networks are overcome. Because of the above reason, SVM is employed to estimate the engine power and torque models that can be used for precision performance prediction, so that the number of dynamometer tests can be significantly reduced, because the dynamometer tests normally cost a large amount of money and time. Moreover, dynamometer is not always available, particular in the case of on-road fine tune-up. Research on the prediction of modern gasoline engine output power and torque subject to various parameter setups in the ECU are still quite rare, so the use of SVM for modeling of engine output power and torque is the first attempt. 2. Support vector machines SVM is an interdisciplinary field of machine learning, optimization, statistical learning and generalization theory. It is also another category of feed-forward networks as illustrated in Fig. 2 (Haykin, 1999). Basically, it can be used for pattern classification and function estimation (Gunn, 1998). Since the paper focuses on function estimation, the
ARTICLE IN PRESS C.-M. Vong et al. / Engineering Applications of Artificial Intelligence 19 (2006) 277–287
279
Bias b
x1
K(x, x1)
x2
K(x, x2)
y Input vector x
.. .
Output neuron
.. . Linear outputs
xm
K(x, xm )
0
1
Input layer of size m0
Hidden layer of m1 Inner-product kernels
Fig. 2. Neural networks interpretation of SVM (Haykin, 1999).
discussion is only related to function estimation issues. For pattern classification, (Gunn, 1998; Haykin, 1999; Smola et al., 1996) provide valuable reference. SVM is a very nice framework or methodology to formulate the mathematical program for the training error function used in any application. No matter which application, SVM formulates the training process (i.e., minimization of squared residual error function) as a quadratic programming (QP) problem for the weights with regularization factor included. Since QP problem is a convex function, the solution returned is global (or even unique) instead of many local ones, unlike neural networks. This result ensures the high generalization of the trained SVM models over neural networks. Another important appeal of SVM over other traditional regression methods is its ability to handle very high nonlinearity. Similar to nonlinear regression, SVM transforms the low dimensional nonlinear input data space into high-dimensional linear feature space through a nonlinear mapping j : Rn ! Rnh (Fig. 3), n is the dimension of data space, and nh is the (very high and even infinite) dimension of the unknown feature space. Then linear function estimation over the feature space can be performed. The problem now turns to find out this nonlinear mapping j for its primal formulation (Fig. 4). Nevertheless, SVM dual formulation (Fig. 4) provides an inner-product kernel trick, Kðxk ; xl Þ ¼ jðxk ÞT jðxl Þ, which totally eliminates the effort of finding the nonlinear mapping j in the primal formulation as necessary in traditional nonlinear regression methods. This trick is also illustrated in Section 2.4 (Eqs. (6) and (7)) So, the kernel function K in dual formulation is to be defined rather than the nonlinear mapping j. Fortunately, three common kernel functions
Fig. 3. Nonlinear mapping from nonlinear data space to high dimensional linear feature space (Suykens et al., 2002).
(Cristianini and Shawe-Taylor, 2000; Scho¨lkopf and Smola, 2002; Suykens et al., 2002) are available, and among which, radial basis function (RBF) kernel is the best for nonlinear function estimation.
2.1. SVM formulation for nonlinear function estimation Consider the data set, D ¼ fðx1 ; y1 Þ; . . . ; ðxN ; yN Þg, with N data points where xi 2 Rn , y 2 R. SVM dual formulation for nonlinear regression is expressed as the following equation (Cristianini and Shawe-Taylor, 2000; Scho¨lkopf and Smola, 2002; Suykens et al., 2002): Min Jða; a Þ ¼ a;a
N X N 1X ðai ai Þðaj aj ÞKðxi ; xj Þ 2 i¼1 j¼1
þe
N N X X ðai þ ai Þ yi ðai ai Þ i¼1
s:t:
N X
ðai ai Þ ¼ 0,
i¼1
ð1Þ
i¼1
where a; a : Lagrangian multipliers (each multiplier is expressed as an N-dimensional vector) ai ; aj 2 a; ani ; anj 2 a ; for 1pi; jpN and ai ; aj ; ai ; aj 2 ½0; c K: kernel function, e: user pre-defined regularization constant, c: the user pre-defined positive real constant for capacity control.
ARTICLE IN PRESS C.-M. Vong et al. / Engineering Applications of Artificial Intelligence 19 (2006) 277–287
280
Primal problem P Parameter to estimate : w ∈R y(x) = sign [wT(x) + b]
nh
(x) 1
, nh is the number of hidden neurons. w1 y (x)
x
wnh
nh(x)
Kernel trick T K (xk , xl ) = (xk ) (xl )
xk and xl are the k th and l th training data points
K(x, x1)
Dual problem D Parameter to estimate : α ∈R N, N is number of hidden neruon in the feature space.
1
# sv
y(x) = sign[∑k yk K(x , xk )+ b],
y (x)
k =1
x
# sv = number of support vectors < N.
# sv
K(x , x #sv) Fig. 4. Primal-dual neural network interpretations of SVM (Suykens et al., 2002).
Nonzero ai and ai* are known as support values corresponding to the ith data point, where ith data point means the ith engine setup and output torque. Besides, RBF with user pre-defined sample variance s2 is chosen as the kernel function because it often has a good result for nonlinear regression (Suykens et al., 2002; Seeger, 2004). After solving Eq. (1) with a commercial optimization package, such as MATLAB and its optimization toolbox, two N-vectors a; a are obtained to be the solutions, resulting in the following target nonlinear model: MðxÞ ¼
N X
where b is the bias constant, x the new engine input setup with n parameters and s2 the user-specified sample variance. In order to obtain the bias b, m training data points d k ¼ hxk ; yk i 2 D, k ¼ 1; 2; . . . ; m, are selected, such that their corresponding ak and ak 2 ð0; cÞ, i.e., 0oak , ak oc. By substituting xk into Eq. (2) and setting Mðxk Þ ¼ yk , a bias bk can be obtained. Since there are m biases, the optimal bias value b* is usually obtained by taking the average of bk as shown in Eq. (3):
ðai ai ÞKðx; xi Þ þ b
i¼1
¼
N X i¼1
ðai ai Þ eðjjxxi jj
2
Þ=s2
þ b;
ð2Þ
b ¼
m 1X bk . k k¼1
(3)
ARTICLE IN PRESS C.-M. Vong et al. / Engineering Applications of Artificial Intelligence 19 (2006) 277–287
2.2. Comparing SVM with neural networks and nonlinear regression Since SVM combines the advantages of neural networks and nonlinear regression, they are compared and the respective advantages are listed out. 2.2.1. SVM vs. neural networks Both SVM and neural networks can handle highly nonlinear function estimation. However, they have the following difference: (1) The architecture of the SVM models has not to be determined before training. Input data of any arbitrary dimensionality can be treated with only linear cost in the number of input dimensions. In addition, number of support vectors (i.e., hidden neurons in neural networks) is not necessary to be specified a priori. (2) SVM treats function estimation as a QP problem of minimizing the data fitting error function plus regularization, which produces a global (or even unique) solution having minimal fitting error, while high generalization of the estimated model can also be obtained. For neural networks, its formulation for training error function usually leads to many local solutions. 2.2.2. SVM vs. nonlinear regression Both SVM and nonlinear regression can estimate nonlinear function with high generalization. However, they have the following difference: (1) Inner product kernel trick Kðxk ; xl Þ ¼ jðxk ÞT jðxl Þ is applied in SVM dual formulation so that the nonlinear mapping j can be ignored, whereas this difficult nonlinear mapping must be explicitly specified or guessed in traditional nonlinear regression methods. This is illustrated in Section 2.4 (Eqs. (6) and (7)). This factor usually prevents nonlinear regression from handling high nonlinearity. (2) An interesting property for SVM is called sparseness. This is the result of solving the QP formulation in SVM. In solving Eq. (1), most of the support values ai and ai* are set to zero. Hence, only those data points xk related to non-zero support values ak and a*k are involved in computing Eq. (2). These data points xk are called support vectors. The number of these support vectors (i.e., #sv) is determined during training time. That explains why SVM does not require to specify the number of support vectors a priori. Then the estimated model in Eq. (2) involves a set of m5N support vectors. This makes the estimated model very compact and efficient in run time. Under this circumstance, SVM can handle much larger amount of training data (up to millions) while nonlinear regression can usually handle up to hundreds of training data.
281
2.3. Least squares support vector machines Least squares support vector machines (LS-SVM) (Suykens et al., 2002) is a variant of SVM, which employs least squares error in the training error function. SVM solves nonlinear function estimation problems by means of convex quadratic programs and the sparseness is obtained as a result of this QP problem. However, QP problems are inherently difficult to be solved. Although many commercial packages exist in the world for solving QP problems, it is still preferred to have a simpler formulation. LS-SVM is the variant that modifies the original SVM formulation, leading to solving a set of linear equations that is easier to use/solve than QP problems, while most of the important advantages of SVM are retained. In addition, the advantages of LS-SVM over standard SVM are: (1) The threshold b is returned automatically as part of the LS-SVM solution whereas SVM must calculate the threshold b separately. (2) The hyperparameters for tuning is reduced from three (e, c, s) into two (g, s). (3) Bayesian inference procedure has been developed to automatically find out the most appropriate values for hyperparameters g and s, which eliminates the burden of manual cross-validation procedure to estimate the values of e, c and s.
2.4. LS-SVM formulation for nonlinear function estimation Consider the data set D ¼ fðx1 ; y1 Þ; . . . ; ðxN ; yN Þg, with N data points where xk 2 Rn , y 2 R, k ¼ 1 to N. LS-SVM deals with the following optimization problem in the primal weight space 2 3 N 1 T 1X 2 e 6 min J P ðw; eÞ ¼ w w þ g 7 6 w;b;e 7, 2 2 k¼1 k (4) 4 5 T s:t: ek ¼ yk ½w jðxk Þ þ b; k ¼ 1; . . . ; N where w 2 Rnh is the weight vector of the target function, e ¼ ½e1 ; . . . ; eN is the residual vector, and j : Rn ! Rnh is a nonlinear mapping, n is the dimension of xk, and nh is the dimension of the unknown feature space. Solving the dual of Eq. (4) can avoid the high (and unknown) dimensionality of w. The LS-SVM dual formulation of nonlinear function estimation is then expressed as follows (Suykens, et al., 2002): 2 3 Solve in a; b : # " # 7 6" T 0 7, b 6 0 1v (5) 4 5 ¼ y 1v X þ ð1=gÞIN a where IN is an N-dimensional identity matrix, y ¼ [y1,y,yN]T, 1v is an (N1)-dimensional vector ¼ [1,y,1]T, a ¼ [a1,..., aN]T, and g 2 R is a scalar for regularization
ARTICLE IN PRESS C.-M. Vong et al. / Engineering Applications of Artificial Intelligence 19 (2006) 277–287
282
(which is a hyperparameter for tuning). The kernel trick is employed as follows: Xk;l ¼ jðxk ÞT jðxl Þ ¼ Kðxk ; xl Þ; k; l ¼ 1; . . . ; N,
ð6Þ
where K is a predefined kernel function. The resulting LS-SVM model for function estimation becomes MðxÞ ¼
N X
ak jðxk ÞT jðxÞ þ b
k¼1
¼
N X
ak Kðxk ; xÞ þ b
k¼1
N X
jjxk xjj2 ak exp ¼ s2 k¼1
þ b;
ð7Þ
where ak, b 2 R are the solutions of Eq. (5), xk is training data, x is the new input case, and RBF is chosen as the kernel function K. From the viewpoint of the current application, some parameters in Eqs. (5) and (6) are specified as: N xk
yk
total number of engine setups (data points), engine input control parameters in the kth sample data point k ¼ 1; 2; . . . ; N (i.e. the kth engine setup), engine output torque in the kth sample data point.
3. Application of LS-SVM to gasoline engine modeling In current application, M(x) in Eq. (7) is the torque function of an automotive engine. The power of the engine is calculated based on the engine torque as discussed in Section 4. The issues of LS-SVM for this application domain are discussed in the following sub-sections. 3.1. Schema The training data set is expressed as D ¼ fd k g ¼ fðxk ; yk Þg, k ¼ 1 to N. Practically, there are many input control parameters and they are also ECU and engine dependent. Moreover, the engine power and torque curves are normally obtained at full-load condition. For the demonstration purpose of the LS-SVM methodology, the
following common adjustable engine parameters and environmental parameter are selected to be the input (i.e., engine setup) at engine full-load condition: x ¼ hI r ; O; tr ; f ; J r ; d; a; pi
and
y ¼ hT r i,
where r is the engine speed (RPM) and rA{1000, 1500, 2000, 2500,y,8000}, Ir the ignition spark advance at the corresponding engine speed r (degree before top dead centre), O the overall ignition trim (7degree before top dead center), tr the fuel injection time at the corresponding engine speed r (ms), f the overall fuel trim (7%), Jr the timing for stopping the fuel injection at the corresponding engine speed r (degree before top dead center), d the ignition dwell time at 15 V (ms), a the air temperature (1C), p the fuel pressure (bar) and Tr the engine torque at the corresponding engine speed r (kg m). The engine speed range for this project has been selected from 1000 to 8000 rpm. Although the engine speed r is a continuous variable, in practical ECU setup, the engineer normally fills the setup parameters for each category of engine speed in a map format. The map is usually divided the speed range discretely with interval 500, i.e. rA{1000, 1500, 2000, 2500y}. Therefore, it is unnecessary to build a function across all speeds. So, r is manually divided with a specified interval of 500 instead of any integer ranging from 0 to 8000. As the training data is engine speed dependent, another notation Dr is used to further specify a data set containing the data with respect to a specific r. For example, D1000 contains the following parameters: /I1000, O, t1000, f, J1000, d, a, p, T1000S, while D8000 contains /I8000, O, t8000, f, J8000, d, a, p, T8000S. Consequently, D is separated into fifteen subsets namely D1000, D1500,y,D8000. An example of the training data (ECU setup) for D1000 is shown in Table 1. For a subset Dr, it is passed to the LS-SVM regression module, Eq. (5), in order to construct the torque function Mr (Eq. (7)) with respective to engine speed r. According to the division of training data, there are totally 15 torque functions, i.e., Mr ¼ {M1000, M1500,y,M8000}. In this way, the LS-SVM module is run for fifteen times. In every run, a different subset Dr is used as training set to estimate its corresponding torque function. An engine torque against engine speed curve is therefore obtained by fitting a curve that passes through all data points generated by M1000, M1500, M2000,y,M8000.
Table 1 Example of training data di in data set D1000
d1 d2 ^ dN
I1000
O
t1000
f
J1000
d
a
p
T1000
8 10 ^ 12
0 2 ^ 0
7.1 6.5 ^ 7.5
0 0 ^ 3
385 360 ^ 360
3 3 ^ 2.7
25 25 ^ 30
2.8 2.8 ^ 2.8
20 11 ^ 12
ARTICLE IN PRESS C.-M. Vong et al. / Engineering Applications of Artificial Intelligence 19 (2006) 277–287
4. Data sampling and implementation In practical engine setup, the automotive engineer determines an initial setup, which can basically start the engine, and then the engine is fine-tuned by adjusting the parameters about the initial setup values. Therefore, the input parameters are sampled based on the data points about an initial setup supplied by the engine manufacturer. In the experiment, a sample data set D of 200 different engine setups along with output torque has been acquired from a Honda B16A DOHC engine controlled by a programmable ECU, MoTeC M4, running on a chassis dynamometer (Fig. 5) at wide open throttle. The engine output data is only the torque against the engine speeds because the horsepower HP of an engine is calculated using 2p r 9:81 T ; (8) 746 60 where HP is the engine horsepower (Hp), r the engine speed (RPM: revolution per minute) and T the engine torque (kg m). After collection of sample data set D, for every data subset Dr D, it is randomly divided into two sets: TRAINr for training and TESTr for testing, such that Dr ¼ TRAINr[TESTr, where TRAINr contains 80% of Dr and TESTr holds the remaining 20% (Fig. 6). Then every TRAINr is sent to the LS-SVM module for training, which has been implemented using LS-SVMlab (Pelckmans et al., 2003), a MATLAB toolbox under MS Windows XP. The implementation and other important issues are discussed in following sub-sections. HP ¼
283
1999). This prevents any parameter from domination to the output value. For all input and output values, it is necessary to be normalized within the range [0,1], i.e., unit variance, through the following transformation formula: v vmin NðvÞ ¼ v ¼ , (9) vmax vmin where vmin and vmax are the minimum and maximum domain values of the input or output parameter v, respectively. For example, vA[7, 39], vmin ¼ 7 and vmax ¼ 39. The limits for each input and output parameter of an engine should be predetermined via a number of experiments or expert knowledge or manufacturer data sheets. As all input values are normalized, the output torque value v* produced by the LS-SVM is not the actual value. It must be de-transformed using the inverse N1 of Eq. (9) in order to obtain the actual output value v. 4.2. Error function To verify the accuracy of each function of Mr, an error function has been established. For a certain function Mr,
4.1. Data pre-processing and post-processing In order to have a more accurate regression result, the data set is conventionally normalized before training (Pyle,
Fig. 6. Further division of data randomly into training sets (TRAINr) and test sets (TESTr).
Fig. 5. Car engine performance data acquisition on a chassis dynamometer.
ARTICLE IN PRESS 284
C.-M. Vong et al. / Engineering Applications of Artificial Intelligence 19 (2006) 277–287
the corresponding validation error is vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u N u 1 X y M r ðxk Þ2 k , Er ¼ t N k¼1 yk
(10)
where xkARn is the engine input parameters of kth data point in a test set or a validation set, yk is the true torque value in the data point dk (d k ¼ ðxk ; yk Þ represents the kth data point) and N is the number of data points in the test set or validation set. The error Er is a root-mean-square of the difference between the true torque value yk of a test point dk and its corresponding estimated torque value Mr(xk). The difference is also divided by the true torque yk, so that the result is normalized within the range [0,1]. It can ensure the error Er also lies in that range. Hence, the accuracy rate for each torque function of Mr is calculated using the following formula: Accuracyr ¼ ð1 E r Þ 100%.
(11)
4.3. Procedures for tuning hyperparameters According to Eqs. (5) and (7), it can be noted that the user has to adjust two hyperparameters (g, s), where g is the regularization factor and s specifies the kernel sample variance. Without knowing the best values for these hyperparameters, all estimated engine torque functions couldn’t achieve high generalization. In order to select the best values for these hyperparameters, 10-fold cross validation is usually applied but it takes a very long time. Recently, there is a more sophisticated Bayesian framework (Suykens et al., 2002; Van Gestel et al., 2001a, b) that can infer the hyperparameter values for LS-SVM. Given a set of training examples, Bayesian inference is a very robust framework to compute the distribution of the estimated model parameters based on the training examples. Based on the distribution of the model parameters computed, the optimal model parameters values can be predicted. As the theory using Bayesian inference to predict the hyperparameters g and s is out of the scope of this research, it is not discussed in detail. The basic idea of the hyperparameters inference procedure using Bayesian framework (Mackay, 1995; Seeger, 2004; Van Gestel et al, 2001a, b) is based on a modified version of LS-SVM program in Eq. (12), where m is now the regularization factor instead of g, and z is the variance of the noise for residual ek (assuming constant variance): 2 3 min J P ðw; eÞ ¼ mE w þ zE D w;b;e 4 5 (12) s: t: ek ¼ yk ½wT jðxk Þ þ b; k ¼ 1; . . . ; N with 1 E w ¼ wT w, 2 N N 1X 1X e2k ¼ ðy ½wT jðxk Þ þ bÞ2 , ED ¼ 2 k¼1 2 k¼1 k
ð13Þ
whose dual program is the same as Eq. (5), where w 2 Rnh is the weight vector of the target function and e ¼ [e1,y,eN] is the residual vector. The relationship of g with m and z is g ¼ z=m. It should be noted that after substituting Eq. (13) and the relationship of g into Eq. (12), it directly becomes Eq. (4). Fig. 7 briefly illustrates the algorithm for Bayesian inference for these two hyperparameters based on a certain data set TRAINr, and this figure is drawn by referring to (Van Gestel et al., 2001a). Although the inference procedure is theoretically very complicated, (Pelckmans et al., 2003) has provided a MATLAB/C toolbox to handle this inference procedure. 4.4. Training The training data is firstly preprocessed using Eq. (9). Then the hyperparameters (g, s) ,as shown in Eqs. (5) and (7), for the target torque functions are inferred at this point. Since there are 15 target torque functions, then fifteen individual sets of hyperparameters (gr, sr) are inferred with respect to r. The detailed inference procedure for a certain training data set TRAINr is listed in Fig. 7. After obtaining the 15 pairs of inferred hyperparameters (gMP,r , sMP,r), where the subscript MP stands for maximum posterior, the training data set TRAINr is used for calculating the support values a and threshold b in Eq. (5). Finally, the target function Mr can be constructed using Eq. (7). 5. Results To illustrate the advantages of LS-SVM regression, the results are compared with that obtained from training a multilayer feedforward neural network (MFN) with backpropagation. Since MFN is similar to SVM and LS-SVM and it is also a well-known universal estimator, the results from MFN can be considered as a rather standard benchmark. 5.1. LS-SVM results After obtaining all torque functions for an engine, their accuracies are evaluated one by one against their own test sets TESTr using Eqs. (7) and (8). According to the accuracy shown in Table 2, the predicted results are in good agreement with the actual test results under their hyperparameters (gMP,r, sMP,r) inferred using the procedure described in Fig. 7. However, it is believed that the function accuracy could be improved by increasing the number of training data. An example of comparison between the predicted and actual engine torque and horsepower under the same ECU configuration is shown in Fig. 8.
ARTICLE IN PRESS C.-M. Vong et al. / Engineering Applications of Artificial Intelligence 19 (2006) 277–287
285
Fig. 7. Inference procedure for hyperparameters (g, s) (Van Gestel et al., 2001a).
5.2. MFN results Fifteen neural networks NETR ¼ {NET1000, NET1500, y,NET8000} with respect to engine speed r are built based on the fifteen sets of training data TRAINr ¼ TRr[Validr. TRr is really used for training the corresponding network NETr whereas Validr is used as validation set for early stop of trainings so as to provide better network generalization. Every neural network consists of 8 input neurons (the parameters of an engine setup at a certain engine speed r), 1 output neuron (the output torque value Tr), and 50 hidden neurons, which is just a guess. Normally, 50 hidden neurons can provide enough capability to approximate a highly nonlinear function. The activation function used inside the hidden neurons is Tan-Sigmoid Transfer function while for the output neuron; a pure linear filter is employed (Fig. 9). The training method employs standard backpropgation algorithm (i.e., gradient descent towards the negative
direction of the gradient) so that the results of MFN can be considered as a standard. Learning rate of weight update is set to be 0.05. Each network is trained for 1000 epochs. The training results of all NETr are shown in Table 3. The same test sets TESTr are also chosen so that the accuracies of the engine torque functions built by LSSVM and MFN can be compared reasonably. The average accuracy of each NETr shown in Table 3 is calculated using Eqs. (7) and (8). 5.3. Comparison of results With reference to Tables 2 and 3, SVM outperforms MFN about 8.08% in overall average accuracy under the same test sets TESTr. In addition, the issues of hyperparameters and training time have also been compared. In LS-SVM, two hyperparameters (gMP, sMP) are required. They can be guessed using Bayesian inference, which totally eliminates the user burden. In MFN, learning
ARTICLE IN PRESS C.-M. Vong et al. / Engineering Applications of Artificial Intelligence 19 (2006) 277–287
286
rate and number of hidden neurons are required from the users. Surely, these parameters can also be solved by 10-fold cross-validation. However, the users have to prepare a grid of guessed values for these parameters, and the grid may not cover the best values for the hyperparameters. Therefore, LS-SVM could often produce better generalization rate over MFN as indicated in Tables 2 and 3. MFN produces less training error than LS-SVM because there is no regularization factor controlling the tradeoff between training error and generalization. In the contrast, LS-SVM produces better generalization, about
8.08%, due to the regularization factor gMP introduced in the training error function. Another issue is about the time required for training. Under a 2.4 GHz Pentium 4 PC with 1GB RAM on board, Input p1
1
a1
IW1,1
8 × 1 50 × 8 8
Output
Hidden layer
+
b1
n1
50 × 1 1 × 50
50 × 1
50 × 1
b2
1 50
Torque function Mr
M1000 M1500 M2000 M2500 M3000 M3500 M4000 M4500 M5000 M5500 M6000 M6500 M7000 M7500 M8000 Overall average
gMP,r
0.28 0.31 0.22 1.14 0.59 0.74 0.98 1.33 0.10 0.49 0.59 1.23 0.43 0.75 0.61
sMP,r
2.32 8.77 4.91 5.64 2.42 4.37 3.38 5.89 10.71 6.87 10.92 7.43 3.05 6.34 3.28
Mean square error with training set TRAINr (%)
Average accuracy with test set TESTr (%)
0.43 0.65 0.89 0.44 0.32 0.27 0.08 1.25 2.10 1.89 1.24 0.58 0.77 0.66 0.39
91.2 91.1 90.5 91.2 91.3 91.6 92.5 84.2 81.1 83.2 88.7 90.0 91.3 90.5 90.4
0.80
90.32
+
n2
1×1
1×1
1×1
a1= tansig(IW1,1 p1+ b1) Table 2 Accuracy of different functions Mr and its corresponding hyperparameter values
a3 = y
IW2,1
1
a2 = purelin(IW2,1 a1+ b2)
Fig. 9. Architecture (layer diagram) of MFN.
Table 3 Training errors and average accuracy of the 15 neural networks Neural network NETr
Training error (mean square error) (%)
Average accuracy with test set TESTr (%)
NET1000 NET1500 NET2000 NET2500 NET3000 NET3500 NET4000 NET4500 NET5000 NET5500 NET6000 NET6500 NET7000 NET7500 NET8000
0.01 0.01 0.03 0.15 0.06 0.14 0.03 0.05 0.10 0.65 0.18 0.25 0.03 0.12 0.19
86.6 85.4 84.3 82.9 83.7 80.3 84.0 78.2 76.2 78.1 81.8 84.2 82.1 84.1 81.7
Overall average
0.13
82.24
Fig. 8. Example of comparison between predicted and actual engine torque and power.
ARTICLE IN PRESS C.-M. Vong et al. / Engineering Applications of Artificial Intelligence 19 (2006) 277–287
LS-SVM takes about 10 min for training 200 data points of 8 attributes for one time. The Bayesian inference for two hyperparameters takes about 16 min. In other words, fifteen engine torque functions requires (10+16) 15 ¼ 390 min training time. For MFN, an epoch takes about 10 s ¼ 1/6 min and each network takes 1000 epochs for training. Consequently, it takes about (1000 1/ 6) 15 ¼ 2500 min for fifteen neural networks. According to this estimation, LS-SVM only takes 15.6% of training time of MFN. The major time reduction is caused by the one time error function minimization in LS-SVM as opposed to 1000 epochs in MFN. Even LS-SVM compares with standard SVM; LS-SVM requires less training time because of elimination of 10-fold cross-validation for guessing hyperparameters.
6. Conclusions LS-SVM plus Bayesian inference is firstly applied to produce a set of torque function for an automotive engine according to different engine speeds. According to Eq. (8), the engine power is calculated based on the engine torque. In this research, the torque functions are separately regressed based on fifteen sets of sample data acquired from an automotive engine through the chassis dynamometer. The engine torque functions developed are very useful for vehicle fine tune-up because the effect of any trial ECU setup can be predicted to be gain or loss before running the vehicle engine on a dynamometer or road test. If the engine performance with a trial ECU setup can be predicted to be gain, the vehicle engine is then run on a dynamometer for verification. If the engine performance is predicted to be loss, the dynamometer test is unnecessary and another engine setup should be made. Hence, the function for prediction can greatly reduce the number of expensive dynamometer tests, which saves not only the time taken for optimal tune-up, but also the large amount of expenditure on fuel, spare parts and lubricants, etc. It is also believed that the function can let the automotive engineer predict if his/her new engine setup is gain or loss during road tests, where the dynamometer is unavailable. Moreover, experiments have been done to indicate the accuracy of the torque functions, and the results are highly satisfactory. In comparison to the traditional neural network method, the LS-SVM plus Bayesian inference outperforms about 8.08% in overall accuracy under the same test set and its training time is approximately 84.4% less than that using standard neural network. From the perspective of automotive engineering, the construction of modern automotive gasoline engine power and torque functions using LS-SVM is a new attempt and this methodology can also be applied to different kinds of vehicle engines.
287
References Bishop, C., 1995. Neural Networks for Pattern Recognition. Oxford University Press, New York. Brace, C., 1998. Prediction of Diesel Engine Exhaust Emission using Artificial Neural Networks. IMechE Seminar S591, Neural Networks in Systems Design, UK. Cristianini, N., Shawe-Taylor, J., 2000. An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. Cambridge University Press, UK. Gunn, S., 1998. Support vector machines for classification and regression. ISIS Technical Report ISIS-1-98. Image Speech & Intelligent Systems Research Group, University of Southapton, May 1998, UK. Harrell, F., 2001. Regression Modelling Strategies with Applications to Linear Models, Logistic Regression, and Survival Analysis. Springer, New York. Haykin, S., 1999. Neural Networks: A Comprehensive Foundation, second ed. Prentice-Hall, USA. Liu, Z., Fei, S., 2004. Study of CNG/diesel dual fuel engine’s emissions by means of RBF neural network. Journal of Zhejiang University Science 5 (8), 960–965. MacKay, D., 1995. Probable networks and plausible predictions—a review of practical Bayesian methods for supervised neural networks. Network Computation in Neural Systems 6, 469–505. Pelckmans, K., Suykens, J., Van Gestel, T., De Brabanter, J., Lukas, L., Hamers, B., De Moor, B., Vandewalle, J., 2003. LS-SVMlab: a MATLAB/C toolbox for Least Squares Support Vector Machines. Available at http://www.esat.kuleuven.ac.be/sista/lssvmlab Pyle, D., 1999. Data Preparation for Data Mining. Morgan Kaufmann, USA. Ryan, T., 1996. Modern Regression Methods. Wiley-Interscience, USA. Scho¨lkopf, B., Smola, A., 2002. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, USA. Seber, G., Wild, C., 2003. Nonlinear Regression, New Edition. WileyInterscience, USA. Seeger, M., 2004. Gaussian processes for machine learning. International Journal of Neural Systems 14 (2), 1–38. Sen, A., Srivastava, M., 1990. Regression Analysis: Theory, Methods, and Applications. Springer, New York. Smola, A., Burges, C., Drucker, H., Golowich, S., Van Hemmen, L., Muller, K., Scholkopf, B., Vapnik, V., 1996. Regression estimation with support vector LEARNING machines, available at http:// www.first.gmd.de/smola Su, S., Yan, Z., Yuan, G., Cao, Y., Zhou, C., 2002. A method for prediction in-cylinder compound combustion emissions. Journal of Zhejiang University Science 3 (5), 543–548. Suykens, J., Gestel, T., De Brabanter, J., De Moor, B., Vandewalle, J., 2002. Least Squares Support Vector Machines. World Scientific, Singapore. Tabachnick, B., Fidell, L., 2001. Using Multivariate Statistics, fourth ed. Allyn and Bacon, USA. Traver, M., Atkinson, R., Atkinson, C., 1999. Neural network-based diesel engine emissions prediction using in-cylinder combustion pressure. SAE Paper 1999-01-1532. Van Gestel, T., Suykens, J., De Moor, B., Vandewalle, J., 2001a. Automatic relevance determination for least squares support vector machine classifiers. In: Proceedings of the European Symposium on Artificial Neural Networks (ESANN’2001), Bruges, Belgium, April 2001, pp. 13–18. Van Gestel, T., Suykens, J., Lambrechts, D., Lanckriet, A., Vandaele, G., De Moor, B., Vandewalle, J., 2001b. Predicting financial time series using least squares support vector machines within the evidence framework. IEEE Transactions On Neural Networks, Special Issue on Financial Engineering 12 (4), 809–821. Yan, Z., Zhou, C., Su, S., Liu, Z., Wang, X., 2003. Application of neural Network in the study of combustion rate of neural gas/diesel dual fuel engine. Journal of Zhejiang University Science 4 (2), 170–174.