Particle swarm optimization-least squares support vector regression based forecasting model on dissolved gases in oil-filled power transformers

Particle swarm optimization-least squares support vector regression based forecasting model on dissolved gases in oil-filled power transformers

Electric Power Systems Research 81 (2011) 2074–2080 Contents lists available at ScienceDirect Electric Power Systems Research journal homepage: www...

438KB Sizes 0 Downloads 63 Views

Electric Power Systems Research 81 (2011) 2074–2080

Contents lists available at ScienceDirect

Electric Power Systems Research journal homepage: www.elsevier.com/locate/epsr

Particle swarm optimization-least squares support vector regression based forecasting model on dissolved gases in oil-filled power transformers Ruijin Liao a , Hanbo Zheng a,∗ , Stanislaw Grzybowski b , Lijun Yang a a b

State Key Laboratory of Power Transmission Equipment & System Security and New Technology, Chongqing University, Chongqing, China Department of Electrical and Computer Engineering, Mississippi State University, Mississippi State, USA

a r t i c l e

i n f o

Article history: Received 1 April 2011 Received in revised form 23 July 2011 Accepted 27 July 2011 Available online 27 August 2011 Keywords: Least squares support vector machine (LS-SVM) Particle swarm optimization (PSO) Dissolved gas analysis (DGA) Power transformers Forecasting models

a b s t r a c t This paper presents a forecasting model based upon least squares support vector machine (LS-SVM) regression and particle swarm optimization (PSO) algorithm on dissolved gases in oil-filled power transformers. First, the LS-SVM regression model, with radial basis function (RBF) kernel, is established to facilitate the forecasting model. Then a global optimizer, PSO is employed to optimize the hyperparameters needed in LS-SVM regression. Afterward, a procedure is put forward to serve as an effective tool for forecasting of gas contents in transformer oil. The application of the proposed model on actual transformer gas data has given promising results. Moreover, four other forecasting models, derived from back propagation neural network (BPNN), radial basis function neural network (RBFNN), generalized regression neural network (GRNN) and support vector regression (SVR), are selected for comparisons. The experimental results further demonstrate that the proposed model achieves better forecasting performance than its counterparts under the circumstances of limited samples. © 2011 Elsevier B.V. All rights reserved.

1. Introduction Power transformers are one of the most essential and costly equipment in the power system. The reliable and efficient faultfree operation of large power transformers has a decisive role in the availability of power supply and transmission [1,2]. Thus many efforts have been dedicated to the incipient fault detection of power transformers. Studies in the past decades have proved that the dissolved gases in transformer oil are related closely to incipient faults. Dissolved gas analysis (DGA) has gained worldwide acceptance and as a diagnostic method for the detection of transformer’s incipient faults [3–5]. Principally, the fault related gases commonly used are hydrogen (H2 ), methane (CH4 ), acetylene (C2 H2 ), ethylene (C2 H4 ) and ethane (C2 H6 ). Currently, there are various practical DGA techniques such as Rogers, modified Rogers, Dornenburg, IEC, and Duval Triangle methods [6–8]. These methods would find the relationships between the gases (gas concentrations and the ratios of specific gas concentrations) and the fault types. Therefore, if we can forecast the future gas contents of a transformer according to the historical data, transformer incipient faults and their development trend will be found out early [9].

∗ Corresponding author. Tel.: +86 15902397021; fax: +86 023 65112258. E-mail address: [email protected] (H. Zheng). 0378-7796/$ – see front matter © 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.epsr.2011.07.020

The tasks of gas content forecasting underpin a number of applications such as fault diagnosis, condition monitoring and other maintenance schemes for oil-filled transformers. However, there must be appropriate techniques and methods which can provide a basis for accurately forecasting trends and expected behavior. In the past, forecasting approaches have mostly used neural network models such as back propagation neural network (BPNN) [10], radial basis function neural network (RBFNN) [11], and generalized regression neural network (GRNN) [12]. Generally, these forecasting models are not appropriate for application in the field because they need a large amount of historical data while the data in practice is very limited [13,14]. Recently, many effective attempts have been developed, including the fuzzy linear regression model [15], grey model (GM) [13], grey-extension model [9], support vector machine (SVM) model [14,16,17], etc. The least squares support vector machine (LS-SVM) is introduced by Suykens et al. [18] as reformulations to standard SVM [19] which simplifies the model of standard SVM in a great extent by applying linear least squares criteria to the loss function instead of traditional quadratic programming method. The simplicity and inherited advantages of SVM such as excellent generalization ability and a unique solution promote the applications of LS-SVM in many pattern recognition and regression problems [20–24]. In this paper, a LS-SVM regression with particle swarm optimization (PSO) based model is proposed for forecasting of gas contents in transformer oil. The hyper-parameters in regularization item and kernel function of LS-SVM are optimized with a

R. Liao et al. / Electric Power Systems Research 81 (2011) 2074–2080

global optimizer, PSO, by minimizing an error fitness function based upon cross validation. PSO developed by Kennedy and Eberhart is a stochastic global optimization technique inspired by social behavior of bird flocking [25]. Compared with other evolutional algorithms, for example, genetic algorithm, PSO does not need evolutionary operators such as crossover and mutation. Furthermore, the advantages of PSO are that PSO possesses the capability to escape from local optima, is easy to be implemented, and has fewer parameters to be adjusted [26,27]. Thus in this study, PSO is applied to optimize the hyper-parameters. The evaluation of the model is guided by the separate or joint action of two different criteria, namely mean absolute percentage error (MAPE) and squared correlation coefficient (r2 ), which express the learning and generalization capabilities of the SVM estimator adopted as measure tool for regression. The real-world DGA data are exploited for gas forecasting using this model, the results of which highlight the potential of the proposed model with satisfactory forecasting accuracy and valuable information. The rest of this paper is organized as follows. In Section 2, model of the least squares support vector regression is described. Hyperparameter optimization of LS-SVM based upon PSO is detailedly explained in Section 3. The forecasting procedure is presented to illustrate how to use the model to deal with a practical problem in Section 4. The experimental part is showed in Section 5. Finally, conclusions are drawn in Section 6.

The support vector regression (SVR) is based upon the idea to deduce an estimate f(x) of the true y and unknown relationship y = f(x) between the vector of observations x and the desired y from a given set of training samples. This is usually performed by mapping the data from the original feature space to a higher dimensional transformed feature space, to increase the flatness of the function and, accordingly, to approximate it in a linear way as follows T

f (x) = ω ϕ(x) + b

yi = ωT ϕ(xi ) + b + ei , i = 1, 2, . . . , l

where f(x) ∈  and ϕ(x) denotes a set of nonlinear transformations. One commonly used SVM regression model is called ε-SVR. Consider a given set of training samples {(x1 , y1 ), . . ., (xl , yl )} ⊂ n × , where xi is the input and yi is the corresponding target value for sample i. The primal optimization problem for ε-SVR can be defined as follows

 1 T (i + i∗ ) ω ω+C 2 l

(2)

i=1

L(ω, b, e, ˛) =  (ω, e) −

⎪ ⎩

ωT ϕ(xi ) + b − yi i , i∗

≥ 0,



(6)

⎧ l  ⎪ ∂L ⎪ ⎪ 0 ⇒ ω = ˛i ϕ(xi ) = ⎪ ⎪ ∂ω ⎪ ⎪ i=1 ⎪ ⎪ ⎪ ⎪ l ⎪  ⎪ ⎨ ∂L =0⇒

∂b

˛i = 0

⎪ ⎪ ⎪ ⎪ ∂L ⎪ ⎪ = 0 ⇒ ˛i = Cei , i = 1, 2, . . . , l ⎪ ⎪ ∂ei ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ ∂L = 0 ⇒ ωT ϕ(xi ) + b + ei − yi = 0, i = 1, 2, . . . , l i=1

(7)

∂˛i

After elimination of the variables ω and ei , the optimization problem can be transformed into the following linear solution.

 

0

1Tl

1l

˝ + C −1 I

b ˛

  =

0 y

(8)

where 1l = [1, . . ., 1]T , ˛ = [˛1 , . . ., ˛l ]T , and y = [y1 , . . ., yl ]T . The Mercer’s condition is applied within the matrix ˝ = {ij }l×l as follows ˝ij = ϕ(xi )T ϕ(xj ) = K(xi , xj )

(9)

Then the resulting LS-SVM model for regression becomes f (x) =

l 

˛i K(x, xi ) + b

(10)

i=1

where ˛i , b are the solution to the linear Eq. (8). In Eq. (10), there are several different kinds of Mercer kernel function K(x, xi ) such as polynomial, sigmoid, and radial basis function (RBF). Due to fewer parameters to set and an excellent overall performance, the RBF is an effective option for kernel function [18,28]. Therefore, this study applies an RBF kernel function, showed as Eq. (11), to help the LS-SVM regression model in obtaining the optimal solution.





x − xi 2 −

2 2

(11)

(3)

i = 1, 2, ..., l

where  and * are the slack variables, and C is the regularization parameter. Compared with standard SVR, LS-SVM regression applies linear least squares criteria to the loss function instead of inequality constraints with equality ones. In the primal space, the LS-SVM formulation can be described as 1 T 1  2 ei ω ω+ C 2 2 l

min  (ω, e) =

˛i (ωT ϕ(xi ) + b + ei − yi )

where ˛i (i = 1, 2, . . ., l) are Lagrange multipliers. The Karush–Kuhn–Tucker (KKT) conditions for optimality are given by

K(x, xi ) = exp

ε + i∗

l  i=1

subject to the constrains

⎧ T ⎪ ⎨ yi − ω ϕ(xi ) − b ≤ ε + i

(5)

The Lagrangian function can be constructed by

(1)

x ∈ n ,

min  (ω, ,  ∗ ) =

subject to the equality constrains



2. Model of least squares support vector regression

2075

i=1

(4)

Consequently, by contrast with ε-SVR, only two major hyperparameters, C and 2 , need to be chosen priori by users appropriately in LS-SVM regression model. The selection of the hyper-parameters plays an important role in the performance of LS-SVM. A better approach is to apply cross validation to select the best choices among some candidate parameters. Based upon the idea, several disciplined approaches [24–31] can be used to obtain the optimal hyper-parameters for regression model, out of which, evolutionary method such as genetic algorithm, simulated annealing algorithm and PSO algorithm, is one of the most widely used approaches. In this paper, employed is the PSO algorithm.

2076

R. Liao et al. / Electric Power Systems Research 81 (2011) 2074–2080

3. Hyper-parameter optimization based upon PSO 3.1. Algorithm of particle swarm optimization Consider a swarm of size S. Each particle, which is characterized by its velocity vdi (t), the best position pbestd (t), and its current posi-

tion pdi (t) (i = 1, 2, . . ., S), flies in a d-dimensional search space (in this study, d = 2) according to its own historical experience and others’. Let t denote the current generation and gbestd (t) denote the global best position thus far. To search for the optimal solution, each particle updates its velocity and position according to the following equations

⎧ d d d d ⎨ vi (t + 1) = wvi (t) + c1 r(t)(pbest (t) − pi (t)) ⎩

+c2 r(t)(gbest d (t) − pdi (t))

pdi (t

+ 1) =

pdi (t) + vdi (t

(12) Fig. 1. Flowchart of PSO for hyper-parameter optimization.

+ 1)

where c1 and c2 are two acceleration constants regulating the relative velocities with respect to the best global and local positions, respectively, and r(t) is a random variable that is drawn from an uniform distribution in the open interval (0, 1). The inertia weight w is used to balance the capabilities of global exploration and local exploration, which can be determined by w = wmax −

wmax − wmin t T

(13)

where wmax is the initial weight, wmin is the final weight, and T is the maximum number of generations. 3.2. PSO-based hyper-parameter selection By means of the PSO algorithm, the two major hyper-parameters of LS-SVM regression model, C and , can be optimized. In solving the hyper-parameter selection, each particle represents a potential solution, comprised of a vector d = (C, ). The hyper-parameter optimality is measured by means of fitness functions that are defined in relation to the considered optimization problem. In the training and testing process of LS-SVM, the objective is to improve the generalization performance of the regression model, namely, minimize the errors between the true values and forecasting values of the testing samples. Therefore, the fitness function can be defined as follows

n m 

1 1 2  (f (xij ) − yij ) Fitness = n

m

i=1

(14)

j=1

where n is the number of folds for cross validation, m is the number of each subset as validation, yij is the true value, and f(xij ) is the forecasting value of validation samples. The objective is to minimize the Fitness, so the particle with the minimal fitness value will outperform others and should be reserved during the optimization process. Accordingly, the optimal hyper-parameters can be selected. 4. A procedure for forecasting using proposed regression model As a result of the discussion in Sections 2 and 3, we are now in a position to formulate a procedure for dealing with forecasting of dissolved gases in oil-filled transformers. Considering a PSO-based LS-SVR forecasting model, a time-series set of certain gas content (for example, H2 ) is chosen, which is defined as an input vector as follows: XI = {a1 , a2 , . . . , an }

(15)

Thus the forecasting result XO = {an+1 }, which is calculated by formula (10), is taken as an output. The procedure is summarized as the following steps.

Step 1: Data preprocessing Extract a collection of raw data and generate training and testing sets. Generally, the raw samples, which are usually obtained from electric power companies, are mostly sampled aperiodically. Thus the training samples must be preprocessed before modeling. Commonly, we can transform the primary sampling data into equal interval time series by interpolation methods. In this study, we employ the method of cubic Hermite spline interpolation to process the raw data more smoothly than linear interpolation and avoid the problem of Runge’s phenomenon which occurs when using high degree polynomials simultaneously [32]. And then the raw data, including training data and testing data, are normalized, which can improve generalization of LS-SVM regression. Step 2: Hyper-parameter optimization The process of PSO for hyper-parameter optimization can be described as follows. Step 2.1: Initialize the swarm size, maximum of generations and the velocity and position for each particle comprised of the hyperparameters C and . The velocity is restricted to the [−vmax , vmax ] range in which vmax is a predefined boundary value according to the corresponding experimental data. Step 2.2: Evaluate each particle’s fitness according to formula (14) and set the best position from the particle with the minimal fitness in the swarm. Step 2.3: To each candidate particle, train an LS-SVM regression model with the corresponding hyper-parameters based upon cross validation. Step 2.4: Update the velocity and position of each particle by means of (12). And update the inertia weight generation by generation according to (13). Step 2.5: Check the stopping criterion. If the maximal number of generations is not yet reached, return to Step 2.2. Otherwise go to next substep. Step 2.6: Terminate the algorithm and give the optimal hyperparameters. The flowchart of PSO for hyper-parameter optimization is shown as Fig. 1. Step 3: Training and testing Train an LS-SVM estimator on the training samples with the optimal hyper-parameters obtained from Step 2. To validate the trained regression model, we can use some measures to evaluate it.

R. Liao et al. / Electric Power Systems Research 81 (2011) 2074–2080

2077

Table 1 Testing data of chromatogram (␮L/L). Case number

Testing date

H2

CH4

C2 H2

C2 H4

C2 H6

Data type

1 [31]

2007-11-15 2007-11-22 2007-11-29 2007-12-06 2007-12-13 2007-12-20 2007-12-27 2008-01-03 2008-01-10 2008-01-17 2008-01-24

4.50 7.40 10.20 7.70 8.80 12.60 15.20 14.00 14.90 16.80 13.20

20.50 23.70 32.30 32.70 37.50 39.40 45.60 41.80 45.50 58.00 47.70

0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

10.00 12.90 17.50 16.10 17.00 16.80 20.60 20.10 21.80 26.80 22.40

5.90 7.00 10.10 8.30 10.90 11.60 11.80 12.10 13.20 17.50 13.00

Training Training Training Training Training Training Training Training Training Training Testing

2 [31]

2005-04-14 2005-04-28 2005-05-05 2005-05-12 2005-05-19 2005-05-26 2005-06-02

18.50 20.90 22.10 23.10 23.30 24.60 23.90

62.30 66.80 68.80 71.10 71.40 72.70 73.40

5.50 6.20 6.30 6.60 6.70 6.50 6.70

22.50 23.50 23.90 24.50 24.60 24.90 24.30

70.00 79.80 83.20 87.30 89.00 90.20 92.20

Training Training Training Training Training Training Testing

3 [32]

1990 1991 1992 1993 1994 1995

18.37 19.42 27.30 22.40 23.70 24.80

24.20 24.70 25.40 22.30 23.50 23.80

0.00 0.00 0.00 0.00 0.00 0.00

2.71 2.89 2.52 2.64 2.95 2.70

3.40 3.80 3.70 3.50 3.70 3.64

Training Training Training Training Training Testing

Let x1 , . . ., xl be the training data and f(x1 ), . . ., f(xl ) be the approximate values by LS-SVM regression. Suppose that the true values of training data are denoted as y1 , . . ., yl , and we evaluate the regression models by MAPE and r2 [33] defined as follows

  1   f (xi ) − yi  MAPE =   y l l

(16)

i

i=1

 r2 =

l

 l

l i=1

2

f (xi ) −

l f (xi )yi i=1





l

2   

l f (xi ) i=1

2

l

f (xi ) i=1 l

y i=1 i

l y2 i=1 i





l y i=1 i

2  (17)

5. Experimental results 5.1. Forecasting results based upon proposed regression model This study employed dissolved gas data including 110 kV and 220 kV power transformers from several electric power companies in China [34,35]. The DGA data of partial forecasting cases are shown in Table 1. As shown, the dissolved gas content data are divided into training samples and testing sample in each case. Taking Case 1 for example, the sampling time was periodically in the period between November 2007 and January 2008. Before applying the model, the experimental data including training data and testing data are normalized first. Then, we implement PSO algorithm to find the optimal hyperparameters C and for each feature gas (say, hydrogen, methane, ethylene and ethane) model by using 5-fold cross validation. The initial population size of swarm is chosen to 20 such that it is

Then, forecast the trend of DGA concentrations on testing samples via the trained regression model. Finally, end the procedure and analyze the outcomes. The diagram of procedure structure is illustrated in Fig. 2.

Implementing PSO optimization Data preprocessing

Training data

Hyper-parameters optimization

No

Testing data

Yes

MAPE and r 2 Criteria evaluation

Curve 1:Best fitness Curve 2:Average fitness

0.15 0.14

Normalized fitness

Input data

0.16

0.13 0.12 0.11 0.1 0.09 0.08

Modeling

0.07 0.06 0

Results of forecasting Fig. 2. The diagram of procedure structure.

10

20

30

40

50

60

70

80

Generations Fig. 3. Convergence curves of PSO for hydrogen in Case 1.

90

100

2078

R. Liao et al. / Electric Power Systems Research 81 (2011) 2074–2080

18

16

60 Actual data Forecasting values

Actual data Forecasting values

55 50 45





)

)

14

12

40 10

H2

CH4

35

8

30 6

4 11/15/2007

25

11/29/2007

12/13/2007

12/27/2007

01/10/2008

20 11/15/2007

01/24/2008

11/29/2007

12/13/2007

Date 28 26

12/27/2007

01/10/2008

01/24/2008

01/10/2008

01/24/2008

Date 18

Actual data Forecasting values

Actual data Forecasting values

16

14

)

)

24

20

12

18

10

C 2H6

C 2H4





22

16

8

14 6

12 10 11/15/2007

11/29/2007

12/13/2007

12/27/2007

01/10/2008

01/24/2008

4 11/15/2007

11/29/2007

12/13/2007

Date

12/27/2007

Date Fig. 4. Forecasting results of hydrogen in Case 1 using proposed model.

enough to cover the search space within the limited generations based upon the experimental runs. Again based upon many experiments, the maximum number of generations is fixed to 100 and the inertia weight is initially set to 0.9 and reduced to 0.1 linearly according to (13). The convergence process of PSO for hydrogen in Case 1 is drawn in Fig. 3. Curve 1 is the convergence curve drawn with the best fitness for the training samples obtained from the minimal fitness of the swarm in each generation. Curve 2 is the average fitness for the training samples obtained by all particles of each generation. From Fig. 3, it can be observed that both fitness curves significantly decrease at the very beginning generations and afterward the curves appear almost flat, which shows that PSO converges to the best solution quickly. Consequently, the parameters with the minimum validation error are selected as the most appropriate hyper-parameters.

In the following step, the optimal hyper-parameters are utilized to train LS-SVM regression models. The two standard criteria, namely MAPE and r2 , are carried out to measure the performance of the models. The testing data are used to examine the accuracy of the final forecasting results. Fig. 4 shows an example of forecasting results for Case 1. Regarding five separate PSO aided LS-SVR based forecasting models, the forecasting performance in MAPE and r2 and the optimal hyper-parameters for the models of different dissolved gases in Case 1 are illustrated in Table 2. From Fig. 4 and Table 2, we can see that the proposed model has strong learning capability for small samples and simultaneously achieves excellent generalization performance since the LS-SVM is a good compromise for guaranteeing both stability and accuracy improvement. As listed in Table 2, the MAPE between actual data

Table 2 Forecasting performance and the optimal hyper-parameters for the models of different dissolved gases in Case 1. Gases

H2 CH4 C2 H2 C2 H4 C2 H6

Hyper-parameters

Training

Testing

C



MAPE (%)

r2

MAPE (%)

145.1024 202.7623 – 201.0028 158.7771

2.4174 2.2068 – 2.3186 1.8083

0.2073 0.0979 – 0.1075 0.1371

0.9999 0.9999 – 0.9999 0.9999

2.8931 3.6229 – 2.3804 3.6414

R. Liao et al. / Electric Power Systems Research 81 (2011) 2074–2080

2079

Table 3 Comparisons of forecasting results and performance among BPNN, BRFNN, GRNN, ␧-SVR, and the proposed model.

2

3

CH4

C2 H2

C2 H4

C2 H6

13.2000/18.1459/ 10.4497/14.2030/ 12.1435/13.5819 37.4690/ 20.8354/7.5984/ 8.0040/2.8931

47.7000/60.4083/ 51.7889/44.8038/ 45.0798/45.9719 26.6420/ 8.5721/6.0717/ 5.49311/3.6229



22.4000/26.3767/ 20.6137/20.8881/ 21.1039/21.8668 17.7531/ 7.9744/6.7495/ 5.7862/2.3804

13.0000/15.0194/ 12.0881/13.4180/ 12.4433/12.5266 15.5342/ 7.0146/3.2152/ 4.2822/3.6414

Actual value/BPNN/ RBFNN/GRNN/ ␧-SVR/LS-SVM MAPE (%) (BPNN/RBFNN/GRNN/ ␧-SVR/LS-SVM)

23.9000/22.6509/ 20.8832/24.6000/ 22.0707/23.3195 5.2262/ 12.6228/2.9289/ 7.6540/2.4287

73.4000/67.8535/ 67.4326/71.7300/ 68.5998/69.8354 7.5566/ 8.1300/2.2753/ 6.5398/4.8563

8.0000/6.0481/ 6.1684/6.5000/ 6.3143/6.2871 9.7288/ 7.9337/2.9851/ 5.7575/6.1624

24.3000/25.3112/ 23.6377/24.9000/ 23.9631/24.3396 4.1612/ 2.7256/2.4691/ 1.3866/0.1628

92.2000/84.7425/ 86.1958/90.1999/ 90.2169/90.2545 8.0844/ 6.5121/2.1693/ 2.1509/2.1101

Actual value/BPNN/ RBFNN/GRNN/ ␧-SVR/LS-SVM MAPE (%) (BPNN/RBFNN/GRNN/ ␧-SVR/LS-SVM)

24.8000/16.3806/ 18.3700/21.3176/ 22.2487/22.3429 33.9492/ 25.9274/14.0417/ 10.2874/9.9077

23.8000/22.5236/ 22.3000/23.5000/ 24.1167/24.0137 5.3629/ 6.3025/1.2605/ 1.3308/0.8982



2.7000/2.9080/ 2.5200/2.7987/ 2.7423/2.7420 7.7032/ 6.6667/3.6550/ 1.5655/1.5556

3.6400/3.3976/ 3.4000/3.5804/ 3.6198/3.6200 6.6580/ 6.5934/1.6386/ 0.5563/0.5495

(say, hydrogen, methane, ethylene and ethane) and forecasting values are only 2.8931%, 3.6229%, 2.3804% and 3.6414%, respectively. From the above results, we conclude that the proposed model combining LS-SVM regression and PSO obtains very promising results in forecasting dissolved gas contents for oil-filled transformers. This forecasting information is important for decision making regarding the transformer routine tests or refurbishment, so the testing costs and maintenance schedule can be optimized.

a





20 18 16

)

1

H2 Actual value/BPNN/ RBFNN/GRNN/ ␧-SVR/LS-SVM MAPE (%) (BPNN/RBFNN/GRNN/ ␧-SVR/LS-SVM)

Actual data BPNN RBFNN GRNN SVR LS-SVM

14



Case number

12

H2

5.2. Comparisons with BPNN, RBFNN, GRNN, and SVR models

8 6 4 11/15/2007

11/29/2007

12/13/2007

12/27/2007

01/10/2008

01/24/2008

12/27/2007

01/10/2008

01/24/2008

Date

b

40

)

30

20

BPNN RBFNN GRNN SVR LS-SVM

(

To further verify superior approximation performance and generalization performance of the LS-SVM regression model based upon PSO, expanded experimental comparisons are carried out in terms of BPNN, RBFNN, GRNN and ε-SVR models, which can produce much legible comparisons. In Case 2, the sampling data are unequal interval series, which needs to be preprocessed by the method of cubic Hermite spline interpolation. And then all experimental data are normalized before implementing the selected models. Similar with LS-SVM regression, we choose RBF for ε-SVR as a kernel function and employ the PSO algorithm to select the optimal hyper-parameters. In RBFNN and GRNN models, the spread of RBF plays an important role in successful application of neural networks. In this study, we apply cross validation to select the optimal spread among some candidate values, which ensures that the networks provide the best generalization performance. Moreover, we utilize a double hidden-layer network with the transfer function of log-sigmoid to find the best network model for BPNN. And the Levenberg–Marquardt optimization method is used to minimize the predetermined error goal value as fast as possible. Comparisons of the performance in the training and testing phases among BPNN, RBFNN, GRNN, ε-SVR and LS-SVM regression are shown in Fig. 5 (taking hydrogen in Case 1 as an example). The forecasting results and evaluation performance in MAPE for all the five models are listed in Table 3. From Fig. 5 and Table 3, it can be easily noticed that the BPNN makes the biggest training and forecasting errors, especially during the sharp fluctuation points. This problem may be because that BPNN suffers the weakness of requirement for a large amount of training data. Compared with BPNN, the RBFNN and GRNN have better performance. However, some forecasting values from the two models are inaccurate, which makes several errors in MAPE up to 10%. Obviously, the ε-SVR and LS-SVM regression models perform comparable performance compared with the three above models. The difference between ε-SVR

10

10

0

-10

-20 11/15/2007

11/29/2007

12/13/2007

Date Fig. 5. Comparisons of the performance in the training and testing phases among BPNN, RBFNN, GRNN, ␧-SVR and LS-SVM regression models.

and LS-SVM regression models is not as significant as the others. It is partly because the two models belong to the same SVM family and have some similar characteristics. But the proposed LS-SVM regression model has lower MAPEs at the most time compared with the ε-SVR. Moreover, from analysis of the percentage errors and

2080

R. Liao et al. / Electric Power Systems Research 81 (2011) 2074–2080

MAPEs in training and forecasting phases, it can be concluded that the proposed forecasting model are more stable than the others. Therefore, we can safely draw the conclusion that the LS-SVM regression model based upon PSO performs better than its counterparts. 6. Conclusion This paper has investigated a novel forecasting model, namely least squares support vector regression based upon PSO, to handle real-world DGA data and forecast dissolved gas contents in power transformer oil. Compared with standard SVR, the LS-SVM regression model applies linear least squares criteria to the loss function instead of traditional quadratic programming method. Advantages of this regression model include the ones inherited form SVR, e.g. a unique solution, and support of statistical learning theory. Moreover, the PSO optimization method has been used to obtain the optimal hyper-parameters needed in the LS-SVM regression model. In the instances, the forecasting procedure can serve as an effective tool for gas content forecasting in oil-filled transformers. Furthermore, compared with BPNN, RBFNN, GRNN and the ε-SVR with PSO, the proposed model has better forecasting performance in MAPE and simultaneously performs some other attractive properties, such as the strong learning capability for small samples, excellent generalization performance and the good stable capability of forecasting. Although a satisfactory forecasting accuracy has been achieved in the study, some subsequent work needs to be addressed in the near future. Currently, the experiments are implemented based upon limited actual DGA data. However, the forecasting model can be more reliable in future if it is obtained with large training and testing data. More DGA data are being collected and the proposed model will be implemented to further verify its capability and reliability for gas content forecasting in transformer oil. This model is convenient to combine with fault diagnosis method to offer useful information for future transformer fault analysis. Thus, a subsequent work needs to be supplemented in the future study, although not be demonstrated in this paper. Acknowledgments The authors gratefully acknowledge supports provided by Funds for Innovative Research Groups of China (51021005), National Basic Research Program of China (973 Program) (2009CB724505-1), and the Scientific Research Fund of SKL of Power Transmission Equipment & System Security and New Technology (2007DA10512708103), Chongqing University, China. The authors also thank the anonymous reviewers and the editor for their valuable comments. References [1] C.P. Hung, M.H. Wang, Diagnosis of incipient faults in power transformers using CMAC neural network approach, Electr. Power Syst. Res. 71 (3) (2004) 235–244. [2] G.Y. Lv, H.Z. Cheng, H.B. Zhai, L.X. Dong, Fault diagnosis of power transformer based on multi-layer SVM classifier, Electr. Power Syst. Res. 74 (1) (2005) 1–7. [3] W.H. Tang, J.Y. Goulermas, Q.H. Wu, Z.J. Richardson, J. Fitch, A probabilistic classifier for transformer dissolved gas analysis with a particle swarm optimizer, IEEE Trans. Power Deliv. 23 (2) (2008) 751–759. [4] L.X. Dong, D.M. Xiao, Y.S. Liang, Y.L. Liu, Rough set and fuzzy wavelet neural network integrated with least square weighted fusion algorithm based fault

[5]

[6] [7]

[8] [9]

[10]

[11] [12]

[13]

[14]

[15] [16] [17] [18] [19] [20]

[21] [22] [23] [24]

[25] [26]

[27]

[28] [29]

[30]

[31]

[32] [33] [34]

[35]

diagnosis research for power transformers, Electr. Power Syst. Res. 78 (1) (2008) 129–136. Z. Yang, W.H. Tang, A. Shintemirov, Q.H. Wu, Association rule mining-based dissolved gas analysis for fault diagnosis of power transformers, IEEE Trans. Syst. Man Cybern. C: Appl. Rev. 39 (6) (2009) 597–610. Y. Zhang, X. Ding, Y. Liu, P.J. Griffin, An artificial neural network approach to transformer fault diagnosis, IEEE Trans. Power Deliv. 11 (4) (1996) 1836–1841. J.L. Guardado, J.L. Naredo, P. Moreno, C.R. Fuerte, A comparative study of neural network efficiency in power transformers diagnosis using dissolved gas analysis, IEEE Trans. Power Deliv. 16 (4) (2001) 643–647. M.H. Wang, A novel extension method for transformer fault diagnosis, IEEE Trans. Power Deliv. 18 (1) (2003) 164–169. M.H. Wang, Grey-extension method for incipient fault forecasting of oilimmersed power transformer, Electr. Power Compon. Syst. 32 (10) (2004) 959–975. H.S. Hippert, C.E. Pedreira, R.C. Souza, Neural networks for short-term load forecasting: A review and evaluation, IEEE Trans. Power Syst. 16 (1) (2001) 44–55. F.J. Chang, Y.C. Chen, Estuary water-stage forecasting by using radial basis function neural network, J. Hydrol. 270 (1/2) (2003) 158–166. M.T. Leung, A.S. Chen, H. Daouk, Forecasting exchange rates using general regression neural networks, Comput. Operations Res. 27 (11/12) (2000) 1093–1110. M.H. Wang, C.P. Hung, Novel grey model for the prediction of trend of dissolved gases in oil-filled power apparatus, Electr. Power Syst. Res. 67 (1) (2003) 53–58. S.W. Fei, Y. Sun, Forecasting dissolved gases content in power transformer oil based on support vector machine with genetic algorithm, Electr. Power Syst. Res. 78 (3) (2008) 507–514. C. Kao, C.L. Chyu, A fuzzy linear regression model with better explanatory power, Fuzzy Sets Syst. 126 (3) (2002) 401–409. F.E.H. Tay, L.J. Cao, Application of support vector machines in financial time series forecasting, Omega-Int. J. Manage. Sci. 29 (4) (2001) 309–317. Q. Wu, The forecasting model based on wavelet v-support vector machine, Expert Syst. Appl. 36 (4) (2009) 7604–7610. J.A.K. Suykens, T. Van Gestel, J. De Brabanter, B. De Moor, J. Vandewalle, Least Squares Support Vector Machines, World Scientific, Singapore, 2002. V.N. Vapnik, The Nature of Statistical Learning Theory, Springer-Verlag, New York, 1998. T. Van Gestel, J.A.K. Suykens, D.E. Baestaens, Financial time series prediction using least squares support vector machines within the evidence framework, IEEE Trans. Neural Netw. 12 (4) (2001) 809–821. B.J. de Kruif, T.J.A. de Vries, Pruning error minimization in least squares support vector machines, IEEE Trans. Neural Netw. 14 (3) (2003) 696–702. T. Van Gestel, J.A.K. Suykens, B. Baesens, Benchmarking least squares support vector machine classifiers, Mach. Learn. 54 (1) (2004) 5–32. Y. Zhang, Y.C. Liu, Traffic forecasting using least squares support vector machines, Transportmetrica 5 (3) (2009) 193–213. Z. Yang, X.S. Gu, X.Y. Liang, L.C. Ling, Genetic algorithm-least squares support vector regression based predicting and optimizing model on carbon fiber composite integrated conductivity, Mater. Des. 31 (3) (2010) 1042–1049. J. Kennedy, R.C. Eberhart, Particle swarm optimization, in: Proc. IEEE Int. Conf. Neural Networks, IV, IEEE Service Center, Piscataway, NJ, 1995, pp. 1942–1948. M. Clerc, J. Kennedy, The particle swarm-explosion, stability, and convergence in a multidimensional complex space, IEEE Trans. Evolut. Comput. 6 (1) (2002) 58–73. J.J. Liang, A.K. Qin, P.N. Suganthan, S. Baskar, Comprehensive learning particle swarm optimizer for global optimization of multimodal functions, IEEE Trans. Evolut. Comput. 10 (3) (2006) 281–295. S.S. Keerthi, C.J. Lin, Asymptotic behaviors of support vector machines with Gaussian kernel, Neural Comput. 15 (7) (2003) 1667–1689. B.K. Panigrahi, V.R. Pandi, Bacterial foraging optimization: Nelder–Mead hybrid algorithm for economic load dispatch, IET Gener. Trans. Distrib. 2 (4) (2008) 556–565. E. Atashpaz-Gargari, C. Lucas, Imperialist competitive algorithm: An algorithm for optimization inspired by imperialistic competition, IEEE Congress Evol. Comput. 7 (2007) 4661–4667. C. Lucas, Z. Nasiri-Gheidari, F. Tootoonchian, Application of an imperialist competitive algorithm to the design of a linear induction motor, Energy Convers. Manage. 51 (7) (2010) 1407–1411. Q. Duan, S.L. Li, F.X. Bao, E.H. Twizell, Hermite interpolation by piecewise rational surface, Appl. Math. Comput. 198 (1) (2008) 59–72. C.C. Chang, C.J. Lin, LIBSVM—a library for support vector machines, [Online] Available: http://www.csie.ntu.edu.tw/∼cjlin/libsvm, 2001. Y.C. Xiao, Application study of support vector machine in transformer condition assessment, PhD thesis, Beijing Jiaotong University, Beijing, BJ, China, 2008 (in Chinese). R.J. Liao, Study on blackboard-style expert system and prediction model based on genetic algorithm for insulation fault of power transformer, PhD thesis, Chongqing University, Chongqing, CQ, China, 2003 (in Chinese).