Time series forecasting by neural networks: A knee point-based multiobjective evolutionary algorithm approach

Time series forecasting by neural networks: A knee point-based multiobjective evolutionary algorithm approach

Expert Systems with Applications 41 (2014) 8049–8061 Contents lists available at ScienceDirect Expert Systems with Applications journal homepage: ww...

2MB Sizes 0 Downloads 70 Views

Expert Systems with Applications 41 (2014) 8049–8061

Contents lists available at ScienceDirect

Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa

Time series forecasting by neural networks: A knee point-based multiobjective evolutionary algorithm approach Wei Du a, Sunney Yung Sun Leung a,⇑, Chun Kit Kwong b a b

Institute of Textiles and Clothing, The Hong Kong Polytechnic University, Hong Kong, China Department of Industrial and Systems Engineering, The Hong Kong Polytechnic University, Hong Kong, China

a r t i c l e

i n f o

Article history: Available online 15 July 2014 Keywords: Artificial neural network (ANN) Multiobjective evolutionary algorithm (MOEA) Time series forecasting (TSF) Knee point

a b s t r a c t In this paper, we investigate the problem of time series forecasting using single hidden layer feedforward neural networks (SLFNs), which is optimized via multiobjective evolutionary algorithms. By utilizing the adaptive differential evolution (JADE) and the knee point strategy, a nondominated sorting adaptive differential evolution (NSJADE) and its improved version knee point-based NSJADE (KP-NSJADE) are developed for optimizing SLFNs. JADE aiming at refining the search area is introduced in nondominated sorting genetic algorithm II (NSGA-II). The presented NSJADE shows superiority on multimodal problems when compared with NSGA-II. Then NSJADE is applied to train SLFNs for time series forecasting. It is revealed that individuals with better forecasting performance in the whole population gather around the knee point. Therefore, KP-NSJADE is proposed to explore the neighborhood of the knee point in the objective space. And the simulation results of eight popular time series databases illustrate the effectiveness of our proposed algorithm in comparison with several popular algorithms. Ó 2014 Elsevier Ltd. All rights reserved.

1. Introduction Artificial neural network (ANN) is a mathematical model consisting of a group of artificial neurons connecting with each other, the original idea of which comes from biological neural networks. An ANN is like a human’s brain, capable of computing and storing specific information. Because of its satisfactory ability of detecting and extracting nonlinear relationships of the given information, ANNs have been widely utilized in pattern recognition, image processing, data mining, time series forecasting and so on (Bhaduri, Stefanski, & Srivastava, 2011; Leung, Tang, & Wong, 2012; Tang, Gao, & Kurths, 2014;Tang & Wong, 2013; Wong, Leung, & Guo, 2012; Wong, Seng, & Ang, 2011; Xu, Cao, & Qiao, 2011; Zhang, Tang, Miao, & Du, 2013). Among all these applications, time series forecasting (TSF) is quite an intriguing one. Traditional TSF has been performed via statistical-based methods, such as exponential smoothing models (Gardner, 2006; Taylor, 2006) and autoregressive integrated moving average (ARIMA) models (Contreras, Espínola, Nogales, & Conejo, 2003), which are categorized as linear models. While in the past few decades, ANNs play a dominated role in TSF problems owing to their superior performance on regression and classification problems. The main difference that distinguishes ANNs from traditional ⇑ Corresponding author. E-mail address: [email protected] (S.Y.S. Leung). http://dx.doi.org/10.1016/j.eswa.2014.06.041 0957-4174/Ó 2014 Elsevier Ltd. All rights reserved.

methods is their ability of generating nonlinear relations hidden in time series data. Recently, a large number of empirical studies have indicated that ANNs have better performance than traditional methods on TSF problems (Hill, O’Connor, & Remus, 1996; Wong & Guo, 2010; Zhang, Cao, & Schniederjans, 2004). A variety of ANNs have been used in forecasting time series data. For instance, the single multiplicative neuron (SMN) model, a novel neural network introduced recently, was used for TSF (Yeh, 2013); the generalized regression neural network (GRNN) was proven effective in the prediction of time series data (Yan, 2012); radial basis function neural network (RBFNN) also performed well on TSF problems (Du & Zhang, 2008). Generally speaking, during the course of training, training error is the first objective to minimize through different learning algorithms (Yeh, 2013). While actually, there are other objectives that need to be optimized besides training error, like the number of hidden layer nodes and L2 -norm of the hidden layer weights (Goh, Teoh, & Tan, 2008). However, these objectives naturally conflict with each other, which means that the improvement of one objective may lead to deterioration of another (Zhou et al., 2011). In the past two decades, multiobjective evolutionary algorithms (MOEAs) have attracted much attention in the field of evolutionary computation, which is largely due to their capability of dealing with a multiobjective optimization problem as well as finding nondominated sets in a single run (Deb, 2002; Tang, Wang, Gao, Swift, & Kurths, 2012). Among them, nondominated sorting genetic

8050

W. Du et al. / Expert Systems with Applications 41 (2014) 8049–8061

algorithm-II (NSGA-II) (Deb, Pratap, Agarwal, & Meyarivan, 2002) is one of the most popular one. Although NSGA-II finds solutions with satisfactory convergence and diversity in most of the test problems, it gets into trouble when facing multimodal problems (Deb et al., 2002). We impute it to the genetic algorithm in NSGA-II, since the genetic algorithm does not utilize the useful information to explore the preferred region effectively. Recently, a new evolutionary algorithm called adaptive differential evolution (JADE) was proposed, which has been proven efficient and versatile in exploration and exploitation (Das & Suganthan, 2011). Therefore, considering that there are many local optimal solutions in TSF problems solved by ANNs (Zhang, Patuwo, & Hu, 1998), replacing the non-adaptive genetic algorithm part by JADE comes to the first incentive of this paper. The second incentive of this paper comes from the analysis of the prediction results of several time series databases. According to the results published so far, there are only a few results of applying MOEAs to tackle TSF problems (Chiam, Tan, & Mamun, 2007; Katagiri, Nishizaki, Hayashida, & Kadoma, 2012). And normally, their works have not deeply explored the distribution of the ANN which has the best forecast results in the whole population of ANNs. While in our paper, after employing the proposed nondominated sorting adaptive differential evolution (NSJADE) in several TSF problems, we find the individuals with the best results of forecasting problems mostly gather around the ‘‘corner’’ of corresponding Pareto front (PF). This phenomenon reminds us of the concept of Knee Point proposed in the past few years, which has been proven useful in the optimization of many engineering problems, like signalized intersection design, metal cutting process design, and so on (Branke, Deb, Dierolf, & Osswald, 2004; Das, 1999; Deb & Gupta, 2011). Therefore, to the best of the authors’ knowledge, this paper makes the first attempt to employ the concept of knee point in TSF problems, which will be manifested as a promising way to ensure both accuracy and reliability at the same time. When TSF problems were solved by ANNs, most previous research utilized single objective evolutionary algorithms to optimize ANNs (Gu, Zhu, & Jiang, 2011; Ren & Zhou, 2011; Wong et al., 2012). In their researches, training error was the only objective optimized by evolutionary algorithms; the ANN with the minimum training error was selected as the final network for the TSF problem. However, the features of the training samples do not represent the inherent underlying distribution of the new observations due to the existence of noise. So it is not reasonable to merely minimize the training error of ANNs when executing the forecasting. Fortunately, MOEA serves as a promising candidate to optimize ANNs in the TSF problems. As introduced before, according to the results published so far, there are only a few results of applying MOEAs to tackle TSF problems (Chiam et al., 2007; Katagiri et al., 2012). While in their works, they did not deeply explored the distribution of the ANN which has the best forecast results in the whole population of ANNs. However, this distribution information is quite important and helpful when solving the TSF problems, which will be utilized in our research to make the forecasting more accurate and reliable. Briefly, the contributions of this paper can be summarized as follows: (1) JADE is introduced in MOEAs, which delivers a promising performance in exploring the search space; (2) the knee point mechanism helps investigate the intrinsic properties of TSF problems, where our proposed KP-NSJADE guarantees both accuracy and reliability of the prediction of time series data. The organization of this paper is as follows. Some preliminaries of multiobjective optimization and a new NSJADE, are given in Section 2. In Section 3, NSJADE is used to optimize SLFNs, and experiments are performed to validate the effectiveness of NSJADE. In Section 4, the concept of knee point is introduced. And based on

it, KP-NSJADE, an improved version of NSJADE is proposed. In Section 5, eight benchmark time series data sets are predicted by KP-NSJADE-trained SLFNs; in addition, several experiments show the superiority of KP-NSJADE. Finally, conclusions are given in Section 6. 2. Background and methods In this section, we first give some preliminaries of multiobjective optimization, as well as our proposed NSJADE. In addition, for easy reading, some abbreviations used in this paper are introduced below. ANN: artificial neural network SLFN: single hidden layer feedforward neural network MOEA: multiobjective evolutionary algorithm DE: differential evolution JADE: adaptive differential evolution CoDE: composite differential evolution CLPSO: comprehensive learning particle swarm optimizer SPEA2: strength Pareto evolutionary algorithm 2 NSGA-II: nondominated sorting genetic algorithm II NSJADE: nondominated sorting adaptive differential evolution KP-NSJADE: knee point based nondominated sorting adaptive differential evolution TSF: time series forecasting PF: Pareto front ELM: extreme learning machine

2.1. Multiobjective optimization Nowadays, many real-world optimization problems involve various objectives that often conflict with each other. A multiobjective optimization problem can be formulated as follows (without any loss of generality, a minimization problem is considered with a decision space X):

minimize FðxÞ ¼ ðf1 ðxÞ; . . . ; fn ðxÞÞ s:t:

x 2 X;

ð1Þ

where X is a decision space and x 2 X is a decision vector. FðxÞ ¼ ðf1 ðxÞ; . . . ; fn ðxÞÞ is the objective vector with n objectives to be minimized. The objectives in (1) are conflicting pairs, which means that there is not a single solution optimizing all the objectives simultaneously. So it is necessary to seek a group of solutions that can balance all the objectives. Here we introduce the definitions of Pareto dominance, Pareto optimal solution and Pareto front. Definition 1. (Pareto Dominance): Given two objective vectors X 1 ; X 2 2 Rn , then X 1 dominates X 2 , denoted as X 1  X 2 , iff x1i 6x2i ; 8i 2 f1; 2; . . . ; ng and x1j < x2j ; 9j 2 f1; 2; . . . ; ng.

Definition 2. (Pareto Optimal Solution): A feasible solution x 2 X of (1) is called a Pareto optimal solution, iff 9 = x 2 X such that FðxÞ < Fðx Þ. Definition 3. (Pareto Front): The image of all the Pareto optimal solutions in the objective space is called the Pareto front (PF). Fig. 1 illustrates the dominance relationships between different solutions, where the solutions represented by closed blue circles are dominated by the solutions denoted by closed red squares.

W. Du et al. / Expert Systems with Applications 41 (2014) 8049–8061

2.2. NSGA-II NSGA-II was proposed in Deb et al. (2002), which is composed of a fast nondominated sorting approach and a crowded-comparison operator. These two features ensure both lower computation complexity and higher solution diversity of NSGA-II. In addition, an elitism mechanism is introduced to keep better individuals in the population. NSGA-II also avoids specifying a sharing parameter for the algorithm. It has been well recognized that NSGA-II behaves efficiently to handle multiobjective optimization problems.

8051

individual is also a loser and should be put in A. If we consider this situation, which means the crowding-distance values of these two individuals should be computed, the worst-case complexity of this operation is O (NPlog(2NP)). Therefore, in order to reduce the computation complexity of the algorithm, we merely calculate the dominance relationship between these two individuals, but not the crowding distance if they have the same rank value. In addition, this treatment guarantees the optional archive A containing ‘‘worse’’ individuals, which help increase the diversity of our algorithm.

2.3. JADE JADE was developed in Zhang and Sanderson (2009), which is dedicated to solving single objective optimization problems. This algorithm shows its effectiveness in both unimodal and multimodal functions. The strategy ‘‘DE/current-to-pbest’’ with optional archive helps JADE achieve good balance between ‘‘greedy’’ and ‘‘diverse’’. ‘‘DE/current-to-pbest’’ means in the mutation stage, any of the top 100p% (p2 ð0; 1) solutions can be randomly selected to play the role of the single best solution. While the optional archive, which stores the inferior individuals in the evolution process, aims to increase the diversity of the population. Moreover, at each generation, the crossover probability CR and the mutation factor F of each individual are adaptively generated according to a Normal distribution and a Cauchy distribution. Considering the advantages mentioned above, as shown in Zhang and Sanderson (2009), JADE has the ability of having a faster convergence speed and higher accuracy. 2.4. Nondominated sorting adaptive differential evolution (NSJADE) NSGA-II has exhibited its effectiveness in various multiobjective optimization problems and practical applications. However, in Deb et al. (2002), NSGA-II gets stuck at different local Pareto-optimal sets in the problem ZDT4, a multimodal test problem. This phenomenon inspires us to update the search engine of NSGA-II, which is a genetic algorithm with simulated binary crossover (SBX) operator and polynomial mutation. This search engine does not balance the exploration and exploitation in the search space very well; meanwhile, it does not take advantage of the useful information to explore the preferred region. Hence, JADE, a simple but powerful DE algorithm with effective adaptive mechanism, is elected as a new search engine in NSGA-II, while the other two efficient parts, i.e., the nondominated sorting approach and crowded-comparison operator, are retained. Here we first develop a nondominated sorting adaptive differential evolution (NSJADE). The pseudocode of NSJADE is given in Algorithm 1. First, the settings of parameters are given; and a population with NP is initialized. Then Nondominated_Sort function generates different levels of PFs according to different objectives of x, and the crowding-distance value is assigned to each individual x (Deb et al., 2002). From step 5 to step 32, it is the search engine part of our algorithm, which is similar to JADE except for the selection operator (From step 22 to step 31). If a parent individual is dominated by the offspring one, it will be added into the optional archive A. After examining the size of A and regenerating the values of lCR and lF , the algorithm Combine the parent and the offspring. Then the new generation is obtained after the Nondominated_Sort and Select_Elitism operations. Remark 1. From step 22 to step 24, we define an individual dominated by its offspring as a loser, and it will be put into A. Here we leave out a situation: the parent has the same rank with its offspring, but has a smaller crowding distance. According to the comparison rules defined in Deb et al. (2002), this parent

Algorithm 1. Nondominated sorting adaptive differential evolution (NSJADE) 1: Begin 2: Set lCR = 0.5; lF = 0.5; A = ; 3: Create a random initial population {xi;0 j i ¼ 1; 2; . . . ; NP} 4: x = Nondominated_SortðxÞ 5: for g = 1 to G do 6: SF ¼ ;; SCR ¼ ; 7: for i = 1 to NP do 8: Generate CRi ¼ randni ðlCR ; 0:1Þ; F i ¼ randci ðlF ; 0:1Þ 9: Randomly select xpbest;g from 100p% best individuals 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33:

Randomly select xr1;g – xi;g from current population P ~ r2;g – xr1;g – xi;g from P [ A Randomly select x v i;g ¼ xi;g þ F i  ðxpbest;g  xi;g Þ þ F i  ðxr1;g  x~r2;g Þ Check the boundary of v i;g Generate jrand ¼ randintð1; DÞ for j = 1 to D do if j ¼ jrand or randð0; 1Þ < CRi then uj;i;g ¼ v j;i;g else uj;i;g ¼ xj;i;g end if end for if f ðui;g Þ  f ðxi;g Þ then xi;g ! A; CRi ! SCR ; F i ! SF end if end for Randomly remove individuals from A to maintain size(A) = NP lCR ¼ ð1  cÞ  lCR þ c  meanA ðSCR Þ lF ¼ ð1  cÞ  lF þ c  meanL ðSF Þ xall ¼ Combineðx; uÞ xall ¼ Nondominated Sortðxall Þ x ¼ Select Elitismðxall Þ end for End

2.5. Experiments with test functions In Deb et al. (2002), 9 test functions were used to examine the performance of NSGA-II, including SCH, FON, POL, KUR, ZDT1-ZDT4 and ZDT6. Among them, ZDT4 is a multimodal problem, having 219 or 7:94ð1011 Þ different local Pareto optimal fronts in the search space. The result in Deb et al. (2002) shows that real-coded NSGA-II gets stuck at different local Pareto optimal sets. Our proposed NSJADE can easily find the global Pareto optimal front because of the modification of the search engine. Next we give the convergence effect of NSGA-II and NSJADE on ZDT4. For the integrity of the experiment, we also list the results of these two

8052

W. Du et al. / Expert Systems with Applications 41 (2014) 8049–8061

dominated solutions

forecast three common time series databases. Here, some parameters of the experiments are given in Table 1. Remark 2. As shown in Table 1, the initial range of the input weights of our SLFN is set between [0.25, 0.25]. Normally, this range is assigned in [1, 1] initially; however, [1, 1] is so wide that many networks with good performance can be easily disturbed during the evolution process, which drives us to limit the initial range of the input weights. In addition, after carrying out the experiments, we find that this change makes the convergence of the algorithm more stable, but does not affect the forecasting accuracy.

f2

nondominated solutions

3.2. Examples f1

Fig. 1. Illustration of the relationship between dominated and nondominated solutions.

algorithms on the rest ZDT functions. Both approaches run for 250 generations, and have the same population size 100. For NSGA-II, we use the same distribution indices for crossover and mutation operators as in Deb et al. (2002): gc ¼ 20; gm ¼ 20. For NSJADE, the parameters in JADE part are the same with Zhang and Sanderson (2009): the value of parameter c is 0.1; the size of the optional archive is the same as that of the population; the parameter p in pbest mechanism is assigned to 0.05; other parameters are provided in Algorithm 1. Fig. 2 shows the nondominated solutions obtained by NSGA-II and NSJADE on ZDT functions. From it, we find that on ZDT4 and ZDT6, the solutions acquired by NSGA-II are a little bit far from the true PF; while the results of NSJADE are quite satisfying. And for ZDT1-ZDT3, the convergence effect of the two algorithms is similar, where the solutions overlap the true PF of each test function.

In this subsection, three popular databases are used to test the forecasting performance of the SLFNs optimized by our proposed NSJADE. Example 1. In this example, we will use NSJADE-trained SLFN to forecast the Wolf’s sunspot numbers (DataMarket, 2013), which record sunspot cycles from 1700 to 1988. It is hard to forecast this time series data set because of its nonlinear, nonstationary and non-Gaussian. We predict the value at the point yðtÞ of the time series by using the former three consecutive values yðt  1Þ; yðt  2Þ and yðt  3Þ, which is also the common configuration of this problem (Leung, Lam, Ling, & Tam, 2003). The training set has 180 samples (i.e., 1705–1884), and the later 96 samples (i.e., 1885–1980) are used to test the performance of the algorithm.

Example 2. In this example, NSJADE-trained SLFN will be employed to predict the brightness observations at midnight of a variable star for 600 successive days (Time Series Data Library, 2013a). The first 299 observations (i.e., 4–302) are training samples, and the later 298 ones (i.e., 303–600) are test samples. The values yðt  1Þ; yðt  2Þ and yðt  3Þ are used to forecast the value at the point yðtÞ of the time series, as in Li and Hu (2012).

3. Time series forecasting by NSJADE-trained neural network 3.1. Experimental setup Since NSJADE is effective when solving multimodal problems and there are many local optimal solutions in TSF problems solved by ANNs (Zhang et al., 1998), here we use it to optimize ANNs for TSF problems. As mentioned in the Introduction, there are usually three objectives to be optimized: the minimization of training error, the minimization of the number of hidden nodes and the minimization of L2 -norm of hidden layer weights (Goh et al., 2008). In this paper, we conform to the common setting to fix the number of hidden nodes for each problem, and optimize the other two objectives. Meanwhile, we use single hidden layer feedforward networks (SLFNs) for TSF problems, since the universal approximation capability theorem (Hornik, Stinchcombe, & White, 1989) proves that a single hidden layer of neurons is totally enough for approximating any given function. To be specific, the learning algorithm for SLFNs is extreme learning machine (ELM), which was proposed in Huang, Zhu, and Siew (2006). When using ELM, the input weights and hidden biases are randomly generated. Thereafter, the output weights can be determined through finding the minimum norm least-square solution to the given network. And our MOEA is employed to optimize this ELM-based SLFN. Next, we will use the NSJADE-trained SLFN to

Example 3. Daily closing stock price of IBM will be forecasted by NSJADE-trained SLFN in this example. This data set contains daily closing price of IBM stock from Jan. 1st, 1980 to Oct. 8th, 1992 (Time Series Data Library, 2013b). We divide the data set into two parts: the first 1665 data (i.e., 4–1668) are used as training samples, the later 1665 data (i.e., 1669–3333) are test samples. We predict the value at the point yðtÞ of the time series by using the former three successive values yðt  1Þ; yðt  2Þ and yðt  3Þ. 3.3. Results and analysis For the above three examples, all the values are normalized between 0.1 and 0.9 (Yeh, 2013), since data normalization can speed up training process of neural networks. The SLFNs in the experiment are all firstly learned by ELM. Then we compare the performance of NSJADE, NSGA-II, JADE and the case without evolutionary algorithms on them. All experiments are implemented for 50 runs; the mean MSE (i.e., solution quality) and the standard deviation of MSE (i.e., robustness) of different algorithms are given in Table 2, represented by M MSE and S MSE separately. The subscripts tr and te mean training set and test set. ‘‘–’’ in Table 2 indicates the case without any evolutionary algorithm.

8053

W. Du et al. / Expert Systems with Applications 41 (2014) 8049–8061 1.4

1.4 NSGA−II True Pareto Front NSJADE

1.2

NSGA−II True Pareto Front NSJADE

1.2

0.8

0.8 f2

1

f2

1

0.6

0.6

0.4

0.4

0.2

0.2

0

0

0.2

0.4

0.6

0.8

0

1

0

0.2

0.4

0.6

f1

0.8

1

f1

(a)

(b)

1.2

1.8

NSGA−II True Pareto Front NSJADE

1

NSGA−II True Pareto Front NSJADE

1.6

0.8

1.4

0.6

1.2

0.4 f2

f2

1 0.2

0.8

0

0.6

−0.2 −0.4

0.4

−0.6

0.2

−0.8

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0

0.9

f1

0

0.2

0.4

0.6

0.8

1

f1

(d)

(c) 1.4

NSGA−II True Pareto Front NSJADE

1.2

1

f2

0.8

0.6

0.4

0.2

0 0.2

0.3

0.4

0.5

0.6 f1

0.7

0.8

0.9

1

(e) Fig. 2. Comparison of nondominated solutions with NSGA-II & NSJADE on ZDT functions. (a) ZDT1; (b) ZDT2; (c) ZDT3; (d) ZDT4; (e) ZDT6.

Remark 3. Among the four algorithms in Table 2, NSJADE and NSGA-II are multiobjective algorithms, while JADE is a single objective one. As discussed in the previous section, the mechanism of multiobjective algorithms is different from single objective ones – that is, for multiobjective algorithms, a set of solutions are obtained at last, while we only choose the optimal solution when

Table 1 The parameters of the experiments. Parameter

Value

Population size Generation Initial range of the input weights Number of hidden nodes Activation function

100 1000 [0.25, 0.25] 30 Sigmoid

using single objective algorithms. Therefore, in order to unify the metrics, for NSJADE and NSGA-II in this experiment, we average the MSE values of training sets of 100 individuals at first, then acquire the mean and standard deviation of these values in 50 runs, represented by M MSEtr and S MSEtr , respectively. Meanwhile, the same operation is executed on test sets, hence we get M MSEte and S MSEte . And for JADE in the experiment, we only need to utilize the optimal solution, and calculate the mean and standard deviation of the values obtained in 50 runs, both of training samples and test samples. According to the results given above, we can see that M MSEte and S MSEte in the three examples are the smallest when using our proposed NSJADE to train the SLFN; while the situation without any evolutionary algorithm has the poorest performance on all of three examples, which shows us the necessity of using evolutionary algorithms. Moreover, in Examples 1 and 3, though NSJADE

8054

W. Du et al. / Expert Systems with Applications 41 (2014) 8049–8061

Table 2 Performance comparison of different algorithms. Data set

Method

M MSEtr

S MSEtr

M MSEte

S MSEte

Sunspot

NSJADE NSGA-II JADE –

2.0767E03 2.0961E03 1.9593E03 2.2512E03

1.0513E05 1.6381E05 6.7057E06 2.1673E05

1.9995E03 2.0071E03 2.0212E03 3.4647E03

2.3541E05 4.9025E05 6.2096E05 0.0218

Star

NSJADE NSGA-II JADE –

1.4066E04 1.4087E04 1.4190E04 1.4295E04

3.2359E07 3.3187E07 4.1198E07 4.7609E07

1.4389E04 1.4401E04 1.4838E04 1.5567E04

1.5601E07 2.1956E07 1.2097E06 7.3381E07

IBM

NSJADE NSGA-II JADE –

6.4083E05 6.4126E05 6.3716E05 6.4606E05

4.9835E08 6.9841E08 5.9044E08 1.1733E07

1.1680E04 1.1683E04 1.1698E04 9.4441E04

2.3192E07 4.0916E07 7.6204E07 4.8737E04

does not have the smallest M MSE or S MSE of training samples, the best test results in last two columns reflect the advantage of optimizing two objectives, ensuring both accuracy and generalization of the network. Remark 4. From the results shown in Table 2, we can see that NSJADE and NSGA-II have better forecasting performance than JADE, which exhibits that multiobjective evolutionary algorithms are quite competitive for TSF problems when compared with single objective ones. Next, we plot the exact location of the individual with the minimum test MSE in each run according to these three TSF problems optimized by NSJADE, as shown in Fig. 3. From Fig. 3, we find that these three figures exhibit an interesting phenomenon: most of the individuals with the minimum test MSE value (i.e., the red dots) are around the ‘‘corner’’ of the PF curves (i.e., the green dots) when trained by NSJADE. Therefore, in the next subsection, we will use more common databases, and examine whether this phenomenon will happen to them; if so, our proposed NSJADE can be modified to have better forecasting performance on these time series data sets. 3.4. Exploration on other time series data sets In this subsection, we use five other time series data sets, which appeared in Yeh (2013), to explore whether the intriguing phenomenon mentioned above also happens to them. The five time series data sets are Mackey–Glass chaotic time series (MG), Box– Jenkins gas furnace (BJ), electroencephalogram data (EEG), laser generated data (LGD) and computer generated series (CGS). It is worth mentioning that the above five examples have the same inputs, training set and test set as those in Yeh (2013). First of all, we briefly introduce these five time series data sets. MG is based on the Mackey–Glass equation, which is a nonlinear time delay differential equation (Mackey & Glass, 1977). We use the values of yðt  1Þ; yðt  7Þ; yðt  13Þ and yðt  19Þ to forecast the value of yðtÞ. The training data contain 450 samples, and the test data consist of 500 samples. BJ was recorded from the combustion process of a methane–air mixture (Time Series Data Library, 2013c). There are 296 pairs of inputs yðtÞ and uðtÞ : yðtÞ is the CO2 concentration (output), uðtÞ is the gas flowing rate (input). We use yðt  1Þ and uðt  4Þ to estimate yðtÞ, which is the best input combination after trial and error. The training data consist of 140 samples, and the test set has 150 samples. EEG is a series of electroencephalogram data recorded by Zak Keirn for his Master’s thesis at Purdue University (Computer Science News, 2011). This data set cannot be predicted by linear models. We predict the value of yðtÞ from the earlier points yðt  1Þ; yðt  2Þ; yðt  4Þ and yðt  8Þ. 150 samples are used as training set, and another 150 samples are used as test set.

LGD and CGS are two of the databases presented in the Santa Fe Time Series Competition (The Santa Fe Time Series Competition Data, 1991). LGD was recorded from a Far-Infrared-Laser, which is a univariate time series (The Santa Fe Time Series Competition Data, 1991). The training set has 400 samples, and 400 samples are used to test the performance of the algorithm. CGS was generated artificially, which has the following features: synthetically generated, relatively high-dimensional dynamics, long data sets, no background information, finite states and drifting parameters (The Santa Fe Time Series Competition Data, 1991). The training data consist of 450 samples, and the test data have 500 samples. We predict the value of yðtÞ from the earlier points yðt  1Þ; yðt  2Þ; yðt  4Þ and yðt  8Þ for both LGD and CGS. Now, we plot the location of the individual with the best test performance in each PF of these five TSF problems optimized by NSJADE, as shown in Fig. 4. All the values of the five time series data sets are normalized between 0.1 and 0.9, as in Yeh (2013). And the parameters of the experiments are the same as those in Table 1. According to Fig. 4, we find that these five figures do exhibit the phenomenon that happens to the previous three databases: most of the individuals with the minimum test MSE value (i.e., the red dots) are around the ‘‘corner’’ of the PF curves (i.e., the green dots) when trained by NSJADE. Among them, the first four figures (e.g., Fig. 4(a)–(d)) are extremely obvious. Remark 5. Figs. 3 and 4 offer the distribution information of the best individual of each run in the Pareto front obtained by NSJADE of the eight databases. This deeply-explored information will be utilized for upgrading our previous NSJADE, which will be more proper for solving the given TSF problems.

4. Knee point based nondominated sorting adaptive differential evolution (KP-NSJADE) From Figs. 3 and 4 in Section 3, the phenomenon exhibited reminds us of the concept of Knee Point (Branke et al., 2004; Das, 1999; Deb & Gupta, 2011), the use of which in TSF problems has been widely overlooked. Therefore, in this section, we intend to narrow the distribution range of the individuals along the PF curves obtained by our proposed NSJADE, which guarantees both accuracy and robustness of TSF problems. 4.1. Knee point A knee point, or a knee solution, usually occurs in a multiobjective optimization problem. Generally speaking, it is a kind of point that locates at the ‘‘corner’’ of the PF in a biobjective optimization problem. Fig. 5 illustrates the knee point in the whole PF: the filled red circle is the knee point, while other empty blue circles are the

8055

W. Du et al. / Expert Systems with Applications 41 (2014) 8049–8061 8

4

8

x 10

2

x 10

1.8

3.5

1.6 3 1.4 1.2

2

1

f2

f2

2.5

0.8

1.5

0.6 1 0.4 0.5 0 1.9

0.2 1.95

2

2.05

2.1

2.15

2.2

2.25

2.3

f1

0 1.38

2.35

1.39

1.4

1.41

1.42 f1

−3

x 10

(a)

(b)

1.43

1.44

1.45

1.46 −4

x 10

8

6

x 10

5

f2

4

3

2

1

0 6.34

6.36

6.38

6.4

6.42

6.44 f1

6.46

6.48

6.5

6.52 −5

x 10

(c) Fig. 3. Distribution of the best individual of each run in the PF obtained by NSJADE of Sunspot, Star Brightness and IBM Stock Price data sets (the red dots are the individuals with the minimum test MSE value, the green dots form the PF curves). (a) Sunspot; (b) Star Brightness; (c) IBM Stock Price. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

solutions forming the PF of the problem; the front looks like a bent leg (need to turn the curve upside down) and the point lies on the knee part of the leg. There are several definitions of a knee point (Branke et al., 2004; Das, 1999; Deb & Gupta, 2011). Here we introduce the one proposed most recently: bend-angle based knee point (Deb & Gupta, 2011), which is simple but useful. Definition 4. (Bend-Angle): For a given Pareto optimal point x, two other Pareto optimal points xL and xR , left and right of x are provided. The bend angle at x is defined as follows:

hðx; xL ; xR Þ ¼ hL  hR ;

ð2Þ

where

f2 ðxL Þ  f2 ðxÞ ; f1 ðxÞ  f1 ðxL Þ f2 ðxÞ  f2 ðxR Þ : hR ¼ arctan f1 ðxR Þ  f1 ðxÞ

hL ¼ arctan

ð3Þ ð4Þ

From the definition above, the bend-angle of a point can be calculated by any two points left and right of itself. And the point in the Pareto front with the largest positive bend-angle is called bend-angle based knee point. 4.2. Knee point based nondominated sorting adaptive differential evolution (KP-NSJADE) In this part, we will modify our former proposed NSJADE based on knee point, hoping the new algorithm converge to the region around the knee point of the optimization problem. From the definition of a knee point in Definition 4, the purpose of the new algorithm is to find the point with the maximum bendangle as well as other points around it in the optimal PF. So there

are two steps for the algorithm: the first one is to converge to the optimal PF as near as possible; the second one is to converge around the knee point in this optimal PF. Referring to Algorithm 1, from step 30 to step 31, Nondominated_Sort and Select_Elitism functions sort the individuals according to different objectives, and the crowding-distance value. Therefore, it is feasible to replace sorting according to the crowding-distance value by sorting according to the bend-angle value in our further improved algorithm. Here we get a knee point based nondominated sorting adaptive differential evolution (KP-NSJADE). The main body of KP-NSJADE is similar to Algorithm 1, except for the modifications in Nondominated_Sort function, since we prefer the individuals with larger bend-angle values. Remark 6. The second criterion in the sorting process of KP-NSJADE is bend-angle value, which means the value of each individual should be calculated in each generation. When computing bend-angle value in Nondominated_Sort function, the objective values need to be normalized at first. Then the bendangle value of every individual is counted using any two individuals left and right of it. If none is found left or right, in the case of the two boundary points of the PF, we assign 1 to the bend-angle values of them. This assignment is feasible since individuals with large bend-angle values are preferred in the evolution and the boundary individuals are normally unlikely to be the knee points.

Remark 7. Note that in Branke et al. (2004), the authors used another measure to search the knee point, which needs the two or four closest neighbors left and right the target point. The use of the bend-angle in KP-NSJADE merely requires any given points left and right of the target one, which is much more convenient for the operation.

8056

W. Du et al. / Expert Systems with Applications 41 (2014) 8049–8061 4

15

10

x 10

8

x 10

7 6 10

f2

f2

5 4 3

5

2 1 0

2

3

4

5

6

0 1.86

7

f1

1.88

1.9

(a)

1.94

1.96 f1

1.98

2

2.02

2.04 −4

x 10

(b)

5

10

1.92

−6

x 10

6

x 10

2.5

x 10

9 8

2

7 1.5

5

f2

f2

6

4

1

3 2

0.5

1 0

5.4

5.6

5.8

6 f1

6.2

6.4

6.6

0

6.8

0

1

2 f1

−3

x 10

(c)

3 −4

x 10

(d) 5

15

x 10

f2

10

5

0 2.8

2.9

3

3.1

3.2 f1

3.3

3.4

3.5

3.6 −4

x 10

(e) Fig. 4. Distribution of the best individual of each run in the PF obtained by NSJADE of MG, BJ, EEG, LGD and CGS (the red dots are the individuals with the minimum test MSE value, the green dots form the PF curves). (a) MG; (b) BJ; (c) EEG; (d) LGD; (e) CGS. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

5. Time series forecasting by KP-NSJADE-trained neural network In this section, five benchmark time series data sets, as well as the three databases in Section 3, are utilized to test the performance of our proposed algorithm: KP-NSJADE. Meanwhile, we compare the forecasting performance of KP-NSJADE and some other popular algorithms, and then investigate the convergence trend of KP-NSJADE; later, the results and the related analyzes are given. 5.1. Experimental setup Here we introduce the experimental setup of TSF by KPNSJADE-trained neural network. We also apply SLFNs to the experiment; and the learning algorithm is ELM, as that in Section 3. The other parameters are the same as those in Table 1. The time series

data sets to be predicted are MG, BJ, EEG, LGD, CGS, Sunspot, Star and IBM, which have been introduced in Section 3. Likewise, the values of them are normalized between 0.1 and 0.9. 5.2. Comparison of NSJADE and KP-NSJADE NSJADE and KP-NSJADE are involved in the problem of predicting eight time series data sets first. All the experiments are implemented for 50 runs. In Table 3, we record the values of M MSEtr ; S MSEtr ; M MSEte , and S MSEte by KP-NSJADE in the 250th, 500th, 750th and 1000th generations of the evolution process, which are used to compare with the corresponding results by NSJADE. At each generation, the best M MSEte value of each database is shown in boldface. For Star and IBM, KP-NSJADE shows better effect than NSJADE in the all four generations. For MG, BJ, EEG, LGD and Sunspot, KP-NSJADE outperforms NSJADE in the three of the four generations. While for CGS, KP-NSJADE is no

8057

W. Du et al. / Expert Systems with Applications 41 (2014) 8049–8061

9

Table 4 Scores of NSJADE and KP-NSJADE.

8

Method

7 KP-NSJADE NSJADE

6 5

Total

250th

500th

750th

1000th

5 3

6 2

7 1

5 3

23 9

f2

In order to show the effect of KP-NSJADE more clearly, we score the performance of these two algorithms at each generation: the one with the better M MSEte value gets 1 point, while the other gets 0. From Table 4, we can see that at each generation, KP-NSJADE outperforms NSJADE, and the total score is 23 to 9.

4 3 2 1 0

Generation

0

1

2

3

4

5

6

7

8

9

f1 Fig. 5. Illustration of the keep point (the filled red circle is the knee point, the empty blue circles are the solutions forming the PF of the problem). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Remark 8. Our purpose of improving NSJADE is to further reduce the M MSEte values in the forecasting problems, since getting the minimum test MSE values is always the objective researchers pursue. From the results in Table 3 and the analysis above, KP-NSJADE has achieved this goal. Therefore, although the S MSEte values of the eight examples by KP-NSJADE are not always better than those by NSJADE, the performance of KP-NSJADE on M MSEte values is satisfying. 5.3. Convergence analysis of KP-NSJADE

better than NSJADE; from Fig. 4(e), we can find the reason which has been mentioned before: the phenomenon that the individuals with the best M MSEte values gather around the knee point is not more obvious than other four databases.

Then we investigate the convergence trend of KP-NSJADE on different data sets (i.e., MG, BJ, EEG, LGD and CGS) by plotting the convergence curves in the 250th, 500th, 750th and 1000th

Table 3 Comparison of NSJADE and KP-NSJADE. Data set

Method

M MSEtr

MG

KP-NSJADE NSJADE

250th Generation 3.6332E06 1.9802E07 3.5932E06 2.5231E07

S MSEtr

BJ

KP-NSJADE NSJADE

1.9256E04 1.9311E04

EEG

KP-NSJADE NSJADE

LGD

M MSEte

S MSEte

M MSEtr

M MSEte

S MSEte

2.3721E06 2.3581E06

1.0461E07 9.1302E08

500th Generation 3.2248E06 2.3386E07 3.3248E06 1.6570E07

S MSEtr

2.2687E06 2.2856E06

1.1987E07 8.4918E08

1.4118E06 7.1968E07

3.9699E04 3.9796E04

1.0889E06 6.5377E07

1.9099E04 1.9201E04

6.1819E07 3.5613E07

3.9622E04 3.9748E04

2.3764E06 7.1319E07

5.7652E03 5.8802E03

1.0153E04 6.3295E05

4.4778E03 4.4777E03

8.3185E05 3.8545E05

5.6424E03 5.8091E03

3.3641E05 5.5457E05

4.4743E03 4.4793E03

9.8273E05 4.2989E05

KP-NSJADE NSJADE

8.1985E05 8.3901E05

2.0498E06 6.7307E06

3.7034E04 3.7521E04

1.5172E05 9.6164E06

7.9736E05 7.8930E05

2.0123E06 8.3485E06

3.7365E04 3.7447E04

1.8504E05 1.0332E05

CGS

KP-NSJADE NSJADE

3.2416E04 3.1978E04

4.3111E06 4.8643E06

6.7796E04 6.7728E04

8.1675E06 6.2446E06

3.1514E04 3.1319E04

4.8583E06 2.4777E06

6.8852E04 6.8002E04

1.4069E05 7.1246E06

Sunspot

KP-NSJADE NSJADE

2.1878E03 2.1577E03

1.8735E05 3.7742E05

2.0024E03 2.0080E03

2.5892E05 2.1741E05

2.1286E03 2.1017E03

4.1360E05 2.0728E05

2.0074E03 2.0057E03

5.7699E05 2.0658E05

Star

KP-NSJADE NSJADE

1.4173E04 1.4166E04

2.7965E07 3.5502E07

1.4402E04 1.4407E04

1.7321E07 1.2425E07

1.4107E04 1.4113E04

2.1499E07 3.1147E07

1.4387E04 1.4392E04

2.1874E07 1.5593E07

IBM

KP-NSJADE NSJADE

6.4394E05 6.4361E05

6.4521E08 8.9465E08

1.1660E04 1.1663E04

2.5136E07 1.9011E07

6.4226E05 6.4164E05

1.2117E07 1.0312E07

1.1664E04 1.1680E04

6.0677E07 2.6932E07

MG

KP-NSJADE NSJADE

750th Generation 3.0137E06 2.3948E07 3.1819E06 1.6619E07

2.2063E06 2.2082E06

1.3391E07 9.3583E08

1000th Generation 2.8880E06 2.2349E07 3.1054E06 1.6171E07

2.1354E06 2.1489E06

1.4356E07 7.5395E08

BJ

KP-NSJADE NSJADE

1.9088E04 1.9172E04

5.4362E07 3.4448E07

3.9726E04 3.9745E04

3.3554E06 6.6327E07

1.9088E04 1.9148E04

1.9569E07 3.4153E07

3.9741E04 3.9733E04

3.1748E06 5.4844E07

EEG

KP-NSJADE NSJADE

5.6150E03 5.7802E03

3.7731E05 4.1346E05

4.4727E03 4.4747E03

1.1013E04 4.1955E05

5.5927E03 5.7621E03

5.9128E05 3.4214E05

4.4652E03 4.4665E03

1.3940E04 4.3000E05

LGD

KP-NSJADE NSJADE

7.8032E05 7.5422E05

2.4321E06 4.4600E06

3.7533E04 3.7554E04

2.3646E05 8.8047E06

7.6530E05 7.6371E05

3.6018E06 1.1526E05

3.7814E04 3.7720E04

2.7511E05 1.1526E05

CGS

KP-NSJADE NSJADE

3.1297E04 3.1158E04

4.9575E06 2.1249E06

6.9219E04 6.8158E04

1.3308E05 6.1746E06

3.0747E04 3.1084E04

6.0645E06 2.6977E06

6.9285E04 6.8153E04

1.2054E05 6.4867E06

Sunspot

KP-NSJADE NSJADE

2.0740E03 2.0898E03

4.4914E05 1.6351E05

1.9974E03 1.9977E03

6.5796E05 2.2212E05

2.0607E03 2.0767E03

4.8623E05 1.0513E05

1.9975E03 1.9995E03

7.6175E05 2.3541E05

Star

KP-NSJADE NSJADE

1.4083E04 1.4065E04

1.6705E07 3.6653E07

1.4378E04 1.4385E04

1.7301E07 1.9379E07

1.4075E04 1.4066E04

1.7215E07 3.2359E07

1.4380E04 1.4389E04

2.2153E07 1.5601E07

IBM

KP-NSJADE NSJADE

6.4037E05 6.4112E05

1.5792E07 5.2842E08

1.1672E04 1.1674E04

7.4501E07 3.1645E07

6.3959E05 6.4083E05

1.6988E07 4.9835E08

1.1668E04 1.1680E04

5.8410E07 2.3192E07

8058

W. Du et al. / Expert Systems with Applications 41 (2014) 8049–8061

generations in Fig. 6. All the experiments are implemented for 50 runs. For a better comparison, in Fig. 6, we also plot the convergence curve in the 1000th generation by NSJADE, as in Fig. 4. From Fig. 6, we find that the improved algorithm does help converge to the knee point along the PF of each problem and the speed is very fast. Moreover, from Fig. 6, we can explain from a different perspective the variation trend of the M MSEte values in different generations optimized by KP-NSJADE shown in Table 3. Due to the different features of these five time series data sets, the convergence conditions are not the same. Overall, as the points in the PF curve gradually approach the knee point, the forecasting performance of test sets is varying accordingly. For every time series data set, there is a span in its PF curve obtained by NSJADE in the 1000th generation. If the range by KP-NSJADE is too small (like LGD in the 1000th generation) or too large (like MG in the 250th, 500th,750th generations, as well as EEG in the 250th generation), the forecasting effect of test sets is not better than that by NSJADE in the 1000th generation.

5.4. Comparison of KP-NSJADE and other common algorithms In this part, we compare the forecasting performance of our proposed KP-NSJADE with those of some popular evolutionary algorithms, including CLPSO (Liang, Qin, Suganthan, & Baskar, 2006), CoDE (Wang, Cai, & Zhang, 2011), SPEA2 (Zitzler, Laumanns, & Thiele, 2001) and NSGA-II (Deb et al., 2002). The parameters of the later four algorithms used in this paper are the same as those of the original papers where they were published. And all the experiments are implemented for 50 runs. The results are shown in Table 5, where the smallest values of M MSEte at each generation by different algorithms are shown in boldface. From Table 5, we can find that for BJ, EEG and IBM, KP-NSJADE outperforms the other four algorithms in M MSEte values in the 250th, 500th, 750th and 1000th generations. For MG, in the 250th and 500th generations, CoDE performs the best in M MSEte ; while in the later 750th and 1000th generations, KP-NSJADE shows its superiority. Note that for MG, the value of M MSEte obtained by KP-NSJADE in the 1000th generation is much smaller than any

4

16

10

x 10

8 NSJADE 1000 gens KP−NSJADE 250 gens KP−NSJADE 500 gens KP−NSJADE 750 gens KP−NSJADE 1000 gens

14 12

6 5

8

4

6

3

4

2

2

1

1

2

3

4

5

6

7

0 1.86

8

1.88

1.9

1.92

1.94

1.96

1.98

2

2.02

−6

(a)

2.04 −4

x 10

x 10

(b) 6

5

18

NSJADE 1000 gens KP−NSJADE 250 gens KP−NSJADE 500 gens KP−NSJADE 750 gens KP−NSJADE 1000 gens

7

10

0

x 10

x 10

3 NSJADE 1000 gens KP−NSJADE 250 gens KP−NSJADE 500 gens KP−NSJADE 750 gens KP−NSJADE 1000 gens

16 14

x 10

NSJADE 1000 gens KP−NSJADE 250 gens KP−NSJADE 500 gens KP−NSJADE 750 gens KP−NSJADE 1000 gens

2.5

12

2

10 1.5 8 6

1

4 0.5 2 0

5.4

5.6

5.8

6

6.2

6.4

6.6

0

6.8

0

1

2

3

−3

4

5

6 −4

x 10

x 10

(c)

(d) 5

10

x 10

NSJADE 1000 gens KP−NSJADE 250 gens KP−NSJADE 500 gens KP−NSJADE 750 gens KP−NSJADE 1000 gens

9 8 7 6 5 4 3 2 1 0 2.8

3

3.2

3.4

3.6

3.8 −4

x 10

(e) Fig. 6. The convergence trend of KP-NSJADE on MG, BJ, EEG, LGD and CGS. (a) MG; (b) BJ; (c) EEG; (d) LGD; (e) CGS.

8059

W. Du et al. / Expert Systems with Applications 41 (2014) 8049–8061 Table 5 Comparison of KP-NSJADE and other algorithms. Data set

Method

M MSEtr

S MSEtr

M MSEte

S MSEte

M MSEtr

S MSEtr

M MSEte

S MSEte

MG

KP-NSJADE CLPSO CoDE SPEA2 NSGA-II

250th Generation 3.6332E06 4.1969E06 3.3552E06 3.7604E06 3.6714E06

1.9802E07 1.1932E07 8.9659E08 9.5038E08 1.8737E07

2.3721E06 2.7458E06 2.3001E06 2.3937E06 2.4303E06

1.0461E07 4.1498E07 1.9024E07 5.1590E08 1.1357E07

500th Generation 3.2248E06 4.1143E06 3.1898E06 3.6515E06 3.5624E06

2.3386E07 1.4544E07 1.1651E08 9.9985E08 1.8533E07

2.2687E06 2.7439E06 2.2660E06 2.3635E06 2.4200E06

1.1987E07 3.9319E07 1.6811E07 5.6903E08 1.0847E07

BJ

KP-NSJADE CLPSO CoDE SPEA2 NSGA-II

1.9256E04 1.9405E04 1.9015E04 1.9390E04 1.9255E04

1.4118E06 1.4791E06 1.2095E07 7.1559E07 5.0912E07

3.9699E04 4.0369E04 4.0015E04 3.9785E04 3.9832E04

1.0889E06 3.3679E06 3.0603E06 8.5527E07 1.7847E06

1.9099E04 1.9409E04 1.9000E04 1.9383E04 1.9226E04

6.1819E07 1.3369E06 9.3396E08 8.0663E07 4.3659E07

3.9622E04 4.0301E04 3.9984E04 3.9772E04 3.9794E04

2.3764E06 3.2494E06 2.7206E06 8.7129E07 1.6757E06

EEG

KP-NSJADE CLPSO CoDE SPEA2 NSGA-II

5.7652E03 5.7561E03 5.6245E03 5.9907E03 5.7715E03

1.0153E04 4.5440E05 1.7190E05 5.2506E05 6.6135E05

4.4778E03 4.4828E03 4.4852E03 4.4791E03 4.4802E03

8.3185E05 8.9111E05 1.0162E04 2.9911E05 6.4171E05

5.6424E03 5.7402E03 5.6094E03 5.9906E03 5.7544E03

3.3641E05 4.7069E05 1.7479E05 5.0325E05 7.4241E05

4.4743E03 4.4873E03 4.4795E03 4.4778E03 4.4840E03

9.8273E05 1.1944E04 1.0741E04 3.0097E05 6.8513E05

LGD

KP-NSJADE CLPSO CoDE SPEA2 NSGA-II

8.1985E05 8.7088E05 7.5384E05 9.0077E05 8.0920E05

2.0498E06 3.0264E06 2.4880E06 3.3000E06 5.1792E06

3.7034E04 3.8425E04 3.7868E04 3.7807E04 3.7072E04

1.5172E05 2.6211E05 2.0989E05 6.3798E06 1.2522E05

7.9736E05 8.5790E05 7.3248E05 8.7783E05 7.9247E05

2.0123E06 2.7708E06 2.7986E06 4.1492E06 6.4660E06

3.7365E04 4.0201E04 3.7529E04 3.7773E04 3.7121E04

1.8504E05 4.8950E05 1.7416E05 6.0596E06 1.4496E05

CGS

KP-NSJADE CLPSO CoDE SPEA2 NSGA-II

3.2416E04 3.2819E04 3.1167E04 3.2527E04 3.1418E04

4.3111E06 3.2583E06 2.3635E06 1.8967E06 4.8276E06

6.7796E04 6.7676E04 6.7595E04 6.7874E04 6.7370E04

8.1675E06 9.4921E06 1.2956E05 3.3904E06 1.0384E05

3.1514E04 3.2608E04 3.0875E04 3.2330E04 3.1101E04

4.8583E06 3.1101E06 2.8306E06 1.9360E06 4.2927E06

6.8852E04 6.7524E04 6.7552E04 6.7975E04 6.7483E04

1.4069E05 1.0229E05 1.0190E05 3.3333E06 1.3297E05

Sunspot

KP-NSJADE CLPSO CoDE SPEA2 NSGA-II

2.1878E03 2.1516E03 2.0294E04 2.1694E03 2.1135E03

1.8735E05 2.1847E05 1.3071E05 8.8915E06 2.0040E05

2.0024E03 2.0132E03 2.0090E03 2.0043E03 2.0083E03

2.5892E05 6.5902E05 6.6220E05 1.0344E05 4.9890E05

2.1286E03 2.1419E03 2.0093E04 2.1624E03 2.1009E03

4.1360E05 2.0440E05 1.3031E05 1.0739E05 2.2086E05

2.0074E03 2.0180E03 2.0066E03 2.0009E03 2.0098E03

5.7699E05 8.6720E05 5.8891E05 1.0206E05 4.3891E05

Star

KP-NSJADE CLPSO CoDE SPEA2 NSGA-II

1.4173E04 1.4141E04 1.4023E04 1.4179E04 1.4116E04

2.7965E07 2.7694E07 2.1886E07 9.6670E08 2.9763E07

1.4402E04 1.4431E04 1.4393E04 1.4407E04 1.4403E04

1.7321E07 3.6145E07 5.2833E07 6.1963E08 2.2026E07

1.4107E04 1.4133E04 1.4004E04 1.4174E04 1.4101E04

2.1499E07 2.0864E07 2.3886E07 8.3472E08 3.5711E07

1.4387E04 1.4428E04 1.4390E04 1.4405E04 1.4395E04

2.1874E07 3.5512E07 6.7355E07 5.8404E08 2.3458E07

IBM

KP-NSJADE CLPSO CoDE SPEA2 NSGA-II

6.4394E05 6.4277E05 6.3968E05 6.4337E05 6.4186E05

6.4521E08 7.6441E08 4.1931E08 2.1381E08 5.5515E08

1.1660E04 1.1710E04 1.1668E04 1.1663E04 1.1680E04

2.5136E07 8.5915E07 8.8622E07 1.7246E07 3.5710E07

6.4226E05 6.4237E05 6.3945E05 6.4314E05 6.4161E05

1.2117E07 6.6909E08 3.2940E08 2.0566E08 7.2881E08

1.1664E04 1.1696E04 1.1679E04 1.1672E04 1.1676E04

6.0677E07 1.0122E06 5.4348E07 1.4087E07 4.2160E07

MG

KP-NSJADE CLPSO CoDE SPEA2 NSGA-II

750th Generation 3.0137E06 4.0665E06 3.1269E06 3.5879E06 3.5098E06

2.3948E07 1.5168E07 1.0799E07 9.6067E08 1.9817E07

2.2063E06 2.7810E06 2.2458E06 2.3416E06 2.3702E06

1.3391E07 9.3208E07 1.4685E07 5.4373E08 1.3571E07

1000th Generation 2.8880E06 2.2349E07 4.0438E06 1.2722E07 3.1030E06 1.0335E07 3.5302E06 9.8439E08 3.4557E06 2.4183E07

2.1354E06 2.7797E06 2.2054E06 2.3280E06 2.3570E06

1.4356E07 4.4527E07 1.4853E07 5.2827E08 1.3521E07

BJ

KP-NSJADE CLPSO CoDE SPEA2 NSGA-II

1.9088E04 1.9348E04 1.8992E04 1.9367E04 1.9215E04

5.4362E07 1.6316E06 9.6766E08 7.2482E07 4.3604E07

3.9726E04 4.0310E04 3.9910E04 3.9755E04 3.9763E04

3.3554E06 2.9279E06 2.7003E06 7.6163E07 1.6924E06

1.9088E04 1.9369E04 1.8985E04 1.9353E04 1.9217E04

1.9569E07 1.4220E06 9.0342E08 6.8349E07 5.2025E07

3.9741E04 4.0336E04 3.9995E04 3.9743E04 3.9770E04

3.1748E06 3.1448E06 2.5841E06 6.4610E07 1.7718E06

EEG

KP-NSJADE CLPSO CoDE SPEA2 NSGA-II KP-NSJADE CLPSO CoDE SPEA2 NSGA-II

5.6150E03 5.7360E03 5.5999E03 5.9613E03 5.7483E03 7.8032E05 8.4638E05 7.2008E05 8.5575E05 7.7755E05

3.7731E05 3.5070E05 1.9053E05 4.8860E05 6.3855E05 2.4321E06 2.3554E06 2.9449E06 4.6535E06 5.1752E06

4.4727E03 4.4987E03 4.4753E03 4.4779E03 4.4855E03 3.7533E04 3.8468E04 3.7469E04 3.7855E04 3.7036E04

1.1013E04 1.1183E04 9.8694E05 3.4001E05 6.9419E05 2.3646E05 3.3981E05 2.2128E05 6.4584E06 1.8351E05

5.5927E03 5.7134E03 5.5896E03 5.9282E03 5.7432E03 7.6530E05 8.4507E05 7.0284E05 8.3749E05 7.8421E05

5.9128E05 2.6992E05 1.8673E05 5.3170E05 6.4058E05 3.6018E06 1.8859E06 3.2032E06 5.6980E06 5.4627E06

4.4652E03 4.4854E03 4.5021E03 4.4718E03 4.4682E03 3.7814E04 3.8209E04 3.7102E04 3.7820E04 3.7270E04

1.3940E04 1.0179E04 1.0592E04 3.1531E05 7.5017E05 2.7511E05 2.6212E05 1.8749E05 6.0568E06 1.3489E05

CGS

KP-NSJADE CLPSO CoDE SPEA2 NSGA-II

3.1297E04 3.2638E04 3.0746E04 3.2227E04 3.1010E04

4.9575E06 2.5754E06 2.5736E06 1.9793E06 4.7957E06

6.9219E04 6.7696E04 6.7641E04 6.8069E04 6.7527E04

1.3308E05 1.0576E05 1.3061E05 3.9781E06 1.3296E05

3.0747E04 3.2508E04 3.0590E04 3.2185E04 3.0935E04

6.0645E06 2.9226E06 2.4205E06 1.8509E06 4.3647E06

6.9285E04 6.7946E04 6.7862E04 6.8079E04 6.7809E04

1.2054E05 1.4966E05 1.1751E05 4.1125E06 1.2393E05

Sunspot

KP-NSJADE CLPSO

2.0740E03 2.1396E03

4.4914E05 2.5367E05

1.9974E03 2.0001E03

6.5796E05 7.7794E05

2.0607E03 2.1336E03

4.8623E05 2.2420E05

1.9975E03 2.0026E03

7.6175E05 5.9239E05

LGD

(continued on next page)

8060

W. Du et al. / Expert Systems with Applications 41 (2014) 8049–8061

Table 5 (continued) Data set

Method

M MSEtr

S MSEtr

M MSEte

S MSEte

M MSEtr

S MSEtr

M MSEte

S MSEte

CoDE SPEA2 NSGA-II

2.0043E04 2.1600E03 2.0996E03

1.0957E05 1.0232E05 2.2043E05

2.0153E03 1.9991E03 2.0040E03

7.6118E05 1.3030E05 4.5988E05

1.9974E04 2.1613E03 2.0961E03

9.7343E06 1.1778E05 1.6381E05

2.0087E03 2.0001E03 2.0071E03

6.9432E05 1.2437E05 4.9025E05

Star

KP-NSJADE CLPSO CoDE SPEA2 NSGA-II

1.4083E04 1.4127E04 1.3993E04 1.4172E04 1.4096E04

1.6705E07 2.3059E07 2.8101E07 1.0830E07 3.1408E07

1.4378E04 1.4437E04 1.4388E04 1.4404E04 1.4395E04

1.7301E07 3.2622E07 7.8961E07 6.2258E08 1.6538E07

1.4075E04 1.4124E04 1.3983E04 1.4169E04 1.4087E04

1.7215E07 2.1012E07 2.5044E07 9.7596E08 3.3187E07

1.4380E04 1.4440E04 1.4382E04 1.4404E04 1.4401E04

2.2153E07 3.2582E07 5.4550E07 5.8096E08 2.1956E07

IBM

KP-NSJADE CLPSO CoDE SPEA2 NSGA-II

6.4037E05 6.4238E05 6.3925E05 6.4299E05 6.4143E05

1.5792E07 6.5298E08 3.3803E08 2.3825E08 5.3333E08

1.1672E04 1.1682E04 1.1681E04 1.1677E04 1.1678E04

7.4501E07 1.0266E06 6.6838E07 1.4860E07 3.7424E07

6.3959E05 6.4205E05 6.3919E05 6.4303E05 6.4126E04

1.6988E07 8.2341E08 3.0478E08 3.4834E08 6.9841E08

1.1668E04 1.1672E04 1.1674E04 1.1681E04 1.1683E04

5.8410E07 1.0139E06 7.9857E07 1.2730E07 4.0916E07

Table 6 Scores of NSJADE and other algorithms. Method

KP-NSJADE CLPSO CoDE SPEA2 NSGA-II

Generation

Total

250th

500th

750th

1000th

5 0 2 0 1

4 0 1 1 2

6 0 0 0 2

6 0 1 0 1

21 0 4 1 6

values by other algorithms at each generation shown in Table 5. For LGD, KP-NSJADE has a better M MSEte value than other algorithms in the 250th generation; and this value is the smallest when compared to that of other algorithms in the 500th, 750th and 1000th generations. For Sunspot and Star, KP-NSJADE performs better than the rest algorithms in three of the four generations shown in Table 5. For CGS, KP-NSJADE does not perform well in M MSEte value; and from Fig. 4(e), we can find the reason which has been mentioned before: the phenomenon that the individuals with the best M MSEte values gather around the knee point is not more obvious than other four databases. Similarly, in order to show the effect of KP-NSJADE more clearly, we score the performance of these five algorithms in each generation: the one with the best M MSEte value gets 1 point, while the others gets 0. From Table 6, we can see that in every generation, KP-NSJADE outperforms the other algorithms, and the total score is 21; while NSGA-II gets 6 points, followed by CoDE and SPEA2; and CLPSO performs the worst. Overall, according to the experiments above, it is revealed that our proposed KP-NSJADE performs much more effectively in TSF problems. The knee point strategy narrows the distribution range of the solutions, and helps the solutions converge to the individuals with better forecasting performance faster, which ensures the effectiveness of KP-NSJADE when compared with some other popular algorithms. 6. Conclusion In this paper, we investigate the problem of time series forecasting by using neural networks. Unlike most of the former research where single objective algorithms were applied for TSF problem, we develop two novel multiobjective evolutionary algorithms for the TSF problems. Firstly, a new multiobjective evolutionary algorithm, called nondominated sorting adaptive differential evolution (NSJADE), is proposed to forecast three common time series databases. After applying the algorithm to each time series database and other five common databases, we find an interesting phenomenon when plotting the Pareto front of each problem: individuals with better forecasting performance in the

whole population gather around the knee point of the curve, which drives us to further improve the algorithm with this newlydiscovered information. Therefore, a knee point based nondominated sorting adaptive differential evolution (KP-NSJADE) is proposed for converging to the neighborhood of the knee point of every problem’s curve in the objective space. It is worth mentioning that it is the first time that the concept of knee point is introduced to time series forecasting problems. By comparing with several popular algorithms, we show that KP-NSJADE is more effective and competitive to forecast time series databases. Our research presents two new MOEAs for the TSF problems, which ensure both accuracy and stability in the forecasting when compared with several traditional single objective algorithms. Furthermore, our research shows that it is very helpful to probe into deeper information when utilizing the MOEAs for the TSF problems, like the distribution of the ANNs with the best forecast results. However, in this research, only eight time series databases are tested. We cannot ensure that other databases will also have the phenomenon displayed before, i.e., the ANNs with the best forecasting results are around the ‘‘corner’’ of the Pareto front curves when trained by the proposed NSJADE. It is worth noticing that our research merely aims to play a pioneering role for the TSF problems to utilize the underlying characteristics of the different time series databases, no matter which algorithms, i.e., single objective algorithms or MOEAs, are employed for the optimization during the forecasting process. For our future work, we will test more time series databases, to examine the performance of MOEAs. Meanwhile, more information of different databases will be explored during the course of forecasting, like the distribution of the ANNs with the worst forecast results. Finally, MOEAs will be widely used for forecasting problems in real-world industries, to see if they are also efficient. 7. Author contributions Conceived and designed the experiments: WD SYSL. Performed the experiments: WD. Analyzed the data: WD CKK. Wrote the paper: WD SYSL. Acknowledgement The authors sincerely thank Dr. Yang Tang for the constructive and valuable comments and suggestions, and The Hong Kong Polytechnic University for the financial support in this research. References Bhaduri, K., Stefanski, M. D., & Srivastava, A. N. (2011). Privacy-preserving outlier detection through random nonlinear data distortion. IEEE Transactions on Systems, Man, and Cybernetics Part B, Cybernetics, 41, 260–272.

W. Du et al. / Expert Systems with Applications 41 (2014) 8049–8061 Branke, J., Deb, K., Dierolf, H., & Osswald, M. (2004). Finding knees in multi-objective optimization. Parallel problem solving from nature (PPSN-VIII). Heidelberg, Germany: Springer. Chiam, S. C., Tan, K. C., & Mamun, A. A. (2007). Multiobjective evolutionary neural networks for time series forecasting. Evolutionary multi-criterion optimization. Heidelberg, Berlin: Springer. Computer Science News (2011). . Contreras, J., Espínola, R., Nogales, F. J., & Conejo, A. J. (2003). ARIMA models to predict next-day electricity prices. IEEE Transactions on Power Systems, 18, 1014–1020. Das, I. (1999). On characterizing the ‘‘knee’’ of the Pareto curve based on normalboundary intersection. Structural Optimization, 18, 107–115. Das, S., & Suganthan, P. N. (2011). Differential evolution: A survey of the state-ofthe-art. IEEE Transactions on Evolutionary Computation, 15, 4–31. DataMarket (2013). . Deb, K. (2002). Multi-objective optimization using evolutionary algorithms. Wiley. Deb, K., & Gupta, S. (2011). Understanding knee points in bicriteria problems and their implications as preferred solution principles. Engineering Optimization, 43, 1175–1204. Deb, K., Pratap, A., Agarwal, S., & Meyarivan, T. (2002). A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation, 6, 182–197. Du, H., & Zhang, N. (2008). Time series prediction using evolving radial basis function networks with new encoding scheme. Neurocomputing, 71, 1388–1400. Gardner, E. (2006). Exponential smoothing: The state of the art – Part II. Journal of Forecasting, 22, 637–666. Goh, C. K., Teoh, E. J., & Tan, K. C. (2008). Hybrid multiobjective evolutionary design for artificial neural networks. IEEE Transactions on Neural Networks, 19, 1531–1548. Gu, J., Zhu, M., & Jiang, L. (2011). Housing price forecasting based on genetic algorithm and support vector machine. Expert Systems with Applications, 38, 3383–3386. Hill, T., O’Connor, M., & Remus, W. (1996). Neural network models for times series forecasting. Management Science, 42, 1082–1092. Hornik, K., Stinchcombe, M., & White, H. (1989). Multilayer feedforward networks are universal approximators. Neural Networks, 2, 359–366. Huang, G.-B., Zhu, Q.-Y., & Siew, C.-K. (2006). Extreme learning machine: Theory and applications. Neurocomputing, 70, 489–501. Katagiri, H., Nishizaki, I., Hayashida, T., & Kadoma, T. (2012). Multiobjective evolutionary optimization of training and topology of recurrent neural networks for time-series prediction. The Computer Journal, 55, 325–336. Leung, F. H., Lam, H. K., Ling, S. H., & Tam, P. K. (2003). Tuning of the structure and parameters of a neural network using an improved genetic algorithm. IEEE Transactions on Neural Networks, 14, 79–88. Leung, S. Y. S., Tang, Y., & Wong, W. K. (2012). A hybrid particle swarm optimization and its application in neural networks. Expert Systems with Applications, 39, 395–405. Li, C., & Hu, J.-W. (2012). A new ARIMA-based neuro-fuzzy approach and swarm intelligence for time series forecasting. Engineering Applications of Artificial Intelligence, 25, 295–308. Liang, J. J., Qin, A. K., Suganthan, P. N., & Baskar, S. (2006). Comprehensive learning particle swarm optimizer for global optimization of multimodal functions. IEEE Transactions on Evolutionary Computation, 10, 281–295. Mackey, M., & Glass, L. (1977). Oscillation and chaos in physiological control systems. Science, 197, 287–289. Ren, G., & Zhou, Z. (2011). Traffic safety forecasting method by particle swarm optimization and support vector machine. Expert Systems with Applications, 38, 10420–10424.

8061

Tang, Y., Gao, H., & Kurths, J. (2014). Distributed robust synchronization of dynamical networks with stochastic coupling. IEEE Transactions on Circuits and Systems-I: Regular Papers, 61, 1508–1519. Tang, Y., Wang, Z., Gao, H., Swift, S., & Kurths, J. (2012). A constrained evolutionary computation method for detecting controlling regions of cortical networks. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 9, 1569–1581. Tang, Y., & Wong, W. K. (2013). Distributed synchronization of coupled neural networks via randomly occurring control. IEEE Transactions on Neural Networks and Learning Systems, 24, 435–447. Taylor, J. (2006). Forecasting daily supermarket sales using exponentially weighted quantile regression. European Journal of Operational Research, 178, 154–167. The Santa Fe Time Series Competition Data (1991). . Time Series Data Library (2013a). . Time Series Data Library (2013b). . Time Series Data Library (2013c). . Wang, Y., Cai, Z., & Zhang, Q. (2011). Differential evolution with composite trial vector generation strategies and control parameters. IEEE Transactions on Evolutionary Computation, 15, 55–66. Wong, W. K., & Guo, Z. X. (2010). A hybrid intelligent model for medium-term sales forecasting in fashion retail supply chains using extreme learning machine and harmony search algorithm. International Journal of Production Economics, 128, 614–624. Wong, W. K., Leung, S. Y. S., & Guo, Z. X. (2012). Feedback controlled particle swarm optimization and its application in time-series prediction. Expert Systems with Applications, 39, 8557–8572. Wong, Y. W., Seng, K. P., & Ang, L.-M. (2011). Radial basis function neural network with incremental learning for face recognition. IEEE Transactions on Systems, Man, and Cybernetics Part B, Cybernetics, 41, 940–949. Xu, Y., Cao, X., & Qiao, H. (2011). An efficient tree classifier ensemble-based approach for pedestrian detection. IEEE Transactions on Systems, Man, and Cybernetics Part B, Cybernetics, 41, 107–117. Yan, W. (2012). Toward automatic time-series forecasting using neural networks. IEEE Transactions on Neural Networks and Learning Systems, 23, 1028–1039. Yeh, W.-C. (2013). New parameter-free simplified swarm optimization for artificial neural network training and its application in the prediction of time series. IEEE Transactions on Neural Networks and Learning Systems, 24, 661–665. Zhang, G., Patuwo, B. E., & Hu, M. Y. (1998). Forecasting with artificial neural networks: The state of the art. International Journal of Forecasting, 14, 35–62. Zhang, J., & Sanderson, A. C. (2009). JADE: Adaptive differential evolution with optional external archive. IEEE Transactions on Evolutionary Computation, 13, 945–958. Zhang, W., Cao, Q., & Schniederjans, M. J. (2004). Neural network earning per share forecasting models: A comparative analysis of alternative methods. Decision Sciences, 35, 205–237. Zhang, W., Tang, Y., Miao, Q., & Du, W. (2013). Exponential synchronization of coupled switched neural networks with mode-dependent impulsive effects. IEEE Transactions on Neural Networks and Learning Systems, 24, 1316–1326. Zhou, A., Qu, B.-Y., Li, H., Zhao, S.-Z., Suganthan, P. N., & Zhang, Q. (2011). Multiobjective evolutionary algorithms: A survey of the state of the art. Swarm and Evolutionary Computation, 1, 32–49. Zitzler, E., Laumanns, M., & Thiele, L. (2001). SPEA2: Improving the strength Pareto evolutionary algorithm, TIK-report 103.