Taiwanese 3G mobile phone demand forecasting by SVR with hybrid evolutionary algorithms

Taiwanese 3G mobile phone demand forecasting by SVR with hybrid evolutionary algorithms

Expert Systems with Applications 37 (2010) 4452–4462 Contents lists available at ScienceDirect Expert Systems with Applications journal homepage: ww...

1MB Sizes 0 Downloads 58 Views

Expert Systems with Applications 37 (2010) 4452–4462

Contents lists available at ScienceDirect

Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa

Taiwanese 3G mobile phone demand forecasting by SVR with hybrid evolutionary algorithms Wei-Chiang Hong a,*, Yucheng Dong b, Li-Yueh Chen c, Chien-Yuan Lai a a

Department of Information Management, Oriental Institute of Technology, 58 Sec. 2, SiChuan Rd., Panchiao, Taipei 220, Taiwan Department of Management Science, School of Management, Xi’an Jiaotong University, Xi’an 710049, PR China c Department of Hospitality Management, MingDao University, 369 Wen-Hua Rd., Peetow, Changhua, 52345, Taiwan b

a r t i c l e

i n f o

Keywords: Demand forecasting Genetic algorithm–simulated annealing (GA–SA) Support vector regression (SVR) Autoregressive integrated moving average (ARIMA) General regression neural networks (GRNN) Third generation (3G) mobile phone

a b s t r a c t Taiwan is one of the countries with higher mobile phone penetration rate in the world, along with the increasing maturity of 3G relevant products, the establishments of base stations, and updating regulations of 3G mobile phones, 3G mobile phones are gradually replacing 2G phones as the mainstream product. Therefore, accurate 3G mobile phones demand forecasting is desirable and necessary to communications policy makers and all enterprises. Due to the complex market competitions and various subscribers’ demands, 3G mobile phones demand forecasting reveals highly non-linear characteristics. Recently, support vector regression (SVR) has been successfully employed to solve non-linear regression and time-series problems. This investigation employs genetic algorithm–simulated annealing hybrid algorithm (GA–SA) to choose the suitable parameter combination for a SVR model. Subsequently, examples of 3G mobile phones demand data from Taiwan were used to illustrate the proposed SVRGA–SA model. The empirical results reveal that the proposed model outperforms the other two models, namely the autoregressive integrated moving average (ARIMA) model and the general regression neural networks (GRNN) model. Ó 2009 Elsevier Ltd. All rights reserved.

1. Introduction 1.1. Historical overview of mobile communications Before explaining the importance and solid foundation of 3G mobile phones demand for economic growth and relevant business markets developments, it is worthwhile to take a brief historical overview of mobile communications. The matured characteristics of mobile communications (such as mobility, security, roaming on Internet, and improved voice/video service) are the most important attractiveness of current and potential subscribers. First generation (1G) mobile phones were analog in nature, designed with primary focus on voice communications and provided localized wireless services. Only the basic but necessary communicational demands could be satisfied. By the late 1990s, the second generation (2G) mobile phones were deployed. 2G mobile phones were digital in nature, had improved voice capability (with short messaging services (SMS)), spectrum management, wider coverage area, and better mobility, in addition, added capability of text delivery. During this time period, even the market also experienced the emergence of the Internet, however, it was far from reality to re* Tel.: +886 2 7738 0145x5316; fax: +886 2 7738 6310. E-mail address: [email protected] (W.-C. Hong) 0957-4174/$ - see front matter Ó 2009 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2009.12.066

ceive the same service by wireless approach. By the end of 2000, wireless voice services were already matured. By 2001, 2.5G technologies were introduced, 2.5G technologies were also digital in nature, offering circuit and packed switched data services, such as voicemail, e-mail, location-based services (LBS), and other e-commerce services. 2006 was year zero of the third generation (3G) mobile phone, which could provide content-rich applications independently of user location to reach the following main features for each subscriber: always-on connectivity, all IP network, global roaming, and value added services (Selian, 2002). More details and application designs for 3G mobile technologies could be referred in Tanguturi and Harmantzis (2006), Gerstheimer and Lupp (2004) and Yoo et al. (2005). 3G mobile phone thereby has been expected to accelerate the development of mobile commerce and services (Yuan et al., 2006). For example, in Korea, only within a year after the launch of its 3G services in 2001, about 7% of the mobile phone subscribers had signed up for 3G services. At the same time, the average revenue per user (ARPU) for these users is nearly three times higher than that of 2G users (Yoo, Lyytinena, & Yang, 2005). In Taiwan, the establishment of base stations, and updating of regulations governing 3G mobile phones, in addition, the relevant 3G products are also gradually mature, thus, 3G mobile phones are gradually replacing 2G phones as the market’s mainstream product. Therefore, mobile phone is currently going to the age of high

W.-C. Hong et al. / Expert Systems with Applications 37 (2010) 4452–4462

speed data communications, which combines personal mobile multimedia tools, computers, and television. They can provide well quality (always-on connectivity) in mobile television, videophone, and Internet functions. However, for telecommunications businesses, it is important to understand the development trend of the 3G market and the growth of the 3G mobile phone penetration rate to allocate their investments in base stations and launched services; for policy makers, it is much more important to have the picture that which factors such as social economic degrees/levels, products/functions restrictions, economic disturbances, or costs block the rapid growth of 3G mobile phones. Therefore, an accurate forecast of 3G mobile phones demand, usually measured as a number of 3G mobile phones subscribers, is important to help researchers, telecommunications companies, and policy makers (or potential investors) with investigating future 3G development trends and making operational, tactical and marketing strategic decisions. Examples of operational decisions include business scheduling and staff training-on-job; tactical decisions relate to the preparation of 3G added-value services brochures, and strategic decisions are to do with base stations investments. Similarly, government bodies need accurate forecasts about 3G mobile phones demand in order to plan for telecommunication governing regulations. The benefits of accurate forecasting are undisputed. 1.2. Reviews of forecasting approaches Conventional quantitative forecasting models include two categories, regression models and time-series models. Regression models, also known as econometric models which are based on traditional statistical theory, are focused on constructing high relationships among mobile phone subscribers and socio-economic factors, such as income, living expenditure, and same generations’ effects. Those socio-economic factors are often found out that their coefficients are insignificant, thus, they play a minor role in coefficients determination even they may have strengthened influences on mobile phone subscribers. On the other hand, a lot of scrapping redundant variables may sometimes increase the explanation ability (the higher values of adjusted R2 ), however, co-linearity problem will also be suffered in the same time. This fact is one of the major limitations of econometric models. Therefore, more attention should be paid to collecting such variables. The second category approaches are time-series forecasting models, developed by Box and Jenkins (1976), the ARIMA (autoregressive moving integrated moving average) models have been one of the most popular approaches in time series forecasting, and are often employed when data are insufficient to build econometric models or when knowledge of the structure of regression models is limited. In some cases, as in short-term forecasting, time-series models are likely to outperform regression models (Witt & Witt, 1992). However, a fundamental limitation for time-series forecasting models is their inability to predict changes that are not evident in historical data, particularly for the non-linearity of mobile phone subscribers’ patterns. Recently, due to the significant progress in the fields of pattern recognition methodology, artificial neural networks (ANNs) are possible to be employed to forecast business demands. Many researchers had applied ANNs concepts to construct appropriate forecasting models to implement forecasting works, such as original ANNs models (Abdel-Aal, 2004; Law, 2000; Valverde Ramı´rez, de Campos Velho, & Ferreira, 2005; Vlahogianni, Karlaftis, & Golias, 2005; Yao & Tan, 2000; Zhang & Hu, 1998), and general regression neural networks (GRNN) (Kamo & Dagli, 2009; Leung, Chen, & Daouk, 2000; Pai & Hong, 2005b; Wu & Lin, 2009). ANNs are primarily based on a model of emulating the processing of human neurological system to find out related spatial and temporal characteristics from the historical data patterns (especially for non-linear and dynamic evolutions), therefore, ANNs are able to approximate any degree of

4453

complexity and without prior knowledge of problem solving, particularly without understanding any assumptions of traditional statistical/econometric approaches required. As mentioned above that the process underlying mobile phone subscribers is complicated to be captured by a single linear statistical algorithm, ANNs have received much attention and been considered as alternatives for solving mobile phone subscribers forecasting. However, the training procedure of ANNs models is not only time consuming but also possible to get trapped in local minima and subjectively in selecting the model architecture (Suykens, 2001). Proposed by Vapnik (1995), support vector machines (SVMs) are one of the significant developments in overcoming shortcomings of ANNs mentioned above. Rather than by implementing the empirical risk minimization (ERM) principle to minimize the training error, SVMs apply the structural risk minimization (SRM) principle to minimize an upper bound on the generalization error. SVMs could theoretically guarantee to achieve the global optimum, instead of trapping local optimum like ANNs models. Thus, the solution of a non-linear problem in the original lower dimensional input space could find its linear solution in the higher dimensional feature space. For more detailed mechanisms introduction of SVMs, it is referred to Cortes and Vapnik (1995) and Vapnik et al. (1996), among others. SVMs have found wide application in the field of pattern recognition, bio-informatics, and other artificial intelligence relevant applications. Particularly, along with the introduction of Vapnik’s e-insensitive loss function, SVMs also have been extended to solve non-linear regression estimation problems (Drucker, Burges, Kaufman, Smola, & Vapnik, 1997), which are so-called support vector regression (SVR). SVR have been successfully employed to solve forecasting problems in many fields, such as financial time-series (stocks index and exchange rate) forecasting (Cao, 2003; Cao & Gu, 2002; Huang, Nakamori, & Wang, 2005; Pai & Lin, 2005a, 2005b; Tay & Cao, 2001, 2002), engineering and software field (production values and reliability) forecasting (Pai & Hong, 2006), atmospheric science forecasting (Hong & Pai, 2007; Mohandes, Halawani, Rehman, & Hussain, 2004; Pai & Hong, 2007; Wang, Xu, & Lu, 2003), electric load forecasting (Pai & Hong, 2005a, 2005b), and so on. Meanwhile, SVR model had also been successfully applied to forecast tourist arrivals (Pai & Hong, 2005c; Pai, Hong, Chang, & Chen, 2006). The empirical results indicated that the selection of the three parameters (C, e, and r) in a SVR model influences the forecasting accuracy significantly. Numerous publications in the literature had given some recommendations on appropriate setting of SVR parameters (Cherkassky & Ma, 2004), however, those approaches do not simultaneously consider the interaction effects among the three parameters. There is no general consensus and many contradictory opinions, thus, evolutionary algorithms are employed to determine appropriate parameter values.

1.3. GA–SA algorithm in parameter determination of a SVR model Genetic algorithms (GAs) are auto-adaptive stochastic search techniques (Holland, 1975) that are based on the Darwinian survival-of-the-fittest philosophy and generate new individuals with selection, crossover and mutation operators. GAs start with a coding of the parameter set of all types of objective functions, thus, GAs have the ability in solving problems those traditional algorithms are not easily to solve. In Pai et al. (2006) and Pai and Hong (2005c), SVR with GAs is superior to other competitive forecasting models (ARIMA and ANNs). GAs are able to reserve a few best fitted members of the whole population for the next generation in the operation process, however, after some generations GAs might lead to a premature convergence to a local optimum in the searching the suitable parameters of a SVR model.

4454

W.-C. Hong et al. / Expert Systems with Applications 37 (2010) 4452–4462

Simulated annealing (SA) is a stochastic based general search tool that mimics the annealing process of material physics (Kirkpatrick, Gelatt, & Vecchi, 1983). When the system in the original state sold with energy is greater than that of the new generated state, this new state is automatically accepted. In contrast, the new state is accepted by Metropolis criterion with a probability function. The performance of SA is dependent on the cooling schedule. Thus, SA has some institution to be able to escape from local minima and reach global minimum (Lee & Johnson, 1983). In Pai and Hong (2005b) and Pai and Hong (2006), SVR with SA is also superior to other competitive forecasting models (Weibull Bayes, ARIMA, and GRNN). However, SA costs more computation time. To ensure the efficiency of SA, a proper temperature cooling rate (stop criterion) should be considered. To overcome these drawbacks from GAs and SA, it is necessary to find some effective approach and improvement to avoid leading to misleading local optimum and to search optimum objective function efficiently. Genetic algorithm–simulated annealing (GA– SA) hybrid algorithm is a novel trial in dealing with the challenges mentioned above. The GA–SA can firstly employ the superiority of SA algorithm to escape from local minima and approximate to the global minimum, and secondly apply the mutation process of GAs to improve searching ability in the range of values. In addition, Juang (2004) indicated that the hybrid of a GAs with existing algorithms can always produce a better algorithm than either the GAs or the existing algorithms alone. Therefore, the hybrid algorithm has been applied to the fields of system design (Shieh & Peralta, 2005), system and network optimization (Ponnambalam & Reddy, 2003; Zhao & Zeng, 2006), query to information retrieval system (Cordón, Moya, & Zarco, 2002), continuous-time production planning (Ganesh & Punniyamoorthy, 2005; Wang, Wong, & Rahman, 2004), and electrical power districting problem (Bergey, Ragsdale, & Hoskote, 2003). However, there is little application of the GA–SA to SVR’s parameter determination. This investigation presented in this paper is motivated by a desire to solve the problem of maintaining the premature convergence to a local optimum of GAs and the efficiency of SA mentioned above in determining the three free parameters in the SVR 3G mobile phones demand forecasting model. Therefore, the GA–SA algorithm is employed in the SVR model, namely SVRGA–SA, to provide good forecasting performance in capturing non-linear 3G mobile phones demand changes tendency. The remainder rest of the paper is organized as follows. The fundamental principle and formulation of SVR and the GA–SA algorithm which is used to select the parameters of the SVR model are presented in the Section 2, in addition, other alternative forecasting models are also introduced. Numerical examples to demonstrate the forecasting performance of the proposed model and corresponding comparison results with the other forecasting models are provided in Section 3. Conclusions are finally made in Section 4. 2. Forecasting models 2.1. Support vector regression The brief ideas of SVMs for the case of regression are introduced. A non-linear mapping uðÞ : Rn ! Rnh is defined to map the input data (training data set) fðxi ; yi ÞgNi¼1 into a so-called high dimensional feature space (which may have infinite dimensions), Rnh (Fig. 1(a) and (b)). Then, in the high dimensional feature space, there theoretically exists a linear function, f, to formulate the nonlinear relationship between input data and output data. Such a linear function, namely SVR function, is as,

f ðxÞ ¼ wT uðxÞ þ b

ð1Þ

where f ðxÞ denotes the forecasting values; the coefficients wðw 2 Rnh Þ and bðb 2 RÞ are adjustable. As mentioned above, SVM method one aims at minimizing the empirical risk,

Remp ðf Þ ¼

1 XN He ðyi ; wT uðxi Þ þ bÞ i¼1 N

ð2Þ

where He ðy; f ðxÞÞ is the e-insensitive loss function (as thick line in Fig. 1(c)) and defined as Eq. (3),

He ðy; f ðxÞÞ ¼



jf ðxÞ  yj  e; ifjf ðxÞ  yj P e 0;

otherwise

ð3Þ

In addition, He ðy; f ðxÞÞ is employed to find out an optimum hyper plane on the high dimensional feature space (Fig. 1(b)) to maximize the distance separating the training data into two subsets. Thus, the SVR focuses on finding the optimum hyper plane and minimizing the training error between the training data and the e-insensitive loss function. Then, the SVR minimizes the overall errors,

Re ðw; n ; nÞ ¼

Minw;b;n ;n

N X 1 T ðni þ ni Þ w wþC 2 i¼1

ð4Þ

with the constraints

yi  wT uðxi Þ  b 6 e þ ni ;

i ¼ 1; 2; . . . ; N

 yi þ wT uðxi Þ þ b 6 e þ ni ; ni P 0;

i ¼ 1; 2; . . . ; N

ni P 0;

i ¼ 1; 2; . . . ; N

i ¼ 1; 2; . . . ; N

The first term of Eq. (4), employed the concept of maximizing the distance of two separated training data, is used to regularize weight sizes, to penalize large weights, and to maintain regression function flatness. The second term penalizes training errors of f ðxÞ and y by using the e-insensitive loss function. C is a parameter to trade off these two terms. Training errors above e are denoted as ni , whereas training errors below e are denoted as ni (Fig. 1(b)). After the quadratic optimization problem with inequality constraints is solved, the parameter vector w in Eq. (1) is obtained,



N X ðbi  bi Þuðxi Þ

ð5Þ

i¼1

where bi ; bi are obtained by solving a quadratic program and are the Lagrangian multipliers. Finally, the SVR regression function is obtained as Eq. (6) in the dual space,

f ðxÞ ¼

N X

ðbi  bi ÞKðxi ; xÞ þ b

ð6Þ

i¼1

where Kðxi ; xj Þ is called the kernel function, and the value of the kernel equals the inner product of two vectors, xi and xj , in the feature space uðxi Þ and uðxj Þ, respectively; that is, Kðxi ; xj Þ ¼ uðxi Þ  uðxj Þ. Any function that meets Mercer’s condition (Vapnik, 1995) can be used as the kernel function. There are several types of kernel function. The most used kernel functions are the Gaussian radial basis functions (RBF) with a width of r : Kðxi ; xj Þ ¼ expð0:5kxi  xj k2 =r2 Þ and the polynomial kernel with an order of d and constants a1 and a2 : Kðxi ; xj Þ ¼ ða1 xi xj þ a2 Þd . If the value of r is very large, the RBF kernel approximates the use of a linear kernel (polynomial with an order of 1). Till now, it is hard to determine the type of kernel functions for specific data patterns (Amari & Wu, 1996). However, the Gaussian RBF kernel is not only easier to implement, but also is capable to non-linearly map the training data into an infinite dimensional space, thus, it is suitable to deal with non-linear relationship problems. Therefore, the Gaussian RBF kernel function

W.-C. Hong et al. / Expert Systems with Applications 37 (2010) 4452–4462

4455

Fig. 1. Transformation process illustration of a SVR model.

is specified in this study. The forecasting process of a SVR model is illustrated as in Fig. 2. The selection of the three positive parameters, C, e; r of a SVR model is important to the accuracy of the forecasting. However, there is no structural method or any shortage opinions on efficient setting of SVR parameters. The GA–SA algorithm is used in the proposed SVR model to optimize the parameter selection. 2.2. Genetic algorithms–simulated annealing hybrid algorithm (GA– SA) 2.2.1. Implementation structure of GA–SA To overcome the drawbacks from GAs and SA, this study propose a hybrid GA–SA algorithm by applying the superiority of SA to escape from local minima and approximate to the global minimum, in addition, by using the mutation process of GAs to improve searching ability in the range of values. On the other hand, to avoid computation executing time consuming, only the optimal individual of GAs population will be delivered to the SA for further improving. The proposed GA–SA algorithm consists of the GAs part and the SA part. GAs evaluates the initial popula-

tion and operates on the population using three basic genetic operators to produce new population (best individual), then, for each generation of GAs, it will be delivered to SA for further processing. After finishing all the processes of SA, the modified individual will be sent back to GAs for the next generation. These computing iterations will be never stopped till the termination condition of the algorithm is reached. The proposed procedure of GA–SA is illustrated as follow and the flowchart is shown as Fig. 3. The procedure of the GAs part is illustrated as follow: Step 1: Initialization. Construct randomly the initial population of chromosomes. The three parameters, C, r, and e in a SVR model in the ith generation are encoded into a binary format; and represented by a chromosome that is composed of ‘‘genes” of binary numbers (Fig. 4). Each chromosome has three genes, which represent three parameters. Each gene has 40 bits. For instance, if each gene contains 40 bits, a chromosome contains 120 bits. More bits in a gene correspond to finer partition of the search space.

Fig. 2. The forecasting process of a SVR model.

4456

W.-C. Hong et al. / Expert Systems with Applications 37 (2010) 4452–4462

Fig. 3. The GA–SA algorithm flowchart.

0

1

1



1

1

0

0

σ



1

0

1



0

0

ε

C Fig. 4. Binary encoding of a chromosome.

Step 2: Evaluating fitness. Evaluate the fitness of each chromosome. Due to forecasting accuracy required, in this paper, a negative mean absolute percentage error (-MAPE) for forecasting errors calculation is used as the fitness function. The MAPE is as Eq. (7),

MAPE ¼

 N   1 X ai  fi   100% N i¼1  ai 

ð7Þ

where ai and fi represent the actual and forecast values, and N is the number of forecasting periods. Step 3: Selection operation. Based on fitness functions, chromosomes with higher fitness values are more likely to yield offspring in the next generation. The roulette wheel

selection principle (Holland, 1975) is applied to choose chromosomes for reproduction. Step 4: Crossover operation and mutation operation. Mutations are performed randomly by converting a ‘‘1” bit into a ‘‘0” bit or a ‘‘0” bit in to a ‘‘1” bit. In crossover operation, chromosomes are paired randomly. The single-pointcrossover principle is employed herein. Segments of paired chromosomes between two determined breakpoints are swapped. For simplicity, suppose a gene has four bits, thus, a chromosome contains 12 bits (Fig. 5). Before crossover is performed, the values of the three parameters in #1 parent are 1.5, 1.25 and 0.34375, respectively. For #2 parent, the three values are 0.625,

4457

W.-C. Hong et al. / Expert Systems with Applications 37 (2010) 4452–4462

8.75 and 0.15625, accordingly. After crossover, for #1 offspring, the three values are 1.625, 3.75 and 0.40625, accordingly. For #2 offspring, the three values are 0.5, 6.25 and 0.09375, respectively. Finally, decode the crossover three parameters in a decimal format. Step 5: Stop condition. If the number of generation is equal to a given scale, then the best chromosomes are presented as a solution, otherwise go to the Step 1 of the SA part.

Previous studies (Kirkpatrick et al., 1983) indicated that the maximum number of loops (Nsa) is 100d to avoid infinitely repeated loops, where d denotes the problem dimension. In this investigation, three parameters (r, C, and eÞ are used to determine the system states. Therefore, Nsa is state to 300. Step 5: Temperature reduction. After the new system state is obtained, reduce the temperature. The new temperature reduction is obtained by the Eq. (9):

In the proposed GA–SA algorithm process, GAs will deliver its best individual to SA for further processing. After the optimal individual of GAs being improved, SA sends it back to GAs for the next generation. These computing iterations will be never stopped till the termination condition of the algorithm is reached. The procedure of the SA part is illustrated as follow:

New temperature ¼ ðCurrent temperatureÞ  q; where 0 < q < 1:

Step 1: Generate initial current state. Receive values of the three parameters from GAs. The values of forecasting error, MAPE, shown as Eq. (7), is defined as the system state (E). Here, the initial state (E0) is obtained. Step 2: Provisional state. Make a random move to change the existing system state to a provisional state. Another set of three positive parameters are generated in this stage. Step 3: Metropolis criterion tests. The following Metropolis criterion equation is employed to determine the acceptance or rejection of provisional state (Metropolis, Rosenbluth, Rosenbluth, & Teller, 1953): 8 > < Accept the provisional state; if Eðsnew Þ > Eðsold Þ; and p < P ðacceptsnew Þ; 0 6 p 6 1: Accept the provisional state; if Eðsnew Þ 6 Eðsold Þ > : Reject the provisional state; otherwise

ð8Þ where the p is a random number to determine the acceptance of the provisional state, Pðacceptsnew Þ, the probability of accepting the new state, is given by the following probability function,   new Þ (T is the thermal equilibrium Pðacceptsnew Þ ¼ exp  Eðsold ÞEðs kT temperature, k represents the Boltzmann constant). If the provisional state is accepted, then set the provisional state as the current state. Step 4: Incumbent solutions. If the provisional state is not accepted, then return to step 2. Furthermore, if the current state is not superior to the system state, then repeat steps 2 and 3 until the current state is superior to the system state, and set the current state as new system state.

ð9Þ The q is set at 0.9 in this study (Dekkers & Aarts, 1991). If the pre-determined temperature is reached, then stop the algorithm and the latest state is an approximate optimal solution. Otherwise, go to step 2. 2.3. Other alternative forecasting models In this study, for forecasting accuracy comparison with SVRGA– SA model, other alternative forecasting models, namely the autoregressive integrated moving average (ARIMA) model and the general regression neural network (GRNN) model were employed to forecast Taiwanese 3G mobile phones demand. The introduction of these two modes is as follows. 2.3.1. ARIMA model Introduced by Box and Jenkins (1976), the ARIMA model has been one of the most popular approaches in forecasting. In an ARIMA model, the future value of a variable is supposed to be a linear combination of past values and past errors, expressed as follows:

yt ¼ h0 þ /1 yt1 þ /2 yt2 þ    þ /p ytp þ et  h1 et1  h2 et2      hq etq

where yt is the actual value and et is the random error at time t; /i and hj are the coefficients; p and q are integers and often referred to as autoregressive and moving average polynomials, respectively. In addition, the difference ðrÞ is used to solve the non-stationary problem, and defined as follows:

rd yt ¼ rd1 yt  rd1 yt1

ð11Þ

Basically, three phases are included in an ARIMA model: model identification, parameter estimation and diagnostic checking. Furthermore, the backward shift operator, B, is defined as follows:

B1 yt ¼ yt1 ; B2 yt ¼ yt2 ; . . . . . . ; Bp yt ¼ ytp 1

B

before crossover Parameter

Parameter

2

p

et ¼ et1 ; B et ¼ et2 ; . . . . . . ; B et ¼ etp

1

1

0

0

0

0

1

0

1

0

1

1

Parent 2

0

1

0

1

1

1

1

0

0

1

0

1

/p ðBÞ ¼ 1  /1 B1  /2 B2  . . . . . .  /p Bp 1

2

q

h1 ðBÞ ¼ 1  h1 B  h2 B  . . . . . .  hq B

Offspring 2

ð14Þ ð15Þ

ð16Þ

Eq. (16) is denoted as ARIMA (p, d, q) with non-zero constant, C0. For example, the ARIMA (2, 2, 1) model can be represented as Eq. (17).

after crossover

Offspring 1

ð13Þ

Hence, Eq. (10) can be rewritten as Eq. (16),

/p ðBÞrd yt ¼ C 0 þ hq ðBÞet

Crossover Point=1

Parameter

ð12Þ

then /p ðBÞ and hq ðBÞ can be written as follows respectively:

Parameter

Parent 1

Parameter

ð10Þ

Parameter

1

1

0

1

0

1

1

0

0

1

0

1

/2 ðBÞr2 yt ¼ C 0 þ h1 ðBÞet

0

1

0

0

1

0

1

0

0

0

1

1

In general, the values of p, d, q then need to be estimated by autocorrelation function (ACF) and partial autocorrelation function (PACF) of the differenced series.

Fig. 5. A simplified example of parameter representation.

ð17Þ

4458

W.-C. Hong et al. / Expert Systems with Applications 37 (2010) 4452–4462

2.3.2. GRNN model The general regression neural network (GRNN) model, proposed by Specht (1991), can approximate any arbitrary function from historical data. The foundation of GRNN operation is based on the theory of kernel regression. The procedure of the GRNN model can be equivalently represented as follows:

R1 Nf ðM; NÞdN E½NjM ¼ R1 1 f ðM; NÞdN 1

ð18Þ

where N is the predicted value of GRNN, M the input vector ðM1 ; M2 ; . . . ; Mn Þ which consists of n variables, E½NjM the expected value of the output N given an input vector M, and f(M, N) the joint probability density function of M and N. The GRNN primarily has four layers (Fig. 6). Each layer is assigned with a specific computational function when non-linear regression function, Eq. (19), is performed. The first layer of the network is to receive information. The input neurons then feed the data to the second layer. The primary task of the second layer is to memorize the relationship between the input neuron and its proper response. Therefore, the neurons in the second layer are also called pattern neurons. A multivariate Gaussian function of hi is given in Eq. (19), and the data from the input neurons are used to compute an output hi by a typical pattern neuron i,

hi ¼ exp

  ðM  U i Þ0 ðM  U i Þ 2 2r

ð19Þ

where U i is a specific training vector represented by pattern neuron i, and r is the smoothing parameter. In the third layer, the neurons, namely the summation neurons, receive the outputs of the pattern neurons. The outputs from all pattern neurons are augmented. Basically, two summations, the simple summation and the weighted summation, are conducted in the neurons of the third layer. The simple summation and the weighted summation operations can be represented as Eqs. (20) and (21) respectively.

X

Ss ¼

hi

ð20Þ

i

Sw ¼

X

wi hi

ð21Þ

i

where wi is the pattern neuron i connected to third layer of weights. The summations of neurons in the third layer are then fed into the fourth layer. The GRNN regression output Q is calculated as follows:



Ss Sw

ð22Þ

3. Numerical examples 3.1. The data set and index of performance evaluation 3G mobile phones demand data (2006–2008) is obtained from the revenue reports section of Chunghwa Telecom Co. Ltd. financial information service which is published monthly (Chunghwa Telecom Co. Ltd., 2008). Table 1 lists the total 27 data used in this example. The data exhibit a steady increasing tendency since January 2006, and seems to follow 3-month cycles with increasing peaks. This study employs the Changhwa Telecom’s monthly 3G mobile phone demand data to compare the forecasting performances of the proposed SVRGA–SA model with those of ARIMA model and GRNN model. To be based on the same compared condition, in this paper, these 3G mobile phone demand data is divided into the three periods (training period, validation period, and testing period), particularly for the ratio of validation data to training data, it is recommended by Schalkoff’s (1997) to be approximately one to four. Therefore, the data set is divided as: training (January 2006–May 2007, 17 monthly 3G mobile phone subscribers), validation (June 2007–October 2007, 5 monthly 3G phone mobile subscribers), and testing (November 2007–March 2008, 5 monthly 3G phone mobile subscribers), accordingly.

Fig. 6. The architecture of the GRNN model.

4459

W.-C. Hong et al. / Expert Systems with Applications 37 (2010) 4452–4462 Table 1 Total number of 3G mobile phone subscribers of Chunghwa Telecom Co. (2006–2008) (unit: 1000 subscribers). Month (year)

Subscribers

Month (year)

Subscribers

Month (year)

Subscribers

January 2006 February 2006 March 2006 April 2006 May 2006 June 2006 July 2006 August 2006 September 2006 October 2006 November 2006 December 2006

304 313 328 340 360 408 458 524 597 673 786 943

January 2007 February 2007 March 2007 April 2007 May 2007 June 2007 July 2007 August 2007 September 2007 October 2007 November 2007 December 2007

1035 1138 1285 1415 1520 1685 1805 1909 1993 2111 2195 2291

January 2008 February 2008 March 2008

2365 2478 2588

Resource: Chunghwa Telecom Co. Ltd. (2008).

The accuracy of the proposed 3G phone demand forecasting model is measured as mean absolute percentage error (MAPE), given by Eq. (7). The minimum values of MAPE indicate that the deviations between actual values and forecast values are very small. 3.2. Parameters determination of the three forecasting models In this investigation, the free parameters of the three models are essential to obtain good forecasting results. For ARIMA models, the statistical package identified the most suitable model for the training data as ARIMA (1, 1, 1) with constant term. The ARIMA (1, 1, 1) model can be expressed as follows:

ð1  0:0647B1 Þryt ¼ 98:91 þ ð1 þ 0:3947B1 Þet

ð23Þ

ACF of Residuals for the ARIMA(1,1,1) model (with 95% confidence limits for the autocorrelations) 1.0 0.8

Autocorrelation

0.6 0.4 0.2 0.0 -0.2 -0.4 -0.6 -0.8 -1.0 1

2

3

4

Lag Fig. 7. Estimated residual ACF.

After determining the suitable parameters of the ARIMA model, it is important to examine how closely the proposed model fits a given time series. The autocorrelation function (ACF) was calculated to verify the parameters. Fig. 7 plots the estimated residual ACF and indicates that the residuals are not autocorrelated. PACF, the partial autocorrelation function, displayed in Fig. 8, is also used to check the residuals and indicates that the residuals are not correlated. For the GRNN model, Fig. 9 shows the MAPE values of the GRNN with various r. Clearly, when r exceeds 0.42, the value of MAPE subsequently also increases. Therefore, the limit of r is 0.42. In this study, the value of r was set at 0.04. For SVRGA–SA model, in the training stage, the rolling-based forecasting procedure is conducted, taking Fig. 10 as example, in which dividing training data into two subsets, namely fed-in (13 demand data) and fed-out (4 demand data) respectively. Firstly, the primary 13 demand data of fed-in subset are feeding into the proposed model, the structural risk minimization principle is employed to minimize the training error, then, obtain one-step ahead forecasting demand, namely the 14th forecasting demand. Secondly, the next 13 demand data, including 12 of the fed-in subset data (from 2nd to 13th) pulsing the 14th data in the fed-out subset, are similarly again fed into the proposed model, the structural risk minimization principle is also employed to minimize the training error, then, obtain one-step ahead forecasting demand, namely the 15th forecasting demand. Repeat the rolling-based forecasting procedure till the 17th forecasting demand is obtained. Meanwhile, training error in this training stage is also obtained. Several types of data rolling were applied during the training stage to conduct the rolling-based forecasting procedure. Different numbers of the 3G mobile phone demand in a time series were fed into the SVRGA–SA model to forecast the 3G demand in the next

MAPE 7.2

PACF of Residuals for the ARIMA(1,1,1) model (with 95% confidence limits for the partial autocorrelations) 1.0

7

Partial Autocorrelation

0.8 0.6

6.8

0.4 0.2

6.6

0.0 -0.2

6.4

-0.4 -0.6

6.2

-0.8 -1.0

6 1

2

3

Lag Fig. 8. Estimated residual PACF.

4

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1

σ: [0.04,1] Fig. 9. MAPE with various

r values of GRNN model.

4460

W.-C. Hong et al. / Expert Systems with Applications 37 (2010) 4452–4462

Fig. 10. The rolling-base forecasting procedure (training stage).

Table 2 Forecasting results of SVRGA and SVRGA–SA models. SVRGA

SVRGA–SA

Number of fed-in data

10 11 12 13 14

Parameters

Testing MAPE

r

C

e

0.7411 0.7722 3.9378 0.3959 3.4412

3057.6 1005.90 842.88 1492.00 1431.10

5.143 47.980 53.250 54.855 54.642

2.142 3.506 3.614 3.108 2.992

Table 3 Suitable values of parameters for different models. Models

Suitable parameter combinations

ARIMA GRNN SVRGA SVRGA–SA

r ¼ 0:04 r ¼ 0:7411; C ¼ 3; 057:6; e ¼ 5:143 r ¼ 0:9010; C ¼ 7681; e ¼ 21:242

p ¼ 1; d ¼ 1; q ¼ 1

Table 4 Forecasting Taiwan 3G demand from November 2007 to March 2008 (unit: 1000 subscribers). Months (years)

Actual demand

ARIMA

GRNN

SVRGA

SVRGA–SA

November 2007 December 2007 January 2008 February 2008 March 2008

2195 2291 2365 2478 2588

2226 2333 2439 2544 2650

2111 2195 2291 2365 2478

2143 2259 2353 2426 2476

2207 2281 2358 2449 2559

2.287%

3.991%

2.142%

0.720%

MAPE

Number of fed-in data

10 11 12 13 14

Parameters

Testing MAPE

r

C

e

0.8597 0.6217 0.5380 0.9010 0.3295

4007.1 3330.9 5041.1 7681 9376.6

5.044 0.569 39.347 21.242 87.94

1.254 1.053 1.988 0.720 0.882

validation period. While training errors improvement occurs, the three kernel parameters, r, C, and e of the SVRGA–SA model adjusted by GA–SA algorithm are employed to calculate the validation error. Then, the adjusted parameters with minimum validation error are selected as the most appropriate parameters. Finally, a five-step-ahead policy is employed to forecast 3G mobile phone demand. And, the kernel parameters, r, C, and e, in the proposed model with the smallest testing MAPE value is used as the most suitable model in this example. Table 2 lists the free parameters of the different models, in which the SVRGA model performs best when 10 fed-in data are used; the SVRGA–SA model performs best when 13 fed-in data are used. Table 3 lists the free parameters of the different models. These suitable parameters of the different models were used for forecasting the Taiwan 3G mobile phone demand in the testing data set. 3.3. Forecasting results The well-trained models, ARIMA (1, 1, 1), GRNN, SVRGA, and SVRGA–SA, are applied to forecast the 3G mobile phone demand from November 2007 to March 2008. Table 4 shows the actual values and the forecast values obtained using various forecasting

W.-C. Hong et al. / Expert Systems with Applications 37 (2010) 4452–4462

4461

Mann-Whitney U test is performed at the 0.025 and 0.05 significance levels in one-tail-test. The test results (Table 5) showed that the SVRGA–SA model almost yields improved forecast results and significantly outperforms the other three forecasting models.

3G demand 2700 2600 2500

4. Conclusions

2400 2300 Actual ARIMA

2200

GRNN SVRGA

2100

SVRGA-SA

2000 Nov. 2007

Dec. 2007

Jan. 2008

Feb. 2008

Mar. 2008

Fig. 11. Forecasting demands by ARIMA, GRNN, SVRGA and SVRGA–SA models (November 2007 to March 2008).

Table 5 Mann-Whitney U test. Mann-Whitney U test

SVRGA–SA vs. ARIMA (1, 1, 1) SVRGA–SA vs. GRNN ðr ¼ 0:04Þ SVRGA–SA vs. SVRGA

a ¼ 0:025 U¼2

a ¼ 0:05 U¼4

0 0 2.5

0 0 2.5

models in this example. The MAPE values for each month are calculated to compare fairly the proposed models with other alternative models. The proposed SVRGA–SA model has smaller MAPE values than the ARIMA (1, 1, 1), GRNN ðr ¼ 0:04Þ, and SVRGA models to capture the 3G demand patterns on five-month average basis. In addition, the GA–SA algorithm also helps to avoid trapping into local minimum than GAs did, thus, outperform the SVRGA model. For example, in Table 2, the GA–SA algorithm is then excellently to shift the local solution of SVRGA model by 13 fed-in data rolling type, ðr; C;eÞ ¼ ð0:3959; 1492; 54:855Þ with local optimal forecasting errors, in terms of MAPE (3.108%), to be improved by GA–SA algorithm to another better solution, ðr; C;eÞ ¼ ð0:9010; 7681; 21:242Þ to be the appropriate local optimal forecasting error in terms of MAPE (0.720%). Thus, it once again reveals that GA–SA algorithm is much appropriate than GAs in parameter adjustments to achieve forecasting accuracy improvement by integrated into the SVR model. Fig. 11 illustrates the forecasting 3G mobile phone demand of ARIMA, GRNN, SVRGA and SVRGA–SA models. To verify the significance of accuracy improvement of SVRGA– SA model, Mann-Whitney U test (Mann & Whitney, 1947) is conducted. Mann-Whitney U test is an approach assessing the significance of a difference in central tendency of two data series. These two data error series are ranked from the smallest value to the largest value. The test statistic U is given by Eq. (24):

U ¼ minfU 1 ; U 2 g

ð24Þ

where

n1 ðn1 þ 1Þ  R1 ; 2 n2 ðn2 þ 1Þ U 2 ¼ n1 n2 þ  R2 2

U 1 ¼ n1 n2 þ

ð25Þ ð26Þ

The n1 and n2 are the sizes of data series I and data series II, respectively. The R1 and R2 are the rank sums of data series I and data series II, correspondingly.

As mentioned above, in Taiwan, the establishment of 3G related infrastructures, governing regulations of 3G telecommunication, and the 3G relevant products are gradually mature, therefore, 3G telecommunication is currently going to the age of high speed data communications. However, there are several important issues such as whether the growth of the 3G mobile phone penetration rate will proceed quickly in keeping up with the 2G model, which socio-economic factors (such as social levels, inherent product restrictions, economic disturbances, or costs) will block the rapid growth of 3G phones. In other words, accurate 3G mobile phone demand forecasting will not only well investigate the future development trends of 3G, but also provide important guide for effective implementations of 3G related businesses nurturing. In addition, for telecommunication businesses, it will help them plan their future marketing strategies. Particularly, in the highly technological changes and innovations of telecommunication market make 3G mobile phone demand forecasting more difficult. Thus, it is worth analyzing where these forecasts fail and how forecasting accuracy is improved. In this investigation, SVRGA–SA model is proposed to predict Taiwan 3G phone demand. The numerical example of Chunghwa Telecom Co. Ltd. is used to elucidate the forecasting accuracy of proposed model. This study evaluates the feasibility of GA–SA algorithm in parameter determination to achieve forecasting accuracy improvement by integrated into the SVR model. Tables 2 and 4 illustrated that the SVRGA–SA model had given better results than other forecasting models; in the meanwhile, particularly, the SVRGA–SA model had avoided being trapped in local optimum like SVRGA (hybrid SVR and GAs) did. The superior performance of the SVRGA–SA model is caused of: (1) non-linear mapping capabilities and so can more easily capture data patterns of tourist arrivals than can ARIMA models; (2) SVR model applies structural risk minimization rather than minimizing the training errors, this minimization of an upper bound on the generalization error provides better generalization performance than that of ARIMA and GRNN models; finally, (3) the parameter selection in a SVR model heavily influences their forecasting performance, thus, improper selection of these three parameters will lead to either over-fitting or under-fitting of a SVR model. The GA–SA algorithm employed the superiority of SA algorithm to escape from local minima to the global minimum, and then, applied the mutation process of GAs’ searching capability to determine proper parameters combination. The favorable results obtained in this work reveal that the proposed model is a valid alternative for use in high technological products demand forecasting science. In the future, other novel hybrid evolutionary algorithms should be further applied to obtain more appropriate parameter combination, and then, to achieve more improvable, satisfactory accurate high technological products demand forecasting if it exists. Other socio-economic factors, such as market prices and gross domestic expenditure per person, could be included in the SVRGA–SA model to further forecast 3G relevant products. Acknowledgment This research was conducted with the support of National Science Council, Taiwan (NSC 97-2410-H-161-001, NSC 98-2410-H161-001, and NSC 98-2811-H-161-001).

4462

W.-C. Hong et al. / Expert Systems with Applications 37 (2010) 4452–4462

References Abdel-Aal, R. E. (2004). Short-term hourly load forecasting using abductive networks. IEEE Transactions on Power Systems, 19(1), 164–173. Amari, S., & Wu, S. (1996). Improving support vector machine classifiers by modifying kernel functions. Neural Networks, 12(6), 783–789. Bergey, P. K., Ragsdale, C. T., & Hoskote, M. (2003). A simulated annealing genetic algorithm for the electrical power districting problem. Annals of Operations Research, 121(1–4), 33–55. Box, G. E. P., & Jenkins, G. M. (1976). Time series analysis: Forecasting and control. San Francisco: Holden-Day. Cao, L. (2003). Support vector machines experts for time series forecasting. Neurocomputing, 51(1–4), 321–339. Cao, L., & Gu, Q. (2002). Dynamic support vector machines for non-stationary time series forecasting. Intelligent Data Analysis, 6(1), 67–83. Cherkassky, V., & Ma, Y. (2004). Practical selection of SVM parameters and noise estimation for SVM regression. Neural Networks, 17(1), 113–126. Chunghwa Telecom Co. Ltd., (2008). Monthly revenue reports. Financial Information Service. http://www.cht.com.tw/CompanyCat.php?Page=FileDownload&CatID=798. Cordón, O., Moya, F., & Zarco, C. (2002). A new evolutionary algorithm combining simulated annealing and genetic programming for relevance feedback in fuzzy information retrieval systems. Soft Computing, 6(5), 308–319. Cortes, C., & Vapnik, V. (1995). Support vector networks. Machine Learning, 20(3), 273–297. Dekkers, A., & Aarts, E. H. L. (1991). Global optimization and simulated annealing. Mathematical Programming, 50(1–3), 367–393. Drucker, H., Burges, C. J. C., Kaufman, L., Smola, A., & Vapnik, V. N. (1997). Support vector regression machines. Advances in Neural Information Processing Systems, 9(2), 155–161. Ganesh, K., & Punniyamoorthy, M. (2005). Optimization of continuous-time production planning using hybrid genetic algorithms–simulated annealing. International Journal of Advanced Manufacturing Technology, 26(1–2), 148–154. Gerstheimer, O., & Lupp, C. (2004). Needs versus technology—the challenge to design third-generation mobile applications. Journal of Business Research, 57(12), 1409–1415. Holland, J. (1975). Adaptation in natural and artificial system. Ann Arbor: University of Michigan Press. Hong, W. C., & Pai, P. F. (2007). Potential assessment of the support vector regression technique in rainfall forecasting. Water Resources Management, 21(2), 495–513. Huang, W., Nakamori, Y., & Wang, S. Y. (2005). Forecasting stock market movement direction with support vector machine. Computers & Operations Research, 32(10), 2513–2522. Juang, C. F. (2004). A hybrid of genetic algorithm and particle swarm optimization for recurrent network design. IEEE Transactions on Systems, Man, and Cybernetics-Part B: Cybernetics, 34(2), 997–1006. Kamo, T., & Dagli, C. (2009). Hybrid approach to the Japanese candlestick method for financial forecasting. Expert Systems with Applications, 36(3 Part), 5023–5030. Kirkpatrick, S., Gelatt, C. D., & Vecchi, M. P. (1983). Optimization by simulated annealing. Science, 220(4598), 671–680. Law, R. (2000). Back-Propagation Learning in Improving the Accuracy of Neural Network-Based Tourism Demand Forecasting. Tourism Management, 21(4), 331–340. Lee, J., & Johnson, G. E. (1983). Optimal tolerance allotment using a genetic algorithm and truncated Monte Carlo simulation. Computer Aided Design, 25(9), 601–611. Leung, M. T., Chen, A. S., & Daouk, H. (2000). Forecasting exchange rates using general regression neural networks. Computers and Operations Research, 27(11– 12), 1093–1110. Mann, H. B., & Whitney, D. R. (1947). On a test of whether one of two random variables is stochastically larger than the other. Annals of Mathematical Statistics, 18(1), 50–60. Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., & Teller, A. H. (1953). Equations of state calculations by fast computing machines. Journal of Chemical Physics, 21(6), 1087–1092. Mohandes, M. A., Halawani, T. O., Rehman, S., & Hussain, A. A. (2004). Support vector machines for wind speed prediction. Renewable Energy, 29(6), 939–947. Pai, P. F., & Hong, W. C. (2005a). Forecasting regional electric load based on recurrent support vector machines with genetic algorithms. Electric Power Systems Research, 74(3), 417–425.

Pai, P. F., & Hong, W. C. (2005b). Support vector machines with simulated annealing algorithms in electricity load forecasting. Energy Conversion and Management, 46(17), 2626–2669. Pai, P. F., & Hong, W. C. (2005c). An improved neural network model in forecasting arrivals. Annals of Tourism Research, 32(4), 1138–1141. Pai, P. F., & Hong, W. C. (2006). Software reliability forecasting by support vector machines with simulated annealing algorithms. Journal of Systems and Software, 79(6), 747–755. Pai, P. F., & Hong, W. C. (2007). A recurrent support vector regression model in rainfall forecasting. Hydrological Processes, 21(6), 819–827. Pai, P. F., Hong, W. C., Chang, P. T., & Chen, C. T. (2006). The application of support vector machines to forecast tourist arrivals in Barbados: An empirical study. International Journal of Management, 23(2), 375–385. Pai, P. F., & Lin, C. S. (2005a). Using support vector machines in forecasting production values of machinery industry in Taiwan. International Journal of Advanced Manufacturing Technology, 27(1–2), 205–210. Pai, P. F., & Lin, C. S. (2005b). A hybrid ARIMA and support vector machines model in stock price forecasting. Omega, 33(6), 497–505. Ponnambalam, S. G., & Reddy, M. M. (2003). A GA–SA multiobjective hybrid search algorithm for integrating lot sizing and sequencing in flow-line scheduling. International Journal of Advanced Manufacturing Technology, 21(2), 126–137. Schalkoff, R. J. (1997). Artificial neural networks. New York: McGraw-Hill. Selian, A. (2002). From GSM to IMT-2000—A comparative analysis. International Telecommunications Union (ITU). Shieh, H. J., & Peralta, R. C. (2005). Optimal in situ bioremediation design by hybrid genetic algorithm–simulated annealing. Journal of Water Resources Planning and Management, 131(1), 67–78. Specht, D. A. (1991). A general regression neural network. IEEE Transactions on Neural Networks, 2(6), 568–576. Suykens, J. A. K. (2001). Nonlinear modelling and support vector machines. In Proceedings of IEEE instrumentation and measurement technology conference (pp. 287–294). Tanguturi, V. P., & Harmantzis, F. C. (2006). Migration to 3G wireless broadband internet and real options: The case of an operator in India. Telecommunications Policy, 30(7), 400–419. Tay, F. E. H., & Cao, L. (2001). Application of support vector machines in financial time series forecasting. Omega, 29(4), 309–317. Tay, F. E. H., & Cao, L. (2002). Modified support vector machines in financial time series forecasting. Neurocomputing, 48(1–4), 847–861. ´ rez, M. C., de Campos Velho, H. F., & Ferreira, N. J. (2005). Artificial Valverde Ramı neural network technique for rainfall forecasting applied to The São Paulo Region. Journal of Hydrology, 301(1–4), 146–162. Vapnik, V. (1995). The nature of statistical learning theory. New York: Springer-Verlag. Vapnik, V., Golowich, S., & Smola, A. (1996). Support vector machine for function approximation, regression estimation, and signal processing. Advances in Neural Information Processing Systems, 9(4), 281–287. Vlahogianni, E. I., Karlaftis, M. G., & Golias, J. C. (2005). Optimized and metaoptimized neural networks for short-term traffic flow prediction: A genetic approach. Transportation Research Part C, 13(3), 211–234. Wang, W., Xu, Z., & Lu, J. W. (2003). Three improved neural network models for air quality forecasting. Engineering Computations, 20(2), 192–210. Wang, Z. G., Wong, Y. S., & Rahman, M. (2004). Optimisation of multi-pass milling using genetic algorithm and genetic simulated annealing. International Journal of Advanced Manufacturing Technology, 24(9-10), 727–732. Witt, S. F., & Witt, C. A. (1992). Modeling and forecasting demand in tourism. London: Academic Press. Wu, J. D., & Lin, B. F. (2009). Speaker identification based on the frame linear predictive coding spectrum technique. Expert Systems with Applications, 36(4), 8056–8063. Yao, J., & Tan, C. L. (2000). A case study on using neural networks to perform technical forecasting of forex. Neurocomputing, 34(1–4), 79–98. Yoo, Y., Lyytinena, K., & Yang, H. (2005). The role of standards in innovation and diffusion of broadband mobile services: The case of South Korea. Journal of Strategic Information Systems, 14(3), 323–353. Yuan, Y., Zheng, W., Wang, Y., Xue, Z., Yang, Q., & Gao, Y. (2006). Xiaolingtong versus 3G in China: Which will be the winner? Telecommunications Policy, 30(5-6), 297–313. Zhang, G., & Hu, M. Y. (1998). Neural network forecasting of the British Pound/US Dollar Exchange Rate. Omega, 26(4), 495–506. Zhao, F., & Zeng, X. (2006). Simulated annealing—genetic algorithm for transit network optimization. Journal of Computing in Civil Engineering, 20(1), 57–68.