SVR mathematical model and methods for sale prediction

SVR mathematical model and methods for sale prediction

Journal of Systems Engineering and Electronics Vol. 18, No. 4, 2007, pp.769–773 SVR mathematical model and methods for sale prediction Yi Yang1 , Ron...

194KB Sizes 0 Downloads 47 Views

Journal of Systems Engineering and Electronics Vol. 18, No. 4, 2007, pp.769–773

SVR mathematical model and methods for sale prediction Yi Yang1 , Rong Fuli1 , Chang Huiyou2 & Xiao Zhijiao1 1. Computer Science Dept., Sun Yat-Sen Univ., Guangzhou 510275, P. R. China; 2. Software School, Sun Yat-Sen Univ., Guangzhou 510275, P. R. China (Received June 13, 2006)

Abstract: Sale prediction plays a significant role in business management. By using support vector machine Regression (ε-SVR), a method using to predict sale is illustrated. It takes historical data and current context data as inputs and presents results, i.e. sale tendency in the future and the forecasting sales, according to the user’s specification of accuracy and time cycles. Some practical data experiments and the comparative tests with other algorithms show the advantages of the proposed approach in computation time and correctness.

Keywords: regression, support vector regression, sale prediction.

1. Introduction With the high competition in Chinese and international markets, the research on the decision making and forecasting for the changing tendency of sale is becoming more and more important. The inquiries from China Tobacco indicate that forecasting sales is essential for the annual marketing plans of most Chinese tobacco enterprises. Accurately and rapidly predicting sales is thus a very important issue.

2. Related works Over recent years, some methods have been applied to predict sales and demand in the practical industry[1−4] , and ARIMA, regression, time series, and grey theory are the most frequently used methods and they all need great amount of training data. Unfortunately, the existing data is usually insufficient, and a rather long training time is expected in above approaches. Forecasting trend in Tobacco sale is considered complicated, not only because of many factors within the enterprises and that outer circumstances may influence it, but also by the reason that there are no rules during the operation. We want to propose some intelligent recommendation for decisionmaking

to tackle the complexity of sale prediction for Chinese tobacco enterprises. In our case, learning and predicting methods based on ε-SVR for intelligent decision support are provided, then, numerical and graphical visualization results describing the tobacco sale prediction are illustrated. Furthermore, the efficiency of ε-SVR is analyzed by comparing with the neural network.

3. ε-SVR based regression mathematical model Smola and Scholkopf[5] discussed a very general class of loss functions, and the use of ridge regression in feature space was considered by Sholkopf et al[6] . Lee et al[2] proposed a support vector machine regression. In regression problems, a training data set S = {(x1 , y1 ), · · · , (xm , ym )} ∈ Rn · R is given, which consists of m points in n+1 dimensional, xi ∈ Rn is illustrated by the matrix AT ∈ Rn×m , and yi ∈ Rn is called the observation. The main goal of regression problems is to estimate a function f (x) that can predict the observation value, y, of new input data point, x, by learning from the given training data set, S. Learning from a given training data set means finding a linear or nonlinear surface that tolerates a small

* This project was supported by the National Natural Science Foundation of China (60573159); the Natural Science Foundation of Guangdong Province (05200302).

770

Yi Yang, Rong Fuli, Chang Huiyou & Xiao Zhijiao

error in fitting this training data set. Disregarding the tiny errors that fall within some tolerance, say ε, may lead to some better generalization ability. ξ measures the costs of the errors on the training points. These are zero for all points inside the band. In SVM, such a regression algorithm is called εsupport vector regression (ε-SVR). With many reasonable choices of loss function, the solution will be characterized as the minimum of a convex function. The idea of representing the solution by means of a small subset of training points has enormous computational advantages. Using ε-insensitive loss function has that advantages, while still ensuring the existence of a global minimum and the optimization of reliable generalization bound. To start with, a linear function is defined as follows Y = f (x) = Aw + b

(1)

The mathematical regression model of linear functions is (LR ε-SVR) L(w, b) =

n 

(yi − Aw − b)2

(2)

i=1

s.t. Aw + b − y  ε + ξ, (w, b) ∈ Rn+1 , ξi ∈ Rm Applying ε-SVR, the regression function f (x) is constructed to fit the training data set. The least squares approach prescribes choosing the parameters(w, b)to minimize the sum of the squared deviations of the data. The problem can be formulated as an unconstrained minimization problem as follows n  1 T min w w + C |ξi | (3) 2 i=1 s.t.

Aw + b − y  ε + ξ, (w, b) ∈ Rn+1 , ξi ∈ Rm (4)

where positive C is the penalty weight for errors. Formulation(3), which is to minimize the ε-insensitive loss function is a convex quadratic program and can be settled down by using the Lagrange multiplier method. Then, by introducing Lagrange multipliers αi , i ∈ n, the Lagrange function is described as follows n

L(w, b, ξi , αi ) =

 1 T w w+C |ξi |− 2 i=1

n  i=1

σi {yi (wT Ai + b) − 1 + ξi } (5)

According to the Kuhn-Tucher theorem in optimization theory, the solution of the optimization is given by the saddle point of the Lagrange function. The nonzero Lagrange coefficientsαi are called support vectors (αi is a boundary support vector if αi = C), and can be found by the following convex quadratic programming problem ⎫ ⎧ n  n n ⎬ ⎨ 1  max − αi yi yj (ATi Aj )αi αj + (6) α ⎩ 2 ⎭ i=1 i=1 j=1 s.t.

n 

αi yi = 0, 0  αi  C, i = 1, · · · , n

(7)

i=1

w=

n 

yi αi Ai

(8)

i=1

In practical application, nonlinear surface is much more popular. In order to generate the result from linear case to nonlinear case, the kernel technique that has been used extensively in kernel-based learning algorithms is used ployed. Applying the duality theorem in convex minimization problem,w can be represented by AT u for u ∈ Rn , then the formulation (1) is changed to y = f (x) = AAT u + b (9) Then the formulation (3) becomes n

 1 min (uT u + b2 ) + C |K(Ai , AT )u + b − yi | (10) 2 i=1 n

 1 |K(Ai , AT )u + b − yi | (11) min (uT u + b2 ) + C 2 i=1 Replacing the AAT by a kernel matrix K(A, AT ), formulation (9) can become of f (x) = K(xT , AT )u + b = uT K(A, x) + b = n  ui K(Ai , x) + b (12) i=1

So, the nonlinear regression function may be regarded as a linear combination of a dictionary of atom functions, and the coefficients ui and b are determined by solving formula (13). Then, the mathematical regression model of the nonlinear function is (NLR ε-SVR) n

min

(u,b)∈Rn+1

 1 T (u u + b2 ) + C |K(Ai , AT )u + b − yi | 2 i=1 (13)

SVR mathematical model and methods for sale prediction Aw + b − y  ε + ξ, (w, b) ∈ Rn+1 , ξi ∈ Rm (14)

s.t.

factors are numerical and then can been taken into usage directly, such as the actual sale numbers and profits. However some factors are qualitative, for example, national and international policies on tobacco trading and the market-demands situations. In our experiments, we set up five different levels for the qualitative factors: strongly support; support; currently; disadvantage and against.

Where K(A, AT ) is a kernel map from R1×n × Rn×m to R1×m . Through the kernel technique, we can compute the value of the kernel function over a sample set instead of the space in the high-dimensional feature, thus the dimension crisis can be forbidden.

4. ε-SVR steps for tobacco sale prediction

Step 2 Data processing to the training and testing samples data in ε-SVR SP. Data processing is necessary before the regression calculation. 22 classes in the samples are identified in this application, as described in Table 2. Training samples (training data files) are made from historical data, and current context data constitute testing (predicting) files.

The regression based ε-SVR which is to predict the sale is described in the following, named by ε-SVR SP. Tobacco sale prediction is used as an example in the description. Step 1 Choosing samples and numerical processing for the qualitative parameters. Tobacco sale related factors and data, which are from 1995 to 2004, are provided by China Tobacco and other investigations, part of them are shown in Table 1. It can be noticed that some of above influencing ``` ``` year 1995 ``` actual sale ``` China Mainland

2 859

Table 1

771

Step 3 Data structure in sample files of εSVR SP. There all three sample data files in εSVR SP, which are training data file, testing (predicting) data file and

The actual sale

1996

1997

1998

1999

2000

2001

2002

2003

2004

2 768

2 826

2 715

2 889

3 068

3 267

3 367

3 514

3 733

North of China

350

299

314

314

349

376

422

434

456

484

North East of China

235

237

252

224

233

254

280

291

304

329

East of China

848

825

866

855

890

918

970

1 015

1 057

1 122

Middle South of China

760

762

769

726

773

870

914

934

977

1 044

South West of China

458

441

426

392

441

449

466

475

480

495

North West of China

205

201

197

202

202

199

214

218

240

253

Table 2

Samples in ε-SVR

sample type

description

sample#

y

yearly sale

10

x1

citizen’s salary standard

10

x2

enterprise’s yearly profits

10

x3

the stocks

10

x4

the tobacco’s price

10

x5 − x15

different kinds of raw materials’ prices

x16

national policies on trading

5

x17

international policies on trading

5

x18

national market demands

5

x19

international market demands

5

x20

weather

10

x21

tobacco consuming habits

10

sample initialization

original numerical, scaled to[0,1]

≈11∗ 10

original qualitative, numerical processed to needed

772

Yi Yang, Rong Fuli, Chang Huiyou & Xiao Zhijiao

output data file. Step 4 Training and predicting calculation in ε-SVR SP. The algorithm is constructed according to the regression mathematical model ( LR ε-SVR and NLR ε-SVR) described in Section 3. From the methodology described, it can be noticed that there are some parameters in the algorithm, such as n and C. In ε-SVR calculating, ε (tiny error), cost C (related with the Penalty Weight) and the cycle condition g (correspond to the Dimension n) can affect the accuracy of prediction, the cycle on the forecasting graphics and the calculation speed. So a set of nice arguments for the parameters will be effective. By splitting the training data file and doing the cross validation can do some help in improving the correctness character of the regression. For example, splitting the training data into 3 sets: train model with file 1 and 2 first, then predict file 3 to get accuracy; then train with file 2 and 3, and predict file 1, finally train file 1 and 3, and predict file 2. Several trying arguments process are built-in the ε-SVR SP. The resulting nice arguments will be used in the ε-SVR SP’s prediction calculations.

used to compare with the ε-SVR’s outputs. It can be seen that the optimal solution percentage of ε-SVR is mostly larger than 90%.

Step 5 SVR SP.

A NN embedded BP learning method is used for comparing the efficiency characters with ε-SVR, especially in the training process. We used different size training data sets, whose sizes are considered to be small, medium and large. The speed of training, the storage requirements for memory and the output results are addressed, as shown in Table 4. It can be noticed that the speed and the costs of memory for ε-SV is steady in the extreme for different data size problems, but the CPU time for NN is proportional with the data size and is much bigger for larger data set experiments,

Obtaining the forecasting value from ε-

5. Details about the experiments As shown in Table 2, the data and information adopted are supplied and presented according to Chinese geography division area. Hence, our simulated experiments will output the prediction value for different division areas and the whole China Mainland. Radial Basis K(xi , xj )= exp(−γ|xi − xj |2 ) is selected as the kernel function. The ε-SVR programmed in Java is run on JBuilder2005, with 1.0GHZ CPU and 512M. In addition to this, a neural network is processed to compare the accuracy and efficiency. Several practical data sets tests are carried out to evaluate the ε-SVR’s correctness feature. It can be seen from Table 3, during the practical data sets tests, the training files consisted of those samples from 1995 to the time which is one year prior the objective year, and the testing files are made of the samples from 1995 to the year whose tobacco sale will be predicted. The actual tobacco sale from China Mainland were

Fig. 1

A sale prediction cure (ε = 0.01, g = 100 and C = 100) Table 3

training

Practical data set tests

testing

actual

e-SVR

CPU

Ave.

set

set

sale

(s)

error

220∗5

150∗6

3068

3129

2.91

7.01%

500∗6

500∗6

3267

3390

3.21

6.23%

500∗7

500∗7

3367

3411

2.99

5.99%

500∗8

500∗8

3514

3622

3.2

5.98%

900∗9

900∗9

3733

3800

3.1

3.21%

Table 4

Comparison between ε-SVR and NN

data set

algorithm

CPU/s

Ave. error

156∗10

ε-SVR

3

11.11%

NN

100

6.21%

309∗15

ε-SVR

3.99

10.76%

NN

100

6.23&

ε-SVR

3.99

7.47%

NN

150

13.98%

ε-SVR

5.91

5.99%

NN

160

19.67%

1000∗18

4211∗11

SVR mathematical model and methods for sale prediction meanwhile, the NN’s training process costs much more memory than ε-SV.

6 Discussion, conclusions and outlook This article proposes a ε-SVR to predict Chinese tobacco sale for a period of time in the future, and a visualization graphics about sale trend is also presented. The algorithms take some historical and current context data into account and present the results in the light of the user’s opinion on the degree of accuracy and cycle time. Simulated experiments and practical data sets tests are processed to evaluate the correctness. At the same time, neural network has also been introduced in comparison. It can be discovered that the ε-SVR overmatch NN in efficiency by using less CPU and memory. Our future works will be concentrated on the improvement in training, learning and prediction, including the automatic scaling for training data and testing data, kernel function selection and optimization in learning.

References [1] Frank C, Garg A, Raheja A. Forecasting women’s apparel

773 17(5): 678–687. [3] Lin C, Hsu P. Forecast of non-alcoholic beverage sales in Taiwan using the Grey theory. Asia Pacific Journal of Marking and Logistics, 2002, 14(4): 3–12. [4] Mentzer J T, Kent J L. Forecasting demand in the longaberger company. Marketing Management, 1999, 8(2): 46–51. [5] Smola A, Scholkopf B. On a kernel-based method of pattern recognition, regression, approximation and operator inversion. Algorithmica, 1998, 22: 211–231. [6] Sholkopf B, Smola A. Williamson R, et al. Shrinking the tube: a new support vector regression algorithm.

Ad-

vances in Neural Information Processing Systems, MIT Press 1999.

Yi Yang received the B.S. in Electrical Engineering Department from Fudan University in 1989, and the M.S. and Ph.D. from Information Science School of Northeastern University in 1999 and 2002 respectively. She is currently an associate professor in the Information School & Technology of Sun Yat-sen University. Her current research interests include complex system modeling and optimization, scheduling, intel-

of Clothing Science and Technology, 2003, 15(2): 107–125.

ligent algorithms and Artificial Life Computing methods. She is a senior member of CCCF, and has pub-

[2] Lee Y J, Hsieh W F, Huang C M. E-SSVR: A smooth

lished more than 50 papers in the fields of computer

support vector machine for e-insensitive regression. IEEE

intelligence based decision and support. E-mail: [email protected]

sales using mathematical modeling. International Journal

Transactions on Knowledge and Data Engineering, 2005,