Journal of Systems Engineering and Electronics Vol. 18, No. 4, 2007, pp.769–773
SVR mathematical model and methods for sale prediction Yi Yang1 , Rong Fuli1 , Chang Huiyou2 & Xiao Zhijiao1 1. Computer Science Dept., Sun Yat-Sen Univ., Guangzhou 510275, P. R. China; 2. Software School, Sun Yat-Sen Univ., Guangzhou 510275, P. R. China (Received June 13, 2006)
Abstract: Sale prediction plays a significant role in business management. By using support vector machine Regression (ε-SVR), a method using to predict sale is illustrated. It takes historical data and current context data as inputs and presents results, i.e. sale tendency in the future and the forecasting sales, according to the user’s specification of accuracy and time cycles. Some practical data experiments and the comparative tests with other algorithms show the advantages of the proposed approach in computation time and correctness.
Keywords: regression, support vector regression, sale prediction.
1. Introduction With the high competition in Chinese and international markets, the research on the decision making and forecasting for the changing tendency of sale is becoming more and more important. The inquiries from China Tobacco indicate that forecasting sales is essential for the annual marketing plans of most Chinese tobacco enterprises. Accurately and rapidly predicting sales is thus a very important issue.
2. Related works Over recent years, some methods have been applied to predict sales and demand in the practical industry[1−4] , and ARIMA, regression, time series, and grey theory are the most frequently used methods and they all need great amount of training data. Unfortunately, the existing data is usually insufficient, and a rather long training time is expected in above approaches. Forecasting trend in Tobacco sale is considered complicated, not only because of many factors within the enterprises and that outer circumstances may influence it, but also by the reason that there are no rules during the operation. We want to propose some intelligent recommendation for decisionmaking
to tackle the complexity of sale prediction for Chinese tobacco enterprises. In our case, learning and predicting methods based on ε-SVR for intelligent decision support are provided, then, numerical and graphical visualization results describing the tobacco sale prediction are illustrated. Furthermore, the efficiency of ε-SVR is analyzed by comparing with the neural network.
3. ε-SVR based regression mathematical model Smola and Scholkopf[5] discussed a very general class of loss functions, and the use of ridge regression in feature space was considered by Sholkopf et al[6] . Lee et al[2] proposed a support vector machine regression. In regression problems, a training data set S = {(x1 , y1 ), · · · , (xm , ym )} ∈ Rn · R is given, which consists of m points in n+1 dimensional, xi ∈ Rn is illustrated by the matrix AT ∈ Rn×m , and yi ∈ Rn is called the observation. The main goal of regression problems is to estimate a function f (x) that can predict the observation value, y, of new input data point, x, by learning from the given training data set, S. Learning from a given training data set means finding a linear or nonlinear surface that tolerates a small
* This project was supported by the National Natural Science Foundation of China (60573159); the Natural Science Foundation of Guangdong Province (05200302).
770
Yi Yang, Rong Fuli, Chang Huiyou & Xiao Zhijiao
error in fitting this training data set. Disregarding the tiny errors that fall within some tolerance, say ε, may lead to some better generalization ability. ξ measures the costs of the errors on the training points. These are zero for all points inside the band. In SVM, such a regression algorithm is called εsupport vector regression (ε-SVR). With many reasonable choices of loss function, the solution will be characterized as the minimum of a convex function. The idea of representing the solution by means of a small subset of training points has enormous computational advantages. Using ε-insensitive loss function has that advantages, while still ensuring the existence of a global minimum and the optimization of reliable generalization bound. To start with, a linear function is defined as follows Y = f (x) = Aw + b
(1)
The mathematical regression model of linear functions is (LR ε-SVR) L(w, b) =
n
(yi − Aw − b)2
(2)
i=1
s.t. Aw + b − y ε + ξ, (w, b) ∈ Rn+1 , ξi ∈ Rm Applying ε-SVR, the regression function f (x) is constructed to fit the training data set. The least squares approach prescribes choosing the parameters(w, b)to minimize the sum of the squared deviations of the data. The problem can be formulated as an unconstrained minimization problem as follows n 1 T min w w + C |ξi | (3) 2 i=1 s.t.
Aw + b − y ε + ξ, (w, b) ∈ Rn+1 , ξi ∈ Rm (4)
where positive C is the penalty weight for errors. Formulation(3), which is to minimize the ε-insensitive loss function is a convex quadratic program and can be settled down by using the Lagrange multiplier method. Then, by introducing Lagrange multipliers αi , i ∈ n, the Lagrange function is described as follows n
L(w, b, ξi , αi ) =
1 T w w+C |ξi |− 2 i=1
n i=1
σi {yi (wT Ai + b) − 1 + ξi } (5)
According to the Kuhn-Tucher theorem in optimization theory, the solution of the optimization is given by the saddle point of the Lagrange function. The nonzero Lagrange coefficientsαi are called support vectors (αi is a boundary support vector if αi = C), and can be found by the following convex quadratic programming problem ⎫ ⎧ n n n ⎬ ⎨ 1 max − αi yi yj (ATi Aj )αi αj + (6) α ⎩ 2 ⎭ i=1 i=1 j=1 s.t.
n
αi yi = 0, 0 αi C, i = 1, · · · , n
(7)
i=1
w=
n
yi αi Ai
(8)
i=1
In practical application, nonlinear surface is much more popular. In order to generate the result from linear case to nonlinear case, the kernel technique that has been used extensively in kernel-based learning algorithms is used ployed. Applying the duality theorem in convex minimization problem,w can be represented by AT u for u ∈ Rn , then the formulation (1) is changed to y = f (x) = AAT u + b (9) Then the formulation (3) becomes n
1 min (uT u + b2 ) + C |K(Ai , AT )u + b − yi | (10) 2 i=1 n
1 |K(Ai , AT )u + b − yi | (11) min (uT u + b2 ) + C 2 i=1 Replacing the AAT by a kernel matrix K(A, AT ), formulation (9) can become of f (x) = K(xT , AT )u + b = uT K(A, x) + b = n ui K(Ai , x) + b (12) i=1
So, the nonlinear regression function may be regarded as a linear combination of a dictionary of atom functions, and the coefficients ui and b are determined by solving formula (13). Then, the mathematical regression model of the nonlinear function is (NLR ε-SVR) n
min
(u,b)∈Rn+1
1 T (u u + b2 ) + C |K(Ai , AT )u + b − yi | 2 i=1 (13)
SVR mathematical model and methods for sale prediction Aw + b − y ε + ξ, (w, b) ∈ Rn+1 , ξi ∈ Rm (14)
s.t.
factors are numerical and then can been taken into usage directly, such as the actual sale numbers and profits. However some factors are qualitative, for example, national and international policies on tobacco trading and the market-demands situations. In our experiments, we set up five different levels for the qualitative factors: strongly support; support; currently; disadvantage and against.
Where K(A, AT ) is a kernel map from R1×n × Rn×m to R1×m . Through the kernel technique, we can compute the value of the kernel function over a sample set instead of the space in the high-dimensional feature, thus the dimension crisis can be forbidden.
4. ε-SVR steps for tobacco sale prediction
Step 2 Data processing to the training and testing samples data in ε-SVR SP. Data processing is necessary before the regression calculation. 22 classes in the samples are identified in this application, as described in Table 2. Training samples (training data files) are made from historical data, and current context data constitute testing (predicting) files.
The regression based ε-SVR which is to predict the sale is described in the following, named by ε-SVR SP. Tobacco sale prediction is used as an example in the description. Step 1 Choosing samples and numerical processing for the qualitative parameters. Tobacco sale related factors and data, which are from 1995 to 2004, are provided by China Tobacco and other investigations, part of them are shown in Table 1. It can be noticed that some of above influencing ``` ``` year 1995 ``` actual sale ``` China Mainland
2 859
Table 1
771
Step 3 Data structure in sample files of εSVR SP. There all three sample data files in εSVR SP, which are training data file, testing (predicting) data file and
The actual sale
1996
1997
1998
1999
2000
2001
2002
2003
2004
2 768
2 826
2 715
2 889
3 068
3 267
3 367
3 514
3 733
North of China
350
299
314
314
349
376
422
434
456
484
North East of China
235
237
252
224
233
254
280
291
304
329
East of China
848
825
866
855
890
918
970
1 015
1 057
1 122
Middle South of China
760
762
769
726
773
870
914
934
977
1 044
South West of China
458
441
426
392
441
449
466
475
480
495
North West of China
205
201
197
202
202
199
214
218
240
253
Table 2
Samples in ε-SVR
sample type
description
sample#
y
yearly sale
10
x1
citizen’s salary standard
10
x2
enterprise’s yearly profits
10
x3
the stocks
10
x4
the tobacco’s price
10
x5 − x15
different kinds of raw materials’ prices
x16
national policies on trading
5
x17
international policies on trading
5
x18
national market demands
5
x19
international market demands
5
x20
weather
10
x21
tobacco consuming habits
10
sample initialization
original numerical, scaled to[0,1]
≈11∗ 10
original qualitative, numerical processed to needed
772
Yi Yang, Rong Fuli, Chang Huiyou & Xiao Zhijiao
output data file. Step 4 Training and predicting calculation in ε-SVR SP. The algorithm is constructed according to the regression mathematical model ( LR ε-SVR and NLR ε-SVR) described in Section 3. From the methodology described, it can be noticed that there are some parameters in the algorithm, such as n and C. In ε-SVR calculating, ε (tiny error), cost C (related with the Penalty Weight) and the cycle condition g (correspond to the Dimension n) can affect the accuracy of prediction, the cycle on the forecasting graphics and the calculation speed. So a set of nice arguments for the parameters will be effective. By splitting the training data file and doing the cross validation can do some help in improving the correctness character of the regression. For example, splitting the training data into 3 sets: train model with file 1 and 2 first, then predict file 3 to get accuracy; then train with file 2 and 3, and predict file 1, finally train file 1 and 3, and predict file 2. Several trying arguments process are built-in the ε-SVR SP. The resulting nice arguments will be used in the ε-SVR SP’s prediction calculations.
used to compare with the ε-SVR’s outputs. It can be seen that the optimal solution percentage of ε-SVR is mostly larger than 90%.
Step 5 SVR SP.
A NN embedded BP learning method is used for comparing the efficiency characters with ε-SVR, especially in the training process. We used different size training data sets, whose sizes are considered to be small, medium and large. The speed of training, the storage requirements for memory and the output results are addressed, as shown in Table 4. It can be noticed that the speed and the costs of memory for ε-SV is steady in the extreme for different data size problems, but the CPU time for NN is proportional with the data size and is much bigger for larger data set experiments,
Obtaining the forecasting value from ε-
5. Details about the experiments As shown in Table 2, the data and information adopted are supplied and presented according to Chinese geography division area. Hence, our simulated experiments will output the prediction value for different division areas and the whole China Mainland. Radial Basis K(xi , xj )= exp(−γ|xi − xj |2 ) is selected as the kernel function. The ε-SVR programmed in Java is run on JBuilder2005, with 1.0GHZ CPU and 512M. In addition to this, a neural network is processed to compare the accuracy and efficiency. Several practical data sets tests are carried out to evaluate the ε-SVR’s correctness feature. It can be seen from Table 3, during the practical data sets tests, the training files consisted of those samples from 1995 to the time which is one year prior the objective year, and the testing files are made of the samples from 1995 to the year whose tobacco sale will be predicted. The actual tobacco sale from China Mainland were
Fig. 1
A sale prediction cure (ε = 0.01, g = 100 and C = 100) Table 3
training
Practical data set tests
testing
actual
e-SVR
CPU
Ave.
set
set
sale
(s)
error
220∗5
150∗6
3068
3129
2.91
7.01%
500∗6
500∗6
3267
3390
3.21
6.23%
500∗7
500∗7
3367
3411
2.99
5.99%
500∗8
500∗8
3514
3622
3.2
5.98%
900∗9
900∗9
3733
3800
3.1
3.21%
Table 4
Comparison between ε-SVR and NN
data set
algorithm
CPU/s
Ave. error
156∗10
ε-SVR
3
11.11%
NN
100
6.21%
309∗15
ε-SVR
3.99
10.76%
NN
100
6.23&
ε-SVR
3.99
7.47%
NN
150
13.98%
ε-SVR
5.91
5.99%
NN
160
19.67%
1000∗18
4211∗11
SVR mathematical model and methods for sale prediction meanwhile, the NN’s training process costs much more memory than ε-SV.
6 Discussion, conclusions and outlook This article proposes a ε-SVR to predict Chinese tobacco sale for a period of time in the future, and a visualization graphics about sale trend is also presented. The algorithms take some historical and current context data into account and present the results in the light of the user’s opinion on the degree of accuracy and cycle time. Simulated experiments and practical data sets tests are processed to evaluate the correctness. At the same time, neural network has also been introduced in comparison. It can be discovered that the ε-SVR overmatch NN in efficiency by using less CPU and memory. Our future works will be concentrated on the improvement in training, learning and prediction, including the automatic scaling for training data and testing data, kernel function selection and optimization in learning.
References [1] Frank C, Garg A, Raheja A. Forecasting women’s apparel
773 17(5): 678–687. [3] Lin C, Hsu P. Forecast of non-alcoholic beverage sales in Taiwan using the Grey theory. Asia Pacific Journal of Marking and Logistics, 2002, 14(4): 3–12. [4] Mentzer J T, Kent J L. Forecasting demand in the longaberger company. Marketing Management, 1999, 8(2): 46–51. [5] Smola A, Scholkopf B. On a kernel-based method of pattern recognition, regression, approximation and operator inversion. Algorithmica, 1998, 22: 211–231. [6] Sholkopf B, Smola A. Williamson R, et al. Shrinking the tube: a new support vector regression algorithm.
Ad-
vances in Neural Information Processing Systems, MIT Press 1999.
Yi Yang received the B.S. in Electrical Engineering Department from Fudan University in 1989, and the M.S. and Ph.D. from Information Science School of Northeastern University in 1999 and 2002 respectively. She is currently an associate professor in the Information School & Technology of Sun Yat-sen University. Her current research interests include complex system modeling and optimization, scheduling, intel-
of Clothing Science and Technology, 2003, 15(2): 107–125.
ligent algorithms and Artificial Life Computing methods. She is a senior member of CCCF, and has pub-
[2] Lee Y J, Hsieh W F, Huang C M. E-SSVR: A smooth
lished more than 50 papers in the fields of computer
support vector machine for e-insensitive regression. IEEE
intelligence based decision and support. E-mail:
[email protected]
sales using mathematical modeling. International Journal
Transactions on Knowledge and Data Engineering, 2005,