Computers & Industrial Engineering 53 (2007) 491–498 www.elsevier.com/locate/dsw
Inverse forecasting: A new approach for predictive modeling Gholam R. Amin a
a,*
, Ali Emrouznejad
b
Department of Computer Sciences, Postgraduate Engineering Center, Islamic Azad University of South Tehran Branch, Tehran, Iran b Operations & Information Management Group, Aston Business School, Aston University, Birmingham B4 7ET, UK Received 31 May 2006; received in revised form 26 April 2007; accepted 18 May 2007 Available online 25 May 2007
Abstract In the last two decades there have been substantial developments in the mathematical theory of inverse optimization problems, and their applications have expanded greatly. In parallel, time series analysis and forecasting have become increasingly important in various fields of research such as data mining, economics, business, engineering, medicine, politics, and many others. Despite the large uses of linear programming in forecasting models there is no a single application of inverse optimization reported in the forecasting literature when the time series data is available. Thus the goal of this paper is to introduce inverse optimization into forecasting field, and to provide a streamlined approach to time series analysis and forecasting using inverse linear programming. An application has been used to demonstrate the use of inverse forecasting developed in this study. Ó 2007 Elsevier Ltd. All rights reserved. Keywords: Forecasting; Inverse linear programming; Predictive modeling; Inverse optimization
1. Introduction In the recent years, inverse optimization problems attracted many operations research specialists and different kind of inverse problems have been developed by researchers (Ahuja & Orlin, 2001). On the other hand time-series modeling and forecasting continues to be an important area in both academic research and practical application. For example, Inniss (2006) developed a seasonal clustering technique for determining clusters of time series data. His model was also applied to weather and aviation data to determine probabilistic distributions of arrival capacity scenarios, which can be used for seasonal forecasting and planning (Inniss, 2006). As the practical application of time-series modeling Bermu´dez, Segura, and Vercher (2006) suggested a nonlinear multi-objective problem for forecasting of time series based on soft computing. In time series analysis, generally, historical observations on the item to be predicted are collected and analyzed to specify a model to capture the underlying data generating process and to use the model for predicting the future. Depending on the theory or assumption about the relationship in the data there are two different
*
Corresponding author. Tel.: +98 21 66946032. E-mail address:
[email protected] (G.R. Amin).
0360-8352/$ - see front matter Ó 2007 Elsevier Ltd. All rights reserved. doi:10.1016/j.cie.2007.05.007
492
G.R. Amin, A. Emrouznejad / Computers & Industrial Engineering 53 (2007) 491–498
approaches widely used in time series forecasting (Box, Jenkins, & Reinsel, 1994). The traditional approaches such as the time-series regression, exponential smoothing, and autoregressive integrated moving average (ARIMA) are based on linear models. That is, they assume the future value of a time series is linearly related to the past observations (Box et al., 1994). The ARIMA model is representative of linear models and has achieved great popularity since the publication of Box, Jenkins, and Reinsel’s classic book (Box et al., 1994). Recently, Mohammadi, Eslami, and Kahawita (2006) introduced an Auto regressive moving average (ARMA) model for river flow forecasting using a goal programming methodology for parameter estimation. Two main disadvantages in goal programming are the mathematical expression of goals and constraints and simultaneously optimizing all goals (Arikan & Gungor, 2001). Traditionally, linear statistical forecasting methods have been widely used in many real-world situations including forecasting (Mohammadi et al., 2006) since linear models are easy to develop and implement. They are also simple to understand and interpret, (Bazaraa, Jarvis, & Sherali, 2005). More recently, Taylor (2007) forecasted the daily supermarket sales using an exponential weighted regression time-series model. On the other hand in the field of operational research many applications have been reported using inverse linear programming problem that first was introduced by Zhang and Liu (1996) and further improved by Huang and Liu (1999). The first application of inverse linear programming was the shortest paths problems as developed by Burton and Toint (1992a, 1992b). Other applications include the shortest arborescence problem (Hu & Liu, 1998), maximum capacity problems (Yang & Zhang, 1998) and the maximum flow and minimum cut problems (Burkard, Klinz, & Zhang, 2001; Yang, Zhang, & Ma, 1997). Many more applications of inverse linear programming have been reported by Huang and Liu (1999). Despite the large uses of linear programming in forecasting models there is no single application reported in the literature using inverse linear programming, when the time series data is available. Thus the purpose of this study is to introduce an inverse linear programming for predicting linear time-series modeling and to use the developed model for forecasting. The rest of this paper is organized as follows: next section defines the inverse linear programming. Section 3 proposes a new method to determine the forecasting parameters using inverse linear programming. Section 4 uses a real data to illustrate the use of new methods. Section 5 concludes the paper.
2. Inverse linear programming Zhang and Liu (1996) formulated the inverse linear programming problem (ILPP) in the form of a new linear program and shown that how the solution of the new problem (which is similar to the original problem and the associated dual solutions) can be used to solve the inverse problem. Their approach has been sketched as follows. Let S denotes the set of feasible solutions for a linear programming problem, say P. Assume that the relevant specified cost vector is c and x0 is a given feasible solution. The inverse linear programming problem is to perturb the cost vector c to d so that x0 be an optimal solution of P with respect to d and kd ckp is minimized, where kd ckp is some selected Lp norm. Consider the following linear programming: min
n X
cj xj
j¼1
s:t: n X
ð1Þ aij xj ¼ bi ;
i ¼ 1; . . . ; m
j¼1
xj P 0;
j ¼ 1; . . . ; n
Suppose that x0 is a feasible solution of (1); the corresponding inverse problem under L1 norm is as follows (Ahuja & Orlin, 2001; Zhang & Liu, 1996).
G.R. Amin, A. Emrouznejad / Computers & Industrial Engineering 53 (2007) 491–498
min
493
n X ðaj þ bj Þ j¼1
s:t: m X i¼1 m X
aij pi aj þ bj þ cj ¼ cj ; aij pi aj þ bj ¼ cj ;
8j 2 L
ð2Þ
8j 2 F
i¼1
aj P 0;
bj P 0;
cj P 0;
8j 2 L;
j ¼ 1; . . . ; n; pi free; i ¼ 1; . . . ; m
where L ¼ fj : x0j ¼ 0g; F ¼ fj : x0j > 0g. In order to illustrate how the ILPP can be formulated consider the following tiny linear programming model: min 2x1 3x2 þ 0x3 þ 0x4 s:t: x1 2x2 þ x3 ¼ 2 x1 þ 3x2 þ x4 ¼ 3 xi P 0; i ¼ 1; 2; 3; 4 Obviously x0 ¼ ðx01 ; x02 ; x03 ; x04 Þ ¼ ð0; 0; 2; 3Þ is a feasible solution of the above model. Hence, the corresponding ILPP, model (2), can be written as follows: min a1 þ b1 þ a2 þ b2 þ a3 þ b3 þ a4 þ b4 s:t: p1 p2 a1 þ b1 þ c1 ¼ 2 2p1 þ 3p2 a2 þ b2 þ c2 ¼ 3 p1 a3 þ b3 ¼ 0 p2 a4 þ b4 ¼ 0 aj P 0; bj P 0;
j ¼ 1; 2; 3; 4
p1 and p2 are free One important issue related to above LPs is that x0 = (0, 0, 2, 3) is an optimal solution of the original model if and only if the optimal value of the corresponding ILPP is zero. Next we use ILPP as a base for developing a model for forecasting applications. 3. Inverse forecasting approach In forecasting models usually there are a set of parameters that must be estimated. Most of the forecasting time-series models apply the ordinary least square (OLS) method to estimate the corresponding parameters (Box et al., 1994). In this section we use inverse linear programming technique to develop an inverse based methodology as an alternative method to determine the forecasting parameters. The proposed inverse based model is an improvement to the OLS based estimated parameters which is an alternative but efficient in terms of computation procedure to estimate the forecasting parameters as accurately as possible. To enter to the theory of this paper suppose (t, yt) denotes a set of time series observations, t = 1, . . . , k and yt ¼
m X j¼1
aj fj ðtÞ þ et ;
t ¼ 1; . . . ; k
ð3Þ
494
G.R. Amin, A. Emrouznejad / Computers & Industrial Engineering 53 (2007) 491–498
indicates the selected forecasting model to predict the future value of y, where et is the random error, fj (t) is the jth function of time t(t = 1, . . . , k) and aj are the model parameters (j = 1, . . . , m). Note that Eq. (3) denotes a general linear time series forecasting model (Box et al., 1994). Using the OLS method the parameters (aj) can be estimated such that the following function is minimized. !2 k m k X X X /ða1 ; . . . ; am Þ ¼ yt aj fj ðtÞ ¼ e2t t¼1
j¼1
t¼1
That is, a1 ; . . . ; ^ am Þ min /ða1 ; . . . ; am Þ ¼ /ð^
a1 ;...;am
On the other hand, using the L1 norm, the parameters in model (3) can be estimated such that k k m X X X min jet j ¼ min aj fj ðtÞ y t a1 ;...;am a1 ;...;am t¼1 t¼1 j¼1 which is equivalent to the following linear programming. min
k X
ðpt þ qt Þ
t¼1
s:t: p t qt þ
m X
aj fj ðtÞ ¼ y t
t ¼ 1; . . . ; k
ð4Þ
j¼1
pt P 0;
t ¼ 1; . . . ; k
qt P 0;
j ¼ 1; . . . ; m
aj is free;
Notice that model (4) is always feasible and assume that x0 ¼ ðp01 ; q01 ; . . . ; p0k ; q0k ; a01 ; . . . ; a0m Þ is a feasible solution of (4). According to Section 2, the corresponding ILPP would be f ¼ min
2kþm X
ðaj þ bj Þ
j¼1
s:t: k X
aij pi aj þ bj þ cj ¼ cj ;
8j 2 L
i¼1 k X
aij pi aj þ bj ¼ cj ;
ð5Þ
8j 2 F
i¼1
aj P 0;
bj P 0;
j ¼ 1; . . . ; 2k þ m
cj P 0;
8j 2 L;
pi is free; i ¼ 1; . . . ; k
The following theorems can be expressed as a result of model (5) and the original model (4). Theorem 1. If f* = 0 then the corresponding x0 ¼ ðp01 ; q01 ; . . . ; p0k ; q0k ; a01 ; . . . ; a0m Þ solves model (4). Proof. According to the inverse notion if f* = 0 then x0 is an optimal solution of model (4). h The following theorem shows an important advantage of applying the inverse linear programming technique as an alternative method for forecasting comparing with the other methods in the literature (see for example, Inniss, 2006; Mohammadi et al., 2006; Bermu´dez et al., 2006; Taylor, 2007). The computation efficiency of the new model is that there is no need to solve any linear programming model since the optimal solution can be obtained as a compact form.
G.R. Amin, A. Emrouznejad / Computers & Industrial Engineering 53 (2007) 491–498
495
Theorem 2. An optimal solution of inverse model (5) can be obtained without the need to solve any linear program. Proof. Since cj 2 {0,1} for eachj = 1, . . . , 2k + m there are only two following cases: Case 1: F ¼ fj 2 F : cj ¼ 1g ¼ /.So, an optimal solution of (5) is given by aj ¼ bj ¼ 0 aj pi
¼
bj
¼ 0;
¼ 0;
cj ¼ cj ;
and
8j 2 L
8j 2 F
i ¼ 1; . . . ; k
Case 2: F ¼ fj 2 F : cj ¼ 1g 6¼ /. In this case the optimal value of (5) is zero if and only if the following system has a solution: k X
aij pi þ cj ¼ cj ;
8j 2 L
i¼1 k X
aij pi ¼ cj ;
ð6Þ
8j 2 F
i¼1
8j 2 L
cj P 0; pi is free;
i ¼ 1; . . . ; k
Otherwise if the set of solutions of system (6) is empty then an optimal solution of the inverse model (5) is obtained by aj ¼ bj ¼ 0 bj pi
and aj
cj ¼ cj ;
¼ cj
and
¼ 0;
¼ 0;
i ¼ 1; . . . ; k
8j 2 L 8j 2 F
Hence the optimal value is computed by f ¼
P
j2F bj
¼ jF j.
h
4. Application of inverse forecasting In this section we use a real data set to illustrate the use of inverse linear programming as the alternative method in the estimation of forecasting parameters. In additions we show how the explained inverse linear programming technique can be used for estimation of the parameters of a time series forecasting model. Besides, this section also shows that how the described inverse method can be used for the existing well known OLS forecasting parameters estimation methodology in order to estimate the forecasting parameters more accurately. The data for this application has been obtained from real time series observations in the Telecommunication of Iran. The demand for telephone lines in Iran between 1998 and 2003 is summarized in the Table 1. The other two columns show the GDP per capita and price of the product.
Table 1 Demand for telephone lines in Iran between (1998 and 2003)a Year (t)
GDP per capita, million rials (Yt)
Price, minute/rials (Pt)
Demands (Qt)
1998 1999 2000 2001 2002 2003
5.918 7.497 10.145 11.484 13.598 15.532
13.7 15.28 16.18 15.97 18.2 18.77
851,531 1,016,170 1,115,093 1,410,312 2,037,844 2,116,220
a
Source: Iran Telecommunication Company.
496
G.R. Amin, A. Emrouznejad / Computers & Industrial Engineering 53 (2007) 491–498
The question is to predict the demand for telephone lines in future. One possibility is to use a time series forecasting model like Cobb–Douglas (Box et al., 1994) as follows: Qt ¼ AY at P bt ;
t ¼ 1; . . . ; k
where A, a, and b are parameters to be estimated. Assume qt = Ln(Qt), a = Ln(A), yt = Ln(Yt), and pt = Ln(Pt). Therefore the Cobb–Douglas time-series model can be transformed to the following linear time series forecasting model: qt ¼ a þ y t a þ p t b þ ut ;
t ¼ 1; . . . ; k
where ut denotes the random error component. To approximate the parameters (a, a, b), we use the OLS method for the observations in Table 1 which provides the following system of linear equations: ^ ¼ 84:663 6^ a þ 13:903^ a þ 16:732b ^ ¼ 196:833 13:903^ a þ 32:881^ a þ 38:975b ^ ¼ 236:305 16:732^ a þ 38:975^ a þ 46:728b ^ shows an approximation of the parameters. The above system has a unique solution Note that vector ð^ a; ^ a; bÞ ^ ¼ ð8:155; 0:44; 1:77Þ. Now model (4) is given by of ð^ a; ^ a; bÞ min p1 þ q1 þ . . . þ p6 þ q6 s:t: p1 q1 þ a þ 1:78a þ 2:62b ¼ 13:655 p2 q2 þ a þ 2:02a þ 2:73b ¼ 13:83 p3 q3 þ a þ 2:32a þ 2:8b ¼ 13:93 p4 q4 þ a þ 2:44a þ 2:77b ¼ 14:16 p5 q5 þ a þ 2:61a þ 2:9b ¼ 14:53 p6 q6 þ a þ 2:74a þ 2:93b ¼ 14:56 pi P 0; qi P 0; a; a; b are free
i ¼ 1; . . . ; 6
^ ¼ ð8:155; 0:44; 1:77Þ. That is We can get the following feasible solution (x0) by taking ð^a; ^a; bÞ ^ ^; a^; bÞ x0 ¼ðp01 ; q01 ; p02 ; q02 ; p03 ; q03 ; p04 ; q04 ; p05 ; q05 ; p06 ; q06 ; a ¼ð0:085; 0; 0; 0:036; 0; 0:177; 0:026; 0; 0:09; 0; 0:013; 0; 8:155; 0:44; 1:77Þ
Note that L ¼ f2; 3; 5; 8; 10; 12g;
F ¼ f1; 4; 6; 7; 9; 11; 13; 14; 15g
Hence the corresponding inverse linear program to x0, model (5), will be
G.R. Amin, A. Emrouznejad / Computers & Industrial Engineering 53 (2007) 491–498
497
f ¼ min a1 þ b1 þ . . . þ a15 þ b15 s:t: p1 a2 þ b2 þ c2 ¼ 1; p2 a3 þ b3 þ c3 ¼ 1 p3 a5 þ b5 þ c5 ¼ 1; p4 a8 þ b8 þ c8 ¼ 1 p5 a10 þ b10 þ c10 ¼ 1; p6 a12 þ b12 þ c12 ¼ 1 p1 a1 þ b1 ¼ 1; p2 a4 þ b4 ¼ 1 p3 a6 þ b6 ¼ 1;
p4 a7 þ b7 ¼ 1
p5 a9 þ b9 ¼ 1; p6 a11 þ b11 ¼ 1 p1 þ p2 þ p3 þ p4 þ p5 þ p6 a13 þ b13 ¼ 0 1:78p1 þ 2:02p2 þ 2:32p3 þ 2:44p4 þ 2:61p5 þ 2:74p6 a14 þ b14 ¼ 0 2:62p1 þ 2:73p2 þ 2:8p3 þ 2:77p4 þ 2:9p5 þ 2:93p6 a15 þ b15 ¼ 0 pi free; i ¼ 1; . . . ; 6;
aj P 0;
bj P 0; j ¼ 1; . . . ; 15;
cj P 0; j 2 L
Now using Theorem 2 and without the needs to solve any linear programming model the optimal solution value can be obtained by f ¼ b1 þ b4 þ b6 þ b7 þ b9 þ b11 ¼ 6. According to Theorem 1 we conclude that since the optimal value of the corresponding model (5) for the observations in Table 1 is positive, f* = 6 > 0, then the estimated parameters from the OLS method could be improved. It means that in order to estimate the parameters of the forecasting model more accurately, we need another feasible solution of the corresponding model (4) for which the corresponding optimal value of the inverse model (5), f*, be less than of the current value obtained from the OLS method, f* < 6. Hence by taking a feasible solution of model (4) and having the optimal value f* < 6 for the corresponding inverse model (6) the estimation of the parameters obtained by the OLS method is possible. One possibility for obtaining such feasible solution is changing the objective function of model (4) into p1 + q1 + . . . + p6 + q6 + a1 + b1 + . . . + a15 + b15 and imposing the new constraint a1 + b1 + . . . + a15 + b15 6 f*, for this example a1 + b1 + . . . + a15 + b15 6 6. In this case if f ¼ a1 þ b1 þ . . . þ a15 þ b15 < 6 then an improvement is achieved. 5. Conclusion remarks Forecasting continues to be an important area in both academic research and practical applications. In forecasting models usually there are a set of parameters that must be estimated. Using the inverse linear programming technique this paper suggested an inverse based methodology as an alternative method for estimating of the forecasting parameters. Using the proposed model it can be seen that how the OLS based estimated parameters can be improved. It was shown that the inverse forecasting model can be used to estimate the forecasting parameters accurately as much as possible and with computational efficiency. A real time series observation used to demonstrate the advantage of the new approach. Acknowledgments The authors thank two anonymous referees and the editor of this journal for their valuable suggestions which absolutely improved the quality of the current paper. References Ahuja, R. K., & Orlin, J. B. (2001). Inverse optimization. Operations Research, 49, 771–783. Arikan, F., & Gungor, Z. (2001). An application of fuzzy goal programming to a multi-objective project network problem. Journal of Fuzzy Sets and Systems, 119, 49–58. Bazaraa, M. S., Jarvis, J. J., & Sherali, H. D. (2005). Linear programming and network flows (3rd ed.). John Wiley & Son. Bermu´dez, J. D., Segura, J. V., & Vercher, E. (2006). A decision support system methodology for forecasting of time series based on soft computing. Computational Statistics & Data Analysis, 51, 177–191.
498
G.R. Amin, A. Emrouznejad / Computers & Industrial Engineering 53 (2007) 491–498
Box, G. E. P., Jenkins, G. M., & Reinsel, G. C. (1994). Time series analysis (3rd ed.). Englewood Cliffs, NJ: Prentice-Hall. Burkard, R. E., Klinz, B., & Zhang, J. (2001). Bottleneck capacity expansion problems with general budget constraint. RAIRO Operations Research, 35, 1–20. Burton, D., & Toint, Ph. L. (1992a). On an instance of the inverse shortest paths problem. Mathematical Programming, 53, 45–61. Burton, D., & Toint, Ph. L. (1992b). On the use of an inverse shortest paths algorithm for recovering linearly correlated costs. Mathematical Programming, 63, 1–22. Hu, Z., & Liu, Z. (1998). A strongly polynomial algorithm for the inverse shortest arborescence problem. Discrete Applied Mathematics, 82, 135–154. Huang, S., & Liu, Z. (1999). On the inverse problem of linear programming and its application to minimum weight perfect k-matching. European Journal of Operational Research, 112, 421–426. Inniss, T. R. (2006). Seasonal clustering technique for time series data. European Journal of Operational Research, 175, 376–384. Mohammadi, K., Eslami, H. R., & Kahawita, R. (2006). Parameter estimation of an ARMA model for river flow forecasting using goal programming. Journal of Hydrology, 331, 293–299. Taylor, J. W. (2007). Forecasting daily supermarket sales using exponentially weighted quantile regression. European Journal of Operational Research, 178, 154–167. Yang, C., & Zhang, J. (1998). Inverse maximum capacity problems. OR Spectrum, 20, 97–100. Yang, C., Zhang, J., & Ma, Z. (1997). Inverse maximum flow and minimum cut problems. Optimization, 40, 147–170. Zhang, J., & Liu, Z. (1996). Calculating some inverse linear programming problems. Journal of Computational Applied Mathematics, 72, 261–273.