An improved fuzzy forecasting method for seasonal time series

An improved fuzzy forecasting method for seasonal time series

Expert Systems with Applications 37 (2010) 6310–6318 Contents lists available at ScienceDirect Expert Systems with Applications journal homepage: ww...

279KB Sizes 1 Downloads 168 Views

Expert Systems with Applications 37 (2010) 6310–6318

Contents lists available at ScienceDirect

Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa

An improved fuzzy forecasting method for seasonal time series Hao-Tien Liu *, Mao-Len Wei Department of Industrial Engineering and Management, I-Shou University, No. 1, Sec. 1, Syuecheng Rd., Dashu Township, Kaohsiung County 840, Taiwan, ROC

a r t i c l e Keywords: Fuzzy time series Forecasting Time-variant Seasonality Interval length Window base

i n f o

a b s t r a c t Several time-variant fuzzy time series models have been developed during the last decade. These models usually focus on forecasting stationary of trend time series, but they are not suitable for forecasting seasonal time series. Furthermore, several factors that affect the forecasting accuracy are not carefully examined, such as interval length, interval number, and level of window base. Aiming to solve these issues, the goal of this study is to develop an improved fuzzy time series forecasting method that can effectively deal with seasonal time series. The proposed method can determine appropriate length interval. Moreover, a systematic search algorithm is used to find the best window base. The proposed method can provide decision analysts with more precise forecasted values. Two numerical data sets are employed to illustrate the proposed method and to compare the forecasting accuracy between the proposed method and four fuzzy time series methods. The results of the comparison indicate that the proposed method produces more accurate forecasted results. Ó 2010 Elsevier Ltd. All rights reserved.

1. Introduction With the fast development of new technologies and the stiff competition among the various enterprises, the whole business environment has become more dynamic and unstable. It is crucial for the enterprises to arrive at accurate and quick-response decisions. Hence, effective forecasting is essential for the enterprises to identify future technological trends and customer demands. In general, there are two kinds of quantitative forecasting models: the casual and the time series models. Casual models (such as the regression analysis) focus on identifying all influential factors and their relationships with the outputs. The resulting input–output relationships (or equations) can be employed to forecast. In contrast, by collecting a series of data observed during a time period, the time series models are able to forecast based on the concept that the future can be inferred from patterns of the past. In this study, the time series model is our primary subject of interest. When the historical data are incomplete, uncertain, or vague, the traditional time series models do not usually provide accurate forecast. To address this problem, Song and Chissom (1993a, 1993b) first proposed the concept of fuzzy time series since the fuzzy set theory can effectively deal with incomplete, uncertain and fuzzy data. Song and Chissom (1993b) divided the fuzzy time series into two types namely, the time-variant and time-invariant. Their difference depends on whether the same relation exists between any time t and its prior time t  1. If the relations are all * Corresponding author. Tel.: +886 7 6577711x5515; fax: +886 7 6578536. E-mail address: [email protected] (H.-T. Liu). 0957-4174/$ - see front matter Ó 2010 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2010.02.090

the same, it is a time-invariant fuzzy time series; otherwise, it is time-variant. In time-invariant fuzzy time series, a number of researchers have developed various forecasting methods (Chen, 1996; Chen, 2002; Chen, Cheng, & Teoh, 2007; Chen & Chung, 2006; Cheng, Chen, Teoh, & Chiang, 2008; Huarng, 2001a; Huarng & Yu, 2005; Lee & Chou, 2004; Lee, Wang, Chen, & Leu, 2006; Li & Cheng, 2007; Own & Yu, 2005; Singh, 2007b; Song & Chissom, 1993b; Tsaur, Yang, & Wang, 2005; Yu, 2005). Their forecasting methods often form fuzzy logical e m; A ei; . . . ; A ek ! A e m ; or other variations ei ! A relationships (such as A based on historical data), and group them as heuristic rules to derive the forecasted values. However, in theory, recent data should be more important than older data, so that the relation to time t and to time t  1 should not be the same. In addition, time-variant series should be more sensitive to fluctuation of data than timeinvariant series. Therefore, time-variant fuzzy time series is the primary research subject of this article. In time-variant fuzzy time series, Song and Chissom (1994) indicated that the difference between time-variant and timeinvariant fuzzy time series was that a window base variable w was added to the time-variant forecasting model. With the change of w, the fuzzy relation vector acquired also varied. Hwang, Chen, and Lee (1998) used Song and Chissom’s (1994) method as a basis to calculate the variations of the historical data, from which the corresponding linguistic variables are constructed. As for the recognition of window base, one level is extended to w level. Chen and Hwang (2000) further extended Hwang et al.’s (1998) method, and applied it to forecast daily temperature. Two factors that influence temperature are considered: daily average temperature and

H.-T. Liu, M.-L. Wei / Expert Systems with Applications 37 (2010) 6310–6318

daily cloud density. Their two-factor forecasting model can accurately predict daily temperature. Chen and Wu (2003) proposed a new approach on optimal forecasting by using the fuzzy set theory and soft computing methods. Their method was based on the concepts of fuzzy membership function as well as the natural selection of evolution theory. Hong (2005) proposed a nonhomogeneous fuzzy time series model using the weakest t-norm based algebraic fuzzy operations. Hong’s model used the forward and backward linguistic difference of various orders to compute the forecasted value. Singh (2007a) proposed a simple time-variant method which used the difference operator to develop fuzzy relation rules. From a review of related studies, two issues can be further investigated. Firstly, the previous time-variant studies usually use the time series data at time period t  1, t  2, . . . , t  n (n is often <5) to forecast the value at time period t. For the stationary or trend type of time series, their models can provide satisfactory results. However, when the time series data have a seasonal pattern, their approaches can generate large forecasting errors. The reason for this is because seasonal variations can be quarterly or yearly, whereas the previous studies merely utilize several past data ðn < 5Þ to predict the future. Thus, their models cannot respond to the seasonal variations quickly and accurately. Secondly, the interval length, interval number, or window base are often identified subjectively in previous studies, and thus lack clear standards of judgment. Huarng (2001b) and Huarng and Yu (2006) argued that different lengths and numbers of intervals may affect the forecasting accuracy. Moreover, it is observed from their examples that different window bases can also have impacts on the forecasting precision. Aiming to solve the two issues, the present study intends to develop an improved fuzzy time series forecasting method based on Hwang et al.’s (1998) model since they provide a better forecasting framework and provide more accurate forecasted results. The proposed method can effectively deal with the time series data with seasonal pattern. Furthermore, the proposed method can determine appropriate interval length, interval number, and window base to increase the precision of forecast. To evaluate the performance of the proposed method, two numerical data sets are employed for comparing the forecasting accuracy between the proposed method and four fuzzy time series methods (Chen, 1996; Cheng et al., 2008; Hwang et al., 1998; Lee & Chou, 2004). The rest of the present paper is organized as follows. Section 2 briefly introduces the definitions of fuzzy time series and reviews Hwang et al.’s method. Section 3 describes the proposed fuzzy time series forecasting method. Section 4 illustrates the research steps of the proposed method using two numerical examples, and then compares the forecasting accuracy between the proposed method and the four fuzzy time series methods. The last section concludes the present research. 2. Fuzzy time series and Hwang et al.’s method Song and Chissom (1993a, 1993b, 1994) defined their fuzzy time series in terms of discrete fuzzy sets. Let U be the universe e i of U is deof discourse, where U ¼ fu1 ; u2 ; . . . ; un g. A fuzzy set A fined by

e i ¼ l ðu1 Þ=u1 þ l ðu2 Þ=u2 þ    þ l ðun Þ=un ; A e e e Ai

Ai

Ai

ð1Þ

e i ; l : U ! [0, 1]. l ðui Þ where le is the membership function of A eA i eA i Ai e i ; l ðui Þ 2 ½0; 1 and denotes the membership value of ui in A eA i 1 6 i 6 n. Song and Chissom (1993a, 1993b, 1994) proposed the definitions of the fuzzy time series. The definitions are described as follows:

6311

Definition 1. YðtÞðt ¼ . . . ; 0; 1; 2; . . .Þ, is a subset of R. Let Y(t) be the universe of discourse defined by the fuzzy set li ðtÞ. If F(t) consists of li ðtÞði ¼ 1; 2; . . .Þ; FðtÞ is called a fuzzy time series on Y(t). Definition 2. If there exists a fuzzy relationship R(t  1,t), such that FðtÞ ¼ Fðt  1Þ  Rðt  1; tÞ, where  is an arithmetic operator, then F(t) is said to be caused by Fðt  1Þ. The relationship between F(t) and Fðt  1Þ can be denoted by Fðt  1Þ ! FðtÞ. Definition 3. Suppose F(t) is calculated by F(t  1) only, and FðtÞ ¼ Fðt  1Þ  Rðt  1; tÞ. For any t, if Rðt  1; tÞ is independent of t, then F(t) is considered as a time-invariant fuzzy time series. Otherwise, F(t) is time-variant. As mentioned in Section 1, this study uses Hwang et al.’s (1998) model as a basis to develop the proposed method. The steps of their method are briefly introduced as follows: (1) Compute the variations of the historical data. (2) Partition the universe of discourse U into several equal length intervals. (3) Define fuzzy sets on U. (4) Fuzzify the variations of historical data. (5) Select a window base w and calculate the fuzzy forecasted variations. Here, the operation matrix OW ðtÞ and the criterion matrix C(t) are selected to compute the fuzzy forecasted variation F(t). (6) Defuzzify the derived fuzzy forecasted variations F(t). (7) Calculate the forecasted value. The actual forecasted value in time t is computed by adding the defuzzified forecasted variation to the actual value in time t  1. 3. Proposed fuzzy time series forecasting method The objective of the present study is to develop an improved fuzzy time series model for forecasting the seasonal pattern of data. Generally, there are two different approaches for forecasting seasonal time series. The first approach is to directly forecast what the seasonal time series will be in the future. The second approach is to remove the seasonal variation from the time series data, and then to apply ordinary forecasting methods. The present research employs the latter approach since it has the advantage of easy computation than the former approach. Moreover, the latter approach can be easily extended from the present fuzzy time series models. Hence, the present study intends to remove the seasonal variation (seasonal index) from a time series by deseasonalizing the time series data. Here, we can employ the ratioto-moving-average method (Mansfield, 1994) to calculate the seasonal indices and remove them from a time series. After deseasonalization, the time series will become either stationary or trend. As described in Section 1, we use Hwang et al.’s (1998) method as a foundation to develop the proposed fuzzy forecasting method. In their method, the interval length and window base are identified subjectively, and thus lack clear suggestion. However, both the interval length and window base may greatly affect the forecasted results. The determination of both factors must be carefully considered. To resolve the subjective interval length problem, Huarng (2001b) designed an average-based length method that can effectively determine appropriate interval length. Hence, the present research employs Huarng’s method to determine appropriate interval length. To solve the subjective window base problem, the current study designs a systematic search algorithm to assess the forecasting accuracy of each window base. The window base with the highest forecasting accuracy will be the best window base.

6312

H.-T. Liu, M.-L. Wei / Expert Systems with Applications 37 (2010) 6310–6318

In summary, the concept of the proposed method can be described as follows: At first, the proposed method removes the seasonal variation from the seasonal time series by using the ratio-to-moving-average method. The average-based length method is then applied to determine the appropriate size of the interval length. The modified Hwang et al.’s method is used to forecast the deseasonalized data. The final forecasted values are obtained by combining both the seasonal variations and the deseasonalized forecasted values. The detailed research steps are introduced as follows: Step 1: Collect the seasonal time series data Sdt ; t ¼ 1; 2; . . . ; n: Step 2: Deseasonalize the historical data. Apply the ratio-to-moving-average method to remove the seasonality from the time series data. This method consists of the following steps: (1) Calculate a k period moving average from the first datum to the last datum (assume the seasonal time period is k). (2) If the seasonal time period k is odd, go to step (3). Otherwise, compute the centered moving average that is calculated from the average of the two consecutive moving averages in step (1). (3) Divide the actual value of each time period by the corresponding (centered) moving average. (4) Calculate the median of the ratios for each time period. (5) Compute the seasonal index ðSit Þ by adjusting each period’s median so that the mean value of the k medians is equal to 1. (6) Obtain the deseasonalized data by dividing the actual value of each time period by its corresponding seasonal index. Step 3: Calculate the variations of the deseasonalized data. The variation V t of the deseasonalized data between time t and t  1 is computed below:

V t ¼ Dst  Dst1 ;

ð2Þ

where Dst indicates the deseasonalized data at time t, t ¼ 2; 3; . . . ; n. Step 4: Define the universe of discourse. Find the maximum Dmax and the minimum Dmin among all V t . The universe of discourse U is then defined as follows:

U ¼ ½Dmin ; Dmax :

ð3Þ

Step 5: Determine appropriate interval length. Here, we apply the average-based length method to determine the appropriate interval length l. This method can be described in the following steps: (1) Calculate all the absolute differences between the variations V t1 and V t as the first differences, and then compute the average of the first differences. (2) Take one half of the average (in step (1)) as the length. (3) Find the located range of the length and determine the base (Table 1). (4) According to the determined base, round the length as the appropriate interval length.

Step 6: Define fuzzy sets. For easy partitioning of U, two small numbers D1 and D2 are selected. The number of intervals (fuzzy sets) is calculated by

Number of intervals ¼ ðDmax þ D2  Dmin  D1 Þ=l: Furthermore, the fuzzy sets are defined by

e i ¼ l ðu1 Þ=u1 þ l ðu2 Þ=u2 þ    þ l ðum Þ=um ; A e e e Ai

Range

Base

0.1–1.0 1.1–10 11–100 101–1000

0.1 1 10 100

Ai

ð5Þ

Ai

e i represents fuzzy set i. where A Step 7: Fuzzify the variations. If the variation V t is within the scope e j . All of the variations of uj , then it belongs to fuzzy set A must be classified into the corresponding fuzzy sets. Step 8: Calculate the fuzzy time series F(t) at window base w. The window base has to be larger than or equal to 2 in order to perform a fuzzy composition operation. Therefore, w is set as 2 initially. Let C(t) be the criterion matrix of F(t) and OW ðtÞ be the operation matrix at window base w.

CðtÞ ¼ Fðt  1Þ ¼ ½ C 1 C 2 . . . C m ; 3 2 2 Fðt  2Þ O11 O12 6 Fðt  3Þ 7 6 O O22 21 7 6 6 7¼6 Ow ðtÞ ¼ 6 .. .. .. 7 6 6 5 4 4 . . . Oðw1Þ1 Oðw1Þ2 Fðt  wÞ

... ... .. .

ð6Þ 3

O1m O2m .. .

7 7 7; 7 5

. . . Oðw1Þm ð7Þ

where C j indicates the membership value at the interval uj e i ; j ¼ 1; 2; . . . ; m. within fuzzy set A The fuzzy relation matrix R(t) is computed by performing the following fuzzy composition operation.

RðtÞ ¼ Ow ðtÞ  CðtÞ 2 O11  C 1 6 O C 21 1 6 ¼6 .. 6 4 .

O12  C 2



O1m  C m

3

O22  C 2 .. .

 .. .

O2m  C m .. .

7 7 7 7 5

Oðw1Þ1  C 1 Oðw1Þ2  C 2    Oðw1Þm  C m 3 R11 R12  R1m 6 R R22  R2m 7 7 6 21 7: ¼6 .. .. .. 7 6 .. 5 4 . . . . 2

Rðw1Þ1

Rðw1Þ2

ð8Þ

   Rðw1Þm

To obtain FðtÞ, we can calculate the maximum of every column in R(t).   FðtÞ ¼ maxðR11 ; R21 ; . . . ; Rðw1Þ1 Þ; . . . ; maxðR1m ; R2m ; . . . ; Rðw1Þm Þ ¼ ½f1 ; f2 ; . . . ; fm : ð9Þ

Step 9: Calculate the forecasted value.In F(t), suppose there are k nonzero values corresponding to intervals u1 ; u2 ; . . . ; uk , with their midpoints m1 ; m2 ; . . . ; mk , respectively. Unlike Hwang et al.’s defuzzified method, we utilize a more advanced formula, weighted average method, to calculate the defuzzified variation C v t :

Cv t ¼

Table 1 Base mapping table.

ð4Þ

ðf1 m1 þ f2 m2 þ . . . þ fk mk Þ : f1 þ f2 þ . . . þ fk

ð10Þ

The forecasted value F v t at time t is computed as follows:

F v t ¼ ðC v t þ Dst1 Þ  Sit :

ð11Þ

Notice that the forecasted value F v t is obtained based on the specified window base w.

v alue ¼ 1

Table 2 12-Month moving average, centered 12-month moving average, and a ratio of actual value.

32765.3 33051.0 33606.4 33952.8 34013.1 33845.8 33428.0 33722.4 33634.2 33484.8 33010.0 32419.9 31804.4 31269.3 30409.6 29669.0 29023.0 28298.1

32638.3 32908.1 33328.7 33779.6 33982.9 33929.5 33636.9 33575.2 33678.3 33559.5 33247.4 32715.0 32112.2 31536.8 30839.4 30039.3 29346.0 28660.5

0.961 0.752 1.005 0.913 1.056 1.071 1.034 1.037 1.073 1.059 1.044 1.144 0.820 0.897 1.052 0.967 1.029 1.021

6313

31357 24747 33509 30847 35893 36340 34777 34825 36153 35552 34718 37432 26343 28280 32450 29055 30195 29259 27391 28403 25837 26665 26966 28733

H.-T. Liu, M.-L. Wei / Expert Systems with Applications 37 (2010) 6310–6318

Step 10: Compute the best forecasted value. To obtain the best forecasted value, the proposed method uses the following search algorithm to identify the best window base.

best For w ¼ 2 to n  2

v t  Sdt j nw1

Pnw1 jF t¼1

v alue ¼ MADw

v alue > MADw

MADw ¼ IF best THEN best AND best window base ¼ w Next w

January 2000 February 2000 March 2000 April 2000 May 2000 June 2000 July 2000 August 2000 September 2000 October 2000 November 2000 December 2000 January 2001 February 2001 March 2001 April 2001 May 2001 June 2001 July 2001 August 2001 September 2001 October 2001 November 2001 December 2001 29120.6 29036.2 28846.9 28868.9 28943.6 28995.4 29091.4 29285.4 29394.3 29469.1 29781.8 30314.8 30806.8 31125.2 31433.0 31589.1 31794.5 32267.5

Through this algorithm, we can obtain the best window bases that allow us to calculate the best forecasted value at time t.

A ratio of actual value

29034.0 29207.3 28865.1 28828.7 28909.2 28978.0 29012.8 29170.0 29400.8 29387.9 29550.3 30013.2 30616.3 30997.2 31253.2 31612.9 31565.3 32023.8 32511.3

4. Empirical analysis

Centered 12-month moving average 24708 25781 29629 30452 29566 30072 29843 28627 29642 29447 28440 32201 26787 21675 29192 31418 30392 30490 31729 31396 29488 31396 33994 39439

The present study demonstrates the application of the proposed method and compared the accuracy of its forecasted results with those results obtained by the methods (Chen, 1996; Cheng et al., 2008; Hwang et al., 1998; Lee & Chou, 2004). Here, two seasonal time series data set are employed: the production value of the machinery industry in Taiwan, and the monthly sales of soft drinks.

12-Month moving average January 1998 February 1998 March 1998 April 1998 May 1998 June 1998 July 1998 August 1998 September 1998 October 1998 November 1998 December 1998 January 1999 February 1999 March 1999 April 1999 May 1999 June 1999 July 1999 August 1999 September 1999 October 1999 November 1999 December 1999

4.1. Production value of the machinery industry in Taiwan

Production value

1.025 0.986 1.028 1.020 0.983 1.111 0.921 0.740 0.993 1.066 1.020 1.006 1.030 1.009 0.938 0.994 1.069 1.222

Month A ratio of actual value Centered 12-month moving average Production value

12-Month moving average Month

The mechanical industry is the crucial component of the entire manufacturing industry. The machinery industry in Taiwan has made steady progress over the past several decades. It has played a critical supporting role within the manufacturing industry in Taiwan, both in the areas of manufacturing and the export trade. The time series data of the production value of Taiwan’s machinery industry from January 1998 to December 2001 (Industrial production statistics monthly Taiwan area, 1998-2001) are drawn in Fig. 1. Fig. 1 shows that the machinery industry has a periodic cycle (seasonal pattern). The sharp drop point for each year occurs on January or February since the plants are shut down for about a week around the Chinese New Year. The following steps will illustrate the proposed method for forecasting the production value of Taiwan’s machinery industry.

45000 40000 35000 30000 25000 20000 15000 10000 5000 0 98 98 98 99 99 99 0 0 0 0 0 0 0 1 0 1 0 1 5/ 1/ 9/ 5/ 1/ 9 / 1 /' 5 /' 9 /' 1 /' 5 /' 9 /' Month/Year

Fig. 1. Production value of the machinery industry in Taiwan (unit: million NT dollars).

Production value

6314

H.-T. Liu, M.-L. Wei / Expert Systems with Applications 37 (2010) 6310–6318

Steps 1–2: Collect the time series data of the production value of the machinery industry in Taiwan, and then apply the ratio-to-moving-average method to deseasonalize the time series data. According to Fig. 1, the seasonal pattern seems to be a 12-month cycle. Hence, we need to calculate the 12-month moving average, the centered 12-month moving average, and a ratio of the actual value, as shown in Table 2. By calculating the medians of the ratio and adjusting these medians, we can obtain the seasonal index of each month (Table 3). The deseasonalized value of each month is then computed by dividing the actual production value by its corresponding seasonal index. Steps 3–4: Calculate the variations of the deseasonalized data. The computation results are shown in Table 4.

January February March April May June July August September October November December

U ¼ ½5317; 8974:

Step 5: Apply the average-based length method to determine the appropriate interval length l. Based on Table 4, calculate the absolute difference between the variations V t1 and V t , and then find the average of all the absolute differences, which is 1762. Take one half of 1762, which is 881, as the length that is located in the range [101, 1000]. Based on Table 1, the appropriate base is 100. Next, round the length 881 by base 100, which is 900. Thus, the appropriate interval length is defined as 900. Step 6: Assume D1 ¼ 83 and D2 ¼ 26. Use Eq. (4) to calculate the number of intervals (fuzzy sets), which is 16. Therefore, there are 16 intervals (fuzzy sets) which are u1 ¼ ½5400; 4500; u2 ¼ ½4500; 3600; u3 ¼ ½3600; 2700; u4 ¼ ½2700; 1800; . . . ; u15 ¼ ½7200; 8100, and u16 ¼ ½8100; 9000. The fuzzy sets can be defined as follows:

Table 3 Seasonal index of each month. Month

Table 4 shows that the maximum and the minimum of the variations are 8974 ðDmax Þ and -5317 ðDmin Þ, respectively. The universe of discourse U can be defined as follows:

e 1 ¼ 1=u1 þ 0:5=u2 þ 0=u3 þ 0=u4 þ 0=u5 þ 0=u6 þ    þ 0=u16 ; A

Ratio

Median

1998

1999

2000

2001

0.961 0.752 1.005 0.913 1.056 1.071 1.034 1.037 1.073 1.059 1.044 1.144

0.820 0.897 1.052 0.967 1.029 1.021

1.025 0.986 1.028 1.020 0.983 1.111

0.921 0.740 0.993 1.066 1.020 1.006 1.030 1.009 0.938 0.994 1.069 1.222

Sum

Seasonal index

e 2 ¼ 0:5=u1 þ 1=u2 þ 0:5=u3 þ 0=u4 þ 0=u5 þ 0=u6 þ    þ 0=u16 ; A 0.921 0.752 1.005 0.967 1.029 1.021 1.030 1.009 1.028 1.020 1.044 1.144

0.923 0.754 1.008 0.970 1.032 1.023 1.033 1.011 1.030 1.023 1.047 1.147

11.970

12.000

e 3 ¼ 0=u1 þ 0:5=u2 þ 1=u3 þ 0:5=u4 þ 0=u5 þ 0=u6 þ    þ 0=u16 ; A e 4 ¼ 0=u1 þ 0=u2 þ 0:5=u3 þ 1=u4 þ 0:5=u5 þ 0=u6 þ    þ 0=u16 ; A :

:

e 15 ¼ 0=u1 þ 0=u2 þ 0=u3 þ    þ 0=u13 þ 0:5=u14 þ 1=u15 þ 0:5=u16 ; A

e 16 ¼ 0=u1 þ 0=u2 þ 0=u3 þ    þ 0=u13 þ 0=u14 þ 0:5=u15 þ 1=u16 : A

Table 4 Deseasonalized value and variation of each month. Month

Deseasonalized value

Variation

Month

Deseasonalized value

Variation

January 1998 February 1998 March 1998 April 1998 May 1998 June 1998 July 1998 August 1998 September 1998 October 1998 November 1998 December 1998 January 1999 February 1999 March 1999 April 1999 May 1999 June 1999 July 1999 August 1999 September 1999 October 1999 November 1999 December 1999

26766 34197 29395 31404 28662 29382 28902 28308 28774 28796 27166 28072 29018 28750 28962 32400 29463 29791 30729 31047 28624 30702 32472 34382

– 7431 4802 2009 2742 720 480 594 466 22 1630 906 946 268 212 3438 2937 328 938 318 2423 2078 1770 1910

January 2000 February 2000 March 2000 April 2000 May 2000 June 2000 July 2000 August 2000 September 2000 October 2000 November 2000 December 2000 January 2001 February 2001 March 2001 April 2001 May 2001 June 2001 July 2001 August 2001 September 2001 October 2001 November 2001 December 2001

33969 32825 33245 31811 34796 35507 33681 34437 35094 34766 33163 32632 28537 37511 32194 29963 29272 28588 26528 28087 25080 26075 25758 25049

413 1144 420 1434 2985 711 1826 756 657 328 1603 531 4095 8974 5317 2231 691 684 2060 1559 3007 995 317 709

6315

H.-T. Liu, M.-L. Wei / Expert Systems with Applications 37 (2010) 6310–6318 Table 5 Corresponding fuzzy set for each month. Month

Deseasonalized value

Variation

Fuzzy set

January 1998

26766





January 2000

February 1998

34197

7431

February 2000

March 1998

29395

4802

e 15 A e1 A

March 2000

33245

420

April 1998

31404

2009

April 2000

31811

1434

May 1998

28662

2742

e9 A e3 A

May 2000

34796

2985

June 1998

29382

720

June 2000

35507

711

July 1998

28902

480

e7 A e6 A

July 2000

33681

1826

August 1998

28308

594

August 2000

34437

756

September 1998

28774

466

e6 A e7 A

September 2000

35094

657

October 1998

28796

22

October 2000

34766

328

November 1998

27166

1630

e7 A e5 A

November 2000

33163

1603

December 1998

28072

906

December 2000

32632

531

January 1999

29018

946

e8 A e8 A

January 2001

28537

4095

February 1999

28750

268

February 2001

37511

8974

March 1999

28962

212

e6 A e7 A

March 2001

32194

5317

April 1999

32400

3438

April 2001

29963

2231

May 1999

29463

2937

e 10 A e A3

May 2001

29272

691

June 1999

29791

328

June 2001

28588

684

July 1999

30729

938

e7 A e8 A

July 2001

26528

2060

August 1999

31047

318

August 2001

28087

1559

September 1999

28624

2423

e7 A e4 A

September 2001

25080

3007

October 1999

30702

2078

October 2001

26075

995

November 1999

32472

1770

e9 A e8 A

November 2001

25758

317

December 1999

34382

1910

e9 A

December 2001

25049

709

Step 7: Fuzzify the variations in Table 4. The corresponding fuzzy sets are shown in Table 5. Step 8: Suppose we want to calculate the forecasted production value in January 2000 ðF v January 2000 Þ. Assume the window base is 3. The criterion matrix C(January 2000) is F(December 1999).

CðJanuary 2000Þ e 9 ¼ FðDecember 1999Þ ¼ ½ A ¼ ½ 0 0 0 0 0 0 0 0:5 1 0:5 0    0 : Meanwhile, the composition matrix is O3 (January 2000) which is composed of F(November 1999) and F(October 1999).

O3 ðJanuary 2000Þ   "e # FðNovember 1999Þ A8 ¼ ¼ e9 FðOctober 1999Þ A   0 0 0 0 0 0 0:5 1 0:5 0 0    0 ¼ : 0 0 0 0 0 0 0 0:5 1 0:5 0    0 Use Eq. (8) to compute R(January 2000).

RðJanuary 2000Þ ¼ O3 ðJanuary 2000Þ  CðJanuary 2000Þ   0 0 0 0 0 0 0 0:5 0:5 0 0 0  0 ¼ : 0 0 0 0 0 0 0 0:25 1 0:25 0 0    0 Find the maximum at each column of R(January 2000), which is F(January 2000).

FðJanuary 2000Þ ¼ ½ 0 0 0 0 0 0 0 0:5 1 0:25 0 0    0 :

Month

Deseasonalized value

Variation

Fuzzy set

33969

413

32825

1144

e6 A e A5 e7 A e5 A e 10 A e7 A e4 A e7 A e7 A e6 A e5 A e6 A e2 A e 16 A e1 A e4 A e6 A e6 A e4 A e8 A e3 A e8 A e6 A e6 A

Step 9: In F(January 2000), there are three nonzero intervals: u8 ; u9 , and u10 , while their interval midpoints are 1350, 2250, and 3150, respectively. The computation of C v January 2000 is shown as follows:

C v January 2000 ¼

0:5  1350 þ 1  2250 þ 0:25  3150 ¼ 2121:4: 0:5 þ 1 þ 0:25

Next, use Eq. (11) to calculate the forecasted value of F v January 2000 .

F v January 2000 ¼ ðC v January 2000 þ DsDecember 1999 Þ  SiJanuary 2000 ¼ 33693: Step 10: By using the systematic search algorithm, we find that the MADw value is the smallest when the window base is 2. Based on this window base, we can calculate the best forecasted sales volume for all the data. To evaluate the performance of the proposed method, four fuzzy time series methods (Chen, 1996; Cheng et al., 2008; Hwang et al., 1998; Lee & Chou, 2004) are employed to compare their forecasted results with those obtained by the proposed method, since the four methods have provided satisfactory forecasted results in their empirical analysis. In this case, we use the production value from January 2000 to December 2001 as the evaluation period for comparing the forecasted values between the four methods and the proposed method. The forecasted results of the five methods are shown in Table 6. Notice that the w value in Hwang et al.’s method is defined between 2 and 4, since their method forecasts better within this interval in their own empirical analysis. To evaluate the forecasting accuracy of the five methods, two evaluation indices, namely the mean absolute deviation (MAD) and the mean absolute percent error (MAPE), are employed for comparing the forecasted results between the proposed method and the four fuzzy forecasting methods. The formulas of both indices are listed as follows:

6316

H.-T. Liu, M.-L. Wei / Expert Systems with Applications 37 (2010) 6310–6318

Table 6 Forecasted production value of the five methods. Month

Production value

Chen’s method

Hwang et al.’s method ( at w = 2)

Hwang et al.’s method ( at w = 3)

Hwang et al.’s method ( at w = 4)

Lee and Chou’s method

Cheng et al.’s method

The proposed method

January 2000 February 2000 March 2000 April 2000 May 2000 June 2000 July 2000 August 2000 September 2000 October 2000 November 2000 December 2000 January 2001 February 2001 March 2001 April 2001 May 2001 June 2001 July 2001 August 2001 September 2001 October 2001 November 2001 December 2001

31357 24747 33509 30847 35893 36340 34777 34825 36153 35552 34718 37432 26343 28280 32450 29055 30195 29259 27391 28403 25837 26665 26966 28733

27750 30450 29100 32250 30450 38550 38550 38550 38550 38550 38550 38550 27750 29100 27750 32250 27750 30450 30450 27750 27750 29100 27750 27750

43639 31357 16347 33509 30847 36493 38740 33577 33625 36753 36152 35318 39832 26343 28280 34850 29655 28995 29859 26191 27203 24637 25465 27566

43639 31357 16347 33509 26047 41893 36940 33577 35425 36753 36152 35318 39832 26343 30680 34850 27855 30795 29859 26191 29003 22837 27265 27566

43639 31357 16347 39509 26047 41893 36940 31777 35425 36753 36152 35318 39832 26343 28880 36650 27855 30795 29859 24391 29003 22837 27265 27566

30556 30556 29165 33338 30556 31483 31483 31483 31483 31483 31483 33338 29165 29165 27774 33338 27774 30556 30556 27774 27774 29165 27774 27774

33876 31358 26530 32429 31103 35672 35895 35460 35484 35802 35848 35431 32873 26817 29306 31900 29694 30021 29796 27852 29368 26564 26978 27640

33400 25609 32179 31800 32350 35612 36662 34060 35476 36348 36396 37009 29293 21514 37810 31218 30908 28577 29054 25461 28934 25647 27298 30063

Table 7 MAD and MAPE values of the five methods for the production value example. Index

Chen’s method

Hwang et al.’s method ( at w = 2)

Hwang et al.’s method ( at w = 3)

Hwang et al.’s method ( at w = 4)

Lee and Chou’s method

Cheng et al.’s method

The proposed method

MAD MAPE

2675.1 8.60%

3889.2 12.84%

4292.1 14.10%

4842.1 15.88%

3030.1 9.64%

2162.1 7.27%

1862.4 6.19%

Pn MAD ¼

t¼1 jF

v t  Sdt j n

;

MAPE ¼

n 1X jF v t  Sdt j : n t¼1 Sdt

ð12Þ

In order to further demonstrate the performance of the proposed method, we employ another seasonal time series data from

The results of the comparison are shown in Table 7. Table 7 indicates that the proposed method has the smallest MAD (1862.4) and MAPE (6.19%) among the five methods. Hence, both indices show that the proposed method can increase the forecasting accuracy of the production value compared to those obtained from the four fuzzy time series methods.

120 100 Monthly sales

80 60 40 20

9/ 75

5/ 75

1/ 75

9/ 74

5/ 74

1/ 74

9/ 73

5/ 73

1/ 73

9/ 72

5/ 72

0 1/ 72

4.2. Sales volume of soft drinks

Month/Year Fig. 2. Monthly sales volume of 32-oz soft drinks (unit: hundreds of cases).

Table 8 Monthly sales volume of soft drinks. Month

Monthly sales

Month

Monthly sales

January 1972 February 1972 March 1972 April 1972 May 1972 June 1972 July 1972 August 1972 September 1972 October 1972 November 1972 December 1972 January 1973 February 1973 March 1973 April 1973 May 1973 June 1973 July 1973 August 1973 September 1973 October 1973 November 1973 December 1973

28 31 36 43 46 52 55 59 58 55 47 40 35 40 46 55 60 68 72 75 70 66 58 50

January 1974 February 1974 March 1974 April 1974 May 1974 June 1974 July 1974 August 1974 September 1974 October 1974 November 1974 December 1974 January 1975 February 1975 March 1975 April 1975 May 1975 June 1975 July 1975 August 1975 September 1975 October 1975 November 1975 December 1975

45 49 57 68 78 80 88 90 84 80 57 60 52 60 66 80 85 95 100 104 101 94 81 70

6317

H.-T. Liu, M.-L. Wei / Expert Systems with Applications 37 (2010) 6310–6318 Table 9 Seasonal index, deseasonalized value, and variation of each month. Month

Seasonal index

Deseasonalized value

Variation

Month

Seasonal index

Deseasonalized value

Variation

January 1972 February 1972 March 1972 April 1972 May 1972 June 1972 July 1972 August 1972 September 1972 October 1972 November 1972 December 1972 January 1973 February 1973 March 1973 April 1973 May 1973 June 1973 July 1973 August 1973 September 1973 October 1973 November 1973 December 1973

0.691 0.746 0.849 0.995 1.059 1.160 1.234 1.268 1.174 1.105 0.932 0.787 0.691 0.746 0.849 0.995 1.059 1.160 1.234 1.268 1.174 1.105 0.932 0.787

40.5 41.5 42.4 43.2 43.4 44.8 44.6 46.5 49.4 49.8 50.4 50.8 50.7 53.6 54.2 55.3 56.6 58.6 58.3 59.1 59.6 59.8 62.3 63.5



January 1974 February 1974 March 1974 April 1974 May 1974 June 1974 July 1974 August 1974 September 1974 October 1974 November 1974 December 1974 January 1975 February 1975 March 1975 April 1975 May 1975 June 1975 July 1975 August 1975 September 1975 October 1975 November 1975 December 1975

0.691 0.746 0.849 0.995 1.059 1.160 1.234 1.268 1.174 1.105 0.932 0.787 0.691 0.746 0.849 0.995 1.059 1.160 1.234 1.268 1.174 1.105 0.932 0.787

65.1 65.6 67.2 68.3 73.6 69.0 71.3 71.0 71.5 72.2 61.2 76.2 75.3 80.4 77.8 80.4 80.2 81.9 81.0 82.0 86.0 85.1 86.9 88.9

1.6 0.5 1.6 1.1 5.3 4.6 2.3 0.3 0.5 0.7 11.0 15.0 0.9 5.1 2.6 2.6 0.2 1.7 0.9 1.0 4.0 0.9 1.8 2.0

1.0 0.9 0.8 0.2 1.4 0.2 1.9 2.9 0.4 0.6 0.4 0.1 2.9 0.6 1.1 1.3 2.0 0.3 0.8 0.5 0.2 2.5 1.2

Table 10 Forecasted soft drinks of the five methods. Month

Monthly sales

Chen’s method

Hwang et al.’s method (at w = 2)

Hwang et al.’s method (at w = 3)

Hwang et al.’s method (at w = 4)

Lee and Chou’s method

Cheng et al.’s method

The proposed method

January 1974 February 1974 March 1974 April 1974 May 1974 June 1974 July 1974 August 1974 September 1974 October 1974 November 1974 December 1974 January 1975 February 1975 March 1975 April 1975 May 1975 June 1975 July 1975 August 1975 September 1975 October 1975 November 1975 December 1975

45 49 57 68 78 80 88 90 84 80 57 60 52 60 66 80 85 95 100 104 101 94 81 70

54.5 43.5 43.5 54.5 65.5 71.0 71.0 87.5 87.5 87.5 71.0 54.5 54.5 54.5 54.5 65.5 71.0 87.5 87.5 87.5 87.5 87.5 87.5 71.0

42.5 37.5 48.5 63.5 78.0 88.0 83.0 91.0 93.0 80.0 72.5 42.5 60.0 48.0 59.5 72.5 90.0 95.0 101.5 106.5 110.5 104.0 90.0 70.0

42.5 37.5 48.5 63.5 78.0 84.5 83.0 94.5 89.5 80.0 72.5 42.5 56.0 48.0 63.0 72.5 86.5 91.5 101.5 106.5 110.5 104.0 90.0 70.0

42.5 37.5 48.5 63.5 78.0 84.5 83.0 94.5 89.5 80.0 72.5 42.5 56.0 44.5 63.0 72.5 90.0 91.5 101.5 106.5 110.5 104.0 90.0 70.0

53.8 41.7 53.8 53.8 66.0 72.1 72.1 90.4 90.4 72.1 72.1 53.8 66.0 53.8 66.0 66.0 72.1 90.4 90.4 96.4 96.4 96.4 90.4 72.1

52.0 47.8 49.8 58.0 70.3 77.5 80.0 89.3 90.3 82.0 80.0 58.0 59.5 53.0 59.5 69.3 80.0 82.5 92.8 100.0 102.0 100.5 92.3 80.5

45.3 49.7 56.6 67.8 74.0 85.4 85.1 90.4 83.3 79.0 67.9 48.2 52.7 56.2 68.2 77.4 85.2 93.0 101.7 103.4 96.3 95.0 79.3 68.8

the monthly sales volume of 32-oz soft drinks (Montgomery, Johnson, & Gardiner, 1990). The time series data from January 1972 to December 1975 are shown in Fig. 2. Fig. 2 shows that the time series data have strong seasonality and growth trend. The forecasted monthly sales of soft drinks can be calculated as follows: Steps 1–3: The time series data of the monthly sales volume are listed in Table 8. According to Table 8, we can apply the ratio-to-moving-average method to compute the seasonal index and the deseasonalized va-

lue for each month. The variations of the deseasonalized data are then calculated as shown in Table 9. Step 4: Table 9 indicates that the maximum and the minimum of the variations are 15ðDmax Þ and 11ðDmin Þ, respectively. The universe of discourse Ucan be defined as

U ¼ ½11; 15: Steps 5–6: According to Table 9, we can calculate the appropriate length of interval, which is 1. Assume D1 ¼ 0 and D2 ¼ 0. The number of intervals (fuzzy sets) becomes

6318

H.-T. Liu, M.-L. Wei / Expert Systems with Applications 37 (2010) 6310–6318

Table 11 MAD and MAPE values of the five methods for the soft drinks problem. Index

Chen’s method

Hwang et al.’s method (at w = 2)

Hwang et al.’s method (at w = 3)

Hwang et al.’s method (at w = 4)

Lee and Chou’s method

Cheng et al.’s method

The proposed method

MAD MAPE

9.40 12.69%

6.44 9.58%

5.98 8.83%

6.27 9.24%

8.02 11.34%

6.98 9.93%

2.55 3.67%

26. The intervals are u1 ¼ ½11; 10; u2 ¼ ½10; 9; . . . ; u25 ¼ ½13; 14, and u26 ¼ ½14; 15. The fuzzy sets can be defined as follows:

e 1 ¼ 1=u1 þ 0:5=u2 þ 0=u3 þ 0=u4 þ 0=u5 þ 0=u6 þ    þ 0=u26 ; A e 2 ¼ 0:5=u1 þ 1=u2 þ 0:5=u3 þ 0=u4 þ 0=u5 þ 0=u6 þ . . . þ 0=u26 ; A : : e 25 ¼ 0=u1 þ 0=u2 þ 0=u3 þ . . . þ 0=u23 þ 0:5=u24 þ 1=u25 þ 0:5=u26 ; A e 26 ¼ 0=u1 þ 0=u2 þ 0=u3 þ . . . þ 0=u23 þ 0=u24 þ 0:5=u25 þ 1=u26 : A

Steps 7–10: According to Table 9, find the corresponding fuzzy sets, and then use Eqs. (6)–(11) to compute the forecasted monthly sales. By using the proposed search algorithm, we find that the smallest MAD value occurs when the window base is 2. Apply this best window base (w = 2) to calculate the best forecasted monthly sales for all the data. To compare the forecasting accuracy, the monthly sales from January 1974 to December 1975 are selected as the evaluation period when comparing the forecasted sales between the proposed method and the four fuzzy forecasting methods. The forecasted results of the five methods are shown in Table 10. Moreover, the MAD and MAPE values of the five methods are computed and shown in Table 11. Table 11 indicates that both the MAD (2.55) and MAPE (3.67%) of the proposed method are smaller than those obtained from the four methods. Thus, the proposed method can provide the more accurate forecasted monthly sales. 5. Conclusion In this study, we have developed an improved fuzzy time series method based on Hwang et al.’s (1998) method. This method can forecast seasonal pattern of data more accurately. Moreover, the two existing drawbacks in Hwang et al.’s (1998) method, namely the arbitrary interval length and the subjective window base, are both resolved. All these changes can improve the accuracy of forecast. To evaluate the performance of the proposed method, two seasonal time series data sets are employed to compare the forecasting accuracy between the proposed method and the four fuzzy time series methods. The results of the comparison (Tables 7 and 11) indicate that the proposed method has better forecasted results than the four methods in terms of the MAD and MAPE values. Consequently, the proposed method is capable of forecasting seasonal time series more accurately. One suggestion may be addressed for future research. In the proposed method, we use the average-based length method to determine the appropriate interval length, and then use the systematic search algorithm to find the best window base. A more intelligent search method (ex. a genetic algorithm) may be developed for considering both factors simultaneously.

Acknowledgements This research was partially supported by the National Science Council, Republic of China, under Grant NSC 92-2213-E-214-024. References Chen, S. M. (1996). Forecasting enrollments based on fuzzy time series. Fuzzy Sets and Systems, 81(3), 311–319. Chen, S. M. (2002). Forecasting enrollments based on high-order fuzzy time-series. Cybernetics and Systems, 33(1), 1–16. Chen, T. L., Cheng, C. H., & Teoh, H. J. (2007). Fuzzy time-series based on Fibonacci sequence for stock price forecasting. Physica A – Statistical Mechanics and Its Applications, 380, 377–390. Chen, S. M., & Chung, N. Y. (2006). Forecasting enrollments using high-order fuzzy time series and genetic algorithms. International Journal of Intelligent Systems, 21(5), 485–501. Cheng, C. H., Chen, T. L., Teoh, H. J., & Chiang, C. H. (2008). Fuzzy time-series based on adaptive expectation model for TAIEX forecasting. Expert Systems with Applications, 34(2), 1126–1132. Chen, S. M., & Hwang, J. R. (2000). Temperature prediction using fuzzy time series. IEEE Transaction on Systems, Man, and Cybernetics, 30(2), 263–275. Chen, S. R., & Wu, B. (2003). On optimal forecasting with soft computation for nonlinear time series. Fuzzy Optimization and Decision Making, 2, 215–228. Hong, D. H. (2005). A note on fuzzy time-series model. Fuzzy Sets and Systems, 155, 309–316. Huarng, K. H. (2001a). Heuristic models of fuzzy time series for forecasting. Fuzzy Sets and Systems, 123(3), 369–386. Huarng, K. H. (2001b). Effective lengths of intervals to improve forecasting in fuzzy time series. Fuzzy Sets and Systems, 123(3), 387–394. Huarng, K. H., & Yu, H. K. (2005). A type 2 fuzzy time series model for stock index forecasting. Physica A: Statistical Mechanics and its Applications, 353(1-4), 445–462. Huarng, K. H., & Yu, T. H. K. (2006). Ratio-based lengths of intervals to improve fuzzy time series forecasting. IEEE Transactions on Systems Man and Cybernetics Part B – Cybernetics, 36(2), 328–340. Hwang, J. R., Chen, S. M., & Lee, C. H. (1998). Handling forecasting problems using fuzzy time series. Fuzzy Sets and Systems, 100(2), 217–228. Overall planning department of the council for economic planning and development (1998–2001). Industrial Production Statistics Monthly Taiwan Area, Republic of China. Lee, H. S., & Chou, M. T. (2004). Fuzzy forecasting based on fuzzy time series. International Journal of Computer Mathematics, 81(7), 781–789. Lee, L. W., Wang, L. H., Chen, S. M., & Leu, Y. H. (2006). Handling forecasting problems based on two-factors high-order fuzzy time series. IEEE Transactions On Fuzzy Systems, 14(3), 468–477. Li, S. T., & Cheng, Y. C. (2007). Deterministic fuzzy time series model for forecasting enrollments. Computers and Mathematics with Applications, 53(12), 1904–1920. Mansfield, E. (1994). Statistics for business and economics: Methods and application. NY: W.W. Norton and Company. Montgomery, D. C., Johnson, L. A., & Gardiner, J. S. (1990). Forecasting and time series analysis. NY: McGraw-Hill. Own, C. M., & Yu, P. T. (2005). Forecasting fuzzy time series on a heuristic high-order model. Cybernetics and Systems, 36(7), 705–717. Singh, S. R. (2007a). A simple time variant method for fuzzy time series forecasting. Cybernetics and Systems, 38(3), 305–321. Singh, S. R. (2007b). A robust method of forecasting based on fuzzy time series. Applied Mathematics and Computation, 188(1), 472–484. Song, Q., & Chissom, B. S. (1993a). Fuzzy forecasting enrollments with fuzzy time series – Part 1. Fuzzy Sets and Systems, 54(1), 1–9. Song, Q., & Chissom, B. S. (1993b). Fuzzy time series and its models. Fuzzy Sets and Systems, 54(3), 269–277. Song, Q., & Chissom, B. S. (1994). Fuzzy forecasting enrollments with fuzzy time series – Part 2. Fuzzy Sets and Systems, 62(1), 1–8. Tsaur, R. C., Yang, J. C., & Wang, H. F. (2005). Fuzzy relation analysis in fuzzy time series model. Computers and Mathematics with Applications, 49(4), 539–548. Yu, H. K. (2005). A refined fuzzy time-series model for forecasting. Physica A: Statistical Mechanics and its Applications, 346(3-4), 657–681.