A forecasting method for Chinese civil planes attendance rate based on vague sets

A forecasting method for Chinese civil planes attendance rate based on vague sets

ARTICLE IN PRESS JID: CHAOS [m5G;March 30, 2016;13:2] Chaos, Solitons and Fractals 0 0 0 (2016) 1–8 Contents lists available at ScienceDirect Cha...

NAN Sizes 0 Downloads 36 Views

ARTICLE IN PRESS

JID: CHAOS

[m5G;March 30, 2016;13:2]

Chaos, Solitons and Fractals 0 0 0 (2016) 1–8

Contents lists available at ScienceDirect

Chaos, Solitons and Fractals Nonlinear Science, and Nonequilibrium and Complex Phenomena journal homepage: www.elsevier.com/locate/chaos

A forecasting method for Chinese civil planes attendance rate based on vague sets Shenghan Zhou, Chen Hu, Xiaoduo Qiao, Wenbing Chang∗ School of Reliability and System Engineering, Beihang University, Beijing, 100191, China

a r t i c l e

i n f o

Article history: Received 30 January 2016 Revised 25 February 2016 Accepted 25 February 2016 Available online xxx

a b s t r a c t This paper investigates the feasibility and efficiency of Vague sets to forecast the attendance rate in civil airplanes. Firstly, an overview of the methods of using fuzzy theory for forecasting is made and some problems are pointed out; the concepts of vague sets is then reviewed; and then an improved vague forecasting method is presented to overcome the shortcoming of the previous studies; finally, an comparison between different methods is raised to verify its effectiveness.

2010 MSC: 00-01 99-00

© 2016 Elsevier Ltd. All rights reserved.

Keywords: Civil planes attendance Forecasting method Vagye set

1. Introduction Time series is derived from the random processes theory called dynamic sequence, in accordance with a set of numerical sequence or dynamic series arranged in the time order, which will be a series of quantity changes of a certain phenomenon developed in time. Forecast refers to know the process and the results of things developed in accordance with certain methods and rules on the foundation of mastering some information. Time series forecasting is predicting (estimating) the value of its future or change trend based on the current and past observations of time series to acquire the later developments of things. The research about time series could be traced to AR model proposed by Yule in 1927 [1] and the analysis of time series has laid a solid foundation in the next 80 years. The traditional time series forecasting methods includes exponential smoothing, seasonal model and autoregressive and moving model. In the 1960s,exponential smoothing was appeared when Brown [2], Holt [3] and Winters [4] did some study about this. The classification method of additive model (linear) or multiplicative model (non-linear) which Pegels [5] put forward, the Muth’s [6] demonstration that the way could be used to predict, that Box and Jenkins [7], Roberts [8], Abraham and Ledolter [9] et al.,



Corresponding author. Tel.: +86-10823130 0 0. E-mail address: [email protected] (W. Chang).

applied exponential smoothing into the framework of statistics are the history of exponential smoothing’s developments. According to the previous research, Gardner [10] extended the definition of Pegels’s classification and proposed the concept named damping trend; Snyder [11] proved that exponential smoothing was originated from a state space model of innovation. In 2003, Taylor et al. summarized the work of exponential smoothing and there were 15 different models, the most widely used ways were the simple exponential smoothing (no trend, no seasonal type), Holt’s linear approach(increasing tendency, no seasonal type), Holt-Winters’s increasing model (increasing tendency, increasing seasonal type) and multiplicative model (increasing tendency, multiplicative seasonal type). For seasonal model, people carried out vast research around X-11 and X-12 [12,13], on which the follow-up study was based. In view of Yule’s viewpoint that every time series could be seen as the realization of a random process, other scholars came up with many time series models. Slutsky, Walker, Yagolm and Yule [14] were the earliest who formulated the concept of autoregressive (AR) and moving model (MA). Box and Jenkins [1] integrated the results of previous studies, and presented a coherent, common three-step cycle time series forecasting methods including model identification, parameter estimation and model checking, which was important for modern time series analysis and forecast and promoted the widely use and development of ARIMA model. With the increasing complex issue to be solved, traditional time series forecasting models could not deal with fuzzy questions, and required more complete historical data, which in the real life

http://dx.doi.org/10.1016/j.chaos.2016.02.037 0960-0779/© 2016 Elsevier Ltd. All rights reserved.

Please cite this article as: S. Zhou et al., A forecasting method for Chinese civil planes attendance rate based on vague sets, Chaos, Solitons and Fractals (2016), http://dx.doi.org/10.1016/j.chaos.2016.02.037

JID: CHAOS 2

ARTICLE IN PRESS

[m5G;March 30, 2016;13:2]

S. Zhou et al. / Chaos, Solitons and Fractals 000 (2016) 1–8

was very harsh condition, so the traditional time series forecasting models didn’t meet practical needs [15]. Fuzzy time series. To solve the question above, American Control Professor Zadeh [16] in 1965 proposed the fuzzy theory and the concept about fuzzy logic, and firstly established the model that could deal with uncertain and fuzzy language variables. Later in 1993, vague sets is raised by Gau and Buehrer [17] which can describe the rate for favor, against and abstentions in the vote model at the meantime and obviously provides more information for decision. Fuzzy forecasting is a way to predict and evaluate that bases on fuzzy mathematics, whose feature is similar to our normal thinking pattern and is suited to solve practical problems, that have vague and non-quantitative definition. Fuzzy forecasting [18] needs to establish the evaluation factors, weight sets about each factor, the evaluation consisting of a variety of evaluation results, and uses synthetic operators to do a comprehensive operation. Fuzzy evaluation generally includes selecting the index system, determining the standard value and the target set, standardization of raw data, the weight distribution of each factor, selection of composite operator, calculation of indicators values and degree of membership etc. In 1993, Song and Chissom [19] applied fuzzy sets to time series, then they found that it worked better in predicting practical things than traditional time series and after solving the students enrollment at the university of Alabama between 1971 and 1992, they formally put forward vague time series model. After that, weighted average forecast [20], multi-attribute FTS of a second order [21], K-means based FTS [22] and other methods are presented to improve the FTS model and later, Li [23] proposes a vague forecasting method based on the previous FTS. By now, the FTS has been successfully applied to many fields such as enrollments forecasting, temperature prediction [24], stock index forecasting [25] and so forth. Fuzzy interval is the foundation of fuzzy time series model which has great effects on the calculation process and the forecasting accuracy, so it’s the emphasis of model theoretical research. When using fuzzy time series forecasting model, we need to pay attention to their own characteristics, select the appropriate method to determine the fuzzy interval so that the best application results are obtained. Different fuzzy time series models need to set up different fuzzy relation matrix and distinguishing forecasting rules. Then, Song and Chissom put forward fuzzy time series based on time varying to predict dynamic things. Hwang [26] et al. constructed time varying fuzzy time series forecasting. Sullivan [27] describes comprehensively first-order time varying model and first-order time-invariant model, and compared these models with first-order Markov model based on linguistic scale. The development of fuzzy time series was from lowlevel to high-level, to settle the shortcoming of low-level. Lee [28] et al. constructed high-level two-factor fuzzy logic relation to improve forecast accuracy, and applied to practical prediction problems. Sah [29] proposed a new way of time-invariant to deal with forecasting which changed available historical data to the input of fuzzy time series. In fact, later research about fuzzy time series is the improvement of the model proposed by Song et al. Furthermore, vague sets theory has been utilized in many applications, such as satisfaction-based web service discovery and selection [30], approximation set in rough approximation space [31], risk analysis of compressor system [32,33], and Uncertain Query Processing [34,35]. But in previous studies, there are some points that have been neglected. Firstly, when dealing with the upper limit and the lower limit of prediction bound, they inclined to set them artificially according to the historical data which will result in the limitation of the value predicted in the future and some smaller or bigger value cannot even appear in the model which does not consist with the fact.

Secondly, the predication subintervals are equally divided in previous studies which will lead to the densely distributed values and thus the fuzzy transfer membership is relatively simple which is not appropriate for the prediction. Thirdly, some characteristics of the data are not taken into consideration such as periodicity which may make negative impact on the forecasting accuracy. Fourthly, it’s impertinent for them to deal with date used to verify model validity: in the past ways, what they adopted to validate the data model was the data in transferring relation, which led that the distribution of forecast data could only change back and forth between several values after determining the transferring relationship. This meant transferring relationship was static not dynamic, which could be found in other paper. In this paper, we try to get the natural limitations of the predicted indexes as the upper and lower limitations of the prediction bound and divide the subintervals non-equally by referring to the evaluation scale raised by Dai. et al. [36]. And then, the characteristics of the index including trend and periodicity are identified and a comprehensive method is raised to forecast the index. Particularly, main innovations of this paper lie in that where we introduce an improved vague forecasting method to solve the issue of Chinese civil planes attendance rate prediction. 2. Vague sets and vague time series 2.1. Basic definitions Definition 1. Roughly, we have some basic definition of the vague sets [17]. Vague set is a set of objects, each of which has a grade of membership whose value is a continuous subinterval of [0, 1]. Such a set is characterized by a truth-membership function and a false-membership function. Definition 2.1. Let U = (u1 , u2 , . . . , un ) be the universe of discourse. A vague set A in U is characterized by a truth-membership function tA and a false-membership function fA , A is a set of point named x. tA (x) is the lower bound of supporting x, and fA (x) is the lower bound of opposing x. tA (x) and fA (x) have close connection with real number in the interval [0, 1] corresponding to x in the set A. So vague set A is

A = [tA (x ), 1 − fA (x )]; tA : U → [0, 1];

0 ≤ tA ( x ) + f A ( x ) ≤ 1 :

fA : U → [0, 1]

πA (ui ) = 1 − tA (ui ) − fA (ui ) indicates the uncertain degree i.e. the abstentions when voting. When the domain is continuous, vague set A can be described



A=

[tA (x ), 1 − fA (x )]x, x ∈ U;

while the domain is discrete, vague set A can be described

A=

n 

[tA (xi ), 1 − fA (xi )]xi , xi ∈ U.

i=1

The bigger the π A (ui ) is, the more uncertain the information is. For an example, there is a voting model constituted with ten persons, in which three of ten choose to support, three persons choose to oppose, while others abstain from voting. Then, vague set A is: A = [tA (x ), 1 − fA (x )] = [0.3, 0.7], in which tA (x ) = 0.3 means three support, fA (x ) = 0.3 means three oppose, while 1 − tA (x ) − fA (x ) = 0.4 means four persons give up. Seen from above, the vague set is more suitable for expressing message of supporting and opposing than fuzzy set proposed by Zadeh.

Please cite this article as: S. Zhou et al., A forecasting method for Chinese civil planes attendance rate based on vague sets, Chaos, Solitons and Fractals (2016), http://dx.doi.org/10.1016/j.chaos.2016.02.037

JID: CHAOS

ARTICLE IN PRESS

[m5G;March 30, 2016;13:2]

S. Zhou et al. / Chaos, Solitons and Fractals 000 (2016) 1–8

Definition 2.2. A vague set is empty if and only if its truthmembership and false-membership functions are identically zero on X. Definition 2.3. The complement of a vague set A is denoted by A and is defined by

tA ( x ) = f A ( x ) 1 − f A ( x ) = 1 − tA ( x ) Definition 2.4. A vague set A is contained in the other vague set B, A⊆B, if and only if

tA ≤ tB 1 − fA ≤ 1 − fB We can use a voting model to verify this definition, like vague set A = [0.3, 0.8] and B = [0.4, 0.8]. We can find that [0.3, 0.8] can be interpreted as “the vote for a resolution is 3 in favor, 2 against, and 5 abstentions.” [0.4, 0.8] can be interpreted as “the vote for a resolution is 4 in favor, 2 against, and 4 abstentions.” Obviously, what A indicates is better than what B shows and this meets two factors above which is accord with the reality.

3

(Ai ) is the membership function of vague set Ai . Definition 2.8. Assume that vague set vi is Y (t ) ⊆ R(T = 1, 2, . . . , n ) in the universe of discourse and V(t) is a set of s sequence of vi (t )(i = 1, 2, . . . , n ), then, we can call V(t) is a vague time series in Y (t )(t = 1, 2, . . . , n ). If V(t) is influenced by V(t − 1), we can express them by V (t − 1 ) → V (t ). Definition 2.9. Assume that V(t) is vague time set, we can relate V(t) and V (t − 1 ) by vague relation named R(t, t − 1 ), like V (t ) = V (t − 1 ) • R(t, t − 1 ) at t and t − 1 time. If the vague relation R(t, t − 1 ) of V(t) is mutual independent at any time, which means R(t1 , t1 − 1 ) = R(t2 , t2 − 1 ), we call V(t) time-invariant time series. Definition 2.10. If a sequence of vague sets between V(t) and V (t − n ), V (t − n + 1 ), . . . , V (t − 1 ) have correlativity, we will call V(t) nrank vague time series. 2.2. Vague time series forecasting The method proposed by Song and Chissom uses the following model for forecasting the enrollments of the university of Alabama:

Definition 2.5. Two vague sets A and B are equal, written as A = B, if and only if A⊆B and B⊆A; that is

Ai = Ai−1 ◦ R

tA = tB

“◦” refers to the max-min composition operator, Ai−1 is the vague set at i − 1 year which we have known, Ai is vague set at i year that we need to predict according to Ai−1 , R is the vague relationship and usually described as vague matrix. A(t) is called caused by A(t − 1 ):

1 − fA = 1 − fB Definition 2.6. The union of two vague sets A and B with respective truth-membership and false-membership tA , fA , tA and fB is a vague set C, written as C = A ∪ B, whose truth-membership and false-membership functions are related to those of A and B by

tC = max(tA , tB ) 1 − fC = max(1 − fA , 1 − fB ) = 1 − min( fA , fB ) Time series is a variable or observed value existed in nature science or society science, which arranges in the same time interval according to time order. For example, we assume t is a time variable, as to a series of continuous time points t1 , t2 , . . . , tn named X (t ) = x(t1 ), x(t2 ), . . . , x(tn ). We called X(t) a time series, in which x(ti ) is a observed value at i time. Time series includes two patterns which are determinacy time series and statistics time series. If future’s value of a time series can be described by a mathematical function, like: s = vt, it is a determinacy time series. While future’s value of time series can be only indicated by generality distributions, this time series will be called statistics time series. In 1993, Song and Chissom applied fuzzy sets to time series ,then they found that it worked better in predicting practical things than traditional time series and after solving the students enrollment at the university of Alabama between 1971 and 1992, they formally put forward vague time series model. Then, Gau proposed vague set, which could use truth-membership, false-membership and hesitate degrees to show fuzzification of message. The main distinction between vague time series and traditional time series is that the value before is vague set while the value behind is real number. With the theoretical development of fuzzy sets and vague sets, people find vague sets has more advantages in indicating fuzzy message than fuzzy sets.

A(t − 1 ) → A(t ) In fact, the derivation of the fuzzy relation R is a long-winded work. If R is very big, we will take a large amount of time to calculate the max-min composition operation R. Now, put forward the fuzzy logical relationships at the following:

A1 → A1 , A1 → A2 , A2 → A3 , A3 → A3 A3 → A4 , A4 → A4 , A4 → A3 , A4 → A6 A6 → A6 , A6 → A7 Song and Chissom defined an operator “×” of two vectors. Let D and B be row vectors of dimension m and let C = (ci j ) = DT × B, then the element cij of matrix C at row i and column j is defined as

ci j = min(Di , B j ), (i, j = 1, 2, . . . , m ) where Di and Bj are the ith and the jth elements of D and B, respectively, and DT is the transpose of D. Then, based on the above fuzzy logical relationships and the “ × ” operator, the following relations are obtained:

R1 = AT1 × A1 , R2 = AT1 × A2 R3 = AT2 × A3 , R4 = AT3 × A3 R5 = AT3 × A4 , R6 = AT4 × A3

Definition 2.7. Vague sets are constituted with things which have unitive degree of membership. Let’s assume as a set of the universe of discourse U (t ) = u1 , u2 , . . . , un , when yi may be linguistic value of Y, we can describe vague set Ai as

R7 = AT4 × A3 , R8 = AT4 × A6

Ai = ((Ai )(u1 ))/u1 + ((Ai )(u2 ))/u2 + · · · + ((Ai )(un ))/un

R9 = AT6 × A6 , R1 0 = AT6 × A7

Please cite this article as: S. Zhou et al., A forecasting method for Chinese civil planes attendance rate based on vague sets, Chaos, Solitons and Fractals (2016), http://dx.doi.org/10.1016/j.chaos.2016.02.037

ARTICLE IN PRESS

JID: CHAOS 4

[m5G;March 30, 2016;13:2]

S. Zhou et al. / Chaos, Solitons and Fractals 000 (2016) 1–8

Finally, the fuzzy relation R is obtained by performing the following operations:

R=

10 

Ri

i=1

 “ ” is the union operator. 2.3. Forecasting model based on vague time series The forecasting method based on vague time series is presented as follows: Step 1: Partition the universe of discourse U = [Dmin − D1 , Dmax + D2 ] into even lengthy and equal length interval u1 , u2 , . . . , un , where Dmin and Dmax be the minimum and maximum of known historical data, while D1 and D2 are two proper positive numbers. Step 2: Let A1 , A2 , . . . , An be fuzzy sets which are linguistic values of linguistic variable. Define fuzzy sets A1 , A2 , . . . , An on the universe of discourse U as follows:

A1 = a11 /u1 + a12 /u2 + · · · + a1m /um A2 = a21 /u1 + a22 /u2 + · · · + a2m /um .. . Ak = ak1 /u1 + ak2 /u2 + · · · + akm /um where aij ∈ [0, 1], 1 ≤ i ≤ k, 1 ≤ j ≤ m. The value of aij indicates the grade of membership of uj in the fuzzy set Ai . Step 3: Divide the derived fuzzy logical relationships into groups based on the current state of the variables of fuzzy logical relationships, which could determine the vague matrix E. We can confirm the specific relation matrix A by calculating the frequency of changes of linguistic values. Firstly, we should reduce the matrix A in proportion and make the summation of every line values in the matrix, which could get a unit relation matrix A . But this does not consider the hesitated factors, so we should set k as occupancy volume of hesitated factors, where 0 ≤ k ≤ 1, make the elements that is not 0 multiply 1 − k. Then divide k equally into the elements that is 0 to replace the 0 elements. This way is used to get a new unit hesitated matrix A  and promise the summations of the elements of every line in the matrix is also 1. Step 4: Calculate the forecasted outputs. Firstly, we should know the practical value An at i − 1 year. Then we need to find the element an1 , an2 , . . . , anm correspond to the value An in the matrix A  , like this, we should finish finding the value Aj and calculating their M[Aj ], where M[Aj ] indicates the midpoint of value Aj correspond to the interval uj . Based on the above, Di can be described by

Di = an1 × M[A1 ] + an2 × M[A2 ] + · · · + anm × M[An ] Step 5: Test the result. Forecasting error formula:

FE =

|F − AC | AC

× 100

average forecasting error formula

n

AF E =

i=1 F Ei n

where FE presents forecasting error, F indicates forecast, AC is actual value, AFE shows average forecasting error, and n is the number of error. But in Li’s paper, she changes the way to determine the universe of discourse U. She chooses the amount of increase or decrease of historical data. The other steps are same to the above. This method in forecasting accuracy is better than what Chen proposed. But what Li presented still has some shortcomings, which

is not suitable for the situation that the number of data and the interval are little. But in this passage, we will establish the forecasting model based on vague time series according to the steps as follows: Step 1: Trend and Periodicity Identification. In this part, the characteristics of trend and periodicity are identified respectively with different methods. Step 1.1: Trend identification. Kenddall Test is used for the identification of trend [37].

M = τ /στ Where τ = 4S/N (N − 1 ), στ2 = 2(2N + 5 )/9N (N − 1 ). N refers to the length of time series X = x1 , x2 , . . . , xN . S refers to the number of Ri < Rj (0 ≤ i ≤ j ≤ N) and Ri , Rj are the dual observations of xi , xj . If |M| > Mα /2 and M > 0 indicates the ascending trend of xi ; if |M| > Mα /2 = 1.96 and M < 0 indicates the descending trend of xi . α is the significance level and α = 0.05. Step 1.2: Periodicity identification. In this section, SPSS spectral analysis is used to identify the periodicity of the variable. Step 2: Define the universe of discourse U, U = [Dmin , Dmax ]. In previous studies, Dmin , Dmax are set by referring to the historical data, but this will obviously limit the predicted value. We need to determine the discourse according to practical situation. Step 3: Define the linguistic terms V1 , V2 , . . . , Vm to represent the intervals xi respectively.

Vi → ui Step 4: Match the real data xi with the linguistic terms, if xi ∈ ui , xi → Vi , we will have the fuzzified data. Step 5: Construct fuzzy logical relationships matrix E  = (ei j )m×m from the fuzzified data obtained in step 4. Where ei j denotes the number of transfer relationships from Vi → Vj in the fuzzified adjacent historical data and then each row of the matrix can be normalized to get a new matrix E = (ei j )m×m , where



 ei j / mj=1 ei j 1/m

ei j =

m  if eit = 0 t=1 m  if t=1 eit = 0

So eij can denote the transferring probability of Vi → Vj . Step 6: Data vaguefication. In this section, firstly, an abstention factor k(0 ≤ k ≤ 1) is brought in to denote the uncertainty of the transferring probability to get the vaguefied matrix W  = (wi j )m×m .



m [0, k/N1 ], eit = 0 t=1 m [ ( 1 − k )ei j , ei j ], t=1 eit = 0

wi j =

and ei j = 0 and ei j = 0

(1)

Where N1 denotes the number of ei j = 0 in a row i of matrix E. Step 7: Data forecasting. Suppose the linguistic terms of data (n − 1 )th is Vi and the predicted value of data nth can be gotten with the below equations:

xn =

m 

wi j x¯ j

(2)

j

Where x j = (xmax + xmin )/2. j j With different k, we can get different results and then we can choose the best one with least error. Step 8: Error analysis. For single predicted value error, we can get it with Eq. (3). For the average error, we also can get it with Eq. (4):

SEi =

|F V − AV | AV

× 100%

(3)

n AE =

SEi × 100% n

i

(4)

Please cite this article as: S. Zhou et al., A forecasting method for Chinese civil planes attendance rate based on vague sets, Chaos, Solitons and Fractals (2016), http://dx.doi.org/10.1016/j.chaos.2016.02.037

ARTICLE IN PRESS

JID: CHAOS

[m5G;March 30, 2016;13:2]

S. Zhou et al. / Chaos, Solitons and Fractals 000 (2016) 1–8

5

Table 1 Attendance rate statistics of Chinese airlines. Time

Rate

Time

Rate

Time

Rate

200801 200802 200803 200804 200805 200806 200807 200808 200809 200810 200811 200812 200901 200902 200903 200904 200905 200906 200907 200908 200909 200910

75.00% 75.10% 76.10% 76.00% 71.40% 72.50% 75.10% 74.20% 75.50% 77.10% 74.70% 72.20% 75.20% 77.50% 75.00% 76.40% 72.20% 73.80% 76.60% 79.90% 74.20% 79.70%

200911 200912 201001 201002 201003 201004 201005 201006 201007 201008 201009 201010 201011 201012 201101 201102 201103 201104 201105 201106 201107 201108

78.20% 75.10% 76.80% 79.30% 80.60% 79.20% 77.00% 80.80% 83.40% 84.20% 81.10% 83% 77.60% 79.50% 79.30% 81.10% 81.00% 83.20% 81.30% 81.80% 85.60% 85.00%

201109 201110 201111 201112 201201 201202 201203 201204 201205 201206 201207 201208 201209 201210 201211 201212 201301 201302 201303 201304 201305

82.10% 82.60% 80.20% 78.60% 81.30% 79.90% 75.20% 78.40% 79.00% 81.70% 84.20% 84.50% 81.70% 80.70% 79.60% 79.40% 78.20% 84.20% 83.60% 81.20% 79.70%

Data source: CAAC official website: http://www.caac.gov.cn.

Where FV denotes the forecasting value and AV denotes the actual value. Step 9: According to practical situation, we need to repeat the step 1 ∼ 7, and then get another result FVP , after error comparison, different weights of these to results are linearly given. So the final result can be get with the linear integration of them. 3. Attendance rate forecasting

Fig. 1. Trend identification of Attendance rate.

Estimated response function is established based on the model GM(1, 1), which the equation is shown:

dX (1) /dt + aX (1) = b a, b are parameters to be estimated, which is also calculated according to the determinant (a, b)T .

(a, b)T = (BT B )−1 BT Yn of course, the matrix B and Yn are obtained like this.



−X 1 (1 ) + X 1 (2 )/2 ⎢−X 1 (2 ) + X 1 (3 )/2 B = ⎢. ⎣.. −X 1 (n − 1 ) + X 1 (n )/2



1 1⎥ .. ⎥ ⎦ . 1

Yn = [X (0) (2 ), X (0) (3 ), . . . , X (0) (n )]T So the solution of the equation is

In this section, the main steps for the vague based forecasting model are presented. Step 1: Trend and periodicity identification In this part, the characteristics of trend and periodicity are identified respectively with different methods. Firstly, a part of the original data is given in Table 1. In fact, we choose the data from 200601 to 201305 to forecast the result. Step 1.1: Trend identification. For the above data of attendance rate of Chinese airlines, we have the below test results in Figure 1. From the figure above, we can know that after 2010, |M| > Mα /2 = 1.96, it indicates significant trend of attendance rate of Chinese civil airlines. Step 1.2: Periodicity identification. In this section, SPSS [38] spectral analysis is used to identify the periodicity of Chinese civil airlines. The result is as belows in Figure 2. In the figure, we can know the peak value is between 0.05 and 0.1, so the period is between 10 months and 20 months, it conforms to the actual period of 12 months for the attendance rate. From this section, we can know that the attendance rate is affected by both the trend and periodicity, so it’s far from enough to forecast the value only by trend as the previous studies do. Step 2: Define the universe of discourse U, U = [Dmin , Dmax ]. In previous studies, Dmin , Dmax are set by referring to the historical data, but this will obviously limit the predicted value. So here, the model GM(1, 1) is applied to divide the section. For GM(1, 1), the original data series X (0 ) (i ), i = 1, 2, . . . , n should be disposed to turn into a new progression X (1 ) ( j ), j = 1, 2, . . . , n as followed:

X (0 ) ( j ) =

j  i=1

X ( 0 ) ( i ), j = 1, 2, . . . , n

X (1) (t + 1 ) = (X (1) (0 ) − u/b)e−bt + u/b, which is the sum of value for the data series prediction. It is obvious that the actual forecast could be calculated:

X¯ (0) (t + 1 ) = X (1) (t + 1 ) − X (0) (t ) The parameter v is set as the ratio between the forecast and the actual value.

v = (X (0) (t ))/(X¯ (0) (t )), t = 1, 2, . . . , n From the real data, we can deal with the parameters a and b, besides the equation:

a = −0.0014, b = 0.731 X (1) (t + 1 ) = 522.835e0.0014t − 522.143 The interval of parameter v is [0.91, 1.09]. On the foundation of this, we choose to divide the interval as

u1 = [0.91, 0.955] u2 = [0.955, 0.985] u3 = [0.985, 0.99] u4 = [0.99, 1.005] u5 = [1.005, 1.035] u6 = [1.0355, 1.09]

Please cite this article as: S. Zhou et al., A forecasting method for Chinese civil planes attendance rate based on vague sets, Chaos, Solitons and Fractals (2016), http://dx.doi.org/10.1016/j.chaos.2016.02.037

ARTICLE IN PRESS

JID: CHAOS 6

[m5G;March 30, 2016;13:2]

S. Zhou et al. / Chaos, Solitons and Fractals 000 (2016) 1–8

Step 3: Like what indicates above. Define the linguistic terms V1 , V2 , . . . , Vm to represent the intervals xi respectively.

Vi → ui Step 4: Match the real data xi with the linguistic terms, if xi ∈ ui , xi → Vi , so we have the fuzzified data as Table 1 shows. Step 5: According to the results, following these rules above, for the attendance rate forecasting, we have the below matrix:



3/10 ⎢2/20 ⎢3/6 E=⎢ ⎢1/15 ⎣ 1/23 0

3/10 7/20 0 3/15 5/23 3/13

1/10 2/20 0 2/15 1/23 0

1/10 5/20 0 5/15 3/23 0

1/10 3/20 1/6 3/15 9/23 5/13



0 1/20⎥ 2/6 ⎥ ⎥ 1/15 ⎥ ⎦ 4/23 5/13

Step 6: For the computation example, we have the W  =

(wi j )9×9 .

⎡

3 (1 − k ) 3 , 10 10

 

⎢ ⎢ ⎢ 2 (1 − k ) 2  ⎢ , ⎢ 20 ⎢ 20  ⎢ ⎢ 3 (1 − k ) 3 ⎢ , ⎢ 6 6 W = ⎢ ⎢ 1 − k 1  ⎢ ⎢ 15 , 15 ⎢  ⎢ ⎢ 1−k 1 ⎢ , ⎢ 23 23 ⎢  ⎣ k 0,

3 (1 − k ) 3 , 10 10

 

1−k 1 , 10 10





1−k 1 , 10 10

13

,

13

 

0,

 

0,

3

1− ) 1 , 10 10

3 (1 − k ) 10 7 (1 − k ) 20 k 3 3 (1 − k ) 15 5 (1 − k ) 23 3 (1 − k ) 13

1−k 10 2 (1 − k ) 20 k 3 2 (1 − k ) 15 1−k 23 k 3

1−k 10 5 (1 − k ) 20 k 3 5 (1 − k ) 15 3 (1 − k ) 23 k 3

2 (1 − k ) 10 3 (1 − k ) 20 1−k 6 3 (1 − k ) 15 9 (1 − k ) 23 5 (1 − k ) 13

 

3

Then, after data normalization, we have the adjusted transferring matrix W = (wi j )m×m :

3 (1 − k ) ⎢ 10 ⎢ 2 (1 − k ) ⎢ ⎢ 20 ⎢ ⎢ 3 (1 − k ) ⎢ ⎢ 6 W =⎢ ⎢1 − k ⎢ 15 ⎢ ⎢1 − k ⎢ ⎢ 23 ⎣ k 3

Fig. 2. Periodicity identification of Attendance rate.







[0, k]

⎥ ⎥ ⎥ 7 (1 − k ) 7 2 (1 − k ) 2 5 (1 − k ) 5 3 (1 − k ) 3 1−k 1 ⎥ , , , , , ⎥ 20 20 20 20 20 20 20 20 20 20           ⎥ ⎥ k k k 1−k 1 2 (1 − k ) 2 ⎥ ⎥ 0, 0, 0, , , 3 3 3 6 6 6 6 ⎥           ⎥ ⎥ 3 (1 − k ) 1 2 (1 − k ) 2 5 (1 − k ) 5 3 (1 − k ) 3 1−k 1 ⎥ , , , , , ⎥ 15 15 15 15 15 15 15 15 15 15          ⎥ ⎥ 5 (1 − k ) 5 5 (1 − k ) 1 3 (1 − k ) 3 9 (1 − k ) 9 4 (1 − k ) 4 ⎥ ⎥ , , , , , 23 23 23 23 23 23 23 23 23 23 ⎥          ⎥ 3 (1 − k ) 3 k k 5 (1 − k ) 5 5 (1 − k ) 5 ⎦ 

3





k

13

 

,

13



13

,

13

⎤ ⎥ ⎥

1−k ⎥ ⎥ 20 ⎥ 2 (1 − k ) ⎥ ⎥ ⎥ 6 ⎥ 1−k ⎥ ⎥ 15 ⎥ 4 (1 − k ) ⎥ ⎥ 23 ⎥ ⎦ 5 (1 − k ) 13

Following Step 7 and Step 8, we will get the result about single predicted value error and the average error. Step 9: For the above part, we have finished the forecasting which mainly focus on the trend characteristic of the data FVT , but the attendance rate is also affected by the periodicity. For that part, firstly, another historical series is chosen which is extracted from the original series every data T (T refers to the period of time series) and then after repeating the step 1 ∼ 7, we can get another result FVP , after error comparison, different weights of these two results are linearly given. So the final result can be obtained with the linear integration of them. 4. Result analysis

Table 2 Attendance Rate Forecasting (FVT ). Time

201306 201307 201308 201309 201310 201311 201312 201401 201402 201403

AV

82.10% 82% 84.90% 81.60% 81.00% 78.10% 80.10% 81.30% 85.60% 81.00%

FVT k = 0.1

k = 0.05

k = 0.01

0.7364 0.7434 0.8226 0.7551 0.74 0.7408 0.821 0.7435 0.7444 0.7614

0.7773 0.7847 0.8465 0.797 0.7811 0.7811 0.8174 0.7848 0.7858 0.8037

0.8081 0.8176 0.8477 0.8301 0.8139 0.8149 0.8144 0.8179 0.8188 0.8371

With the above model, we make a forecasting of the data between 201306 and 201403, results are as belows in Table 2. Please cite this article as: S. Zhou et al., A forecasting method for Chinese civil planes attendance rate based on vague sets, Chaos, Solitons and Fractals (2016), http://dx.doi.org/10.1016/j.chaos.2016.02.037

ARTICLE IN PRESS

JID: CHAOS

[m5G;March 30, 2016;13:2]

S. Zhou et al. / Chaos, Solitons and Fractals 000 (2016) 1–8 Table 3 Attendance Rate Forecasting (FVP ).

5. Conclusion

Time

AV

F VP (k = 0.01 )

201306 201307 201308 201309 201310 201311 201312 201401 201402 201403

82.10% 82% 84.90% 81.60% 81.00% 78.10% 80.10% 81.30% 85.60% 81.00%

0.8141 0.8332 0.8882 0.832 0.8279 0.8189 0.819 0.81576 0.846 0.8159

This paper proposes a vague sets-based method to forecast the attendance rate of Chinese civil airlines. Compared with some of the methods raised before, the method has improved the accuracy. But it still needs further research in how to describe the transferring relationship to avoid information missing. Although this paper can achieve high accuracy of civil planes attendance rate, some extended works will be made in the future (1) Other experimental datasets which are collected from foreign countries should be collected to test the effectiveness of the proposed method. (2) We will try to utilize continuous vague sets in our work.

After error calculation, we find that when k = 0.01, the error is the least one, so accordingly we have the below results in Table 3. After calculation, AET = 0.0185 and AEP = 0.0206 the final result can be given. A comparison between different methods can also be made as Table 4. From the above comparison, we can know that the accuracy of the forecasting has been improved, compared the error of Trend and Periodicity Identification, we could find that the periodicity error is obviously larger than trend error, and the error of the periodicity does not act as expected, because the historical data chosen is too small (for example, to forecast the data of 201306, only the data of 20 0806,20 0906,2010 06,201106,201206 are chosen). So based on the result above, we can know the disadvantages as follows: 1. The model is severely depended on the historical data. If there is not enough data to support, the transference between linguistic value would not effectively appear when establishing relation. 2. The variation tendency of data has a major impact on forecasting accuracy. 3. The model cannot deal with data which is changing suddenly. Integrating all the above experimental results together, it can be observed that the proposed algorithm can forecast the Chinese civil planes attendance rate with high accuracy. The reasons lie in that the vague sets theory is exploited in this research.

Table 4 Attendance Rate Forecasting Comparison between methods. Period

AV

S and C

Li

FV

SE

AE

FV

SE

AE

2.85 2.72 6.01 2.19 1.45 2.24 0.30 1.76 6.68 1.36

2.76

0.797947 0.798148 0.798333 0.798504 0.798663 0.798810 0.798946 0.799074 0.799194 0.799306

2.81 2.66 5.97 2.14 1.40 2.28 0.26 1.71 6.64 1.32

2.72

FV 0.81104 0.825244 0.854948 0.831031 0.82076 0.81686 0.816654 0.8168514 0.832128 0.826712

SE 1.21 0.94 3.82 1.84 1.33 4.59 1.95 0.47 2.79 2.06

201306 201307 201308 201309 201310 201311 201312 201401 201402 201403

82.10% 82% 84.90% 81.60% 81.00% 78.10% 80.10% 81.30% 85.60% 81.00%

0.797606 0.797706 0.797989 0.798158 0.798215 0.798460 0.798596 0.798722 0.798841 0.798951

Period

AV

Local method

201306 201307 201308 201309 201310 201311 201312 201401 201402 201403

82.10% 82% 84.90% 81.60% 81.00% 78.10% 80.10% 81.30% 85.60% 81.00%

FVT 0.8081 0.8176 0.8477 0.8301 0.8139 0.8149 0.8144 0.8179 0.8188 0.8371

FVP 0.8141 0.8332 0.8882 0.832 0.8279 0.8189 0.819 0.81576 0.846 0.8159

7

AE 2.10

Acknowledgment The paper is supported by the Aviation Science Foundation of China (Grant No.2014ZG51075) and the Technical Research Foundation. The study is also sponsored by the National Natural Science Foundation of China (Grant No. 71501007). References [1] Box GE, Jenkins GM. Time series analysis: forecasting and control, revised ed. San Francisco: Holden-Day; 1976. [2] Brown RG. Statistical forecasting for inventory control 1959. [3] Holt CC. Forecasting seasonals and trends by exponentially weighted moving averages. Int j forecast 2004;20(1):5–10. [4] Winters PR. Forecasting sales by exponentially weighted moving averages. Manage Sci 1960;6(3):324–42. [5] Pegels CC. Exponential smoothing: some new variations. Manage Sci 1969;15(5):311–20. [6] Muth JF. Optimal properties of exponentially weighted forecasts. J am stat assoc 1960;55(290):299–306. [7] Box GEP, Jenkins GM. Time series analysis: forecasting and control. San Francisco: Holden-Day; 1970. [8] Roberts S. A general class of holt-winters type forecasting models. Manage Sci 1982;28(7):808–20. [9] Abraham B, Ledolter J. Statistical methods for forecasting, 234. New York: John Wiley & Sons; 2009. [10] Gardner ES. Exponential smoothing: the state of the art. J Forecast 1985;4(1):1–28. [11] Snyder R. Recursive estimation of dynamic linear models. J R Stat Soc Ser B (Method) 1985;47(2):272–6. [12] Findley DF, Monsell BC, Bell WR, Otto MC, Chen B-C. New capabilities and methods of the x-12-arima seasonal-adjustment program. J Bus Econ Stat 1998;16(2):127–52. [13] Findley DF, Wills KC, Monsell BC. Seasonal adjustment perspectives on damping seasonal factors: shrinkage estimators for the x-12-arima program. Int J Forecast 2004;20(4):551–6. [14] Wang L, Le T, Cai W. The development and use of time series forecasting. Ordnance Ind Autom 2015a;2:63–8. [15] Qiu W, Liu X. The summary of fuzzy time series forecasying model research. Fuzzy Syst Math 2014;3:173–81. [16] Zadeh LA. Fuzzy sets. Inf contr 1965;8(3):338–53. [17] Gau W-L, Buehrer DJ. Vague sets. IEEE Trans Syst Man Cybern 1993;23(2):610–14. [18] Zhang Z. Fuzzy forecasting of share price. Jilin University; 2008. Ph.D. thesis. [19] Song Q, Chissom BS. Fuzzy time series and its models. Fuzzy set syst 1993;54(3):269–77. [20] Sun B, Xiong S, Wu B. Fuzzy time series models for lnszzs forecasting. J Convergence Inf Technol 2012;7(19). [21] Chatterjee S, Nigam S, Singh J, Upadhyaya L. Application of fuzzy time series in prediction of time between failures and faults in software reliability assessment. Fuzzy Inf Eng 2011;3(3):293–309. [22] C K, F F, C W. A novel forecasting model of fuzzy time series based on k-means clustering. In: Second International Workshop on Education Technology and Computer Science; 2010. p. 223–5. [23] R LL. Intelligent forecast and decision-making based on vague set theory. Guangxi University; 2013. Ph.D. thesis. [24] Wang N-Y, Chen S-M. Temperature prediction and taifex forecasting based on automatic clustering techniques and two-factors high-order fuzzy time series. Expert SystAppl 2009;36(2):2143–54. [25] Chen S-M, Chen C-D. Taiex forecasting based on fuzzy time series and fuzzy variation groups. IEEE Trans Fuzzy Syst 2011;19(1):1–12. [26] Hwang J-R, Chen S-M, Lee C-H. Handling forecasting problems using fuzzy time series. Fuzzy set syst 1998;100(1):217–28. [27] Sullivan J, Woodall WH. A comparison of fuzzy forecasting and markov modeling. Fuzzy Set Syst 1994;64(3):279–93.

Please cite this article as: S. Zhou et al., A forecasting method for Chinese civil planes attendance rate based on vague sets, Chaos, Solitons and Fractals (2016), http://dx.doi.org/10.1016/j.chaos.2016.02.037

JID: CHAOS 8

ARTICLE IN PRESS

[m5G;March 30, 2016;13:2]

S. Zhou et al. / Chaos, Solitons and Fractals 000 (2016) 1–8

[28] Lee L-W, Wang L-H, Chen S-M, Leu Y-H. Handling forecasting problems based on two-factors high-order fuzzy time series. IEEE Trans Fuzzy Syst 2006;14(3):468–77. [29] Sah M, Konstantin Y, et al. Forecasting enrollment model based on first-order fuzzy time series. World Acad Sci Eng Technol 2005;1:375–8. [30] Wang P, Chao K-M, Lo C-C. Satisfaction-based web service discovery and selection scheme utilizing vague sets theory. Inf Syst Front 2015b;17(4):827–44. [31] Zhang Q, Wang J, Wang G, Yu H. The approximation set of a vague set in rough approximation space. Inf Sci 2015;300:1–19. [32] Singh P, Verma M, Kumar A. A novel method for ranking of vague sets for handling the risk analysis of compressor system. Appl Soft Comput 2015;26:202–12.

[33] Llibre J. Centers: their integrability and relations with the divergence. Appl Math Nonlinear Sci 2016;1(1):74–81. [34] Mishra J, Ghosh S. Uncertain query processing using vague set or fuzzy set: which one is better? Int J Comput Commun Contr 2014;9(6):730–40. [35] Yun G, Mohammad F, Reza, Wei G. Ontology optimization tactics via distance calculating. Appl Math Nonlinear Sci 2016;1(1):154–69. [36] DAI Y-q, XU Z-s, LI Y, DA Q-l. New evaluation scale of linguistic information and its application [j]. Chinese J Manage Sci 2008;2:024. [37] Changming L, Hongxing Z. Trend analysis of hydrological components in the yellow river basin. J Natur Resour 2003;18(2):129–35. [38] C T, T QW. SPSS PASW Statistics. Wuhan: Wuhan University Press; 2014.

Please cite this article as: S. Zhou et al., A forecasting method for Chinese civil planes attendance rate based on vague sets, Chaos, Solitons and Fractals (2016), http://dx.doi.org/10.1016/j.chaos.2016.02.037