8th 8th Vienna Vienna International International Conference Conference on on Mathematical Mathematical Modelling Modelling February 18 -- 20, University of Vienna, February 20, 2015. 2015. Vienna Vienna University of Technology, Technology, 8th Vienna18International Conference on Mathematical Available onlineModelling at Vienna, www.sciencedirect.com Austria Austria February 18 - 20, 2015. Vienna University of Technology, Vienna, Austria
ScienceDirect
IFAC-PapersOnLine 48-1 (2015) 850–855
A A Reliability Reliability Measure Measure for for Time Time Series Series Forecasting Forecasting Predictor Predictor A Reliability Measure for Time Series Forecasting Predictor K.M. K.M. George, George, N. N. Park, Park, & & Zhuxi Zhuxi Yang Yang Computer Science Department K.M. George, Science N. Park,Department & Zhuxi Yang Computer Oklahoma State Computer Science Department Oklahoma State University University Stillwater, OK 74078 Oklahoma State Stillwater, OKUniversity 74078 {kmg, npark, zhuxi}@cs.okstate.edu Stillwater, OK 74078 {kmg, npark, zhuxi}@cs.okstate.edu {kmg, npark, zhuxi}@cs.okstate.edu ABSTRACT ABSTRACT ABSTRACT Several Several time time series series forecasting forecasting methods methods can can be be found found in in the the literature. literature. Most Most methods methods depend depend on on aa predictor predictor of of some kind to estimate parameters and the forecast values. However, no measures are available in the literature Several time series forecasting methods can be found in the literature. Most methods depend on a predictor of some kind to estimate parameters and the forecast values. However, no measures are available in the literature for the reliability of the predictor. This paper proposes that forecasts be viewed as functions in a Statistical some to estimate parameters thepaper forecast values.that However, no be measures literature for thekind reliability of the predictor.and This proposes forecasts viewedare as available functions in inthe a Statistical Metric (SMS) aa Statistical Semi-metric Space and aa method to reliability for the Space reliability of or thein This paper proposes that forecasts be viewed as functions in aaa Statistical Metric Space (SMS) or inpredictor. Statistical Semi-metric Space (SSMS) (SSMS) and suggests suggests method to estimate estimate reliability measure of predictor. statistical metric/semi-metric space is from time-series data. Metric Space (SMS) or in aA Space (SSMS) suggests a method to estimate a reliability measure of the the predictor. AStatistical statisticalSemi-metric metric/semi-metric space and is constructed constructed from the the time-series data. A A method isofproposed proposed to construct construct the distribution distribution function of ofspace the SMS/SSMS SMS/SSMS in aa from natural way that can can quantify quantify measureis the predictor. A statistical metric/semi-metric is constructed theway time-series data. A method to the function the in natural that the of forecast. The method presented in is as aa computer method is proposed construct distribution function the SMS/SSMS in a natural that canprogram. quantify the reliability reliability of the the to forecast. Thethe method presented in this thisofpaper paper is easy easy to to implement implement as way computer program. the reliability of the forecast. The method presented in this paper is easy to implement as a computer program. © 2015, IFAC Time (International of Automatic Control) Hosting by Elsevier All rights reserved. Keywords: Time Series,Federation Forecasting, Concordance, Statistical MetricLtd. Spaces, Kendall’s ,, GINI’s Keywords: Series, Forecasting, Concordance, Statistical Metric Spaces, Kendall’s GINI’s mean mean difference. Keywords: Time Series, Forecasting, Concordance, Statistical Metric Spaces, Kendall’s , GINI’s mean difference. difference. determine 11 INTRODUCTION determine aa pair pair of of values values (p, (p, q) q) which which determine determine INTRODUCTION the degrees degreesaof ofpair predictor function whose parameters determine of values (p, q) which determine the predictor function whose parameters 1 INTRODUCTION A are estimated using non-linear optimization A time time series series is is aa sequence sequence of of observations observations that that are are the degrees of predictor function whose parameters are estimated using non-linear optimization measured at times One methods. The that the measured at successive successive times [2]. One of of the the most A time series is a sequence of [2]. observations thatmost are are estimated usingsegment non-linear optimization methods. The lagged lagged segment that determine determine the common of series values (p, q) may be viewed as a predictor. In [6], common examples examples of time time series is the the closing price measured at successive times [2].is Oneclosing of the price most methods. lagged segment determine the values (p, The q) may be viewed as athat predictor. In [6], of stock stock examples market indices indices such asisS&P S&P 500 index, index, the closing price aa concordance method outlined as an of market 500 common of timesuch seriesas values (p, q) maybased be viewed as ais concordance based method ispredictor. outlined In as [6], an Dow Jones index etc. Time series analysis refers to of stock market indices such as S&P 500 index, alternative to correlation based approach in Dow Jones index etc. Time series analysis refers to aalternative concordance methodbased is outlined as an to based correlation approach in methods used to develop models for analyzing the Dow Jones index etc. Time series analysis refers to ARIMA. The The drawback based associated to this this methods used to develop models for analyzing the alternative to correlation approach in ARIMA. drawback associated to data to meaningful and methods to develop modelsstatistics for analyzing the forecasting methodology and others is that no data and and used to extract extract meaningful statistics and other other ARIMA. The drawback associated to this forecasting methodology and others is that no characteristics of data. series data and to extract and other reliability are to characteristics of the themeaningful data. Time Timestatistics series forecasting forecasting forecasting methodology and others that no reliability measures measures are associated associated to the theispredictor. predictor. is forecast based characteristics of the to data. Timefuture seriesvalues forecasting is the the use use of of aa model model to forecast future values based reliability measures are associated tofor the predictor. In In this this paper, paper, we we propose propose aa method method for associating associating on some known past values before they are is use of a model forecast future values based onthe some known pasttodata data values before they are a measure of reliability to a predictor based on a measure of reliability to a predictor based on the the In this paper, we propose a method for associating measured. Basically, aa time series is set of on some known past data values before measured. Basically, time series is aathey set are of statistical metric space theory. We illustrate it a measure of reliability to a predictor based on the statistical metric space theory. We illustrate it observations xxtt recorded at t. measured. a time series a set the of observations Basically, recorded at time time t. isSince Since the utilizing measures of concordance. In order to statistical metric space theory. We illustrate it utilizing measures of concordance. In order to possible future observations xtt recorded at nature time t. of the possible unpredictability unpredictability nature ofSince future accomplish this, we assume a forecasting approach utilizing measures of concordance. In order to accomplish this, we assume a forecasting approach observations, it to each possible unpredictability of future observations, it is is natural natural nature to suppose suppose each that determines predictor and accomplish this, weaaassume a forecasting approach that determines predictor and constructs constructs observation x is a sample of random variable X . t observations, is natural to suppose observation xtt isita sample of random variable Xeach t. forecasting functions that map the predictor to that determines a predictor and constructs forecasting functions that map the predictor to aa observation xtt istoastudy sample of randomcan variable Xtt. Methods used time-series be broadly “current segment” / target and then extend the map Methods used to study time-series can be broadly “current segment” / target the map forecasting functions that and mapthen theextend predictor to a classified as methods and nonMethods study time-series can be to obtain obtainsegment” forecast/ target values. There are the several classified used as toparametric parametric methods andbroadly non“current and then extend map to forecast values. There are several parametric [4]. parametric and classified as parametric methods and nonadvantages to the parametric methods methods [4]. Several Several parametric and to obtain forecast values. approach There areover several advantages to the the assumed assumed approach over the non-parametric modeling and forecasting methods parametric methods [4]. and Several parametric and previous methods. First, it is computationally non-parametric modeling forecasting methods advantages to the assumed approach over the previous methods. First, it is computationally can literature. existing models non-parametric modeling andSome forecasting straightforward. Second, it provides a method can be be found found in in the the literature. Some existingmethods models previous methods. First,it itprovides is computationally straightforward. Second, a method to to are be Autoregressive Integrated Moving Average can found in the literature. Some existingAverage models associate a probability measure to the predictor. are Autoregressive Integrated Moving straightforward. Second,measure it provides a method to associate a probability to the predictor. Model (ARIMA), and Neural are Autoregressive Integrated Model, Moving Third, transforms (such Box-Jenkins Model (ARIMA), Box-Jenkins Box-Jenkins Model, andAverage Neural associate probability predictor. Third, no no adata data transformsmeasure (such as astoin inthe Box-Jenkins Network Model The series be Model (ARIMA), Box-Jenkins and may Neural model)no aredata needed to be be performed performed onBox-Jenkins the original original Network Model etc. etc. The time time Model, series data data may be Third, transforms (such as in model) are needed to on the stationary or non-stationary as well as seasonal or Network etc. The time seriesasdata may be series to eliminate seasonal and non-stationary stationaryModel or non-stationary as well seasonal or model) are needed to be performed on the original series to eliminate seasonal and non-stationary non-seasonal. The and stationary or non-stationary as well seasonal or factors. Fourth, the predicted values are not non-seasonal. The ARIMA ARIMA and asBox-Jenkins Box-Jenkins series to eliminate seasonal and non-stationary factors. Fourth, the predicted values are not biased biased models are models based non-seasonal. The ARIMA and Box-Jenkins by generated The approach models are parametric parametric models based on on factors. Fourth, the predictedvalues. values are biased by the the previously previously generated values. Thenot approach autocorrelation. Autocorrelation and partial models are parametric models and basedpartial on proposed in paper from autocorrelation. Autocorrelation by the previously values. patterns The approach proposed in this this generated paper generates generates patterns from autocorrelation with a lagged series are used to autocorrelation. Autocorrelation and partial segments (the current segment will be the segment autocorrelation with a lagged series are used to proposed in this paper generates patterns from segments (the current segment will be the segment autocorrelation with a lagged series are used to segments (the current segment will be the segment 2405-89632015, © 2015, IFAC (International Federation of Automatic Control) Hosting by Elsevier Ltd. All rights reserved. Copyright 850 Copyright © © 2015, IFAC IFAC 850 Peer review under responsibility of International Federation of Automatic Control. Copyright © 2015, IFAC 850 10.1016/j.ifacol.2015.05.078
MATHMOD 2015 February 18 - 20, 2015. Vienna, Austria
K.M. George et al. / IFAC-PapersOnLine 48-1 (2015) 850–855
containing the most recent data item available) of time series and uses pattern matching on the generated patterns to identify a ‘best predictor’. The process of deciding the lag based on autocorrelation in the Box-Jenkins method is not computationally straightforward. The proposed method specifies an easy to implement scheme for the development of algorithms for determining a predictor. The proposed method also provides a method to develop a measure of expected confidence of the forecast by associating a probability distribution with the predictor. Also, in our view, the framework outlined in this paper avoids the computational inefficiencies of other methods of forecasting. A concrete version of this approach is first proposed in [8]. In [8], three concordance measures have been combined to determine a predictor. This paper formalizes the approach and identifies the concepts of SMS on which the forecasting method is based. We also provide empirical results that support our approach. In the next section, we list several definitions we have used in the proposed method. 2
variables defined for 0 1 and 0 1 such that a) 0 T ( , ) 1
(2)
3. ( x; p, q) = ( x; q, p)
(3)
c) T ( , ) = T ( , )
(6)
d) T (1,1) = 1
(7)
e) If 0 , then T ( ,1) 0
(8)
The function T provides a statistical generalization of the triangular inequality in metric spaces. The pair ( S , ) is called a
statistical metric space. If satisfies condition 1-3, we call ( S , ) a statistical semi-metric space (SSMS). The function is interpreted as the probability that the distance between p and q is less than x. In proposition 2, we define a SSMS that is utilized in the development of reliability measure for the forecasting method. The following proposition defines a set of properties of functions that could be used to develop SSMSs. Proposition 1: Let S be any set and be a positive valued function defined on S S such that ( p, p) 1, ( p, q) 1 if p q , and ( p, q) = (q, p). Define (x, p, q) as follows:
(x, p, q) = { 0
if x < 0,
(1+x)(p,q) if 0 x 1/(p,q) – 1, 1
if x > 1/(p,q) -1 }.
Then ( S , ) is a statistical semi-metric space. Proof: ( x; p, q) satisfies SSMS (1) – (3) of Definition 1.
Definition 1: (Statistical Metric Space) Let S be a set of points, and ( x; p, q) be a probability distribution function of the variable x associated to two points p and q such that p, q S . The function ( x; p, q) satisfies the following conditions: 2. If p q , then (0; p, q) 1
(5)
α or β
The goal of this paper is to develop a time series forecasting framework/methodology that can be based on sound fundamentals to construct a forecasting function and a measure of reliability of the forecast outcome by associating a reliability measure to the predictor. To develop the methodology, we adopt ideas from statistical metric spaces, concordance, and known techniques for programs to estimate functions. We start with the definition of Statistical Metric Space (SMS) [11, 12, 13, 14] and define other concepts related to our proposed method. We adopt the definitions given in [14].
(1)
(4)
b) T is non-decreasing in either variable
RELATED DEFINITIONS
1. (0; p, p) 1
851
Now, we define a SSMS that will be used in section 3. Let Pn denote the set of n-tuples whose elements are in {-1, 0, 1}. We can define a statistical semimetric space on the set Pn. Definition 2: We call the set Pn described above a pattern space of dimension n. Let p ( x1 , x 2 ,....x n ) and q (y1 , y 2 ,...., y n ) be two n-tuples in Pn and let 𝑝𝑝𝑝𝑝 be defined as:
𝑝𝑝𝑝𝑝 =
4. T ( ( x; p, q), ( y; q, r )) ( x y; p, r ) ,
∑𝑛𝑛𝑖𝑖 𝑧𝑧𝑖𝑖⁄ 𝑛𝑛 ,
𝑤𝑤ℎ𝑒𝑒𝑒𝑒𝑒𝑒
𝑧𝑧𝑖𝑖 = 1 𝑖𝑖𝑖𝑖 𝑥𝑥𝑖𝑖 = 𝑦𝑦𝑖𝑖 , 0 𝑜𝑜𝑜𝑜ℎ𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒
where T ( , ) is a function of two
(9)
It is obvious that 0 pq 1 and pq = qp. Also, 851
MATHMOD 2015 February 18 - 20, 2015. Vienna, Austria 852
K.M. George et al. / IFAC-PapersOnLine 48-1 (2015) 850–855
pp = 1. Therefore, by proposition 1 we have the
2i , j sign[( x j xi )( y j yi )]
following:
pq
Proposition 2: Let (x, p, q) be a function defined as follows:
-1 ≤ ≤ 1.
(x, p, q) = { 0
if x > 1/pq -1}.
Then ( Pn , ) is a statistical semi-metric space. Following [2], can be viewed as the probability that distance between p and q is less than x.
Kendall’s subtracts the number of discordant pairs from the concordant pairs. If we count only the concordant pairs, then the ratio in (11) will vary between 0 and 1. So, define pq as a variation of
Definition 3: Now, we define a statistical metric space as shown in [13]. 0, 𝑥𝑥 ≤ 𝑎𝑎 Given a function 𝜀𝜀𝑎𝑎 = { } , 1, 𝑥𝑥 > 𝑎𝑎 a metric space (S, d) with metric d, and a distribution function G(x) such that G(0) = 0 and G(x) > 0 for x >0, then we can define a family of distribution functions, Fpq , associated to pairs of points p, q in S as follows: Fpq = ε0 if p = q Fpq(x) = G(x/d(p,q)) if p q (10) Then (S, F) is a statistical metric space. This definition can be used to construct SMSs from a set S, if a metric and a distribution function are known.
that varies from 0 to 1 as follows: pq
3 THE FORECASTING METHODOLOGY In this section, we describe our forecasting framework/method for predicting future values of a time series. The definitions of the previous section are used in predictor construction and defining an associated reliability measure. The approach taken has an algorithmic slant, we interpret a time series T as a one-directional stream of numbers being printed, one at a time at uniform time intervals. Then, the forecasting problem is to estimate a set of (say m) numbers to be printed next. From the known stream, we construct a set of n-tuples of the form p p1 , p2 ,...., pn . Each tuple is a continuous segment of the already printed stream and is a subsequence of the stream. Let Sn denote the set of all n-tuples constructed from T. Let t be the member of Sn consisting of the most recently printed n numbers. We call t the target sequence. We call Sn –{t}a predictor set. The basic idea is to identify a member p in Sn – {t} similar to the target t that is the best predictor candidate (predictor tuple). Then, find a mapping h : p t (prediction/forecasting map) which is extended to calculate the future values of T as the forecast. We
define | g(p)- g(q)| as a measure of closeness of dispersion in xi and yi as i varies from 1 to n. Definition 5: Kendall’s is a measure of concordance used in [5]. Let p ( x1 , x 2 ,....x n ) and
q (y1 , y 2 ,...., y n ). Also, xi and yi are non-
(12)
define a distribution function associated to p and q.
In [1], it is shown that g(X) can be interpreted as a measure of concordance for two sets of observations. In this paper, for two n-tuples p ( x1 , x 2 ,....x n ) and q (y1 , y 2 ,...., y n ) , we
Kendall’s
,
the next section we show that pq can be used to
We can interpret g as a measure of dispersion of the observations [3].
define
i, j
It is obvious that pq and pq increase together. In
Definition 4: For n observations x1, x2, …, xn of a quantitative variable X, Gini’s mean difference [1] can be defined as follows: 𝑔𝑔(𝑋𝑋) = ∑1≤𝑖𝑖<𝑗𝑗≤𝑛𝑛|𝑥𝑥𝑖𝑖 − 𝑥𝑥𝑗𝑗 | ⁄ 𝑛𝑛2 .
we
2 t ij
n(n 1) 1; if ( xi x j )( yi y j ) 0 where t ij . 0; otherwise
The next two definitions are useful to define SMS structure on sets of tuples. The measures defined can be interpreted as measures of closeness when tuples are viewed as points in a space. We could use Kendall’s 𝜏𝜏 to define a distribution for constructing a SMS.
negative,
(11)
If pq = 1, then p and q has perfect concordance, and their slopes are both positive/negative when plotted in a graph. GMD, g, can be interpreted as the degree of dispersion or volatility in the segment [15]. So, by combining the two measures g and , a good predictor can be chosen for time series forecasting.
if x < 0,
(1+x)pq if 0 x 1/pq – 1, 1
n(n 1)
as 852
MATHMOD 2015 February 18 - 20, 2015. Vienna, Austria
K.M. George et al. / IFAC-PapersOnLine 48-1 (2015) 850–855
define a SMS structure on Sn with a distribution ( x; p, q) where p, q Sn. Then ( x; p, t ) is a measure of reliability of the forecast. The steps of the methodology and an implementation are given in the sections to follow.
zk ← 1 if xi < xj end; end; The pattern space is M(Sn). Members of M(Sn) are tuples of length n(n-1)/2 whose elements are in {-1, 0, 1}. It is possible to define several distance measures on M(Sn) so that M(Sn) is a metric space. In this paper we define the metric 𝑑𝑑(𝑝𝑝, 𝑞𝑞) = ∑𝑛𝑛1 |𝑥𝑥𝑖𝑖 − 𝑦𝑦𝑖𝑖 |. It is obvious that d is a metric and (M(Sn), d) is a metric space.
3.1 A Proposed Implementation Approach The method described above outlines the methodology in general. An approach to implement the methodology is given below: Let T be a time series. We construct a SMS from T. Construct Sn as described previously. Let t be the tuple consisting of the most recent n elements of the time series T and p Sn be a predictor. We interpret a forecast as an extension of a mapping h : p t ,
Algorithm2: Determination of predictor p Let P = {q | q t and M(q) = M(t)} /* q and t are in the same cluster */ Case 1: P { }; i.e. tq = 1 Let t t1 , t 2 ,...., t n and
where p x1 , x2 ,....., xn , to . Specific steps are given below: First, a pattern space of dimension n(n-1)/2 is constructed from Sn. This could be a many-one mapping M, from Sn to Pn. (This essentially partitions Sn into clusters.) Choose a predictor p such that the distance between M(t) and M(p) is a minimum (assuming a suitable metric). Once a predictor p is determined, construct a mapping h : p t and a distribution ( x; p, t ) which
q x1 , x2 ,....., xn P
Let q be such that |g(q) – g(t)| is minimum Set p = q Case 2: P = { } j = 1; while (P = {}) do Add to P all elements q of M(Sn) such that d(t, q) = j j = j + 1; end; P consists of all elements M(q) such that M(q)M(t) as defined in definition 2 is maximum. Let q be such that |g(q) – g(t)| is minimum Set p = q Algorithm 2 chooses a predictor p for t that has the highest concordance as defined by τ and the lowest difference in volatility as defined by GMD. Having determined a predictor, the next step is to construct a mapping from the predictor to the target.
imposes a SMS structure on Sn. The distribution will be interpreted as a confidence measure. In short, the methodology consists of four steps namely, predictor set construction, determination of the optimal predictor, construction of prediction map, and construction of a distribution function. A set of algorithms are proposed below to implement the method. The time complexities of Algorithm 1 and Algorithm 2 are O(n2) where n is the length of the target sequence. The complexity of Algorithm 3 depends on the method used to construct the mapping function. So, the complete process will have a minimum complexity of n2.
Algorithm 3: Mapping construction Step 1. Let p x1 , x2 ,....., xn ϵ Sn be a predictor for t. Step 2. Construct a mapping h : S n S n such that
|t n
1
i
g ( xi ) | is a minimum.
Step 3. Let forecast of p for m time periods be
Algorithm1: Construction of pattern space Let p x1 , x2 ,....., xn be an element of Sn. Let zk be the kth element of M(p). Construct zk as follows: k ← 0; for i = 1 to n-1 for j = i+1 to n k ← k+1 zk ← -1 if xi > xj zk ← 0 if xi = xj
h( xn1 ),...., h( xnm ).
There are several possible approaches available for the construction of a mapping h. In [8], a predictor was determined using a weighted average of the concordance measures τ, Spearman’s ρ, and GMD. Weights were determined by trying different values. A genetic programming approach was used for mapping construction. According to the authors, forecasting results have shown very good 853
853
MATHMOD 2015 February 18 - 20, 2015. Vienna, Austria 854
K.M. George et al. / IFAC-PapersOnLine 48-1 (2015) 850–855
shown in Figures 1 and 2 respectively. The results for a one year period in 2008 is given in Figure 1 and a one year period in 2009 is given in Figure 2. These periods are chosen as representative of declining and increasing index periods. The graphs (Figures 1 and 2) show the behavior of 𝜏𝜏 and the forecast error. We computed the Mean Absolute Percent Error (MAPE) to measure the forecast error which is the most common measure of forecast error. As can be observed from the figures, the error decreases when 𝜏𝜏 increases most of the time. In the 2008 data (Figure 1), the correlation between 𝜏𝜏 and MAPE is -0.11 and in the 2009 data (Figure 2), the correlation is -0.36. It provides justifiable empirical evidence that concordance measure used to determine the predictor can be interpreted as an estimate of the reliability of the forecast.
performance in trend prediction. Several other approaches such as Neural Network or transfer functions also can be followed to construct the function h. In section 4, we provide experimental results based on the approach followed in [8]. Experimental results provide support to the proposed approach. Based on the definitions given in section 2, we propose to construct distribution functions (x, p, t) that can be interpreted as a measure of reliability of the predictor p. 3.2 Probability distributions In this section, we outline two methods to construct probability distributions depending on the predictor that can be used as a confidence measure of the forecast. One method builds a SSMS from Sn and the second method builds a SMS from M(Sn).
Tau vs MAPE 0.8
Method 1:
0.7
Define (x, p, q) as
0.6
if x < 0,
0.5 0.4
(1+x)pq if 0 x 1/pq - 1
0.3
1 if x > 1/pq -1}, pq is defined by Equation (12). By proposition2, (Sn, ) is a SSMS. Then (x, p, t) is the probability that the distance between p and t is less than x.
0.2 0.1
4
Tau
12/4/2008
11/4/2008
10/4/2008
9/4/2008
8/4/2008
7/4/2008
6/4/2008
5/4/2008
4/4/2008
Method 2: With the metric 𝑑𝑑(𝑝𝑝, 𝑞𝑞) = ∑𝑛𝑛1 |𝑥𝑥𝑖𝑖 − 𝑦𝑦𝑖𝑖 |, (M(Sn), d) is a metric space. Following definition 3, we can define a SMS (M(Sn), Fpq). Then Fpq(x) is the probability that the distance between p and q is less than x.
3/4/2008
1/4/2008
0 2/4/2008
(x, p, q) = { 0
Error
Figure 1. Relation between τ and MAPE based on one year forecast data in 2008. Comparison of Tau (Series 1) and MAPE (Series 2) Forecasting Period: 10 day periods in 2009
EXPERIMENTAL RESULTS 1.0000
As described in the previous sections, the Kendal’s 𝜏𝜏 (due to its association with pq) can be associated to a distribution that defines a SSMS on Sn. So, it can be interpreted as a measure of reliability of the predictor. Therefore we have conducted empirical analysis to determine the effectiveness of 𝜏𝜏 as a measure of predictor reliability and its association to forecast produced. The prior forecasting algorithm presented in [8], is consistent with the methodology outlined in this paper. So, we use it for our analysis. The time series used is S&P index closing price. Analysis is performed on several time periods of S&P closing price. Two cases are chosen as representative results. They both are
0.8000 0.6000 0.4000 0.2000 0.0000
Series1
Series2
Figure 2. Relation between τ and MAPE based on one year forecast data in 2009. 854
MATHMOD 2015 February 18 - 20, 2015. Vienna, Austria
5
K.M. George et al. / IFAC-PapersOnLine 48-1 (2015) 850–855
CONCLUSION
Forecasting Based on Genetic Algorithm. In proceedings of the 23rd International Conference on Computer Applications in Industry and Engineering. Las Vegas, USA. [9] Menger, K., Statistical Metrics (1942), Proceedings of the National Academy of Sciences of the United States of America, 28, 535-537. [10] Scarsini, M. (1984). On measures of concordance, Stochastica VIII, 201-218. [11] Schweizer, B. and Sklar, A., Statistical Metric Spaces, Pacific Journal of Mathematics, 03/1960, Volume 10, Issue 1, 313-334. [12] Schweizer, B. and Sklar, A., Statistical Metric Spaces Arising From Sets of Random Variables in Euclidean n-Space (1962), Theory of Probability and its Application, Vol.7, Issue 4, 447-456. [13] Schweizer, B., Sklar, A. (1983). Probabilistic Metric Spaces, North-Holland, New York. [14] Wald, Abraham, On a Statistical Generalization of Metric Spaces (1943), Proceedings of the National Academy of Sciences of the United States of America, Vol. 29, No. 6, 196-197. [15] Yitzhaki, S. and Schechtman, E. (2013), The Gini Methodology A Primer on a Statistical Methodology, Springer New York Heidelberg Dordrecht London.
In this paper we present a new method for forecasting time series data. The proposed method consists of a predictor and a forecasting function. We propose an approach for stock index data that identifies a predictor series and a forecasting function. Based on the theory of statistical metric space, we develop a confidence measure for the predictor. Previous methods do not attempt to provide a reliability measure for the predictor. We propose algorithms for predictor and confidence measure constructions. The major advantage of the proposed method is that it is straightforward to implement it as a computer program. Experimental results demonstrate support for the reliability measure in relation to forecast error. As future work, we plan to analyze several distributions for the construction of SMS from M(Sn) and their predictive capabilities. We also plan to investigate other related variables that could strengthen the confidence measure. 6 REFERENCES [1] Borroni, Claudio Giovanni and Zenga, Michele, A test of concordance based on Gini’s mean difference, Stat. Meth. & Appl. (2007) 16:289–308. [2] Box, George E.P., Jenkins, Gwilym M., and Reinsel, Gregory C., Time Series Analysis Forecasing and Control, 3rd edition, Prentice_Hall, 1994. [3] Cerone, P. and Dragomir, S.S., Bounds for the Gini mean difference of an empirical distribution, Applied Mathematics Letters 19 (2006) 283–293. [4] Chen, G., Abraham, B., Bennett G. W. (1997). Parametric and Non-Parametric Modelling of Time Series - An Empirical Study. EnvironMetrics, 8, 63-74. [5] Christian GENEST, Jean-FranGois QUESSY and Bruno REMILLARD, Tests of serial independence based on Kendall’s process, The Canadian Journal of Statistics, Vol. 30, No. 3.2002, Pages441-461. [6] Ferguson, Thomas S., Genest, Christian, and Hallin, Marc, Kendall’s tau for Serial Dependence, The Canadian Journal of Statistics, Vol. 28, No. 3, 2000, Pages 587-604. [7] Kendall, M. (1938). A New Measure of Rank Correlation. Biometrika, 30, 81—89. [8] Khadka, M. S., Popp B, George, K. M., Park N. (2010). A New Approach for Time Series 855
855