Neurocomputing 175 (2016) 924–934
Contents lists available at ScienceDirect
Neurocomputing journal homepage: www.elsevier.com/locate/neucom
Time series shape association measures and local trend association patterns Ildar Batyrshin a,n, Valery Solovyev b, Vladimir Ivanov b a b
Centro de Investigación en Computación, Instituto Politécnico Nacional, México Kazan Federal University, Russia
art ic l e i nf o
a b s t r a c t
Article history: Received 2 December 2014 Received in revised form 5 May 2015 Accepted 11 May 2015 Communicated by Chennai Guest Editor Available online 7 November 2015
The paper gives the new definition of non-statistical time series shape association measures that can measure positive and negative shape associations between time series. The local trend association measures based on linear regressions in sliding window are considered. The methods of extraction and presentation of positive and negative local trend association patterns from the pairs of time series are described. Examples of application of these methods to analysis of associations between securities data from Google Finance and between exchange rates are discussed. It was shown on the benchmark example and in the analysis of real time series that the correlation coefficient in spite of its fundamental role in statistics does not useful here and can cause confusion in analysis of time series shape similarity and shape associations. & 2015 Elsevier B.V. All rights reserved.
Keywords: Time series shape association measure positive and negative associations local trend association Google Finance exchange rates Pairs Trading
1. Introduction Many time series similarity measures have been introduced in time series data mining during the last two decades [1–11]. These measures usually used in time series clustering and similarity search in time series databases. The following examples of the similarity queries over sequence databases have been mentioned in [1]:
Identify companies with similar pattern of growth; Determine products with similar selling patterns; Discover stocks with similar movement in stock prices. In [12,13], it was pointed out the need in the measures of associations that additionally to similarity between time series could measure inverse relationships between them. In relation to considered above examples such measures could be used for finding rival companies or products with inverse movements in time series when the rising patterns of one time series correspond to falling patterns of another one. There are many economic, financial, industrial, ecological systems that contain changing in time elements or characteristics such that an increase in the values n
Corresponding author. E-mail addresses:
[email protected] (I. Batyrshin),
[email protected] (V. Solovyev),
[email protected] (V. Ivanov). http://dx.doi.org/10.1016/j.neucom.2015.05.127 0925-2312/& 2015 Elsevier B.V. All rights reserved.
of one of them happens together with a decrease in the values of another one: prices and sales, the sales volumes of rival companies, the wind velocity and air pollution concentration etc. In [12,17,18,23], the measures of local trend associations (LTA) based on Moving Approximation Transform (MAT) have been introduced and examples of their application to analysis of possible relationships between elements of economic, financial and ecological systems have been considered. In [14], the general methods of construction of time series shape association measures have been introduced. These methods can generate the sample Pearson's correlation coefficient as a particular case. The generalization of the concept of association measure on a set with involution operation have been done in [15]. The analysis of the axioms of association measures have been done in [13,16]. This paper reconsiders the previous definition of time series shape association measure [13,14]. The new definition includes in explicit form the subset where such measure is defined. In the paper such subsets are determined for several shape association measures. The methods of construction of associated patterns in time series studied in [17,18] are extended here and included in the general framework of local trend association measures. The examples of application of these methods in finance are considered. The methods are demonstrated on example of time series of end-of-day prices of securities downloaded from Google Finance [19] and on time series of exchange rates from [33]. The paper is organized as follows. The new definition of time series shape association measure is given in Section 2. The sample
I. Batyrshin et al. / Neurocomputing 175 (2016) 924–934
Pearson's correlation coefficient is considered in Section 3. The local trend association measure and its domain are studied in Section 4. Up–Down trend association measure is described in Section 5. The methods of construction of Up–Down shape association patterns are studied in Section 6. Sections 7 and 8 consider examples of applications of the discussed methods and measures. The last section contains discussions and conclusions.
2. Definition of the time series shape association measure A time series of length n, (n4 1), is a sequence of a real values x ¼(x1, …, xn) given at time points t¼(1, …, n). Denote T the set of all time series of the length n. Suppose p, q are real values and p a0. For all x, y from T define x þy¼ (x1 þ y1, …, xn þyn) and py þq ¼(py1 þ q, …, pyn þ q). Denote q(n) a constant time series of the length n with all elements equal to q. We will write x ¼const if x ¼q(n) for some q, and x aconst if xi axj for some i aj from {1, …, n}. From definitions above it follows: pxþ q¼ pxþq(n). Denote TC a set of all constant time series from T and TNC ¼T\TC. Definition 1. Suppose V is a subset of T such that: from x A V it follows –x A V and x þ q A V for all real q:
ð1Þ
A function A:V V -[–1,1] satisfying on V the properties:
ðinverse relationshipÞ
Aðx þ q; yÞ ¼ Aðx; yÞ; for all real q;
is called an association measure on V. If for V it is fulfilled: ð2Þ
and A satisfies on V the property: Aðpx; yÞ ¼ Aðx; yÞ; for all p 40;
Proof. Suppose Proposition 1 does not true, i.e. A is an association measure on V and V contains constant time series x¼ (x1,…, xn) ¼ (s,…,s) where s is some real value. For q¼ –2s we have xþ q¼ x– 2s ¼(x1–2s,…, xn–2s) ¼(s–2s,…,s–2s) ¼(–s,…,–s)¼–x and from (1), translation invariance, from xþ q¼ –x, symmetry, inverse relationship and reflexivity of A we obtain: A(x,x)¼ A(x þq,x)¼ A(–x, x) ¼–A(x,x)¼ –1, that contradicts to reflexivity of A. The obtained contradiction proves the Proposition □ In this paper we will consider only non-constant time series. If it is necessary, the definition of the specific time series shape association measure can be extended on constant time series in a suitable way but in this case some axioms of association measure will be not fulfilled for constant time series. From the inverse relationship and reflexivity of association measure it follows for all x∊V: Aðx; –xÞ ¼ –1
ðinverse ref lexivityÞ
From the reflexivity, translation invariance and scale invariance it follows for all x∊V and all real q, p: Aðx þ q; xÞ ¼ 1; Aðpx; xÞ ¼ 1; if p 4 0:
x px þ q;
ðtranslation invarianceÞ
from x A V it follows pxA V for all p 4 0
Proposition 1. If A is an association measure on V then V DTNC, i.e. V does not contain constant time series.
These two properties together with the properties of association measure given in Definition 1 describe equivalence classes of time series with respect to the scale invariant shape association measure A, that can be written as:
Aðx; yÞ ¼ Aðy; xÞ; ðsymmetryÞ Aðx; xÞ ¼ 1; ðreflexivityÞ Að–x; yÞ ¼ –Aðx; yÞ;
925
ðscale invarianceÞ
if p 40:
It means that in calculation of time series shape association measures we can replace any time series x by time series px þq if p4 0. Generally, from the Definition 1 it follows that for all real q1, q2, p1, p2 such that p1, p2 a 0 a scale invariant shape association measure satisfies the property: Aðp1 x þ q1 ; p2 y þ q2 Þ ¼ signðp1 Þ U signðp2 Þ U Aðx; yÞ; where
then A is called a scale invariant association measure. The first two properties of the association measure in Definition 1 are often considered as the properties of a similarity measure. The inverse relationship property relates negative associations with positive associations (or similarities) of time series. The translation invariance property is considered [13] as the necessary property required from any reasonable time series shape association measure because the shape of the time series x does not changed if the constant value q is added to all values of x shifting them up or down on the same value q. The scale invariance of the time series shape association measure is not so evident property because the shape of time series is generally deformed when all its values are multiplied by some positive constant p. But this property is very useful in many applications if we want that the results of the time series analysis will not depend on the change of the units (for example from thousands to millions of dollars), on a normalization of time series values or on a transformation of different scales used for different time dependent variables into unified scale. Discussions of the axioms of association measures can be found also in [13,16]. In comparison with the previous definitions of the time series shape association measure considered in [13,14] we define it here on some set of time series V satisfying (1) (and may be (2)). As it will be shown further, the set V depends both on the properties of the time series shape association measure considered in the Definition 1 and on the properties of the function A proposed as an association measure.
8 > < 1; if s 40 0; if s ¼ 0 : signðsÞ ¼ > : 1; if s o0 In the following sections we will consider some examples of time series shape association measures. The general methods of construction of time series shape association measures can be found in [14].
3. The sample Pearson's correlation coefficient Consider the sample Pearson's correlation coefficient [20,21] applied to time series x,y∊T: n P
ðxi xÞðyi yÞ corr ðx; yÞ ¼ sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi: n n P P ðxi xÞ2 ðyi yÞ2 i¼1
i¼1
ð3Þ
i¼1
It is easy to show that the function A(x,y)¼corr(x,y) is a translation and scale invariant time series shape association measure on the set of all non-constant time series V¼ TNC. Note that for the constant time series the denominator in (3) equals to 0. The correlation coefficient is the measure of the strength of the linear relationship between variables [21]. The positive value of corr(x,y) implies that x and y are positively related and corr(x,y)o0
926
I. Batyrshin et al. / Neurocomputing 175 (2016) 924–934
implies that x and y are negatively related. But correlation coefficient generally is not suitable for measuring nonlinear relationship: [21] gives an example when two variables have the perfect nonlinear relationship yet correlation between them equals zero. Table 1 and Fig. 1. present an example showing that the correlation coefficient is also not so good for measuring time series shape similarity and associations. The time series x and y have the similar shapes but corr(x,y)¼ 0. The time series x and –y have negative associations, i.e. when x is increasing –y is decreasing and vice versa, but corr(x, y) ¼0. In the following sections we introduce time series shape association measures that appear to be better shape association measures than correlation coefficient.
criterion k1 i þX Q f i ; xW i ¼ ðai þ jbi xj Þ2 : j¼i
A sequence MATk(x) ¼(b1, …, bm) of slope values of moving approximations of time series x in the sliding window Wi of size k is called a moving approximation (MAP) transform (or simply MAT) of time series x. The slope values bi are called local trends. Suppose x ¼(x1, …, xn), y ¼(y1, …, yn) are two time series and MATk(x)¼(bx1, …, bxm), MATk(y)¼ (by1, …, bym), k A {2,…,n}, m ¼n k þ1, are MAP transforms of x and y. The values of MATk(x) and the coefficients of least square regression lines can be calculated as follows [12]: 6
4. The local trend association measure
bxi ¼
Consider the local trend association measure based on moving approximation transform studied in [12]. For a time series x of length n denote Wi ¼ (i, iþ 1, …, iþ k 1), iA {1, …, m}, m ¼n k þ1, the sliding window of size k A{2, …, n} and xW i ¼ ðxi ; xi þ 1 ; …; xi þ k 1 Þ the values of time series x in this window. Similarly to moving average [20] consider moving approximation of time series in sliding windows Wi defined by least square regression line fi ¼ ai þ bit, (i¼1, …, m), with parameters {ai,bi} minimizing the Table 1 Example of three synthetic time series with corr(x,y) ¼ corr(x, y) ¼ 0. i
1
2
3
4
5
6
7
8
9
10
x y –y
100 200 200
80 120 120
50 10 10
60 20 20
90 40 40
150 50 50
200 60 60
250 70 70
180 40 40
140 20 20
kP 1 j¼0
ð2j k þ 1Þxi þ j 2
kðk 1Þ
;
i A f1; …; mg
ð4Þ
axi ¼ xi bxi t i ; where t i ¼ 1k
i þP k1 j¼i
j, xi ¼ 1k
i þP k1 j¼i
xj .
The following function is called a measure of local trend associations: m P
bxi ∙byi i¼1 ltak ðx; yÞ ¼ cos ðMAT k ðxÞ; MAT k ðyÞÞ ¼ sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi : m m P 2 P 2 bxi ∙ byi i¼1
ð5Þ
i¼1
Denote TMk0 a set of all time series x such that MATk(x) ¼0(m), i.e. bxi ¼0 for all i¼1, …, m. Let us show that ltak is an association measure on the set of time series x such that MATk(x)a0(m).
Fig. 1. Example of synthetic time series from Table 1 with corr(x,y) ¼corr(x,–y) ¼0.
I. Batyrshin et al. / Neurocomputing 175 (2016) 924–934
Proposition 2. The function A(x,y) ¼ ltak(x,y) is a scale invariant shape association measure on V¼T\TMk0. Proof. Let us show the fulfillment of (1) and (2) for V. Suppose x∊V, i.e. MATk(x) a0(m), and bxi a0 for some i∊{1, …, m}. From (4) for y¼ –x it follows byi ¼– bxi a0 and hence y∊V. Let us show that for any real q for time series y¼x þq it is fulfilled byi a0 and hence y∊V. From (4) we have 6
kP 1 j¼0
byi ¼
ð2j k þ 1Þðxi þ j þ qÞ
þ
kP 1
ð2j k þ 1Þq
j¼0 2
ð2j k þ 1Þ
j¼0 2
kðk 1Þ
Bxi ∙Byi i¼1 ffi: AUDk ðx; yÞ ¼ sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi m m P P B2xi ∙ B2yi i¼1
ð6Þ
i¼1
where kX 1
1 ð2j k þ 1Þxi þ j A:
ð7Þ
j¼0
:
ð2j k þ 1Þ ¼ 0 and hence byi ¼bxi a0 and
j¼0
y¼x þq∊V: kX 1
kP 1
kP 1
m P
Bxi ¼ signðbxi Þ ¼ sign@
2
6q
trend values bxi given in (4) by their sign values Bxi:
0
ð2j k þ 1Þxi þ j kðk 1Þ
¼ bxi þ
kðk 1Þ
Let us show that
j¼0
¼
2
kðk 1Þ 6
6
kP 1
927
These values will be called Up–Down trends, or UD-trends for short [17] and AUDk(x,y) will be referred to as UD-trend association measure. Note that all Bxi take values in { 1,0,1} and Bxi ¼0 if and only if bxi ¼ 0. Taking this into account from Proposition 2 we obtain Proposition 3. The function A(x,y) ¼AUDk(x,y) is a scale invariant shape association measure on V ¼T\TMk0.
kX 1
ðk 1Þk ðk 1Þk ¼ 0: ð2j k þ 1Þ ¼ 2 j ðk 1Þk ¼ 2 2 j¼0 j¼0 The fulfillment of (1) is proved. For y¼px, (p 40) from (4) it follows: byi ¼pbxi a0 and hence (2) is fulfilled. Symmetry and reflexivity of A(x,y)¼ ltak(x,y) follow from (5), inverse relationship follows from (4) and (5). Translation invariance of A(x,y)¼ltak(x,y) follows from byi ¼bxi for y¼x þq proved above. Scale invariance follows from (4) and (5) □ Note that V does not contain the constant time series that follows both from Proposition 1 and from (4) (it was shown in the proof). The local trend association measure evaluates associations between local trends and depends on the size of sliding window. For benchmark example from Table 1 the values of local trend association measure between time series for different size of sliding window are given in Table 2. As it was expected, this measure for small window sizes shows positive associations between time series x and y and negative associations between x and –y because small local trends follow the shapes of time series. Note that the time series x and y have opposite global trends: “generally” x is increasing and y is decreasing, for this reason for large window sizes the local trend association measure shows negative associations between x and y. An analysis of performance of local trend association measure on different examples and application of this measure to analysis of associations between elements of financial and economic systems can be found in [12].
For benchmark example from Table 1 the values of UD-trend association measure for different sizes of sliding window are given in Table 3. The association values equal to 1 for small window sizes k¼ 2, 3 that corresponds to our perceptions that time series x and y are highly positively associated because both of them synchronously move up and down. For time series x and –y the association values for small windows equal to 1 that reflect inverse dynamics of these time series. Below we consider the UD-trend association measure for window size k ¼2 that takes into account only qualitative information about the difference between neighboring values. For k ¼2 from (4), (6) and (7) it follows bxi ¼ xi þ 1 xi ;
ð8Þ
Bxi ¼ signðxi þ 1 xi Þ;
ð9Þ
m P
Bxi ∙Byi i¼1 ffi AUD2 ðx; yÞ ¼ sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi m m P P B2xi ∙ B2yi i¼1
ð10Þ
i¼1
Suppose g is a strictly increasing function on the set of real numbers, i.e. from u ov it follows g(u)o g(v). Define a transformation G on the set of time series T as follows: GðxÞ ¼ ðgðx1 Þ; …; gðxn ÞÞ:
ð11Þ
5. Qualitative trend association measure This method is based on the idea that the numerical values of the local trends of MAT considered in the previous section are replaced by numerical values 1, 0, 1 corresponding to qualitative evaluations “increasing”, “constant” or “decreasing”. Consider a new shape association measure obtained by replacing in MAP transform and in the local trend association measure (5) the local
Table 3 UD-trend associations for time series from Table 1 for all window sizes k. k
2
3
4
5
6
7
8
9
10
AUDk(x,y) AUDk(x,–y)
1 1
1 1
0.429 0.429
0.667 0.667
0.200 0.200
0 0
0.333 0.333
1 1
1 1
Table 2 Local trend associations for time series from Table 1 for all window sizes k. k
2
3
4
5
6
7
8
9
10
ltak(x,y) ltak(x,–y)
0.534 0.534
0.517 0.517
0.452 0.452
0.340 0.340
0.162 0.162
0.094 0.094
0.505 0.505
0.895 0.895
1 1
928
I. Batyrshin et al. / Neurocomputing 175 (2016) 924–934
Proposition 4. The association measure AUD2 ðx; yÞ is invariant under strictly increasing transformations (11), i.e. it is fulfilled: AUD2 ðGðxÞ; yÞ ¼ AUD2 ðx; yÞ: Proof. From (9) and definition of sign it follows BG(x)i ¼sign(g (xi þ 1) g(xi)) ¼sign(xi þ 1 xi) ¼Bxi, hence the association measure (10) will not change its value if the transformation G(x) will be applied □ As it follows from Proposition 4 the association measure AUD2 can be used for measuring associations between time series when it is available only qualitative information about time series values or slopes.
6. Up–Down shape association patterns In this section we introduce the method of construction of positively and negatively associated patterns based on analysis of UD-trends. Denote MATUDk(x) ¼(Bx1, …, Bxm) and MATUDk(y) ¼(By1, …, Bym) the sequences of Up–Down trends (7) obtained from MAP transforms for window size k. Define a time series shape association string as a result of the element-wise multiplication of UD-trends: SASk ðx; yÞ ¼ ðBx1 By1 ; …; Bxm Bym Þ ¼ ðC xy1 ; …; C xym Þ;
ð12Þ
where Cxyi ¼BxiByi for all i¼1, …, m. Note that Cxyi ¼ 1 if Bxi and Byi have the same sign, i.e. Bxi ¼Byi ¼1 or Bxi ¼ Byi ¼ –1. In such cases we will say that Bxi and Byi are positively associated. We have Cxyi ¼ –1 if Bxi and Byi have opposite signs. In this case we will say that Bxi and Byi are negatively associated. For completeness we can introduce 0-association if Bxi ¼0 or Byi ¼ 0. 0-associations can be considered as “positive”, as “negative” associations or ignored depending on applications. We will consider only positive and negative associations. The sequence of indexes Ljp ¼(j, …, jþ p) where j,p Z1 and jþp rn, will be called a positive sequence in SASk(x,y) if Cxyi ¼ 1 for all i¼j, …, jþ p and Cxyi o1, if i¼j 1 or i ¼jþp þ1. By other words, the positive sequences in SASk are maximal subsequences (j, …, jþp) in (1, …, m) such that Cxyi ¼1 for all i¼j, …, j þp. For such sequences a number p þ1 is called a length of the positive sequence. Note that the positive sequences of the small length can appear due to the random fluctuations in time series data. For this reason it is reasonable to suppose that these positive sequences characterize the positive local associations between time series if they sufficiently large. We will say that Ljp defines positively associated patterns in x and y if pþ 1 ZQ, where Q is some “large” constant such that 1o Qrm. Of course the value of Q depends on the properties of time series considered in applications. Similarly we will define a negative sequence in SASk as a maximal subsequence Ljp ¼(j, …, jþ p) in (1, …, m) such that Cxyi ¼ 1 for all i¼j, …, jþp, and we will say that Ljp defines negatively associated patterns in x and y if p þ1 ZQ. These patterns are given by sequences of least square regression segments fxi ¼axi þbxit, fyi ¼ ayi þbyit, defined on windows Wi ¼ (i, iþ1, …, iþk 1), iA{j, …, jþ p}. These least square regression segments will be called moving approximations. The sequence of initial points of moving approximations coincides with Ljp. For k¼ 2 the moving approximations simply connect neighboring points in (j, …, jþp þ 1). For example of time series x and y given in Fig. 1 it is clear that for k ¼2 the slope values in sequences MATUD2(x)¼(Bx1, …, Bx9) and MATUD2(y)¼(By1, …, By9) have the same signs, Ljp ¼(1, …, 9) is a positive sequence in SAS2(x,y)¼ (Bx1By1, …, Bx9By9) and it defines positively associated patterns in x and y (for Qr9) that are given by sequences of least square regression segments connecting all neighboring points as it is
shown in upper part of Fig. 1. In this case the positively associated patterns in x and y are represented by piecewise linear curve passing through all points. Similarly the lower figure in Fig. 1 presents negatively associated patterns. Another example of associated patterns for window size 2 applied to analysis of associations between well production data was considered in [17]. For k4 2 the associated patterns have not so easy graphic representation as for k ¼2. It will be a sequence of moving approximations. Examples will be considered in the following section.
7. Analysis of positively and negatively associated local trend patterns in financial time series An application of the method of recognition and representation of associated patterns in time series discussed in previous sections for highly oscillating data, for example for time series of daily prices of securities in stock market, requires to smooth time series data and/or to consider local trends for windows larger than k ¼2 that also smooth data fluctuations. Here we propose the method of visualization of positively and negatively associated patterns in time series when MAT applied for windows with size greater than 2. This method uses the separate representation of positively and negatively associated patterns in time series. Fig. 2 presents an example of several normalized time series of security prices from Google Finance [19] after smoothing by moving average (w ¼5). Fig. 3 presents comparative analysis of BBRY (BlackBerry) and AAPL (Apple) time series where the least square regression segments corresponding to window size k ¼30 are presented. The positively associated trend segments are presented at the top of Fig. 3 and negatively associated trend segments of these time series are presented on the bottom of Fig. 3. It is interesting to note that the Up–Down association measure (6) for these local trends of time series in considered time period equals to zero: AUDk(x,y)¼0, because exactly one half of local trends of time series are positively associated (Npos ¼111) and another half is negatively associated (Nneg ¼111). The positively associated local trends are located mostly on the left side of the domain (see two upper charts in Fig. 3). The negatively associated local trends are located mostly on the right side of the domain (see two lower charts in Fig. 3). The right parts of these charts are presented also in Figs. 4 and 5. From Fig. 4 one can see that only 10 local trends (11%) in considered period are positively associated and 79 local trends (89%) are negatively associated. Fig. 5 depicts the initial points of positively (two upper charts) and negatively (two charts on the bottom) associated local trends presented in Fig. 4 and defined by positive and negative sequences Ljp. One can see large negative sequences with length 50 and 29 in two charts on the bottom of Fig. 5. An analysis of Figs. 3–5 gives possibility to support the hypothesis that BBRY and APPL companies are rival companies due to the large number of negatively associated local trends of considered time series during the large period. Note that the change of the security prices influenced by many events. Some of them are the launches of new products. Apple launched new 16 GB iPod Touch on May 30, 2013 (near point 71 on the time scale in Figs. 2 and 3). One can see from Fig. 2 that after a short time the price of AAPL is started to increase. BlackBerry launched Z30 in USA on November 14 (approximately equal to 189 on the time scale in Figs. 2–5). Around this date the negatively associated local trends of two companies are changed on positively associated trends (see 2 charts at the top of Figs. 4 and 5) and later again on negatively associated trends but already with increasing trends for BBRY and decreasing trends for APPL (see 2 charts on the bottom of Figs. 4 and 5). The similar relationships of events like launching of new products with the change of the trends and with the change of the type of associations between trend patterns of
I. Batyrshin et al. / Neurocomputing 175 (2016) 924–934
929
Fig. 2. Example of normalized Google Finance data after smoothing by moving average (w¼5).
Fig. 3. Positively (two charts at the top) and negatively (two charts on the bottom) associated moving approximations of BBRY and AAPL data in sliding window of size k ¼ 30.
security prices of rival companies can give useful information for portfolio optimization and market forecast [24]. Fig. 6. depicts another example, similar to Fig. 3, of comparative analysis of ERIC and AAPL time series. As we can see, these time series also have a large number of negatively associated local trends in the second part of all time period. It gives rise to suppose that these two companies are rival companies. Such information can be useful in portfolio optimization tasks. 8. Comparative analysis of exchange rates Compare our method with the method of local correlations considered in [28] and similar to our approach based on local
trend associations. The authors address the problem of capturing and tracking local correlations among time series generalizing the notion of linear cross-correlation. The method proposed produces a general similarity score, which evolves over time, and reflects the changing relationships. The authors consider examples of capturing of local patterns or trends. To illustrate the approach, the authors of [28] used ExRates data from the UCR Time Series Data Mining Archive [33]. There are 2567 (work-) daily spot prices (foreign currency in dollars) for 12 currencies, over the period of about 10 years from 10/9/86 to 8/9/96. The time series of currencies are presented in Fig. 7, where the following codes of currencies are used:
930
I. Batyrshin et al. / Neurocomputing 175 (2016) 924–934
Fig. 4. The right part of Fig. 3 with high number of negatively associated moving approximations. Two negatively associated patterns in lower charts are separated by positively associated patterns shown in upper charts.
Fig. 5. Initial points of moving approximations from positively (two upper charts) and negatively (two charts on the bottom) associated patterns shown in Fig. 4.
1-AUD Australian Dollar; 2-BEF Belgian Franc; 3-CAD Canadian Dollar; 4-FRF French Franc; 5-DEM German Mark; 6-JPY Japanese Yen; 7-NLG Dutch Guilder; 8-NZD New Zealand Dollar; 9-ESP Spanish Peseta; 10-SEK Swedish Krone; 11-CHF Swiss Franc; 12GBP UK Pound. The paper [28] has demonstrated their method on two time series: 4-FRF and 9-ESP. The global cross-correlation coefficient of these time series is 0.30, which is statistically significant (exceeding the 95% confidence interval of 70.04) [28]. But as we mentioned in Section 3, the correlation coefficient does not effective in measuring of shape associations and shape similarity. For example the correlation between French Franc (4-FRF) and Japanese Yen (6-JPY) equals to 0.62. It is clear that the shape of the curve of French Franc is more similar to the shape of the curve of Spanish Peseta than to the curve of Japanese Yen but the correlation coefficient says the opposite. The local trend association measure lta20 for window size 20 gives for these time series the following association values: lta20(4-FRF, 9-ESP) ¼0.88 and lta20(4FRF, 6-JPY) ¼0.54. These values more correspond to visual perception about similarity of the corresponding time series.
Generally, the following class of currencies with high association lta20 of the corresponding time series has been obtained: {2BEF, 4-FRF, 5-DEM, 7-NLG, 11-CHF} with mutual positive associations lta20 greater than 0.92, and the 9-ESP has positive associations with them greater than 0.77. Other two European countries 10-SEK and 12-GBR are joined with the first 6 European countries on association level 0.74. Japan joined with European countries on association levels 0.53–0.58. Another cluster contains 1-AUD and 8-NZD positively associated on level 0.47. The 3-CAD has maximal association value 0.29 with 1-AUD. As we can see these association values correspond to our perception about similarity of the shapes of time series and the obtained clustering based on the local trend association measure has natural interpretation because reflects the geographical location of the countries. Note that the correlation coefficients 0.30 between 4-FRF and 9-ESP in spite of high statistical significance “exceeding the 95% confidence interval of 70.04” does not useful for analysis of shape associations between these time series. The authors of [28] have introduced local correlation measure called LoCo. They applied this measure used for sliding window size 20 for comparative analysis of time series 4-FRF and 9-ESP.
I. Batyrshin et al. / Neurocomputing 175 (2016) 924–934
931
Fig. 6. Positively (two charts at the top) and negatively (two charts on the bottom) associated moving approximations of ERIC and AAPL data in sliding window of size k ¼30.
Fig. 7. The spot prices (foreign currency in dollars) for daily exchange rates of 12 currencies, over the period of about 10 years from 10/9/86 to 8/9/96.
The authors announced that “LoCo score successfully and accurately tracks evolving local correlations, even when the series are widely different in nature.” Based on correlation 0.30 between the corresponding time series it is possible to conclude that they are “widely different in nature” but as it was shown above they have very similar shapes and local trend association value lta20(4-FRF, 9-ESP) ¼0.88 also confirms the high positive association between them. In [28] it was announced also that “most major European Monetary Union (EMU) events are closely accompanied by a break
in the correlation as measured by LoCo, and vice versa.” For us it was difficult to judge based on figures from [28] about this close relationships. Really it is very interesting problem to establish associations between events (underlying financial news) and the change in financial time series [24]. Here we propose instead of correlation breaks considered in [28] to describe local associations between time series by moving approximations, linear regressions calculated in sliding window. Fig. 8 depicts positively (at the top) and negatively (on the bottom) associations between time series 4-FRF and 9-ESP calculated for
932
I. Batyrshin et al. / Neurocomputing 175 (2016) 924–934
Fig. 8. The positively associated 2272 (90%) (at the top) and negatively associated 276 (10%) (on the bottom) moving approximations (shown by bold) in sliding window of size 20 of time series 4-FRF and 9-ESP.
Fig. 9. The initial points of positively associated (at the top) and negatively associated (on the bottom) of moving approximations (shown by bold) in sliding window of size 20 of time series 4-FRF and 9-ESP.
window size 20. Linear regressions are shown by black bold lines. From total 2548 linear regressions only 276 are negatively associated hence only about 10% of all local trend associations are negative and 90% are positive. In Fig. 9 (on the bottom) it is possible to see initial points of negatively associated moving linear regressions of time series 4FRF and 9-ESP. These points often located near the points where the local trends change the sign. For this reason they can be associated with business events related with such change. The change in signs of local trends obtained by local trend association measure can replace a break in the correlation as measured by LoCo. Note that the change of positive local trends on negative and vice versa that can be observed in our method has natural and simple interpretation and can be calculated efficiently.
9. Discussion and conclusion The main contributions of the paper are the following. The new definition of the class of time series shape association measures suitable for measuring positive and negative associations between time series is proposed. The novelty of this definition in comparison with our previous works consists in consideration in explicit form of the subset V of the set of time series where the association measure can be defined. The properties of this subset have been studied for each association measure considered in this paper. The importance of consideration of such set V follows from the fact that generally the properties of reflexivity and inverse reflexivity are contradictive for some x, for example, when x¼ 0, when x is a sample with equal values or when x is a constant time series if we say about time series. Usually such
I. Batyrshin et al. / Neurocomputing 175 (2016) 924–934
situations are avoided in applications but from the mathematical point of view when we define a class of functions satisfying some properties we need to describe explicitly the domain of these functions to avoid the contradictions like above. The Definition 1 gives the new definition of time series shape association measure on a subset V of time series. The Proposition 1 studies the general properties of this set V. The Proposition 2 studies the properties of this set for local trend association measure ltak(x,y) considered in [12] and also used in the paper. Proposition 3 studies the properties of this set for qualitative local trend association measure measuring only the signs of local trends. Proposition 4 proves the monotone invariance of this qualitative measure. This property of monotone invariance generally can be added to the axioms of shape association measure to define a class of qualitative shape association measures. We hope to consider in future such class of association measures. Another contribution of the paper consists in introducing the methods of extraction and visualization of positively and negatively associated patterns in pairs of time series in the sliding window with the size greater than or equal to 2. These methods can be used for extraction both similar (positively associated) patterns as it is proposed in many works on time series data mining and also for extraction of negatively associated patterns. The discovering of such patterns can be useful, for example, in petroleum industry in analysis of communications between wells [17], in analysis of associations between meteorological variables and atmospheric pollutants [23], in finance in comparative analysis of rival companies and in portfolio optimization in Pairs Trading. Examples are considered in Sections 7 and 8 of the paper. The similarity between time series or time series patterns is exploited in motif discovery and in outlier detection [25,26]. For this reason some results of our work can be used in these tasks. The mutual associations can be applied in choosing pairs of stocks that historically move together for Pairs Trading [31,32]. The measures and methods discussed in our paper give regular methods for finding such pairs of stocks that historically move together. Using our approach the concept of “pair of stocks” can be extended from stocks with small distance between them to stocks with similar (or positively associated) shapes that move synchronously and to stocks of rival companies with inverse (opposite) movements. The considered methods of local trend associations are similar in part with the works on local correlations [27–30]. In Section 8 we have done the comparison with the results of the paper [28] on dataset from [33] used in this work. We show that our methodology of analysis of time series based on local trend association measures outperforms the methodology of the compared approach based on local correlations. The benchmark example considered in Section 3 and the analysis of real data in Section 8 show that in spite of correlation coefficient satisfies to axioms of time series shape association measure and in spite of its fundamental role in statistics it is not useful and can cause confusion in analysis of time series shape similarity and shape associations. Discovered associated patterns can be related with some events [24] like launching of new product. The monitoring of time series for detecting not only specific patterns but also associated patterns in different time series describing the behavior of a complex system can help to understand existing relationships between elements of the analyzed systems [22], to forecast the system behavior, to make decisions related with portfolio optimization etc.
933
Acknowledgments This work was partially supported by IPN project SIP 20151589, by RFBR project 15-01-06456 and by the Russian Government Program of competitive growth of Kazan Federal University.
References [1] R. Agrawal, C. Faloutsos, A. Swami, Efficient similarity search in sequence databases, In: Proceedings of the Fourth Internernational Conference Foundations of Data Organization, Springer, 1993, pp. 69–84. [2] R. Agrawal, K.-I. Lin, H.S. Sawhney, K. Shim, Fast similarity search in the presence of noise, scaling, and translation in time-series databases, In: Proceedings of the 21st International Conference Very Large Databases, Morgan Kaufmann, 1995, pp. 490–501. [3] K. Buza, A. Nanopoulos, L. Schmidt-Thieme, Fusion of similarity measures for time series classification, Hybrid Artificial Intelligent Systems, Springer (2011), p. 253–261. [4] G. Das, D. Gunopulos, Time series similarity and indexing, Handbook on Data Mining, Lawrence Erlbaum Associates (2003), p. 279–304. [5] T.-C. Fu, A review on time series data mining, Eng. Appl. Artif. Intell. 24 (2011) 164–181. [6] D.Q. Goldin, P.C. Kanellakis, On similarity queries for time-series data: constraint specification and implementation, In: Proceedings of the 1995 Internernational Conference on the Principles and Practice of Constraint Programming, Springer, 1995, pp. 137–153. [7] J. Kacprzyk, A. Wilbik, S. Zadrozny, Linguistic summarization of trends: a fuzzy logic based approach, In: Proceedings of the 11th International Conference on Information Processing and Management of Uncertainty in Knowledgebased Systems, IPMU’06, 2006, pp. 2166–2172. [8] M. Last, A. Kandel, H. Bunke, Data Mining in Time Series Databases, Machine Perception and Artificial Intelligence, vol. 57, World Scientific, 2004. [9] T.W. Liao, Clustering of time series data – a survey, Pattern Recognit. 38 (2005) 1857–1874. [10] C.S. Möller-Levet, F. Klawonn, K.H. Cho, O. Wolkenhauer, Fuzzy clustering of short time-series and unevenly distributed sampling points, In: Procedings of the 5th International Symposium on Intelligent Data Analysis, IDA’03, 2003, pp. 330–340. [11] D. Rafiei, A.O. Mendelzon, Querying time series data based on similarity, IEEE Trans. Knowl. Data Eng. 12 (2000) 675–693. [12] I. Batyrshin, R. Herrera-Avelar, L. Sheremetov, A. Panova, Moving approximation transform and local trend associations in time series data bases, In: Perception-based Data Mining and Decision Making in Economics and Finance. Studies in Computational Intelligence, vol. 36, Springer, pp. 55–83, 2007. [13] I. Batyrshin, L. Sheremetov, J.X. Velasco-Hernandez, On axiomatic definition of time series shape association measures, In: Proceedings of the Workshop on Operations Research and Data Mining, ORADM 2012, 2012, pp. 117–127. [14] I. Batyrshin, Constructing time series shape association measures: Minkowski distance and data standardization, In: Proceeding sof BRICS CCI 2013 〈http:// arxiv.org/pdf/1311.1958v3〉, 2013. [15] I. Batyrshin Association measures and aggregation functions, In: Proceedings of Advances in Soft Computing and its Applications, LNCS, vol. 8266, Springer, pp. 194–203, 2013. [16] I. Batyrshin, Association measures on sets with involution and similarity measure, In: Proceedings of the 4th World Confernce Soft Computing, Berkeley, California, 25–27 May 2014. [17] I. Batyrshin, Up and down trend associations in analysis of time series shape association patterns, In: Proceedings of MCPR 2012, LNCS, vol. 7329 Springer, pp. 246–254, 2012. [18] I. Batyrshin, V. Solovyev, Positive and negative local trend association patterns in analysis of associations between time series, In: Proceedings of MCPR 2014, LNCS, vol. 8495, Springer, pp. 92–101, 2014. [19] Google Finance, Historical prices, Dates: Feb 19, 2013–Feb, 14, 2014 〈http:// www.google.com/finance〉. [20] C. Chatfield, The Analysis of Time Series: An Introduction, Chapman and Hall, 1984. [21] S. Chatterjee, A.S. Hadi, Regression Analysis by Example, John Wiley & Sons, 2013. [22] Batyrshin, I., Bulgakov, I., Hernandez, A.-L., Huitron, C., Chi, M., Raimundo, A., Cosultchi, A. VMD-Petro: visualization and data mining tool for oilfields, In: Proceedings of Workshop on Operations Research and Data Mining, ORADM 2012, pp. 140–148, 2012. [23] V. Almanza, I. Batyrshin, On trend association analysis of time series of atmospheric pollutants and meteorological variables in Mexico city metropolitan area, LNCS, vol. 6718, Springer (2011), p. 95–102. [24] V. Lavrenko, M. Schmill, D. Lawrie, P. Ogilvie, D. Jensen, J. Allan, Mining of concurrent text and time series, In: Proceedings of KDD-2000 Workshop on Text Mining, pp. 37–44, 2000. [25] F. Martínez-Álvarez, A. Troncoso, J.C. Riquelme, J.S. Aguilar-Ruiz, Discovery of motifs to forecast outlier occurrence in time series, Pattern Recognit. Lett. 32 (12) (2011) 1652–1665.
934
I. Batyrshin et al. / Neurocomputing 175 (2016) 924–934
[26] A. Mueen, E.J. Keogh, Q. Zhu, S. Cash, M.B. Westover, Exact discovery of time series motifs, In: Proceedings of SDM, pp. 473–484, 2009. [27] A. Mueen, S. Nath, J. Liu, Fast approximate correlation for massive time-series data, In: Proceedings of 2010 ACM SIGMOD International Conference Management of Data, ACM, pp. 171–182, 2010. [28] S. Papadimitriou, J. Sun, P.S. Yu, Local correlation tracking in time series, In: Proceedings of the IEEE Sixth Internat. Conf. Data Mining, ICDM'06, pp. 456– 465, 2006. [29] Y. Sakurai, S. Papadimitriou, C. Faloutsos, BRAID: stream mining through group lag correlations, In: Proceedings of SIGMOD, 2005. [30] Y. Zhu, D. Shasha, StatStream: Statistical monitoring of thousands of data streams in real time, In: Proceedings of VLDB, 2002. [31] G. Vidyamurthy, Pairs Trading: Quantitative Methods and Analysis, vol. 217, John Wiley & Sons, 2004. [32] B. Do, R. Faff, K. Hamza, A new approach to modeling and estimation for pairs trading, In: Proceedings of the 2006 Financial Management Association European Conference, 2006. [33] E. Keogh, T. Folias, The UCR Time Series Data Mining Archive, University of California-Computer Science & Engineering Department, Riverside, CA, 2002.
Ildar Batyrshin received Diploma of Moscow Physical– Technical Institute in 1975; Ph.D. from Moscow Power Engineering Institute in 1983; Dr. Sci. (Habilitation) from Higher Attestation Committee of Russian Federation in 1996. Since 1975 he has been with the Department of Informatics and Applied Mathematics of Kazan State Technological University, Russia, (as a Department head in 1997–2003). Since 1999, he has been also with the Institute of Problems of Informatics of Academy of Sciences of the Republic of Tatarstan, Russia as a leading researcher. In 2003–2014, he has been with Mexican Petroleum Institute as a leading researcher and a head of the project. He is now with the Center of Computing Research of the National Polytechnic Institute of Mexico. He is a Past President of Russian Association for Fuzzy Systems and Soft Computing, member of the International Fuzzy Systems Association (IFSA) Council, member of the Council of the Mexican Society of Artificial Intelligence, Senior member of IEEE, Honorary Professor of Budapest Tech and Honorary Researcher of the Republic of Tatarstan, Russia. His areas of research activity are Fuzzy Logic, Expert Systems, Decision Making, Cluster Analysis, Time Series Data Mining, Measures of Association, Social Network Analysis.
Valery Solovyev did his research with the Higher School of Information Technologies and Information Systems, University of Kazan, Russia. He graduated with a Doctor of Science in Computer Science degree at the Russian Academy of Sciences in 1995. He is currently working as a Senior Researcher. His research interests include data mining, computational linguistics, cognitive science.
Vladimir Ivanov is a head of Intelligent Search Technologies Department at Kazan Federal University, Russia. In 2005 he has graduated from Kazan Federal University, and got Ph.D. degree in 2009. His field of interest includes research and development of information extraction, data mining and knowledge engineering systems.