Physica A 287 (2000) 429– 439
www.elsevier.com/locate/physa
The entropy as a tool for analysing statistical dependences in nancial time series Georges A. Darbellaya; b; ∗ , Diethelm Wuertzc
a Laboratoire
de traitement des signaux, Ecole polytechnique fÃedÃerale (EPFL), CH-1015 Lausanne, Switzerland b Ustav Ä Pod vodÃarenskou vÄezÄÃ 4, Ã teorie informace a automatizace, AV CR, CZ-182 08 Prague, Czech Republic c Institut fuer theoretische Physik, Eidgenoessische Hochschule (ETH), CH-8093 Zuerich, Switzerland Received 8 May 2000; received in revised form 9 June 2000
Abstract The entropy is a concept which may serve to de ne quantities such as the conditional entropy and the mutual information. Using a novel algorithm for the estimation of the mutual information from data, we analyse several nancial time series and demonstrate the usefulness of this new c 2000 approach. The issues of long-range dependence and non-stationarity are discussed. Elsevier Science B.V. All rights reserved.
1. Introduction The entropy was introduced in thermodynamics by Clausius in 1865. Later, around 1900, within the framework of statistical physics established by Boltzmann and Gibbs, it came to be understood as a statistical concept. Around the middle of the 20th century, it found its way in engineering and mathematics, most notably through the works of Shannon in communication engineering and of Kolmogorov in probability theory and dynamical systems theory. Concepts based on the entropy, such as the conditional entropy or the mutual information, are well suited for studying statistical dependences in time series [1–3]. By statistical dependences we mean any kind of statistical correlations, not only linear correlations. Traditionally, nonlinear correlations have been studied with higher order moments (or cumulants). The entropy, in its ability of capturing a stochastic ∗
Correspondence address: Laboratoire de traitement des signaux, Ecole polytechnique federale (EPFL), CH-1015 Lausanne, Switzerland.
c 2000 Elsevier Science B.V. All rights reserved. 0378-4371/00/$ - see front matter PII: S 0 3 7 8 - 4 3 7 1 ( 0 0 ) 0 0 3 8 2 - 4
430
G.A. Darbellay, D. Wuertz / Physica A 287 (2000) 429– 439
relationship as a whole, and not necessarily as a sum of moment contributions, opens another route. Quite often, statistical studies which use the entropy assume that the variables of interest are discrete, or may be discretised in some straightforward manner (e.g. Ref. [4], for the case of nancial data). The statistical estimation is obviously far easier when dealing with data taking discrete values, but this restricts the range of problems to which this approach may be applied. In many situations it is desirable to address the continuous case, and this is what we will do in this contribution. This means that we will work with random variables or processes taking continuous values. These random variables may be vector-valued. The estimator we will use, for our data analysis in Section 4, is described in detail in Refs. [2,5 –7]. In mathematical nance the use of continuous processes is indeed widespread. The random walk, and its continuous-time analogue, i.e., Brownian motion, have become cornerstones of nancial modelling. Such models are based on the assumptions that the increments of the prices, namely the returns, are independent and that they obey a Gaussian distribution. Such Gaussian processes make it possible to construct an elegant theory, which is today widely used, despite the fact that it is only a rst approximation [8]. Many empirical studies have shown that nancial returns do not follow a Gaussian process, and the pioneering work of Benoit Mandelbrot in this area is particularly well known (e.g. Ref. [9]). One way of showing that nancial returns cannot be governed by a Gaussian process is to look at their scaling law. For a Gaussian process, the average of the uctuations within a given time interval is proportional to the square root of that time interval. If p(t) is the price at time t, and rt (t) = log p(t + t) − log p(t) the log-return at time t over some interval t, then one investigates the scaling law h|rt (t)|q i ∼ t (q)
(1)
with q = 1 and where h:i denotes the time average. For nancial returns, the exponent (1) is however often dierent from 1=2. The case of the exchange rate between the US Dollar (USD) and the German Mark (DEM) is shown in Fig. 1. To make matters yet more complicated, the exponent is not stable over the years. This is illustrated in Fig. 2. The data are quotes averaged over 30 min periods between October 1992 and May 1997. The volatiliy time, denoted as the upsilon time v, is a monotonous transformation of the physical time: the period is shorter than 30 min when the volatility is high and longer than 30 min when the volatiliy is low (e.g. during the weekends) [10]. This kind of intrinsic time, to which we come back in Section 4, is also referred to as the operational time, since heavy trading results in higher activity [11,12]. As we can see, in both time scales the exponent is not 1=2. The exponent in the scaling law depends on the power q ¿ 0 to which the absolute returns have been raised. For certain processes it is possible to derive the relation between and q [13]. For Levy processes of index 0 ¡ 62, we have (q) = q= if q ¡ and (q) = 1 if q¿. Gaussian processes correspond to the case = 2. Thus,
G.A. Darbellay, D. Wuertz / Physica A 287 (2000) 429– 439
431
Fig. 1. Double-logarithmic plot of the mean absolute return versus the time interval t, over which the return was calculated (see Eq. (1)).
some data generating process can only be a Levy process if the last relation is true for all values of q. The scaling analysis, as outlined above, leads to the conclusion that the central limit theorem (CLT) does not apply to nancial returns. This conclusion may also be reached by looking at histograms and realising that the returns follow heavy-tailed distributions. To understand why the CLT is inapplicable, it is important to check whether the returns are sequentially independent or not. Studies of the scaling behaviour, however, do not provide much of an answer about the (in)dependence of the increments. Their lack of independence is usually studied by looking at the linear correlations of some power q of the absolute (log-)returns. In virtually all such investigations, with the exception of Ref. [14], only the powers q = 1; 2 have been analysed (e.g. Refs. [12,15]). Again, such analyses remain incomplete unless a whole range of q-values is investigated. The approach based on the entropy proceeds along a more direct avenue. It does not require to consider all powers q. In the next section, we provide some theoretical background on our approach. The word prediction is obviously connected to the fact that dependence means predictability. In Section 3 we consider the limiting case of linear prediction theory. Two nancial time series are analysed in Section 4, and our conclusions are summarised in Section 5.
432
G.A. Darbellay, D. Wuertz / Physica A 287 (2000) 429– 439
Fig. 2. Values of the scaling exponent (1) = 1=E calculated over a rolling 12 months window, which is shifted by 2 weeks for each new value.
2. Entropy and statistical dependences 2.1. Entropy and information The Boltzmann–Gibbs–Shannon entropy h(X ) of a continuous random variable X with probability density function fX , and taking its values in Rd , is Z (2) h(X ) = − fX (x) log fX (x) d x : It has the advantage, over some other types of entropies, of satisfying many desirable properties, which were found to be important for a great variety of physical or engineering systems. Before proceeding further, we note that the integration in (2) is carried over the whole support of the function fX , and that log denotes the logarithm, which will be taken to be the natural logarithm. As a result the entropy will be measured in nats. If instead of using the base e one uses the base 2, then the entropy will be expressed in bits. It is straightforward to transform one into the other: h[nats]=ln(2) h[bits], where ln =loge . The entropy (2) is called the dierential entropy and it shares many properties with the entropy of a discrete random variable but not all of them [16,17]. In particular, it is not necessarily positive and it is not invariant under a one-to-one transformation of the random variable X .
G.A. Darbellay, D. Wuertz / Physica A 287 (2000) 429– 439
433
Consider now a second random variable Y with probability density fY . The joint probability density of X and Y will be fX; Y , and the joint entropy of X and Y is de ned as Z Z (3) h(X; Y ) = − fX; Y (x; y) log fX; Y (x; y) d x dy : Again the integration is done over the whole support of the function fX; Y . The dierence Z Z fX; Y (x; y) d x dy (4) h(Y |X ) = h(X; Y ) − h(X ) = − fX; Y (x; y) log fX (x) de nes the conditional entropy. Similarly h(X |Y ) = h(X; Y ) − h(Y ). The dierence I (X ; Y ) = h(Y ) − h(Y |X ) = h(X ) + h(Y ) − h(X; Y ) Z Z fX; Y (x; y) d x dy = fX; Y (x; y) log fX (x) fY (y)
(5)
is the mutual information between X and Y . It is a measure of the dependence between X and Y . It satis es I (X ; Y )¿0
(6)
with equality if and only if X and Y are independent. This follows from the inequality, log u6u − 1 for u ∈ R¿0 . Equality in the previous inequation and in (6) is achieved if and only if, respectively, u = 1 or fX; Y (x; y) = fX (x) fY (y) ∀ x; y. If X and Y are independent, then from (5) and (6), we have h(Y |X ) = h(Y ). Note that, in the (in)equations above, X and Y may be understood as random vectors, i.e., vectors of random variables. 2.2. Estimation from data The diculty in calculating the mutual information from empirical data lies in the fact that the relevant probability density functions are unknown. One standard way is to approximate the densities by means of histograms. However, imposing some arbitrary histogram will not do. In general, it will lead to gross underestimation or overestimation, depending on the particular distribution governing the data set. What we need is an adaptive histogram, i.e., a histogram that is able to adapt itself to any (joint) probability density as well as possible. Fortunately, there is a general de nition of the mutual information based on partitions, and this provides a way of building an adaptive histogram. A ( nite) partition of Rd is any nite system of non-intersecting subsets of Rd , whose sum is the whole of Rd . The subsets are often called the cells of the partition. For obvious reasons, in practice one works with rectangular cells. In other words, the cells are hyperrectangles in Rd . We will denote them as Ck = Ak × Bk , where Ak is the orthogonal projection of Ck onto the space where X takes its values, and Bk the
434
G.A. Darbellay, D. Wuertz / Physica A 287 (2000) 429– 439
projection of Ck onto the space where Y takes its values. It can be shown that the mutual information is a supremum over partitions [18], I (X ; Y ) = sup
{Ck }
X k
PX; Y (Ck ) ln
PX; Y (Ck ) : PX (Ak )PY (Bk )
(7)
Here, {Ck } denotes a partition made of cells Ck , PX; Y (Ck ) is the probability that the pair (X; Y ) takes its values in the cell Ck , PX (Ak ) the probability that X takes its values in Ak and PY (Bk ) the probability that Y takes its values in Bk . It can also be shown that by constructing a sequence of ner and ner partitions, the corresponding sequence of “mutual informations” will monotonously increase. It will stop increasing when conditional independence is achieved on all cells of the partition. Thus, by testing for independence we can decide when to stop a (recursive) partitioning scheme. As a criterion one may use any independence test, e.g. the X2 statistic. Full details, including the computer code, are to be found in Refs. [2,5 –7].
3. Application to ÿnancial time series For a long time, most economists considered nancial returns to be independent, at least for all practical purposes. Then, during the course of the 1980s, it became widely known that the (linear) autocorrelations of their squares or their absolute values show some kind of “long-range dependence” eect (for a review, see e.g. Refs. [19]). Finding an explanation for this eect is not easy, and we will return to it below. On the other hand, the (linear) autocorrelations of the returns are not statistically dierent from zero, except possibly at very short time lags. Most observers, though not all of them, regard this putative dependence as insucient for making riskless pro ts, and would thus not try to beat the market by using such autocorrelations. We will now demonstrate that the mutual information provides a concise approach to the question of dependences in the returns, and in their volatilities. Two time series will be considered. The rst one is formed by the log-returns of the exchange rate between the US Dollar (USD) and the German Mark (DEM). This time series, which contains more than 81 788 points recorded at intervals of approximately 30 min, extends from October 1992 to May 1997. The second time series is made of the log-returns of the Dow Jones (DJ) industrial stock index from the New York Stock Exchange. These 24 277 daily data records cover a period from 1901 to 1998. For the exchange rate time series, we have used, as in Section 1, the v-time, whose clock runs faster during periods of high volatility [10]. High volatile market periods are enlarged and less volatile market periods are shortened. This type of transformation, whose aim is to reduce market seasonalities, was introduced by the Olsen group [11], who used a yearly averaged time scale to de ne the so-called Â-time. The closely related v-time is de ned by means of a weekly averaged time scale instead, and its calculation is simpler [10].
G.A. Darbellay, D. Wuertz / Physica A 287 (2000) 429– 439
435
Fig. 3. Mutual information functions hI (r(t); r(t − ))i (full line) and hI (|r(t)|; |r(t − )|)i (dashed line) as a function of the time lag . r(t) denote the returns of the USD–DEM foreign exchange rate and h:i the averaging over t. The 336 lags cover one week. The two functions are virtually undistinguishable.
Since we have a single series of measurements across time, the mutual information is calculated as a time average. The interpretation of such statistical estimates depends on whether or not the time series is stationary and/or ergodic. Fig. 3 shows two autoinformation curves for the USD–DEM returns. The rst curve (full line) is the mutual information hI (r(t); r(t −))i, where is the time lag and r(t) the return at time t. The second curve (dashed line), hI (|r(t)|; |r(t − )|)i, is virtually identical to the rst curve, except maybe for the lag 1. The periodicity is caused by the remaining seasonality. There are 10 daily peaks during the 2 weeks (the weekends have been washed out as there is virtually no market activity during Saturdays and Sundays). We recall that the mutual information is invariant with respect to bijective transformations (unlike the linear correlation which is invariant with respect to linear transformations only). As a result, hI (|r(t)|q ; |r(t − )|q )i = hI (|r(t)|; |r(t − )|)i for all q ¿ 0 :
(8)
We can thus conclude that, except maybe for the rst lag, all the information about the present returns contained in some past returns is carried by the amplitudes of the returns and not by their signs. The same conclusion applies to the Dow Jones returns, except for a small dierence between hI (r(t); r(t − ))i and hI (|r(t)|; |r(t − )|)i for the rst ve days ( = 1; : : : ; 5). Are the values of hI (r(t); r(t −))i and hI (|r(t)|; |r(t −)|)i stable through time (windows)? To check this we recalculated these time averages over a rolling window, both for the USD–DEM as well as for the Dow Jones time series. If this is done for a xed lag one obtains a graph which is similar in spirit to Fig. 2 or Fig. 7. In other words, the mutual information is not stable through time. This fact supports the view that these time series are not stationary. This could also explain the long-range dependence that can be seen in Fig. 3 or Fig. 4: the decline of the autoinformation functions is
436
G.A. Darbellay, D. Wuertz / Physica A 287 (2000) 429– 439
Fig. 4. Mutual informations hI (s(t); s(t − ))i as a function of the time lag , where the s(t) is the volatility of the Dow Jones stock index returns.
not exponential but hyperbolic. One can construct stationary processes with long-range dependence, by means of the so-called fractional processes. However, non-stationarity may also cause long-range dependence. As a simple illustration, consider a stationary time series with a low dispersion (which could be measured by the variance) and another stationary time series with a high dispersion. Let each series have a length N=2. We assume them to obey one of these numerous processes with exponential decline in the (linear) autocorrelation function or the autoinformation function (e.g. some autoregressive process). Now, append the two processes together and consider them as a single process. Obviously, under the condition that the lag separating two data points is suciently smaller than N , the data points belonging to the part with low variance will often be paired, and those belonging to the part with high variance also. This will induce long-range dependence. The volatility is a measure of the variability of a random variable through time. Any power q of the absolute returns could do. In our case one value suces, see (8), and we will use q = 1. We thus consider sm (t) =
1 m
t X
|r(i)| ;
(9)
i=t−m+1
where m is the length of the window for the calculation of the volatility. The mutual information function displays a very clear hyperbolic decline. The case of the Dow Jones stock index is shown in Fig. 4. A curve with a similar shape was obtained for the USD–DEM exchange rate. However, as for the returns, the mutual information function is not stable through time windows. At this stage one may ask whether the dependence in the volatility could be of any help in predicting the returns. To this end we considered the mutual information between, on one hand, the return at time t, and, on the other hand, the return at time t − 1 and the volatility at time t − 1. In Figs. 5
G.A. Darbellay, D. Wuertz / Physica A 287 (2000) 429– 439
437
Fig. 5. Mutual informations hI (r(t); r(t − 1); sm (t − 1))i (full line), hI (|r(t)|; |r(t − 1)|; sm (t − 1))i (dashed line), hI (r(t); sm (t − 1))i (dashdotted line) and hI (|r(t)|; sm (t − 1))i (dotted line) for the USD–DEM foreign exchange. They are displayed as the a function of the length m of the window for calculating the volatility.
Fig. 6. Same as Fig. 5 but for the Dow Jones stock index.
and 6 we show hI (r(t); r(t −1); sm (t −1))i (full line) and hI (|r(t)|; |r(t −1)|; sm (t −1))i (dashed line) as function of the length of the volatility window m. It can be seen that the two curves track each other quite well. So, there is again very little information that could be exploited for predicting the sign of the return. The quantities hI (r(t); sm (t−1))i (dashdotted line) and hI (|r(t)|; sm (t − 1))i (dotted line) are also shown. For the USD– DEM time series, the inclusion of the volatility as a second input does improve the predictability of the absolute value of the returns. For the Dow Jones time series, this eect is very small.
438
G.A. Darbellay, D. Wuertz / Physica A 287 (2000) 429– 439
Fig. 7. Evolution of the mutual information hI (r(t); r(t − 1); sm (t − 1))i, with m = 12, over a one year rolling window, for the USD–DEM time series.
In Fig. 7 the behaviour of hI (r(t); r(t − 1); sm (t − 1))i, with m = 12, over a one year rolling window is displayed. A similar picture was obtained for the Dow Jones time series. The mutual information is not stable through time. It may even sometimes change quite drastically over some fairly short period.
4. Conclusions We demonstrated that ideas revolving around the entropy may be sensibly applied to nancial time series. The decisive advantage of this approach resides in its ability to account for nonlinear dependences. Above, we have illustrated the method by answering the following questions: • Are the returns statistically independent? – No. Is there any information in the signs of the returns? – No, all the information is contained in their absolute values, except possibly for very short time lags. • Are the volatilities statistically dependent? – Yes. Can the volatilities help predict the returns? – No, except possibly for very short time lags, but the eect is so small that it is probably useless. • Are the (long-range) dependences in the absolute values of the returns or in the volatilities stable over time windows? – No, and a partial explanation of this seems to be that nancial time series show some kind of non-stationarity. Concepts and methods of statistical physics are increasingly being applied to economics, to the point that a new word was coined, econophysics [20]. The process need not be one-directional. It is worth noting that power-law scaling or Brownian motion were in fact known in economics before they appeared in physics.
G.A. Darbellay, D. Wuertz / Physica A 287 (2000) 429– 439
439
References [1] H.-P. Bernhard, A tight upper bound on the gain of linear and nonlinear predictors for stationary stochastic processes, IEEE Trans. Signal Process. 46 (1998) 2909–2917. [2] G.A. Darbellay, Predictability: an information-theoretic perspective, in: A. Prochazka, J. Uhlr, P.J.W. Rayner, N.G. Kingsbury (Eds.), Signal Analysis and Prediction, Birkhauser, Boston, 1998, pp. 249–262. [3] W. Ebeling, J. Freund, F. Schweitzer, Komplexe Strukturen: Entropie und Information, Teubner, Stuttgart, 1998. [4] L. Molgedey, W. Ebeling, Local order, entropy and predictability of nancial time series, Eur. Phys. J. B 15 (2000) 733. [5] G.A. Darbellay, An estimator of the mutual information based on a criterion for independence, Comput. Statist. Data Anal. 32 (1999) 1–17. [6] G.A. Darbellay, J. Franek, http://siprint.utia.cas.cz/timeseries/. [7] G.A. Darbellay, I. Vajda, Estimation of the information by an adaptive partitioning of the observation space, IEEE Trans. Inform. Theory 45 (1999) 1315–1321. [8] J.-P. Bouchaud, M. Potters, Theorie des Risques Financiers, Alea, Saclay, 1997. [9] B. Mandelbrot, A Multifractal Walk down Wall Street, Sci. Am. (February 1999) 50 –53. [10] R. Schnidrig, D. Wuertz, Investigation of the volatiliy and autocorrelation function of the exchange rate on operational time scales, ETH research report No. 95-04 (1995). [11] M.M. Dacorogna, U.A. Mueller, R.J. Nagler, R.B. Olsen, O.V. Pictet, A geographical model for the daily and weekly seasonal volatilities in the FX market, J. Int. Money Finance 12 (1993) 413–438. [12] S. Gallucio, G. Caldarelli, M. Marsili, Y.C. Zhang, Scaling in currency exchange, Physica A 245 (1997) 423–436. [13] F. Schmitt, D. Schertzer, S. Lovejoy, Multifractal analysis of foreign exchange data, Appl. Stochastic Models Data Anal. 15 (1999) 29–53. [14] Z. Ding, C.W.J. Granger, R.F. Engle, A long memory property of stock market returns and a new model, J. Empirical Finance 1 (1993) 83–106. [15] Y. Liu, P. Cizeau, M. Meyer, C.K. Peng, H.E. Stanley, Correlations in economic time series, Physica A 245 (1997) 437–440. [16] T.M. Cover, J.A. Thomas, Elements of Information Theory, Wiley, New York, 1991. [17] N.S. Jayant, P. Noll, Digital Coding of Waveforms, Prentice-Hall, Englewoods Clis, NJ, 1984. [18] R.L. Dobrushin, General formulation of Shannon’s main theorem in information theory, Uspekhi Mat. Nauk 14 (1959) 3–104 (in Russian). Translated in Am. Math. Soc. Trans. 33 (1959) 323– 438. [19] J.Y. Campbell, A.W. Lo, A. Craig MacKinlay, The Econometrics of Financial Markets, Princeton Universtity Press, Princeton, NJ, 1997. [20] J.D. Farmer, Physicists attempt to scale the ivory towers of nance, Comput. Sci. Eng. (November/ December 1999) 26–39.