Multidimensional minimal spanning tree: The Dow Jones case

Multidimensional minimal spanning tree: The Dow Jones case

Physica A 387 (2008) 5205–5210 Contents lists available at ScienceDirect Physica A journal homepage: www.elsevier.com/locate/physa Multidimensional...

491KB Sizes 2 Downloads 51 Views

Physica A 387 (2008) 5205–5210

Contents lists available at ScienceDirect

Physica A journal homepage: www.elsevier.com/locate/physa

Multidimensional minimal spanning tree: The Dow Jones caseI Juan Gabriel Brida a,1 , Wiston Adrián Risso b,∗ a

School of Economics and Management, Free University of Bolzano, Italy

b

Department of Economics, University of Siena, Italy

article

info

Article history: Received 22 December 2007 Received in revised form 11 March 2008 Available online 13 May 2008 PACS: 89.65.Gh 02.10.Ox Keywords: Symbolic time series analysis Financial asset returns Hierarchical tree

a b s t r a c t This paper introduces a new methodology in order to construct Minimal Spanning Trees (MST) and Hierarchical Trees (HT) using the information provided by more than one variable. In fact, the Symbolic Time Series Analysis (STSA) approach is applied to the Dow Jones companies using information not only from asset returns but also for trading volume. The US stock market structure is obtained, showing eight clusters of companies and General Electric as a central node in the tree. We use different partitions showing that the results do not depend on the particular partition. In addition, we apply Monte Carlo simulations suggesting that the tree is not the result of random connections. © 2008 Elsevier B.V. All rights reserved.

1. Introduction Recently Refs. [15,2,3,19] have proposed studying the structure of the financial stock market by using the Minimal Spanning Tree (MST) and the Hierarchical Tree (HT) tools. They have shown it to be useful in detecting clusters and taxonomic relations in a set of elements. The distance defined by Ref. [9] has to be used in order to construct these trees. This distance is based on the Pearson correlation coefficient. However, other distances can be applied, Ref. [4] introduce a symbolic method which gives more flexibility to this methodology by using Symbolic Analysis. In fact, a useful thing in this method is the possibility of analyzing different scenarios by defining different symbolizations. The authors found the structure of the stock market in a normal and in an extreme situation. The latter is not possible with the traditional distance based on the Pearson correlation. Even more, the traditional distance proposed by Ref. [9] uses only one variable (asset returns) in order to obtain the structure and taxonomy of the stock markets, losing the possibility of embodying information from other variables such as volume trading. As will be explained later, many works show that there exists a relationship between returns and trading volume. Two things are well known in the Wall Street tradition. First, it takes volume in order to move the prices, it means that there is a positive correlation between trading volume and absolute value of returns. Secondly, it seems that in ‘‘bull markets’’ the volume is heavy and it is light in ‘‘bear markets’’, this suggests a positive correlation between returns and trading volume. The present paper is an extension of Ref. [4] introducing a methodology which embodies information provided not only from returns but also from volume trading. The next section has the purpose of setting up a method to construct multidimensional minimal spanning trees (MMST), obtaining a structure which recovers information from more than one variable. Section 3 aims to show the importance of considering not only asset returns but also trading volume. In Section 4 symbolization for stock markets and an empirical application to the main US companies is proposed.

I Our research was supported by the Free University of Bolzano (project ‘‘Dynamical Regimes in Economics: Modelling and statistical tools’’).



Corresponding author. Tel.: +39 0577 235058; fax: +39 0577 232661. E-mail addresses: [email protected] (J.G. Brida), [email protected] (W.A. Risso).

1 Tel.: +39 0471 013492; fax: +39 0471 013 009. 0378-4371/$ – see front matter © 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.physa.2008.05.009

5206

J.G. Brida, W.A. Risso / Physica A 387 (2008) 5205–5210

In Section 5 we show that the stability of the tree does not depend on a particular partition. In addition, we use Monte Carlo simulations in order to show that the result obtained is not a tree of random connections. Finally, Section 6 draws some conclusions. 2. Multidimensional symbolic based minimal spanning tree The Minimal Spanning Tree (MST) and the Hierarchical Tree (HT) introduced by Mantegna (see Ref. [15]) are a way of detecting clusters with an economic meaning. To obtain this taxonomic representation, a metric distance is necessary; i.e., a function d defined for each pair of time series that takes values in R such that: 1. 2. 3. 4.

d(i, j) d(i, j) d(i, j) d(i, j)

≥ 0∀i, j = 0 if and only if i = j = d(j, i)∀i, j ≤ d(i, k) + d(k, j)∀i, j, k.

Computing all the distances between elements permits the construction of the distance matrix D. This symmetric matrix determines the minimal spanning tree connecting the n elements of a set, showing the most relevant connections. The methodology proposed in Ref. [15] is based on the Pearson correlation coefficient of the asset returns, a fundamental input and distance function proposed by Grower (see Ref. [9]). However, the problem is the following, as will be explained later: there is evidence that not only returns but also volume trading is relevant when we study the structure and dynamics of the stock markets. Therefore, it might be important to find a method considering this multidimensionality in order to derive the structure and taxonomy by using the Minimal Spanning Tree (MST) and the Hierarchical Tree (HT). Let us consider that in constructing a certain structure it is important to consider the following multidimensional time series for each element i:

T {Xi }tt = =1

      xiT  x xi2    i1 yiT  yi1  yi2  =  , ,...,  . .  .    . zi1

zi2

ziT

We can pass to one-dimensional symbolic space S, by defining a determined partition in the multidimensional space Rn , obtaining the following symbolic time series for each element i:

{si1 , si2 , . . . , siT } . The key step in applying symbolization to the time series measurements involves transforming the original values into a sequence of symbols (see Ref. [8]) permitting the reduction of noise in highly contaminated time series, and of course, asset returns are not the exception. Therefore we have to select the partition which will define the regions, assigning a symbol to each measurement according to the region into which it falls. The problem of selecting the partitions sometimes depends on the context of the problem or the underlying economic interpretation. In words, if we start with a given set of measurements {x1 , x2 , . . . , xt , . . . , xT } made up of vectors xt ∈ D ⊂ Rq , for t = 1, 2, . . . , T and the state space Rq is endowed with a suitable partition, then, we transform the sequence of data {x1 , x2 , . . . , xt , . . . , xT } into the sequence of symbols s1 s2 . . . st . . . sT , where st = s if and only if xt belongs to the regime region labeled by s. This converts the original signal into a symbolic sequence, from which the symbol sequence statistics can be estimated. Once the symbolic time series is obtained for each element, the procedure introduced in Ref. [4] is applied in order to derive the Minimal Spanning Tree and Hierarchical Tree. It means, after symbolization, it is possible to define a simple distance as follows:

v u t =T uX d0 (si , sj ) = t (sit − sjt )2 .

(1)

t =1

 t = T

=T Note that {sit }tt = 1 and sjt t =1 are two symbolic sequences for companies i and j respectively. Once the distances are computed the distance matrix D is constructed. It is a symmetric matrix which shows all the distances among the different elements of the set. Using this matrix of distance we construct the Minimal Spanning Tree (MST). The method of constructing an MST linking a set of n objects is direct and is known as the Kruskal’s algorithm, also called single linkage (nearest neighbor). Note that the MST of a set of n elements is a graph with n − 1 links. We briefly describe this method, the MST is progressively built up by linking all the elements of the set together in a graph characterized by a minimal distance between stocks. One starts with the pair of elements with the shortest distance. At this stage the only part already identified of the MST, which we address here as a growing MST is just composed by these two companies. Hence, the next smallest distance is linked. Therefore, we consider the next smallest distance and linked it when not already connected in the tree, it means the MST does not allows loops inside the tree. Then, we continue linking the remaining companies until all the given

J.G. Brida, W.A. Risso / Physica A 387 (2008) 5205–5210

5207

Fig. 1. Relationship between returns and volume. The dotted line represents positive relation, the dashed line is the asymmetric relationship. Finally, the solid line is the asymmetric relationship proposed by Karpoff.

companies are connected in a unique tree. The attractive feature of MST is that provides an arrangement of asset returns which selects the most relevant connections of each element of the set. The MST permits us to obtain the subdominant ultrametric distance matrix d< . This matrix can be constructed from the ultrametric distance d< (i, j) (see Refs. [15,16]). The subdominant ultrametric distance d< (i, j) between i and j is the maximum value of any Euclidean distance dk (l; m) detected by moving in single steps from i to j through the shortest path connecting i and j in the MST. The ultrametric distance d< is used to construct the Hierarchical Tree (HT).

3. The importance of volume and price change A survey by Karpoff (see Ref. [14]) reviews previous research on the relationship between price changes and trading volume in financial markets. In general, the article shows two empirical relations. At first, as in an old Wall Street adage that says ‘‘It takes Volume to make prices move’’, Ref. [14] asserts that numerous empirical findings support what is called positive volume-absolute price change correlation (see [7,5,18,21,6,11], among others). Fig. 1 shows this volume (V ), price change (|∆p|) correlation in dashed lines. Karpoff asserts that this stylized fact is present in both the futures and the equity markets. On the other hand, according to Ref. [14], despite this positive correlation being almost universally found, some tests indicate that the correlation is weak. Another familiar adage says that ‘‘Volume is relatively heavy in bull markets and light in bear markets ’’. In this sense Ref. [14] remarks that in equity markets there is evidence of a positive relation between volume and price change, (see [13,20,18,12] among others). One point is that a correlation such as the dotted line in Fig. 1 has been found only in equity markets. Karpoff concludes that what seems to be a contradiction maybe explained by an asymmetric volume–price change relation as shown by the solid asymmetric lines in Fig. 1, indicating that the relationship is fundamentally different for positive and negative price changes. This asymmetric relationship explains the two empirical findings reported in the [14] surveys. Given that there is some evidence of a relationship between volume trading and price changes, it seems adequate to consider the two variables in order to construct the stock market topology.

4. Defining a bidimensional partition for the stock markets The previous section suggested the importance of considering not only price changes but also trading volume in financial markets. This section will define a simple partition recovering information carried by volume and price changes. In order to define the thresholds we consider a kind of global return for each company i given by the product between returns and volume trading at moment t: Ri (t ) = ri (t ).Vi (t ). we construct the empirical distribution (f ∗ (Ri )) for the series of size T , and considering the maximum entropy principle we obtain two thresholds where empirical distribution cumulates are 1/3 and 2/3 of the probability. Once the two thresholds have been obtained we go to the space of returns and volume trading. Therefore, each pair (returns and volume trading) takes an unique symbol according to the region they are in. Fig. 2 shows the different regions in the bidimensional space. Note that, even when we define the global returns only in order to obtain the thresholds, the symbolization is obtained from a bidimensional space where each bidimensional region has 1/3 of the probability. We selected 3 symbols because according to [17], for statistical reasons one would like to work with small partitions, it leads to a small alphabet. However, using only two symbols will not consider the fact that dynamics can be different in high negative and positive returns with respect to normal return. For this reason [17] suggested a partition with three pieces. As they suggest, we apply the maximum entropy principle, meaning three equally probable bidimensional regions were defined. Once the symbolization is complete we can construct the matrix distance D as explained in the previous section.

5208

J.G. Brida, W.A. Risso / Physica A 387 (2008) 5205–5210

Fig. 2. Three region partition for a given stock: Normal (full line) and extreme (dotted line) cases.

Fig. 3. The MST using the distance based on the Pearson correlation coefficient.

5. US stock market structure In order to study the performance of the methodology we realize an empirical study for the US stock market. We use a dataset of companies included in the Dow Jones Industrial Average.2 The returns are obtained from the stock prices for 30 companies composing the Dow Jones Industrial. At first, we compute the MST and the HT by using the methodology suggested by Mantegna [15]. Fig. 3 shows the MST according to Mantegna (using only the correlation of the returns). Analyzing the MST and HT we can check that companies working in the same branch of production tend to cluster. The closest distance is composed by Verizon (VZ ) and AT&T (T ), two telecommunication companies. Note that there are clear groups of companies, retailers, hard industry, defense and aerospace and consumer goods. In addition, note that General Electric (GE) is the most linked company. Later, we considered the information from returns and volume trading as suggested in the Section 4. The MST is shown in Fig. 4, note that we identified eight groups of companies which make sense from an economic point of view. Again here eight clusters of companies are identified and once more, GE takes the central position in the market. Note that it is possible to identify eight different groups of companies in the MMST. This tree presents General Electric (GE) as the central node, linking most of the groups. Note that some groups composed by financial companies, pharmaceutical companies, or informatics and technological companies do not need to be explained.

2 Data is obtained from database available on-line (http://finance.yahoo.com) and coincides with the daily data from July 10th, 1986 to January 26th, 2007.

J.G. Brida, W.A. Risso / Physica A 387 (2008) 5205–5210

5209

Fig. 4. MST for the US stock market in a normal situation considering returns and volume trading.

Table 1 Confidence Intervals at 5% and 95% for random links Links

(5%–95%)

Links

(5%–95%)

Links

(5%–95%)

Link 1 Link 2 Link 3 Link 4 Link 5 Link 6 Link 7 Link 8 Link 9 Link 10

(80.76–81.57) (81.08–81.68) (81.25–81.76) (81.35–81.82) (81.41–81.86) (81.47–81.90) (81.52–81.95) (81.56–81.97) (81.60–82.01) (81.64–82.03)

link 11 link 12 link 13 link 14 link 15 link 16 link 17 link 18 link 19 link 20

(81.67–82.05) (81.70–82.08) (81.73–82.11) (81.76–82.13) (81.79–82.14) (81.82–82.17) (81.85–82.19) (81.87–82.22) (81.90–82.24) (81.93–82.26)

link 21 link 22 link 23 link 24 link 25 link 26 link 27 link 28 link 29

(81.96–82.29) (81.98–82.32) (82.01–82.34) (82.05–82.38) (82.08–82.43) (82.12–82.43) (82.16–82.56) (82.21–82.66) (82.31–82.85)

Based on the 1000 random simulations of 30 random companies for 5184 days.

However, the group composed by Alcoa (AA), Du Pont (DD) and Caterpillar (CAT ) (we called hard industry) present interesting relationships. Actually, it could be an example of social embeddedness (see Ref. [10]). Note that DD has a director in common with CAT (John T. Dillon), AA (Alain J.P. Belda) and James W. Owens is a director in AA and CAT . In fact, Ahrne (see Ref. [1]) explains that members of an organization interact with individuals from the same company and from other companies, creating social networks both inside and between organizations. On the other hand, the HT shows that the furthest companies are MMM, AIG, MO, and XOM, while T and VZ are the closest, hence in a portfolio it is well worth putting together companies such as T and VZ with one of the former group, like XOM in order to reduce the risk. One question is whether the symbolic MST is sensitive to the partition. In the analysis we selected a equally probable partition (1/3 of probability in each bidimensional region) with thresholds at 33.33% and 66.66%. However, when we select different partitions, the main results do not change. We conducted a sensibility analysis by changing the partitions and constructing the MST each time. We defined 9 different partitions presenting thresholds at: 5%–95%, 10%–90%, 15%–85%, 20%–80%, 25%–75%, 30%–70%, 35%–65%, 40%–60%, and 45%–55%. For all these cases the fundamental structure did not change, GE remains as the central node and the eight clusters are present in the tree. A second question is whether our measure is significant or the links in the MMST are random. In order to study this we conducted 1000 Monte Carlo simulations for 30 random companies for a period T = 5184 days (which is the size of our dataset). Then, we computed 1000 random MST, obtaining the simulated distribution of the distances (or links) belonging to the MST. We define the confidence intervals at 5% and 95% where a random link should enter, for a sample size T = 5184 with 30 companies (29 links). The results are shown in Table 1, note that if we have the smallest distance between 80.76

5210

J.G. Brida, W.A. Risso / Physica A 387 (2008) 5205–5210

Table 2 Links of the main US companies in the MST Link

Firms

Dist.

Link

Firms

Dist.

Links

Firms

Dist.

1 2 3 4 5 6 7 8 9 10

VZ-T MRK-PFE INTC-MSFT C-JPM MRK-JNJ C-AXP HD-WMT C-GE INTC-HPQ MMM-GE

60.83 62.32 62.38 62.61 62.89 64.16 64.40 65.16 65.25 65.33

11 12 13 14 15 16 17 18 19 20

PG-KO WMT-GE INTC-IBM MMM-DD GE-KO PG-JNJ HON-UTX DIS-GE UTX-GE GE-IBM

65.71 66.16 66.30 66.41 67.01 67.38 67.41 67.75 67.87 68.60

21 22 23 24 25 26 27 28 29

AA-DD CAT-DD BA-UTX GE-MCD GM-GE XOM-DD VZ-GE PG-MO AIG-UTX

68.67 69.06 69.06 69.43 69.43 70.06 70.40 70.89 80.22

Based on the obtained results for a partition at 1/3 and 2/3.

and 81.57 we cannot reject the hypothesis that this link is random, note also that for the link 29, the interval is 81.98 and 82.32. However, our tree is far from having random links showing the high significance of the links (see Table 2). Note, that the smallest distance is d(VZ , T ) = 60.83 and link 29 has distance d(AIG, UTX ) = 80.22. 6. Conclusions The Minimal Spanning Tree is a clustering method which permits one to obtain a market structure with economic meaning by using the asset returns as input. The present work introduces a symbolic methodology which considers a multidimensional problem, generalizing the method proposed by Ref. [15]. Since there is evidence about the importance not only of the asset returns but also of the volume trading, it seems relevant to develop a method considering both variables. A bidimensional approach is applied to the main US companies considering both returns and trading volume. Thereby, a Multidimensional Minimal Spanning Tree (MMST) is constructed. As a result, a structure of the market is obtained where there are eight clusters with GE taking the central position in the market. The results are compared with the methodology proposed by Ref. [15]. The results seem to be similar. However, the method introduced considers the information not only from returns but also from volume trading. Further analysis shows that the structure is stable in the sense that it is not sensitive to a particular partition. We selected nine different partitions and the results were always the same, eight clusters and GE taking the central position. In addition, we simulate 1000 random stock markets (of 30 companies for periods of 5184 days) in order to obtain the simulated distribution of the proposed distance measure under a random process. We compared the US market tree with the random trees, noting that the links are significant, since they do not enter in the confidence intervals for random trees. Appendix American International Group (AIG); Alcoa (AA); Boeing (BA); DuPont (DD); United Technology (UTX ); Honeywell (HON); Caterpillar (CAT ); General Motors (GM); IBM (IBM); Hewell-Packard (HPQ ); Microsoft (MSFT ); Intel (INTC ); Coca Cola Co. (KO); Disney (DIS); McDonalds (MCD); Wal Mart (WMT ); Home Depot (HD); Procter and Gamble (PG); Altria (MO); Johnson and Johnson (JNJ); Merck (MRK ); Pfizer (PFE); AT&T (T ); Verizon (VZ ); General Electric (GE); 3M (MMM); ExxonMobil (XOM); American Express (AXP); Citigroup (C ); J.P. Morgan (JPM). References [1] G. Ahrne, Social Organizations. Interaction Inside, Outside and Between Organizations, Sage Publications, London, 1994. [2] G. Bonanno, F. Lillo, R.N. Mantegna, Level of complexity in financial markets, Physica A 299 (2001) 16–27. [3] G. Bonanno, G. Calderelli, F. Lillo, S. Micciché, N. Vandewalle, R.N. Mantegna, Networks of equities in financial markets, The European Physical Journal B 38 (2004) 363–371. [4] J. Brida, W. Risso, Dynamic and structure of the main Italian companies, International Journal of Modern Physics C 18 (11) (2007) 1–11. [5] P. Clark, A subordinated stochastic process model with finite variance for speculative prices, Econometrica 41 (1973) 135–155. [6] B. Cornell, The relationship between volume and price variability in future markets, The Journal of Future Markets 1 (1981) 303–316. [7] R. Crouch, The volume of transactions and price changes on the new york stock exchange, Financial Analysis Journal 26 (1970) 104–109. [8] C. Daw, C. Finney, E. Tracy, A review of symbolic analysis of experimental data, Review of Scientific Instruments 74 (2003) 916–930. [9] J. Gower, Some distance properties of latent root and vector methods used in multivariate analysis, Biometrika 53 (3–4) (1966) 325–338. [10] A. Halinen, J. Tornroos, The role of embeddedness in the evolution of business networks, Scandinavian Journal of Management 14 (3) (1998) 187–205. [11] L. Harris, The joint distribution of speculative prices and of daily trading volume, Working paper, Univ. of Southern CA, 1983. [12] L. Harris, Cross-security tets of the mixture of distribution hypothesis, Journal of Financial and Quantitative Analysis 21 (1986) 39–46. [13] P. Jain, G. Joh, The dependence between hourly prices and triding volume, Working paper, The Wharton School, Univ. of PA, 1986. [14] J. Karpoff, The relation between price change and trading volume: A survey, The Journal of Financial and Quantitative Analysis 22 (1) (1987) 109–126. [15] R. Mantegna, Hierarchical structure in financial markets, The European Physical Journal B 11 (1999) 193–197. [16] R. Mantegna, H. Stanley, An introduction to Econophysics: Correlations and Complexity in Finance, Cambridge University Press, UK, 2000. [17] L. Molgedey, W. Ebeling, Local order, entropy and predictability of financial time series, The European Physical Journal B 15 (2000) 733–737. [18] I. Morgan, Stock prices and heteroskedasticity, Journal of Business 49 (1976) 496–508. [19] J. Onnela, Taxonomy of financial assets, Thesis for the degree of Master of Science in Engineering, Dep. of Electrical and Communications Engineering, Helsinki University of Technology, 2002. [20] R. Rogalski, The dependence of price and volume, The Review of Economics and Statistics 36 (1978) 268–274. [21] R. Westerfield, The distribution of common stock price changes: An application of transactions time and subordinated stochastic models, Journal of Financial and Quantitative Analysis 12 (1977) 743–765.