Gold price analysis based on ensemble empirical model decomposition and independent component analysis

Gold price analysis based on ensemble empirical model decomposition and independent component analysis

Physica A 454 (2016) 11–23 Contents lists available at ScienceDirect Physica A journal homepage: www.elsevier.com/locate/physa Gold price analysis ...

1MB Sizes 1 Downloads 84 Views

Physica A 454 (2016) 11–23

Contents lists available at ScienceDirect

Physica A journal homepage: www.elsevier.com/locate/physa

Gold price analysis based on ensemble empirical model decomposition and independent component analysis Lu Xian a , Kaijian He b,c,∗ , Kin Keung Lai a,d a

International Business School, Shaanxi Normal University, Xi’an, 710119, China

b

School of Economics and Management, Beijing University of Chemical Technology, Beijing, 100029, China

c

Department of Management Sciences, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong

d

Department of Industrial and Manufacturing Systems Engineering, The University of Hong Kong, Pokfulam Road, Hong Kong

highlights • • • •

We propose a new improved model based on EEMD and ICA. We decompose gold price into statistically independent components (ICs). The regression analysis is used to analyze the economic meanings of different ICs. The proposed model is effective in the identification of main influencing factors for gold price movement.

article

info

Article history: Received 28 September 2015 Received in revised form 21 January 2016 Available online 23 February 2016 Keywords: Ensemble empirical model decomposition Independent component analysis Gold price

abstract In recent years, the increasing level of volatility of the gold price has received the increasing level of attention from the academia and industry alike. Due to the complexity and significant fluctuations observed in the gold market, however, most of current approaches have failed to produce robust and consistent modeling and forecasting results. Ensemble Empirical Model Decomposition (EEMD) and Independent Component Analysis (ICA) are novel data analysis methods that can deal with nonlinear and non-stationary time series. This study introduces a new methodology which combines the two methods and applies it to gold price analysis. This includes three steps: firstly, the original gold price series is decomposed into several Intrinsic Mode Functions (IMFs) by EEMD. Secondly, IMFs are further processed with unimportant ones re-grouped. Then a new set of data called Virtual Intrinsic Mode Functions (VIMFs) is reconstructed. Finally, ICA is used to decompose VIMFs into statistically Independent Components (ICs). The decomposition results reveal that the gold price series can be represented by the linear combination of ICs. Furthermore, the economic meanings of ICs are analyzed and discussed in detail, according to the change trend and ICs’ transformation coefficients. The analyses not only explain the inner driving factors and their impacts but also conduct in-depth analysis on how these factors affect gold price. At the same time, regression analysis has been conducted to verify our analysis. Results from the empirical studies in the gold markets show that the EEMD–ICA serve as an effective technique for gold price analysis from a new perspective. © 2016 Elsevier B.V. All rights reserved.

∗ Corresponding author at: School of Economics and Management, Beijing University of Chemical Technology, Beijing, 100029, China. Tel.: +86 010 64438793; fax: +86 010 64438793. E-mail addresses: [email protected] (L. Xian), [email protected] (K. He), [email protected] (K.K. Lai). http://dx.doi.org/10.1016/j.physa.2016.02.055 0378-4371/© 2016 Elsevier B.V. All rights reserved.

12

L. Xian et al. / Physica A 454 (2016) 11–23

1. Introduction As one of the most important industry inputs and the medium of exchange, gold has been playing an important role in global commodity and financial markets. In recent years, the gold market has become very active because of its profitmaking prospects and remarkable risk-avoidance features. As a result, gold price has received increasing attention and research interest. In contrast to many other financial assets, the price of gold follows unusual paths. There are numerous researches on the statistical characteristics of gold price and its movement in the literature. For instance, both Tschoegl [1] and Michael E. Solt [2] found that fluctuations in gold price are time-dependent. Cheung and Lai [3] believed that after 1979, generation of long-term memory in the gold market was attributed to several factors. As these factors gradually disappear, long-term memory will vanish. Mills [4] statistically investigated the behavior of gold price on the London market from 1971 to 2002 and found significant sharp peaks and heavy tails which was non-normal. Tully and Lucey [5] investigated macroeconomic influences on gold using the asymmetric power GARCH model and the results suggested that this new model provided a very good description of gold price. Parisi et al. [6] analyzed recursive and rolling neural network to forecast onestep-ahead sign variations in gold price and tried to dispose the sharp peaks and heavy tails. Sjaastad [7] examined the theoretical and empirical relationships between the major exchange rates and the price of gold using forecast error data. Batten and Lucey [8] examined gold futures contract on the Chicago Board of Trade (CBOT) using intraday data from 1999 to 2005 by means of GARCH model and nonparametric estimation method. They explored the characteristics of price volatility and concluded that effect of trading volume on price of gold futures was almost negligible. Shafiee and Topal [9] applied a modified econometric version of the long-term trend reverting jump and dip diffusion model to forecast gold price. The proposed models or methods can help understand characteristics of gold price and recognize each possible factor’s influence on gold price. However, some very important topics have not been discussed in the existing literature. For example, how do these factors affect gold price exactly? How can we identify and describe each factor’s influence? Can we quantify the importance of these factors? These questions are of crucial practical significance in further analysis of the features of gold price. Independent Component Analysis (ICA) can be used to identify the underlying factors. ICA was for the first time introduced in early 1980s in the context of neural network modeling [10,11]. ICA can identify some independent and hidden sources from the mixtures without any prior knowledge of the mixing mechanism. The hidden information is called the Independent Components (ICs) which provide insights into the structure of the observable data set. Due to its generality, the ICA model has been applied in many different areas, such as signal processing, face recognition, feature extraction and quality control [12–15]. The ICA model has been used in financial data as well. There are many situations in which financial time series are closely related. Typical examples may include currency exchange rates, daily returns of stocks, crude oil price and so on. These data may have some underlying factors in common. Back and Weigend [16] applied ICA to extract the features of daily returns of stocks. Their results showed that the dominant ICs can reveal more details of the underlying structure and information of stock prices than Principal Components Analysis (PCA). Kiviluoto and Oja [17] tried to find the common fundamental factors among 40 stores under the same retail chain by ICA. They found that the cash flow of the retail stores was mainly affected by holidays, seasons and competitors’ strategies. Oja et al. [18] applied ICA in foreign exchange rate time series prediction. Lu et al. [19] constructed a two-stage modeling approach using ICA and support vector regression in financial time series forecasting and found that this method could alleviate the influence of noise effectively. Nevertheless, ICA has rarely been applied in gold price analysis. These models are mostly proposed in the multivariate financial data analysis. In the case of univariate data analysis, recently we have witnessed some initial efforts, most from the perspective of combinations of both wavelet analysis and ICA. For example, the method combining wavelet and ICA (called WICA) has been proposed by Lin and Zhang [20], which was used for fault feature separation. However, the wavelet transform imposes strict assumptions of using particular wavelet basis and encounter serious limitations in modeling the practical data. These limitations include the interference terms, border distortion and energy leakage, etc. [21]. During the wavelet transformation process, this would produce a lot of small undesired spikes all over the frequency scales and make the results confusing and difficult to be interpreted. What is more, the choice of mother wavelet and decomposition level has a significant impact on decomposition results [22]. Meanwhile, EMD is an empirical, intuitive, direct and self-adaptive data processing method. This means that it decomposes a signal without prior knowledge about the signal of interest embedded in the data series [23]. Moreover, the decomposition results of the EMD can reflect the physical properties of the original system more accurately, not restricted by the particular basis employed, and offer much better temporal and frequency resolutions [24]. In the literature EMD model is commonly perceived to be more effective in analyzing nonlinear and nonstationary data. Based on this, Mijović et al. [25] introduced the technique combining EMD and ICA (called EMD–ICA), and compared with WICA, the result showed that EMD–ICA outperforms WICA, especially for high noise-to-signal ratios. In this paper we will propose the modified EEMD–ICA models beyond that of Mijović et al. [25] for analyzing the gold price analysis. The rest of the paper is organized as follows. Section 2 gives a brief introduction of ICA and EEMD. The improved EEMD–ICA model is proposed and described in Section 3. Section 4 describes the descriptive statistics of the gold price from 1969 to 2013. Detailed analyses based on the decomposed data using the EEMD–ICA model are presented. Section 5 concludes.

L. Xian et al. / Physica A 454 (2016) 11–23

13

2. Research methodology 2.1. Independent component analysis ICA is a statistical and computational technique used for identifying hidden factors that underlie sets of random variables, measurements or signals. It extends beyond Principal Components Analysis (PCA) to analyze and model the independence characteristics of random variables of the underlying factors or components, not restricted by the common elliptical distributions [10]. We assume that the set of random samples xi of size n × 1, i = 1, 2, . . . , m, m ≤ n is generated by a linear mixture of unknown factors which we denote as si of size n × 1. In matrix notation, we can obtain (1) [26]: X = AS =

m 

ai sTi .

(1)

i=1

Here X = [x1 , x2 , . . . , xm ]T , A is the assumed m × m static mixing matrix, ai is the ith column of A and S = [s1 , s2 , . . . , sm ]T of size m × n. The statistical model in the above equation is called independent components analysis, or ICA model. The goal of ICA model is to estimate both A and S using the observed data X . Generally, there are two assumptions, one is that components si are statistically independent and the other is that there is at the most one Gaussian distribution in the independent components. In fact, we can find an m × m matrix W such that Y = [yi ] = WX

(2)

where yi is the ith row of matrix Y , i = 1, 2, . . . , m and the vector yi is as statistically independent as possible. If the un-mix work is done perfectly, the de-mixing matrix W is the inverse of mixing matrix A, i.e. W = A−1 and we get the estimated value of the latent source signals si , that is,  si = yi . The ICA model is formulated as an optimization problem by using the measure of the independence of ICs as an objective function and using some optimization methods for solving the de-mixing matrix W . Several algorithms for solving ICA model had been proposed in the literature [27,26]. The de-mixing matrix W can be determined by using an unsupervised learning algorithm with the objective of maximizing the statistical independence of ICs. ICs with non-Gaussian distributions imply statistical independence and the non-Gaussianity of the ICs can be measured by the negentropy as in (3). J (y) = H (ygauss ) − H (y)

(3)

where ygauss is a Gaussian random vector havingthe same covariance matrix as y. H is the differential entropy of a random vector y with density f (y) defined as H (y) = − f (y) log f (y)dy. A fundamental result of the information theory is that a Gaussian variable has the largest entropy among all random variables with equal variance. Due to this property, the negentropy is always non-negative and is zero if and only if y has a Gaussian distribution. The advantage of using negentropy as a measure of non-Gaussianity is that it is well justified by statistical theories. However, it is computationally very difficult. Therefore, a simpler approximation of negentropy is proposed as in (4) [26]: J (y) ∝ [E {G(y)} − E {G(ν)}]2

(4)

where ν is a Gaussian variable of zero mean and unit variance (i.e. standardized). The variable y is a random variable with y2

zero mean and unit variance and the function G is a non-quadratic function. In our study, G(y) = y · exp(− 2 ). The FastICA algorithm proposed by Hyvärinen and Oja [26] is adopted in this paper to estimate the de-mixing matrix W . 2.2. Empirical model decomposition and ensemble empirical model decomposition EMD is a method developed by Huang et al. [28] for decomposing a nonlinear, non-stationary time series into several components referred to as Intrinsic Mode Functions (IMFs). An IMF is defined by two criteria: (1) The number of zerocrossings and extrema are equal or differ at most by one; (2) The IMF is symmetric with respect to local zero mean. An iterative ‘‘sifting process’’ [28] is employed to extract the IMFs and a residual series from the data. The total number of IMFs is limited to log2 N, where N is the length of original data series x(t ). If we denote ci (t ), i = 1, 2, . . . , N to be the resultant set of IMFs and the residual series is r (t ), the original time series can be expressed as in (5). x(t ) =

N 

ci (t ) + r (t ).

(5)

i =1

The advantages of EMD are clear and have been demonstrated by many researchers [24,29]. However, the original EMD has a drawback: the frequent appearance of mode mixing, defined as a single IMF consisting of either signals of widely disparate scales, or a signal of a similar scale residing in different IMF components. Wu and Huang [30] proposed EEMD to

14

L. Xian et al. / Physica A 454 (2016) 11–23 Table 1 The IMF recombination process. SetVIMFs = ∅; a = 0 Fork = 1 to N + 1 ifCCI k ≥ λ put the kth IMF into VIMFs else a = a + the kth IMF end if end  VIMFs = VIMFs a

solve this problem. The basic idea of EEMD is that observed data are amalgamations of the true time series and noise. Thus if data are collected by separate observations, each with a different noise level, the ensemble mean is close to the true time series. Therefore, an additional step is taken by adding white noise that may help extract the true signal in the data. The effect of the added white noise can be controlled by the well-established statistical rule, calculated as in (6).

ε εn = √

(6)

N

where N is the number of ensemble members, ε is the amplitude of the added noise and εn is the final standard deviation of error, defined as the difference between the input signal and the corresponding IMFs. In practice, the number of ensemble members is often set to 100 and the standard deviation of white noise series is set to 0.1 or 0.2. 3. An improved EEMD–ICA methodology The EMD–ICA model was proposed by Mijović et al. [25] for single-channel signal processing. In general, the financial time series are signals generated by the financial system and can be processed using similar techniques. Therefore, following similar approach we can assume that the financial time series consists of several mutually independent source signals mixed in a certain manner [10]. With this assumption, the gold price series is first decomposed to obtain the underlying source signals using EMD–ICA. In this paper we propose an improved EEMD–ICA model, that replaces the original EMD model with EEMD model to decompose the gold price series for better performance. And a procedure of recombination is added in the proposed method. The numerical procedure of recombination is as follows. Let x(t ), t = 1, . . . , T be the original data series and then get ci (t ), i = 1, . . . , N as the resultant set of IMFs and the residual series r (t ) by EEMD. For convenience, we denote N +1 r (t ) = cN +1 (t ) and then the original time series can be expressed as x(t ) = i=1 ci (t ). That is, the original time series can be reconstructed using IMFs. The starting point is that an important IMF has more influence on the effect of reconstruction. On the other hand, an unimportant IMF has poor influence on the effect of reconstruction. Thus, we can evaluate the contribution of an IMF by measuring the similarity between the original time series and the corresponding reconstructed series. In this paper, the transformative relative hamming distance (RHD) is applied. At last, the equation used to evaluate the contribution coefficient of the kth IMF (CCI k ) is defined as in (7). CCI k = 1 −

1

T −1 

T − 1 t =1

R(t )

(7)

where R(t ) = 1 if (x(t + 1) − x(t ))( x(t + 1) − x(t )) ≥ 0, or else R(t ) = 0;  x( t ) = 1=1,i̸=k ci (t ). After obtaining all contribution coefficients CCI k (k = 1, . . . , N + 1), we can compare them with a hard threshold λ. A new set of series called virtual intrinsic mode functions (VIMFs) is obtained by regrouping the IMFs. The IMF recombination process is shown in Table 1. Normally, the hard threshold λ can be set to a fixed small value (such as 0.2 or 0.3). In our study, λ = 0.3 is used. As can be seen in Fig. 1, the improved EEMD–ICA methodology generally comprises of the following three steps:

 N +1

1. The gold price series x(t ), t = 1, . . . , T is decomposed into N IMFs, ci (t ), i = 1, . . . , N and the residual series r (t ). 2. The IMF recombination process is adopted to select and regroup the IMFs and the residue and to get a new set of series called VIMFs, vj (t ), j = 1, . . . , M and M ≤ N + 1. 3. Apply the ICA to VIMFs and get statistically independent components sk (t ), k = 1, . . . , L and L ≤ M ≤ N + 1. Furthermore, through linear transformation, we can reconstruct gold price series in terms of the estimated ICs as

 x(t ) =

N  i=1

ci (t ) + r (t ) =

M  j =1

vj (t ) =

L 

bk sk ( t )

k=1

where bk is the sum of the kth column of mixing matrix A, and it is called transformation coefficient of the kth IC.

(8)

L. Xian et al. / Physica A 454 (2016) 11–23

15

Fig. 1. The flow chart of single series separation process using improved EEMD–ICA methodology.

4. Empirical analysis In this section, we report the original results from the empirical studies as well as conducts the in-depth analysis of these results. In our experiment, the gold price series is decomposed into a set of statistically independent components by EEMD–ICA. The analyses of these components help comprehend the variability and underlying factors of gold price from a new perspective. 4.1. Data description In this study, KITCO gold price is chosen as the experiment sample data since it is widely used as the benchmark price by many financial institutions. The data set is made publicly available on the publisher website (http://www.kitco.com/charts/). The National Currency Unit per troy ounce is USD. Since we have witnessed the significantly increasing level of fluctuations for gold price from 1969 onwards, this study focuses on the historical trend of monthly gold price from 1969 to 2013. The monthly data are adopted due to the following reasons. On one hand, the monthly data provide more smoothly and less distorted data fit analysis. (2) data for majority of the underlying influencing factors are available at monthly frequency or even lower frequency. The number of gold price data points from December 1969 to January 2013 is 518 in total. The data series of gold price from December 1969 to January 2013 is illustrated in Fig. 2. 4.2. IMFs, VIMFs and ICs The underlying factors of the gold price data series are derived in three steps. During the process, the model produces three different new data sets: IMFs, VIMFs and ICs. As illustrated in Fig. 3, the IMFs and the residue are derived first by applying EEMD to the data set. Since the number of IMFs is restricted to log2 N, where N is the number of samples, the data series is decomposed into 8 IMFs and one residue. From these series, we can identify some interesting stylized facts of gold price volatility. All the components can be divided into 4 categories based on their different respective frequencies and amplitudes. The first category is called the trend which is shown by the residual. It changes slowly and smoothly in the long term and the continuously increasing trend is consistent with the rise of gold price. Components in the second category displays low frequencies, including IMF6 to IMF8. These components show no violent swings but there are significant cyclical changes.

16

L. Xian et al. / Physica A 454 (2016) 11–23 1800

1600

1400

$US/Ounce

1200

1000

800

600

400

200

0 1975

1980

1985

1990

1995

2000

2005

2010

2005

2010

Year

imf1 signal

100 0 –100

imf2

100 0 –100

imf3

100 0 –100

imf4

200 0 –200

imf5

200 0 –200

imf6

200 0 –200

imf7

500 0 –500

imf8

50 0 –50

res.

Fig. 2. The monthly gold price from Dec. 1969 to Jan. 2013.

1600

2000 1000 0

–150

1975

1980

1985

1990

1995

2000

Year

Fig. 3. The IMFs and residue for the gold price from Dec. 1969 to Jan. 2013 by EEMD.

In other words, there should be some recurring or mild factors which affect gold price, such as gold supply and demand, the US Dollar and seasonal fluctuations. Components in the third category demonstrate moderate level of frequencies and amplitudes displayed by IMF4 and IMF5. According to the changes of trend, we observe that the two components can provide a good description of the effects of significant events (such as the Afghanistan war in 1979, the Iran–Iraq war in 1980 and Subprime Lending Crisis in 2007). The fourth category (including IMF1 to IMF3) has high frequency and small amplitude. This part can be seen as many other sudden market factors, such as the influences of bad weather, labor strikes and investment behaviors. These effects do not last long and are categorized as high frequency events whose effects are included in the high frequency component. These components have small amplitudes, so they show no serious impact on gold price—it is generally within $100. The analysis above shows that there is one to one correspondence between the trend of change of gold price and the IMFs as well as the residue. For reducing the influence of unimportant IMFs, the second step called recombination is adopted. Table 2 shows the contribution coefficients of the IMFs and the residue for the original data. We can see that the first three IMFs have less influence than other IMFs and the residue, which means these sudden market factors that change suddenly are less important than other kinds of factors. In other words, sudden market factors have poor influence on gold price from the long term perspective. Then, IMF1, IMF2 and IMF3 are combined together to form

L. Xian et al. / Physica A 454 (2016) 11–23

17

Table 2 The contribution coefficients of the IMFs and residue for the original data. IMF8

Res.

0.4855

0.3926

0.4778

vimf1

IMF7

0.3849

200 0 –200

vimf2

IMF6

0.3675

200 0 –200

vimf3

IMF5

0.3269

200 0 –200

vimf4

IMF4

0.2573

500 0 –500

vimf5

IMF3

0.2128

50 0 –50

vimf6

IMF2

0.1973

1600

vimf7

IMF1

200 0 –200

–150

0

1975

1980

1985

1990

1995

2000

2005

2010

Year

IC1

5 0 –5

IC2

6 0 –6

IC3

10 0 –10

IC4

5 0 –5

IC5

5 0 –5

IC6

5 0 –5

IC7

Fig. 4. The VIMFs for the gold price from Dec. 1969 to Jan. 2013 through recombination.

5 0 –5

1975

1980

1985

1990

1995

2000

2005

2010

Year

Fig. 5. The ICs for the gold price from Dec. 1969 to Jan. 2013, by the improved EEMD–ICA.

a new set of series named VIMF7; Fig. 4 shows the recombinational results. We can evaluate the contribution coefficient of VIMF7, with the value being 0.3075. Fig. 5 shows the final ICs which were decomposed using ICA, based on these VIMFs. The mixing matrix A and its inverse are also obtained. From Eq. (9), the reconstruction of gold price can be shown as

 x(t ) =

7 

bi si (t )

i=1

= −68.0461s1 − 316.1037s2 − 62.2679s3 + 96.1160s4 + 13.0191s5 + 40.8315s6 + 76.0350s7 .

(9)

The simple and clear mathematical expression above is very attractive. It is reasonable to say that gold price can be shown by several independent components. Therefore, it is particularly important to explore the characteristics and essential meanings of the ICs.

18

L. Xian et al. / Physica A 454 (2016) 11–23

Table 3 Descriptive statistics of the estimated ICs.

IC1 IC2 IC3 IC4 IC5 IC6 IC7

bi

Mean

Skewness

Kurtosis

ρi

H-E

J–B

ADF

−68.0461 −316.1037 −62.2679

−0.2559 −0.6849 −0.1192

0.7068 0.9484 0.3755 0.9801 0.9862 0.8475 1.0060

413.1(0.00) 1757.9(0.00) 2883.2(0.00) 1141.1(0.00) 34.9(0.00) 67.6(0.00) 50.7(0.00)

−1.4445(0.14)

2.4757 0.5859 −0.1334 −0.5936

6.5751 10.2628 14.0980 9.0327 1.7532 4.2489 1.4671

−0.1906(0.00) −0.8938(0.00) −0.1712(0.00)

96.1160 13.0191 40.8315 76.0350

−1.2606 −2.6786 −1.6138 −2.0295 0.1228 −0.6267 −0.0014

0.2729(0.00) 0.0368(0.40) 0.1152(0.00) 0.2146(0.00)

0.3297(0.77)

−11.6763(0.00) 0.4007(0.80)

−0.2830(0.55) −0.8233(0.35) 0.3542(0.78)

Note: p-values in parenthesis; bi is the transformation coefficient of the ith IC; ρi is the Pearson correlation between the ith IC and the original data series; H-E is the Hurst exponent; J–B is the Jarque–Bera Statistic; ADF is the augmented Dickey Fuller Statistic.

4.3. Statistical characteristics of ICs The estimated ICs are analyzed in terms of mean, skewness, kurtosis and Hurst exponent of each IC, correlation between each IC and the original data series, Jarque–Bera (JB) test for normality, Augmented Dickey Fuller (ADF) test for stationarity of each IC. Table 3 shows related statistical information about the ICs. The transformation coefficients of the ICs are also listed. Since both S and A are unknown, we cannot determine the variances of the independent components in the ICA model [10], the matrix A is adapted to make the variance of each IC 1. Therefore, our analysis is based on each IC’s changing trend rather than its amplitude. All ICs are distinctly non-Gaussian as the results of the JB test are all significant at the 5% confidence level. Due to their kurtoses it is also evident that they are far from a normal distribution. From Table 3, we can see the five ICs (IC1, IC2, IC3, IC4 and IC6) are supergaussian distributions with sharp peaks and long tails, while IC5 and IC7 are subgaussian which tend to be flatter than the Gaussian one, or multimodal. This shows that our result fits the assumptions of ICA model. The coefficients of correlations between ICs are zero since they are independent of each other. Table 3 shows that IC2 shows the highest Pearson correlation between the ICs and the original data. The absolute value reaches a high level of more than 0.8938. Interestingly, if we do not consider symbols, the order of the Pearson correlation coefficients is the same as the ICs’ transformation coefficients. This means IC2 is the most dominant mode for the observed data, followed by IC4, IC7, IC1, IC3 and IC6, while IC5 has relatively low influence on gold price. At the same time, Hurst exponents of the ICs are calculated and the results show that other ICs have long-term memory while IC3 switches between high and low values in the long term. The ADF test statistic shows that IC3 is stationary and the others are un-stationary. In other words, the other ICs have changing trends while IC3 displays a random characteristic. This result is consistent with the above-mentioned discussion in this paragraph. 4.4. Relationship between the ICs and some important factors As one of the most important commodities, gold has both monetary and financial attributes. It is generally viewed as a conservative investment instrument, for hedging against inflation. Factors that influence gold price can be divided into two aspects, i.e. the internal factors include supply and demand while external factors include global economic cycles, international financial markets, inflation, geopolitical circumstances, and so on. In the short term, gold price fluctuations are relatively intense due to influences from numerous international political events, short-term market behavioral factors and international speculative funds. In the medium term, the price of gold is mainly influenced by the world’s major developed countries’ monetary and exchange rate policies. In the long term, the price of gold is also mainly influenced by its supply and demand. In this section, we will discuss the economic implications of each IC in two steps. Firstly the trajectory of IC and economic variables are analyzed and compared carefully to identify the potential correspondence. Secondly we conducted the regression analysis to provide the statistical significance of their correlations. 4.4.1. Trend comparison Despite the presence of time differences at some points, the trajectory of the first independent component is strikingly consistent with that of the American CPI, with the data obtained from Bureau of Labor Statistics (BLS, http://www.bls.gov/). Their change trend is shown in Fig. 6. In practical studies, since CPI is normally considered as the index of inflation, what IC1 reflects is the mechanism of how inflation affects gold price, which can be used to explain the meticulous effect on gold price. For example, between 1980 and 1981, the American CPI reached an all-time high of 14.6, exerting the greatest influence on gold price. Through calculation of the value and transformation coefficient of IC1, we can conclude that the effect on gold price was around USD 221.64 per troy ounce; the gold price was nearly USD 600. The second independent component is relatively stable. For convenience, we can change the symbol of IC2 with negative coefficient and illustrate it in Fig. 7. It can be seen that IC2 increases steadily with small changes which, according to our analysis, represents how the global economic development affects gold price. In order to demonstrate it with more details, we took into consideration the

L. Xian et al. / Physica A 454 (2016) 11–23

19

Fig. 6. The comparison between CPI (in US) and IC1 from Dec. 1969 to Jan. 2013.

Fig. 7. The upside down IC2 from Dec. 1969 to Jan. 2013.

Fig. 8. The annual percentage growth rate of world’s GDP from 1969 to 2013.

growth rate of global GDP from 1969 to 2013, with data obtained from United Nations Conference on Trade and Development (UNCTAD), and further illustrated it in Fig. 8. It can be seen from Fig. 8 that the GDP growth rate dropped down to 0.54% in 1982 and also became negative in 2009. These represent the fluctuations in IC2, which suggests that weak global economy causes the gold price to fluctuate. Fluctuations of IC2 indicate that when the global economy gets healthier, the effect of changing GDP growth rate on gold price becomes smaller, meaning that the confidence in economy is significantly boosted. In the long run, gold price still rises along with the unremitting growth of the world economy. Among all the independent components, IC3 has the highest change frequency and is the most stochastic in nature. Furthermore, gold market reacts most rapidly to this component, causing sharp fluctuations in price. We can associate this component with factors such as strikes, bad weather, psychology and speculation. Despite their insignificance in the

20

L. Xian et al. / Physica A 454 (2016) 11–23

Fig. 9. The gold production minus consumption from 1995 to 2013. 1 0.5 0 -0.5 -1 -1.5 -2 -2.5 1975

1980

1985

1990

1995

2000

2005

2010

Year

Fig. 10. IC7 from Dec. 1969 to Jan. 2013.

long run, these factors affect the short-term gold price. Compared with yearly data, ‘‘short-term’’ indicates data collected on a monthly basis. With economic development, the gold market itself plays a greater role, more frequently in causing fluctuations in gold price. Contribution of active gold speculators to gold price has been especially pronounced in recent years. While changing slowly, IC4 remains at a certain high value. Meanwhile, the transformation coefficient of IC4 is 96.1162, indicating that IC4 has a long-term important influence on gold price. Gold, as a commodity, is affected by supply and demand. Fig. 9 describes the trajectory of total gold production and consumption worldwide from 1995 to 2013 (source: The World Gold Council, WGC (http://www.gold.org/)), which shows very similar movement pattern with IC4. This means that the difference between gold production and consumption decreased substantially, mainly attributed to the fact that the soaring gold production cost and difficulty in discovering new gold mines had led to gradual reduction of gold yields, besides the growing demand for gold driven by emerging industries and investment worldwide. At the same time, IC4’s slowly changing trend also leads to another conclusion that, as the basis of gold price, gold supply and demand act as a long-term fundamental mechanism rather than reflecting short-term violent fluctuations of gold price. IC5’s changing trend features cyclicality caused by several possible factors, including long cycles, such as world economic development cycles (global GDP development cycles of around 10 years as shown in Fig. 8), and short cycles, such as seasonal consumption impetus, normally seen in China, India and other emerging economies where gold price is weak in spring and summer but starts to grow in autumn. Certainly, the long cycles of world economic development play a dominant role. One of the important factors affecting gold price is international geopolitics (including wars, global emergencies and major strategies of global powers such as US), which can be seen from the trend of IC6. Events such as the Fourth Middle East War in 1973, the Afghanistan War between 1978 and 1980, the Gulf War in 1991 and the 911 incident in 2001 all represent the corresponding changes in IC6, indicating that such events cause gold price to fluctuate at different levels. From the trend of IC6, it can be concluded that the mechanism of how international geopolitics affect gold price is endowed with two features: firstly, despite having violent effects, such major events are followed with gold price returning to the original level; secondly, with the international order becoming multipolarized, world stability is enhanced, as local wars and terrorist events have smaller effects on gold price. Therefore, IC6 oscillates at a smaller amplitude than before, but the possibility of certain political factors causing violent fluctuations of gold price still cannot be ruled out. Compared with the previous 6 independent components, IC7 changes more moderately. The analysis and comparison of the trend shown in Fig. 10 indicates that this basically reflects the effect of US dollar on gold price. Gold price rose significantly in 1975 and 1980 but remained constantly weak between 1988 and 1999, which is opposite to the IC6 trend in the three time periods. This is consistent with the regular research conclusion that US dollar and gold price move in opposite directions [31,5].

L. Xian et al. / Physica A 454 (2016) 11–23

21

Table 4 Significant political events from 1969 to 2013. Year

Significant political events

1971 1973 1979 1980 1982 1989 1990 1991 2001 2003 2006 2007

The Bretton Woods System collapse The fourth Middle East war The war in Afghanistan The Iran–Iraq war The fifth Middle East war The violent change of the Eastern Europe Gulf War Soviet Union’s disorganization 911 terrorist attacks, the war in Afghanistan The second Gulf War Iran nuclear crisis Subprime Lending Crisis

4.4.2. Regression analysis In this section we investigate the correlation between economic factors and the extracted independent components to test and confirm the derived economic meaning of each independent component. For further statistical evidence and more clear explanation about the relationships between ICs and the economic variables, we regress each IC (except IC3, as it represents the accidental factors, which are supposed to be abrupt, stochastic and unpredictable) on the relevant economic variables. Previous studies have shown that gold price is subject to such factors as world economic development, gold supply and demand, US dollar, inflation (CPI), market emergencies, international geopolitics and cyclicality. The economic variables used in our analysis including: CPI in US, world’s annual GDP, gold supply and demand, economical cycle, political factors, and US dollar index. These variables are not always measurable, such as economical cycle and political factors. Thus it is necessary to use some proxy variables. For economical cycle, interest rate serves as a good indicator. In fact, many research results show that the interest rate cycle and the economic cycle are consistent in the turning point and duration [32–36]. In our analysis, the interest rate in US is used, because of its typicality. The political factors are treated as the dummy variables. Thirteen significant political events are selected and they are showed in Table 4. These dummy variables are coded in the following way: the years with significant political events are coded 1 while the remaining years are coded 0. In order to get better effect and according to the actual situation, the world’s annual GDP is used as auxiliary variables when IC5 regress on political factors. We also decomposed gold price by wavelet transform and WICA for comparison. The symmlet wavelet (symmlet6) was used as the mother wavelet since it is an orthogonal symmetric wavelet filter with a linear phase and has strong superiority in avoiding undesirable artifacts [37]. The decomposition levels are chosen to be 6, to be consistent with the number of ICs in EEMD–ICA decomposition. By the above two decomposition ways, we get Ws and WICs respectively. In order to facilitate comparison, the types of relevant economic variables remain the same as those of ICs’. Due to the sensitivity of the method of ordinary least-squares (OLS) to outliers [38,39], we use the robust regression instead of the simple OLS in our analysis. During the regression analysis process, the data frequency should be the same for both the independent and dependent variables. Since data for some economics variables such as GDP, etc. are only available at yearly frequency, data for their corresponding ICs are converted to the annual frequency. Thus IC2, IC4, IC5 and IC6 are converted to the annual data. The same method is used to process Ws and WICs. The coefficients and R2 for the robust regression of the estimated ICs, Ws and WICs on the relevant economic variables are reported in Table 5. IC3, W3 and WIC3 are absent, for they are all classified as the accidental factors. Experiment results in Table 5 show that the EEMD–ICA model is better than Wavelet Transform and WICA based model in terms of the effectiveness of the extracted independent components from the practical data. The regression equations of ICs and related economical variables from the EEMD–ICA based approach are statistically significant based on the F test result. Meanwhile, the coefficients of regression equation are statistically significant at the 5% confidence level except for the coefficient for ING variable. This is because that there is insufficient amount of data points due to the annual and infrequent nature of the variable. These results provide the statistical evidence about the linear relationship between ICs and related economical variables and also verify the trend comparison results in the previous section. In comparison, p-values of the tests for the coefficients of regression equation based on the wavelet transform and WICA approach are most insignificant. This result suggests that the decomposition using wavelet transform and WICA is ineffective. The relationship between the extracted components and the economic factors is statistically insignificant. The better performance of improved EEMD–ICA model is attributed to the introduction of EEMD as the data separation tool. The EEMD decomposition can extract data characteristics more powerful in actual data analysis. The basis function getting by adaptive optimization method is more suitable for the actual data in the nonlinear and time-varying characteristics, its limitation is less than fixed wavelet basis. Also, the ICA is critical and help identify the underlying and independent factors: world economic development, gold supply and demand, US dollar, inflation (CPI), market emergencies, international geopolitics and cyclicality. These factors are sorted according to their respective degrees of importance and the critical standard is each IC’s transformation coefficient. Through each independent component and its transformation coefficient, gold price can be better recombined, which also provides a new solution to the prediction and risk analysis of gold price.

22

L. Xian et al. / Physica A 454 (2016) 11–23 Table 5 Robust regression of the estimated ICs, Ws and WICs on the relevant economic variables. Intercept **

IC1 on CPI IC2 on GDP IC4 on GPMC IC5 on USR IC6 on ING IC7 on USDX

−0.2054 −0.6187**

W1 on CPI W2 on GDP W4 on GPMC W5 on USR W6 on ING W7 on USDX

−1.0948*

WIC1 on CPI WIC2 on GDP WIC4 on GPMC WIC5 on USR WIC6 on ING WIC7 on USDX

2.7638** 1.5663** 0.4215 −3.005** 0.3421 3.2316 −7.9493 −14.059 261.71**

−0.2971** 0.017 −0.4582 −0.0503 0.1577 −0.2209**

Coefficient **

0.0375 9.5355e−06* −0.00051** −0.2504** −1.153e−05/0.0566 0.03101** 0.2351*

−1.823e−05 −0.005 1.8874 0.00045/3.6992 1.1827** 0.0204*

−1.6069e−06 0.00042

−0.0229 −3.1397e−05** /−0.0312 −0.0011

R2

Adjusted R2

F -stat

0.188 0.604 0.412 0.354 0.145 0.895

0.187 0.595 0.377 0.339 0.104 0.895

120** 65.5** 11.9** 23.6** 3.56* 4090**

0.0102 0.0324 0.0196 0.0249 0.0275 0.442

0.0083 0.0099 −0.0381 0.0022 −0.0189 0.441

5.3* 1.44 0.339 1.1 0.593 379**

0.0154

9.08** 0.883 2.2 0.176 24.3** 15**

0.0173 0.0201 0.115 0.0041 0.537 0.0303

−0.0027 0.0625

−0.0191 0.514 0.0283

Note: Ws are the decomposition signals using wavelet transform; WICs are the series decomposed by WICA; CPI is the consumer price index in US; GDP is the world’s annual gross domestic product; GPMC is the gold production minus consumption; USR is the real interest rate in US; ING has two variables, the first is the world’s annual GDP, and the second is a dummy variable for the international geopolitics; USDX is US dollar index. * p < 0.05. ** p < 0.01.

5. Conclusions By modifying the existing EMD–ICA model and applying it to decomposing of gold price between December 1969 and January 2013, this study obtains a series of statistically independent sequences, which brings out a lot of meaningful information. In this paper, we improve the original method in two aspects: EEMD is used instead of EMD to decompose the gold price series, and a procedure of recombination is added in the proposed method. During the decomposition course, three groups of series called IMFs, VIMFs and ICs are obtained. We not only completely recombine the gold price series through ICs, but also interpret the economic meanings of sequences obtained through decomposition. By identifying the IMFs obtained through EEMD decomposition, we are able to divide the factors affecting gold price into 4 categories and by further decomposition, we obtained 7 ICs with rather obvious meanings, which represent the economic factors of world economic development, gold supply and demand, US dollar, inflation (CPI), market emergencies, international geopolitics and cyclicality (these factors are sorted according to their importance degree). Furthermore, combining each IC’s changing trend with transformation coefficient, we explain their specific effects on gold price in detail, and our discussion is verified by regression analysis. Results obtained from the decomposition of gold price can be applied in other areas, including gold price prediction and risk analysis of gold investment. As for gold price prediction, given that each IC has its own features and meaning, we can use different analytic methods (such as Regression Analysis, Support Vector Regression and Neural Network Mode) and the recombination method mentioned above for prediction. For risk analysis, we can estimate the approximate conditional density for each IC using some modes (such as Maximum likelihood estimate, Bayesian estimation, and Mixture Density Network). The random samples are drawn from the derived joint conditional cumulative distribution and the α -quantile calculated as a portfolio VaR estimate. We leave these interesting arising research topics to future work. Acknowledgments The authors would like to express their sincere appreciation to the editor and the anonymous referees for their valuable comments and suggestions which helped improve the quality of the paper tremendously. This work is supported by the National Natural Science Foundation of China (NSFC No. 71201054, No. 91224001, No. 71433001), the Strategic Research Grant of City University of Hong Kong (7004574), and the Fundamental Research Funds for the Central Universities in BUCT. References [1] A.E. Tschoegl, Efficiency in the gold market—a note, J. Banking Finance 4 (4) (1980) 371–379. URL: http://www.sciencedirect.com/science/article/pii/ 0378426680900151. [2] E. Michael, P.J.S. Solt, On the efficiency of the markets for gold and silver, J. Bus. 54 (3) (1981) 453–478. URL: http://www.jstor.org/stable/2352348.

L. Xian et al. / Physica A 454 (2016) 11–23

23

[3] Y.-W. Cheung, K.S. Lai, Do gold market returns have long memory? Financ. Rev. 28 (2) (1993) 181–202. URL: http://dx.doi.org/10.1111/j.15406288.1993.tb01344.x. [4] T.C. Mills, Statistical analysis of daily gold price data, Physica A 338 (3–4) (2004) 559–566. URL: http://www.sciencedirect.com/science/article/pii/ S0378437104002651. [5] E. Tully, B.M. Lucey, A power GARCH examination of the gold market, Res. Int. Bus. Finance 21 (2) (2007) 316–325. URL: http://www.sciencedirect. com/science/article/pii/S0275531906000353. [6] A. Parisi, F. Parisi, D. Díaz, Forecasting gold price changes: Rolling and recursive neural network models, J. Multinat. Financ. Manag. 18 (5) (2008) 477–487. URL: http://www.sciencedirect.com/science/article/pii/S1042444X08000030. [7] L.A. Sjaastad, The price of gold and the exchange rates: Once again, Resour. Policy 33 (2) (2008) 118–124. economic Aspects of Gold, Iron Ore and Currencies. URL: http://www.sciencedirect.com/science/article/pii/S0301420708000202. [8] J.A. Batten, B.M. Lucey, Volatility in the gold futures market, Appl. Econ. Lett. 17 (2) (2010) 187–190. URL: http://dx.doi.org/10.1080/ 13504850701719991. [9] S. Shafiee, E. Topal, An overview of global gold market and gold price forecasting, Resour. Policy 35 (3) (2010) 178–189. URL: http://www.sciencedirect. com/science/article/pii/S0301420710000243. [10] A. Hyviirinen, J. Karhunen, E. Oja, Independent Component Analysis, Wileyand Sons, 2001. [11] T.-W. Lee, Independent Component Analysis, Springer, 1998. [12] C. Beckmann, S. Smith, Probabilistic independent component analysis for functional magnetic resonance imaging, IEEE Trans. Med. Imaging 23 (2) (2004) 137–152. URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=1263605. [13] O. Déniz, M. Castrillón, M. Hernández, Face recognition using independent component analysis and support vector machines, Pattern Recognit. Lett. 24 (13) (2003) 2153–2157. audio- and Video-based Biometric Person Authentication (AVBPA 2001). URL: http://www.sciencedirect.com/science/article/ pii/S0167865503000813. [14] C. James, O. Gibson, Temporally constrained ICA: an application to artifact rejection in electromagnetic brain signal analysis, IEEE Trans. Biomed. Eng. 50 (9) (2003) 1108–1116. URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=1220217. [15] D.-M. Tsai, P.-C. Lin, C.-J. Lu, An independent component analysis-based filter design for defect detection in low-contrast surface images, Pattern Recognit. 39 (9) (2006) 1679–1694. URL: http://www.sciencedirect.com/science/article/pii/S0031320306001178. [16] A.D. Back, A.S. Weigend, A first application of independent component analysis to extracting structure from stock returns, Int. J. Neural Syst. 8 (04) (1997) 473–484. [17] K. Kiviluoto, E. Oja, Independent component analysis for parallel financial time series, in: The Fifth International Conference on Neural Information Processing, ICONIP’R98, Kitakyushu, Japan, October 21–23, 1998, Proceedings, 1998, pp. 895–898. [18] E. Oja, K. Kiviluoto, S. Malaroiu, Independent component analysis for financial time series, in: Adaptive Systems for Signal Processing, Communications, and Control Symposium 2000. AS-SPCC. The IEEE 2000, 2000, pp. 111–116. URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=882456. [19] C.-J. Lu, T.-S. Lee, C.-C. Chiu, Financial time series forecasting using independent component analysis and support vector regression, Decis. Support Syst. 47 (2) (2009) 115–125. URL: http://www.sciencedirect.com/science/article/pii/S0167923609000323. [20] J. Lin, A. Zhang, Fault feature separation using wavelet-ica filter, {NDT}& E Int. 38 (6) (2005) 421–427. URL: http://www.sciencedirect.com/science/ article/pii/S0963869504001525. [21] Z. Peng, F. Chu, Y. He, vibration signal analysis and feature extraction based on reassigned wavelet scalogram, J. Sound Vib. 253 (5) (2002) 1087–1100. URL: http://www.sciencedirect.com/science/article/pii/S0022460X01940854. [22] A. Tewfik, D. Sinha, P. Jorgensen, On the optimal choice of a wavelet for signal representation, IEEE Trans. Inform. Theory 38 (2) (1992) 747–765. URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=119734. [23] P. Flandrin, P. Gonçalvés, Empirical mode decompositions as data-driven wavelet-like expansions, Int. J. Wavelets Multiresolut. Inf. Process. 02 (04) (2004) 477–496. URL: http://www.worldscientific.com/doi/abs/10.1142/S0219691304000561. [24] N.E. Huang, M.-L. Wu, W. Qu, S.R. Long, S.S.P. Shen, Applications of Hilbert–Huang transform to non-stationary financial time series analysis, Appl. Stoch. Models Bus. Ind. 19 (3) (2003) 245–268. URL: http://dx.doi.org/10.1002/asmb.501. [25] B. Mijović, M. De Vos, I. Gligorijević, J. Taelman, S. Van Huffel, Source separation from single-channel recordings by combining empirical-mode decomposition and independent component analysis, IEEE Trans. Biomed. Eng. 57 (9) (2010) 2188–2196. URL: http://ieeexplore.ieee.org/stamp/ stamp.jsp?arnumber=5483220. [26] A. Hyvärinen, E. Oja, Independent component analysis: algorithms and applications, Neural Netw. 13 (4–5) (2000) 411–430. URL: http://www.sciencedirect.com/science/article/pii/S0893608000000265. [27] A. Bell, T. Sejnowski, An information-maximization approach to blind separation and blind deconvolution, Neural Comput. 7 (6) (1995) 1129–1159. URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6796129. [28] N.E. Huang, Z. Shen, S.R. Long, M.C. Wu, H.H. Shih, Q. Zheng, N.-C. Yen, C.C. Tung, H.H. Liu, The empirical mode decomposition and the hilbert spectrum for nonlinear and non-stationary time series analysis, Proc.: Math. Phys. Eng. Sci. 454 (1971) (1998) 903–995. URL: http://www.jstor.org/stable/53161. [29] X. Zhang, K. Lai, S.-Y. Wang, A new approach for crude oil price analysis based on empirical mode decomposition, Energy Econ. 30 (3) (2008) 905–918. URL: http://www.sciencedirect.com/science/article/pii/S0140988307000436. [30] Z. Wu, N.E. Huang, Ensemble empirical mode decomposition: a noise-assisted data analysis method, Adv. Adapt. Data Anal. 1 (01) (2009) 1–41. [31] F. Capie, T.C. Mills, G. Wood, Gold as a hedge against the dollar, J. Int. Financ. Mark. Inst. Money 15 (4) (2005) 343–352. URL: http://www.sciencedirect.com/science/article/pii/S1042443104000794. [32] P. Cagan, Changes in the cyclical behavior of interest rates, Rev. Econ. Stat. 52 (4) (1971) 1519–1542. [33] K. Sill, The cyclical volatility of interest rates, Bus. Rev.w (1996) 15–29. [34] P.A. Neumeyer, F. Perri, Business cycles in emerging economies: the role of interest rates, J. Monetary Econ. 52 (2) (2005) 345–380. URL: http://www.sciencedirect.com/science/article/pii/S0304393205000036. [35] M. Uribe, V.Z. Yue, Country spreads and emerging countries: Who drives whom? Res. Int. Bus. Finance 69 (1) (2006) 6–36. emerging Markets Emerging Markets and macroeconomic volatility:Lessons from a decade of financial debacles a symposium for the Journal of International Economics. URL: http://www.sciencedirect.com/science/article/pii/S0022199605000620. [36] J.L. Cendejas, J.E. Castañeda, F.-F. Muñoz, Business cycle, interest rate and money in the euro area: A common factor model, Ecol. Modell. 43 (2014) 136–141. URL: http://www.sciencedirect.com/science/article/pii/S0264999314003046. [37] F. Benhmad, Dynamic cyclical comovements between oil prices and {US} gdp: A wavelet perspective, Energy Policy 57 (2013) 141–151. URL: http://www.sciencedirect.com/science/article/pii/S0301421513000244. [38] J. Melvin, P.P.T. Hinich, A simple method for robust regression, J. Amer. Statist. Assoc. 70 (349) (1975) 113–119. URL: http://www.jstor.org/stable/ 2285386. [39] A. Preminger, R. Franck, Forecasting exchange rates: A robust regression approach, Int. J. Forecast. 23 (1) (2007) 71–84. URL: http://www.sciencedirect. com/science/article/pii/S0169207006000549.