Deflation-based separation of uncorrelated stationary time series

Deflation-based separation of uncorrelated stationary time series

Journal of Multivariate Analysis 123 (2014) 214–227 Contents lists available at ScienceDirect Journal of Multivariate Analysis journal homepage: www...

496KB Sizes 1 Downloads 31 Views

Journal of Multivariate Analysis 123 (2014) 214–227

Contents lists available at ScienceDirect

Journal of Multivariate Analysis journal homepage: www.elsevier.com/locate/jmva

Deflation-based separation of uncorrelated stationary time series Jari Miettinen a,∗ , Klaus Nordhausen b , Hannu Oja c , Sara Taskinen a a

Department of Mathematics and Statistics, 40014 University of Jyväskylä, Finland

b

School of Information Sciences, 33014 University of Tampere, Finland

c

Department of Mathematics and Statistics, 20014 University of Turku, Finland

article

info

Article history: Received 17 October 2012 Available online 27 September 2013 AMS subject classifications: 62H10 62H12 62M10 60G10 Keywords: Asymptotic normality Autocovariance Blind source separation MA(∞) processes Minimum distance index SOBI

abstract In this paper we assume that the observed p time series are linear combinations of p latent uncorrelated weakly stationary time series. The problem is then to find an estimate for an unmixing matrix that transforms the observed time series back to uncorrelated time series. The so called SOBI (Second Order Blind Identification) estimate aims at a joint diagonalization of the covariance matrix and several autocovariance matrices with varying lags. In this paper, we propose a novel procedure that extracts the latent time series one by one. The limiting distribution of this deflation-based SOBI is found under general conditions, and we show how the results can be used for the comparison of estimates. The exact formula for the limiting covariance matrix of the deflation-based SOBI estimate is given for general multivariate MA(∞) processes. Finally, a whole family of estimates is proposed with the deflation-based SOBI as a special case, and the limiting properties of these estimates are found as well. The theory is widely illustrated by simulation studies. © 2013 Elsevier Inc. All rights reserved.

1. Introduction The blind source separation (BSS) model is a semiparametric model, where the components of the observed p-variate vector x are assumed to be linear combinations of the components of an unobserved p-variate source vector z. The BSS model can then simply be written as x = Ω z , where Ω is an unknown full rank p × p mixing matrix, and the aim is, based on the observations x1 , . . . , xT , to find an estimate of the mixing matrix Ω (or its inverse). Notice that, in the independent component analysis (ICA), see for example [9], which is perhaps the most popular BSS approach, it is further assumed that the components of z are mutually independent and at most one of them is Gaussian. It is often assumed in BSS applications that the observation vectors x1 , . . . , xT are independent and identically distributed (iid) random vectors. However, in this paper x1 , . . . , xT are observations of a time series. We assume that the p-variate observations x1 , . . . , xT obey a BSS model such that xt = Ω zt ,

t = 0, ±1, ±2, . . .

where Ω is a full-rank p × p mixing matrix, and z = (zt )t =0,±1,±2,... is a p-variate time series that satisfies (A1) E (zt ) = 0, (A2) E (zt zt′ ) = Ip , and



Corresponding author. E-mail address: [email protected] (J. Miettinen).

0047-259X/$ – see front matter © 2013 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.jmva.2013.09.009

(1)

J. Miettinen et al. / Journal of Multivariate Analysis 123 (2014) 214–227

215

(A3) E (zt zt′+k ) = E (zt +k zt′ ) = Dk is diagonal for all k = 1, 2, . . .. The p time series in z are thus weakly stationary and uncorrelated. Note that an ICA time series model is obtained if (A3) is replaced by a stronger assumption that the p component series of zt are mutually independent. No parametric assumptions on the distribution of zt or on the autocovariance structures are made. For parametric maximum likelihood estimates of the unmixing matrix for Gaussian autoregressive sources with known and unknown covariance structures, see e.g. [7,21]. The aim in model (1) is to find, given (x1 , . . . , xT ), an estimate Γˆ of an unmixing matrix Γ such that Γ x has uncorrelated components. Γ = Ω −1 is naturally one possible unmixing matrix. Notice that Ω and z in the definition are confounded in the sense that the signs and order of the components of z (and the signs and order of the columns of Ω , respectively) are not uniquely defined. In the signal processing community, model (1) is often described as a model with components which are temporally correlated but spatially uncorrelated or as a ‘‘colored’’ data model as opposite to the ‘‘white’’ iid model [5,6]. Contrary to the iid ICA model, multiple Gaussian sources are not excluded in the BSS model (1) assumption. In ICA higher order moments are often used to recover the underlying sources, whereas in model (1) the use of second order statistics is adequate, and the separation is based on the information coming from the serial dependence. Due to this property, this blind source separation approach is also called the second order source separation (SOS) approach. BSS time series models have been widely used in engineering, financial, risk management and brain imaging applications, for example. For some recent work with different applications, see e.g. [4,8,11]. Many ICA or SOS methods first whiten the data and then search for an orthogonal matrix to obtain the final solution. This is done by optimizing some objective function, which in some cases is equivalent to jointly diagonalizing certain matrices (see for example [18], for an overview of BSS methods based on joint diagonalization). There are then basically two general ways in which the orthogonal matrix is obtained, either by finding its rows one after another (deflation-based method) or simultaneously (symmetric method). Perhaps the most popular ICA procedure, the so called fastICA, for example has both a deflation-based version and a symmetric version [9]. The outline of this paper is as follows. In Section 2.1 we first discuss the so called AMUSE estimator [19], which was the first estimator of the unmixing matrix developed for this problem, and is based on the simultaneous diagonalization of two autocovariance matrices. The behavior of this estimate depends heavily on the chosen lags and as a solution to this dilemma the so called SOBI estimator [1], which jointly diagonalizes K > 2 autocovariance matrices, is considered in Section 2.2. A new deflation-based estimation procedure for a joint diagonalization of autocovariance matrices is proposed. In Section 3 the asymptotical properties of the new SOBI estimate are derived. In Section 4, a large family of unmixing matrix estimates is proposed with the deflation-based SOBI as a special case, and the limiting properties of these estimates are found as well. The limiting behavior and finite-sample behavior of the estimates are illustrated in simulation studies in Section 5. The paper is concluded with some final remarks in Section 6. 2. BSS functionals based on autocovariance matrices 2.1. Functionals based on two autocovariance matrices Let us first recall the statistical functional corresponding to the AMUSE (Algorithm for Multiple Unknown Signals Extraction) estimator introduced by Tong et al. [19]. Assume that x follows a blind source separation (BSS) model such that, for some lag k > 0, the diagonal elements of the autocovariance matrix E (zt zt′+k ) = Dk are distinct, and write Sk = E (xt x′t +k ) = Ω Dk Ω ′ ,

k = 0, 1, 2, . . .

for the autocovariance matrices. The unmixing matrix functional Γk is then defined as a p × p matrix that satisfies

Γk S0 Γk′ = Ip and Γk Sk Γk′ = Λk , where Λk is a diagonal matrix with the diagonal elements in a decreasing order. Notice that Γk is affine equivariant, that is, if Γk and Γk∗ are the values of the functional in the BSS model (1) at x and x∗ = Ax for some non-singular p × p matrix A, then Γk∗ = Γk A−1 and further Γk x = Γk∗ x∗ (up to sign changes of the components). The statistical properties of the AMUSE estimator were studied recently in [12]. The exact formula for the limiting covariance matrix was derived for MA(∞) processes, and the asymptotic as well as finite sample behavior was investigated. It was shown that the behavior of the AMUSE estimate depends crucially on the choice of the lag k; the p time series can for example be separated consistently only if the eigenvalues in Λk are distinct. Without additional information on the time series, it is not possible to decide which lag k should be used in the estimation. To circumvent this problem [1] proposed the SOBI (Second Order Blind Identification) algorithm seeking an unmixing matrix that makes several autocovariance matrices S0 , S1 , . . . , SK as diagonal as possible. In [1], an algorithm based on iterative Jacobi rotations is used to (approximately) jointly diagonalize the autocovariance matrices. In this paper, we propose a new approach in which the latent uncorrelated time series are found in a deflation-based manner (one by one).

216

J. Miettinen et al. / Journal of Multivariate Analysis 123 (2014) 214–227

2.2. Functionals based on several autocovariance matrices Let S1 , . . . , SK be autocovariance matrices with lags 1, . . . , K . The p × p unmixing matrix functional Γ = (γ 1 , . . . , γ p )′ is the matrix that minimizes K 

∥off(Γ Sk Γ ′ )∥2

k=1

under the constraint Γ S0 Γ ′ = Ip , or, equivalently, maximizes K 

∥diag(Γ Sk Γ ′ )∥2 =

p  K  (γ ′j Sk γ j )2

k=1

(2)

j=1 k=1

under the same constraint. Here diag(S ) is a p × p diagonal matrix with the diagonal elements as in S and off(S ) = S −diag(S ). In this paper, the rows of an unmixing matrix functional Γ are found one by one so that γ j , j = 1, . . . , p − 1 maximizes K  (γ ′j Sk γ j )2 ,

(3)

k=1

under the constraints γ ′r S0 γ j = δrj , r = 1, . . . , j, where δrj is the Kronecker delta. It is not generally true that the solution found using (3) maximizes also (2). Remark 1. In the literature, there are several algorithms available for a simultaneous diagonalization of K autocovariance matrices, that is, for the maximization problem (2). Perhaps the most widely used algorithm based on the Jacobi rotation technique was introduced in [3]. As far as the authors know, this paper provides the first solution to the maximization problem (3). We first consider the existence and uniqueness of Γ = (γ 1 , . . . , γ p )′ , the solution of the maximization problem (3). −1/2

Clearly, if the solution exists, Γ = US0 with some orthogonal U, and we thus have to consider the existence and uniqueness of U = (u1 , . . . , up )′ . The uniqueness of U can naturally be only up to signs of the rows. We fix the signs by requiring, for example, that u′i 1p ≥ 0, i = 1, . . . , p. Here 1p is a p-vector full of ones. We also write −1/2

Rk = S0 −1/2

where S0 follows.

−1/2

Sk S0

,

is chosen to be symmetric. For considering the existence and uniqueness of the solution, we then proceed as

1. For u1 , write first

∆1 ( u) =

K 

(u′ Rk u)2 ,

u ∈ Rp .

k=1

As ∆1 (u) is a continuous function, there exists u1 such that ∥u1 ∥ = 1, u′1 1p ≥ 0 and

∆1 (u1 ) = sup{∆1 (u) : ∥u∥ ≤ 1}. 2. For finding u2 , then first choose any p × (p − 1) matrix U1 such that (u1 , U1 ) is orthogonal. Next write

∆2 (u) = ∆1 (U1 u) =

K  

u′ (U1′ Rk U1 )u

2

,

u ∈ Rp−1 .

k=1

As ∆2 (u) is a continuous function, there exists u∗2 ∈ Rp−1 such that ∥u∗2 ∥ = 1, (u∗2 )′ U1′ 1p ≥ 0 and

∆2 (u∗2 ) = sup{∆2 (u) : ∥u∥ ≤ 1}. Then u2 = U1 u∗2 . 3. For u3 , choose any p × (p − 2) matrix U2 such that (u1 , u2 , U2 ) is orthogonal and continue as in stage 2. Continue in a similar way to find u3 , . . . , up−1 which also determine up . This proves the existence of a solution. For the uniqueness, further assumptions are needed. See Assumption 1 in Section 3 for the uniqueness condition of the population unmixing matrix. The sample matrix is unique with a probability one. Now using (3) one can see that the vector γ j optimizes the Lagrangian function L(γ j , λj ) =

K  k=1

(γ ′j Sk γ j )2 − λjj (γ ′j S0 γ j − 1) −

j −1  r =1

λjr γ ′r S0 γ j ,

J. Miettinen et al. / Journal of Multivariate Analysis 123 (2014) 214–227

217

where λj = (λj1 , . . . , λjj )′ is the vector of the Lagrangian multipliers for γ j . Write T (γ) =

K  (γ ′ Sk γ)Sk γ. k=1

By solving the above optimization problem one can easily show that the unmixing matrix functional Γ satisfies the following p − 1 estimating equations. Definition 1. The unmixing matrix functional Γ = (γ 1 , . . . , γ p )′ based on S0 and S1 , . . . , SK solves the estimating equations

 T (γ j ) = S0

j 

 γ r γ r T (γ j ), j = 1, . . . , p − 1. ′

r =1

For a unique solution of the original problem (3), Γ is affine equivariant, that is, if Γ and Γ ∗ are the values of the functional computed at x and x∗ = Ax (for some non-singular p × p matrix A), then Γ x = Γ ∗ x∗ (up to sign changes of the components). Note however that, for the population autocovariance matrices S0 , S1 , . . . , Sk with the same eigenvectors, the estimating equations alone do not fix the order in which the rows of Γ are obtained. For example, the estimating equation for the first row of Γ , T (γ) = S0 (γγ ′ )T (γ), holds true for the value γ 1 giving the global maximum, but also for the p − 1 other critical points γ 2 , . . . , γ p where γ p gives a minimum. Similarly, T (γ) = S0 (γ 1 γ ′1 + γγ ′ )T (γ) holds true for the value γ 2 giving the global maximum, as well as for the p − 2 critical points γ 3 , . . . , γ p , and so on. Therefore, for each γ j , j = 1, . . . , p − 1, it is important to check that the true (restricted) maximum is obtained. 3. Statistical properties of the SOBI estimator 3.1. Deflation-based SOBI estimator Assume now that (x1 , . . . , xT ) follow the BSS model (1). We also assume (wlog) that the source vectors are centered at zero. The (population) autocovariance matrices Sk can naturally be estimated by the sample autocovariance matrices Sˆk =

T −k 

1

T − k t =1

xt x′t +k ,

k = 0, 1, 2, . . . .

As the population autocovariance matrices are symmetric in our model, we estimate Sk with a symmetrized version SˆkS =

1 2

(Sˆk + Sˆk′ ).

The function T (γ) is naturally estimated by the sample function Tˆ (γ) =

K  (γ ′ SˆkS γ)SˆkS γ. k=1

Applying Definition 1 to observed data, a natural unmixing matrix estimate is obtained as follows. Definition 2. The unmixing matrix estimate Γˆ = (γˆ 1 , . . . , γˆ p )′ based on Sˆ0 and Sˆ1S , . . . , SˆKS solves the estimating equations

 Tˆ (γˆ j ) = Sˆ0

j 

 γˆ r γˆ r Tˆ (γˆ j ), ′

j = 1, . . . , p − 1.

(4)

r =1

The estimating equations suggest simple (fixed point and gradient) algorithms for computing the rows of the unmixing matrix estimate Γˆ one by one. However, the estimating equations alone do not guarantee that the rows are found in a correct order. See Appendix A. For the asymptotic results of our estimate, we of course need consistency and joint asymptotic normality of the autocovariance matrices (autocovariances and cross-autocovariances of the p observed series). Results on consistency and joint limiting normality of the autocovariances for univariate stationary time series are well known, see e.g. Section 7 in [2]. The limiting variances and covariances of the autocovariances are then given by the celebrated Bartlett’s formula. Recent extensions to general multivariate MA(∞) processes, for example, can be found in [16,12]. See also Section 3.4.

218

J. Miettinen et al. / Journal of Multivariate Analysis 123 (2014) 214–227

3.2. Consistency of the SOBI estimator As the estimator is affine equivariant, for considering the statistical behavior of the estimate we have (wlog) the following assumption. Assumption 1. We assume that the observed time series (x1 , . . . , xT ) follows the BSS model (1) with Ω = Ip , S0 = Ip , and the population autocovariance matrices Sk = Λk = diag(λk1 , . . . , λkp ), where

K

k=1

λ2k1 >

k = 1, 2, . . .  K 2 2 k=1 λk2 > · · · > k=1 λkp .

K

Note that the assumption λ2k1 > · · · > λ2kp guarantees the uniqueness of the population value Γ = Ip . For stationary time series, the sample autocovariance matrices converge to population covariance under mild conditions. Under Assumption 1 and under the consistency of the sample autocovariance matrices, we can easily prove the consistency of Γˆ .





Theorem 1. Under Assumption 1, if Sˆ0 →P Ip and SˆkS →P Λk , k = 1, . . . , K , then Γˆ →P Ip . For the proof, see Appendix B.

3.3. Limiting multivariate normality of the SOBI estimator The limiting distribution of Γˆ under Assumption 1 is also easily obtained using the following result. Theorem 2. Assume that



T (Sˆ0 − Ip ) = Op (1) and



T (Sˆk − Λk ) = Op (1), k = 1, . . . , K .

Then, under Assumption 1, for j = 1, . . . , p,







T γˆji = − T γˆij − ( T Sˆ0 )ij + op (1),



T (γˆjj − 1) = −

 √

T γˆji =

k

1√ 2



i < j,

T ((Sˆ0 )jj − 1) + op (1),

i = j,



λkj ( T SˆkS )ji − λ2kj ( T Sˆ0 )ji k  2  + op (1), λkj − λkj λki k



i > j.

k

see the proof of the bit more general Theorem 4 in Appendix B. Under this limiting joint multinormality √ For the proof, √ T (Sˆ0 − Ip ) and T (Sˆk − Λk ), k = 1, . . . , K , Theorem 2 implies the following result. Corollary 1. Assume also that the joint limiting distribution of

√ 



T vec(Sˆ0 , Sˆ1S , . . . , SˆKS ) − vec(Ip , Λ1 , . . . , ΛK )

is a√ (singular) (K + 1)p2 -variate normal distribution with mean value zero. Then, under Assumption 1, the limiting distribution of T vec(Γˆ − Ip ) is a singular p2 -variate normal distribution. It is interesting that the theorem and the corollary seem to hold true also if the rows of Γˆ are forced to be found in a wrong order and Γˆ → P for some permutation matrix P. This was seen in our simulations where the order was manipulated by suitable choices of initial values in the algorithm. The joint limiting distribution of γˆ 1 , . . . , γˆ p then depends on the order in which they are found. Remark 2. Throughout this section, we have assumed that the true value of Ω is Ip . Due to the affine equivariance of Γˆ , for any full-rank Ω , the limiting distribution of



T (Γˆ Ω − Ip ) is the same as that of



T (Γˆ − Ip ) for Ω = Ip . For exact calculations

of the limiting covariance matrices, for example, it is then helpful to note that vec(Γˆ Ω ) = (Ω ′ ⊗ Ip )vec(Γˆ ).

J. Miettinen et al. / Journal of Multivariate Analysis 123 (2014) 214–227

219

3.4. MA(∞) processes To illustrate the theory with some exact calculations of the limiting variances and covariances of the components of Γˆ , we now assume that the components of zt in the BSS model (1) are uncorrelated multivariate MA(∞) processes, that is, zt =

∞ 

Ψj ϵt −j ,

(5)

j=−∞ 2 where Ψj , j = 0, ±1, ±2, . . . , are diagonal matrices satisfying j=−∞ Ψj = Ip , and ϵt are iid p-variate random vectors with E (ϵt ) = 0 and Cov(ϵt ) = Ip . Hence xt = Ω zt is also a multivariate MA(∞) process. Notice that every second-order stationary process is either a linear process (MA(∞)) or can be transformed to one using the Wold’s decomposition. Notice also that causal ARMA(p, q) processes are MA(∞) processes. See Chapter 3 in [2]. We further assume that

∞

(A4) the components of ϵt have finite fourth order moments, and (A5) the components of ϵt are exchangeable and marginally symmetric. Assumption (A5) implies that all third moments of ϵt are zero. In the following we write βij = E (ϵti2 ϵtj2 ), i, j = 1, . . . , p. Notice that if (A5) is replaced by the assumption that the components of ϵt are mutually independent, then, in this independent component model case, βij = 1 for i ̸= j. If one further assumes that innovations ϵt are iid from Np (0, Ip ), then βii = 3 and βij = 1 for all i ̸= j. Miettinen et al. [12] found the joint limiting multivariate normal distribution of the sample autocovariance matrices SˆkS , k = 1, . . . , K , in the above MA(∞) model case and showed that the limiting distribution depends on the Ψt through autocovariances of the parameter vector series, Fk =

∞ 

ψt ψ′t +k ,

k = 0, 1, 2, . . .

t =−∞

where ψ t = (ψt1 , . . . , ψtp )′ is the vector of the diagonal elements of Ψt . For the limiting covariance matrix of Γˆ we then need the p × p matrices Dlm , l, m = 0, . . . , K with elements given by

(Dlm )ii = (βii − 3)(Fl )ii (Fm )ii +

∞ 

((Fk+l )ii (Fk+m )ii + (Fk+l )ii (Fk−m )ii ) ,

k=−∞

(Dlm )ij =

∞ 1 

2 k=−∞

1

((Fk+l−m )ii (Fk )jj + (Fk )ii (Fk+l−m )jj ) + (βij − 1)(Fl + Fl′ )ij (Fm + Fm′ )ij ,

i ̸= j.

4

The limiting distributions of the rows of Γˆ under Assumption 1 in the MA(∞) model are given in the following theorem. As will be seen later, only the limiting variances of the off-diagonal elements of Γˆ Ω are used when comparing different estimates. Theorem 3. Under Assumption 1 in the MA(∞) model, the limiting distribution of normal distribution with mean zero and covariance matrix ASV (γˆ j ) =

j−1 

ASV (γˆji )ei e′i + ASV (γˆjj )ej e′j +

i =1

p 



T (γˆ j − ej ), j = 1, . . . , p, is a p-variate

ASV (γˆji )ei e′i ,

i =j +1

where

 ASV (γˆji ) =

l ,m

λli λmi (Dlm )ji − 2µij



λki (Dk0 )ji + µ2ij (D00 )ji

k

(µij − µii )2

,

for i = 1, . . . , j − 1, ASV (γˆjj ) = 4−1 (D00 )jj , and

 ASV (γˆji ) =

l ,m

λlj λmj (Dlm )ji − 2µjj



λkj (Dk0 )ji + µ2jj (D00 )ji

k

(µjj − µji )2

,

for i = j + 1, . . . , p. Above ej is a p-vector with the jth element one and others zero and µij =



k

λki λkj , i, j = 1, . . . , p.

220

J. Miettinen et al. / Journal of Multivariate Analysis 123 (2014) 214–227

4. A new family of deflation-based estimates The criterion function for the deflation-based SOBI estimator considered so far is based on the fourth moments. We K ′ 2 next consider a modification of the criterion in (3) that is obtained if the second power in k=1 (γ j Sk γ j ) is replaced by G(γ ′j Sk γ j ) with other choices of G. Different choices of G then yield estimates with different robustness and efficiency properties. Notice however that the estimate still depends on the observations through the second moments and to find an estimate with a bounded influence function, for example, does not seem to be possible for any choice of G. To obtain an estimate with a high breakdown point and a bounded influence function, the covariance and the autocovariance matrices should be replaced by a robust scatter matrix and by robust autocovariance matrices, respectively. The results here are only preliminary, and further research is needed. The columns of Γ are again found one by one so that γ j , j = 1, . . . , p − 1 maximizes

K

k=1

K 

G(γ ′j Sk γ j ),

(6)

k=1

under γ ′i S0 γ j = δij , i = 1, . . . , j − 1. In Section 2.2 we used G(x) = x2 , but now we let G be any increasing twice continuously differentiable function and write g = G′ . Similar arguments as in Section 2.2 may be used here to show the existence and uniqueness of the solution of the maximization problem. We modify our general assumptions in the following way. Assumption 2. The observed time series (x1 , . . . , xT ) follows the BSS model (1) with Ω = Ip , S0 = Ip , and the population autocovariance matrices Sk = Λk = diag(λk1 , . . . , λkp ),

k = 1, 2, . . .

where K 

K 

G(λki ) >

k=1

K 

G(λkj ) and

k=1

g (λki )λkj ̸=

k=1

K 

g (λki )λki ,

1 ≤ i < j ≤ p.

k=1

As before, the functional Γ = (γ 1 , . . . , γ p )′ based on functionals S0 and S1 , . . . , SK can be shown to solve the estimating equations

 T (γ j ) = S0

j 

 γ r γ r T (γ j ),

j = 1, . . . , p − 1,



(7)

r =1

where now T (γ) =

K 

g (γ ′ Sk γ)Sk γ.

k=1

The corresponding unmixing matrix estimate Γˆ = (γˆ 1 , . . . , γˆ p )′ is obtained by replacing autocovariance matrices with symmetrized sample autocovariance matrices in the estimating equations. Further, the limiting distribution of Γˆ can be derived using the following result. Theorem 4. Assume that



T (Sˆ0 − Ip ) = Op (1) and



T (Sˆk − Λk ) = Op (1),

k = 1, . . . , K .

Then, under Assumption 2, Γˆ →p Ip , and, for j = 1, . . . , p,







T γˆji = − T γˆij − ( T Sˆ0 )ij + op (1),

i < j,



1√ T (γˆjj − 1) = − T ((Sˆ0 )jj − 1) + op (1), 2

 √

T γˆji =



g (λkj ) T (SˆkS )ji −



k

g (λkj )λkj

k

 k

g (λkj )λkj −

 k

i = j,



g (λkj )λki

T (Sˆ0 )ji

+ op (1),

i > j.

J. Miettinen et al. / Journal of Multivariate Analysis 123 (2014) 214–227

221

The results in Corollary 1 and Theorem 3 hold also for any twice continuously differentiable function G with slight modifications. For MA(∞) processes, the asymptotic variances of the components of γˆj , j = 1, . . . , p, are then given by

 l ,m

ASV (γˆji ) =

g (λli )g (λmi )(Dlm )ji − 2µij



g (λki )(Dk0 )ji + µ2ij (D00 )ji

k

(µij − µii )2

,

for i = 1, . . . , j, ASV (γˆjj ) = 4−1 (D00 )jj , and

 l ,m

ASV (γˆji ) =

g (λlj )g (λmj )(Dlm )ji − 2µjj



g (λkj )(Dk0 )ji + µ2jj (D00 )ji

k

(µjj − µji )2

for i = j + 1, . . . , p. Here µij =



k

,

g (λki )λkj , i, j = 1, . . . , p, and the matrices Dlm , l, m = 0, . . . , K are as given in Section 3.4.

5. Simulation studies 5.1. Minimum distance index There are several possible performance indices to measure the accuracy of an unmixing matrix in the BSS problems (for ˆ = Γˆ Ω . For a good estimate an overview see for example [15]). Most of the indices are functions of the so called gain matrix G Γˆ , Gˆ is then close to some p × p matrix C in

C = {C : each row and column of C has exactly one non-zero element}. In this paper, the estimates are compared using the minimum distance index [10]

ˆ = D(Γˆ Ω ) = √ D

1

inf ∥C Γˆ Ω − Ip ∥ p − 1 C ∈C

(8)

ˆ with the matrix (Frobenius) norm √ ∥ · ∥. It is shown [10] that the minimum distance index is affine invariant and 0 ≤ D ≤ 1. Moreover, if Γˆ → Ω −1 and T vec(Γˆ Ω − Ip ) → Np2 (0, Σ ), which is true for our estimate, then the limiting distribution ˆ 2 is that of a weighted sum of independent chi squared variables with the expected value of T (p − 1)D tr (Ip2 − Dp,p )Σ (Ip2 − Dp,p )





(9)

p

(ei e′i ) ⊗ (ei e′i ). Notice that tr((Ip2 − Dp,p )Σ (Ip2 − Dp,p )) is the sum of the limiting variances of the off√ diagonal elements of T vec(Γˆ Ω − Ip ) and therefore provides a global measure of the variation of the unmixing matrix estimate Γˆ . where Dp,p =

i=1

5.2. Simulation setups In our simulations, trivariate time series z1 , . . . , zT were generated from the following five models (A)–(E) with different choices of T and with 10 000 repetitions. In all simulation settings, the mixing matrix was Ω = Ip . Notice that, due to affine equivariance of the estimates, the performance comparisons do not depend on this choice. (A) AR(3), AR(6) and AR(5) processes with coefficient vectors (0.4, −0.5, 0.2), (0, 0.2, 0, −0.4, 0, 0.3) and (−0.1, 0, 0, 0, −0.1), respectively, and spherical trivariate t5 -distributed innovations. (B) Three independent MA(50) processes with normal innovations and randomly chosen 50 coefficients. (C) Three independent AR(3) processes with coefficient vectors (0, 0.5, −0.2), (0, 0.3, 0.3) and (0.5, 0, −0.1), respectively, and normal innovations. (D) Independent AR(5), MA(6) and AR(2) processes with coefficient vectors (0.2, 0.5, 0.3, 0.1, −0.5), (0.4, 0.2, 0.1, −0.6, −0.2, −0.4) and (0.1, 0.5), respectively, and normal innovations. (E) Independent AR(1), AR(4) and MA(5) processes with coefficient vectors (0.5), (0.1, −0.3, −0.1, 0.1) and (0, −0.1, −0.1, 0.4, 0.1), respectively, and normal innovations. The averages of T (p − 1)Dˆ2 over 10 000 repetitions were used to compare the finite-sample efficiencies of different estimates. We also calculated the expected values of the limiting distributions of T (p − 1)E (Dˆ2 ) in each case to compare the asymptotic efficiencies.

222

J. Miettinen et al. / Journal of Multivariate Analysis 123 (2014) 214–227

Fig. 1. The averages of T (p − 1)Dˆ2 for the SOBI estimate (K = 10) and the AMUSE estimates with lags 1 and 4 over 10 000 repetitions of the observed time ˆ 2. series with length T from model (A). The horizontal lines give the expected values of the limiting distributions of T (p − 1)D Table 1 The expected values of the limiting distributions of T (p − ˆ 2 for SOBI estimates with K = 10. All six different 1 )D extraction orders were considered. Model

(A) (B) (C)

Order 123

132

213

231

312

321

12.6 9.8 35.6

47.7 11.4 22.9

14.2 10.2 44.6

15.7 12.5 46.9

49.2 13.7 25.3

50.8 14.1 34.3

5.3. Simulation results The main results from our simulations are as follows. (i) The SOBI estimate based on several autocovariance matrices seems to outperform the AMUSE estimates that are based on the covariance matrix and one autocovariance matrix with a chosen lag k: Fig. 1 illustrates the finite sample behavior of the AMUSE estimates with lags 1 or 4 as compared to the deflation-based SOBI estimate with ten autocovariance matrices in model (A). It is seen that the deflation-based SOBI estimate clearly outperforms both AMUSE estimates. Notice also that, in practice, the best lag for AMUSE is not known. (ii) The number of autocovariance matrices used in SOBI affects the performance: In Fig. 2, estimates using different numbers of autocovariance matrices are compared in model (B). Since the source components are MA(50) processes, the autocovariances are zero for lags larger than 50. Hence, by Theorem 3, the asymptotic variances of the SOBI estimates are equal for all K ≥ 50. For small sample sizes, the use of too many matrices seems to have a small deteriorating effect. (iii) The algorithm for computing the solution for (3) that is based on the estimating equations only may fail if the initial value is too far away from the true value. It is then interesting that one can, by manipulating the initial value, often force the algorithm to yield the rows of the unmixing matrix in a wrong order. As mentioned before, this also affects the limiting distribution of the estimate. This is illustrated in Fig. 3 where the results are as predicted by Theorem 3. In ˆ 2 for all six different extraction orders are listed Table 1, the expected values of the limiting distributions of T (p − 1)D for models (A)–(C). In all models, order 123 is the true order for our estimator. In models (A) and (B) the true order is the optimal one, but in model (C) it would be better to extract the third source component before the second one. Finding an easy data based rule for the choice of the initial value to obtain the optimal order (as for example in the case of deflation-based fastICA was done in [14]) seems quite useless here. It is difficult to find a model where the optimal order is substantially better than the true one. (iv) The choice of G affects the efficiency: In Fig. 4 and Table 2 we compare the estimates corresponding to different choices of G (and its derivative g). We use g-functions g1 (x) = x, g2 (x) = x3 and g3 (x) = arctan(50x), respectively. In models

J. Miettinen et al. / Journal of Multivariate Analysis 123 (2014) 214–227

223

Fig. 2. The averages of T (p − 1)Dˆ2 for the SOBI estimates using 5, 10, 20, 30, 40, 50, 60, 80 and 100 autocovariance matrices over 10 000 repetitions of observed time series with length T from model (B).

Fig. 3. The averages of T (p − 1)Dˆ2 for three SOBI estimates (K = 10) with different initial values over 10 000 repetitions of observed time series with ˆ 2. length T from model (C). The horizontal lines give the expected values of the limiting distributions of T (p − 1)D

(A)–(C) the estimate based on g1 yields the best results of the three alternatives, but in models (D) and (E) functions g3 and g2 , respectively, are better.

6. Concluding remarks In this paper we presented a thorough analysis of a novel deflation-based SOBI estimate by rigorously developing its asymptotic distribution. For large sample sizes, the asymptotic results then provide good approximations for the finite-

224

J. Miettinen et al. / Journal of Multivariate Analysis 123 (2014) 214–227

Fig. 4. The averages of T (p − 1)Dˆ2 for three SOBI estimates (K = 10) with different choices of the function G from 10 000 repetitions of observed time series with length T from models (D) (the right panel) and (E) (the left panel). The horizontal lines give the expected values of the limiting distributions of ˆ 2. T (p − 1 )D Table 2 The expected values of the limiting distributions of T (p − ˆ 2 for SOBI estimates with K = 10 and selected g1 )D functions. Model

g (x) = 2x

g (x) = 4x3

g (x) = arctan(50x)

(A) (B) (C) (D) (E)

12.6 9.8 7.9 23.7 95.4

14.3 14.1 8.5 42.5 49.9

14.8 11.3 12.1 19.1 173.8

sample average behavior of the estimates. It is not difficult to argue that SOBI is a safer choice in practical applications than AMUSE. Nevertheless the choice of the number of lags and which lags to choose is still an open problem for SOBI. The results in this paper show that for large sample sizes it may be wise to use as many autocovariance matrices as possible but that for small sample sizes one must be more careful. There are no clear guidelines in the literature so far. Belouchrani et al. [1] for example considered (symmetric) estimates based on up to 10 lags, whereas Tang et al. [17] investigated much larger lag sets in the analysis of EEG data. Further research is still needed here. In this paper the autocovariance matrices were jointly diagonalized using a deflation-based algorithm. Symmetric algorithms are available in the literature but the asymptotic properties of the resulting symmetric SOBI estimates have not yet been reported. Finally, it is worth of mentioning that, as a compromise between AMUSE and SOBI, Ziehe & Müller [22] suggested a BSS method which simultaneously diagonalizes the covariance matrix and a weighted sum of several autocovariance matrices. They also suggested a joint diagonalization of autocovariance matrices together with higher-order cross-moment matrices in the case of possibly identical marginal autocovariances. Some other procedures like WASOBI [20] try to optimize the performance of SOBI by weighting the autocovariance matrices and also by removing the special role of the covariance matrix. Acknowledgments We wish to thank the referees and the associate editor for their comments, which helped us to improve this paper considerably. This research was supported by the Academy of Finland. Appendix A. Algorithm for deflation-based SOBI The algorithm for computing unmixing matrix estimate Γˆ can be easily derived using the estimating Eq. (4). See also [13]. −1/2 ˆ 1 , . . . , uˆ p )′ one by As Γˆ = Uˆ Sˆ0 , we first use Sˆ0 to whiten the data and then find the rows of an orthogonal matrix Uˆ = (u

one so that K sample autocovariance matrices of the transformed data Uˆ Rˆ 1 Uˆ ′ , . . . , Uˆ Rˆ K Uˆ ′ become as diagonal as possible.

J. Miettinen et al. / Journal of Multivariate Analysis 123 (2014) 214–227

225

Write first Tˆ (u) =

K  (u′ Rˆ k u)Rˆ k u. k=1

The estimating equations for Uˆ are then

 ˆ j) = Tˆ (u



j 

ˆ j) Tˆ (u



ˆ i uˆ i u

i=1

which implies that

 ˆj ∝ u

Ip −

j −1 

 ˆ j ). Tˆ (u



ˆ i uˆ i u

i=1

ˆ 1 , . . . , uˆ j−1 , an algorithm for uˆ j with two steps This suggests, after finding u  ˆj ← step 1. u

Ip −

j −1 

 ′

ˆ i uˆ i u

ˆ j) Tˆ (u

i=1

ˆ j ← ∥uˆ j ∥−1 uˆ j . step 2. u ˆ and different initial values may give the rows of the final Notice that the algorithm naturally needs an initial value for U, ˆ estimate in a different order. To find the rows of U in the true order we use the following strategy for picking up the initial ˆ j. value for u ˆ 1 , . . . , uˆ j−1 , Uˆ j−1 ) is orthogonal. (1) Choose any p × (p − j + 1) matrix Uˆ j−1 such that (u (2) Choose a random set of unit vectors V = {v1 , . . . , vM } in Rp−j+1 . (3) Find vˆ ∈ V that maximizes ˆ j (v ) = ∆

K  



v Uˆ j′−1 Rˆ k Uˆ j−1 v

2

.

k=1

ˆ j is then Uˆ j−1 vˆ . (4) The initial value for u Since 2Tˆ (u) is the gradient of

 ˆj ← step 1∗ . u

Ip −

j −1 

K

k=1

(u′ Rˆ k u)2 , a gradient type algorithm replaces step 1 (in the fixed point algorithm) by

 ˆ i uˆ ′i u

(uˆ j + ρ Tˆ (uˆ j )).

i =1

Positive constant ρ controls the step size and may depend on the number of iterations so far. We recommend the use of the gradient method with varying step sizes instead of the fixed point algorithm. Appendix B. Proofs of the results Proof of Theorem 1. Write −1/2

Rk = S0

−1/2

Sk S0

and

−1/2 ˆ

Rˆ k = Sˆ0

−1/2

Sk Sˆ0

,

k = 1, . . . , K .

−1/2 Then clearly Rˆ k →P Λk , k = 1, . . . , K . Write Γˆ = Uˆ Sˆ0 for the estimate. As Sˆ0 →P Ip , we have to show that Uˆ = (uˆ 1 , . . . , uˆ p )′ →P Ip = (e1 , . . . , ep )′ .

ˆ 1 . For that, write 1. Consider first the convergence of u ∆1 (u) =

K  k=1

ˆ 1 ( u) = (u′ Rk u)2 and ∆

K 

(u′ Rˆ k u)2 .

k=1

First,

ˆ 1 (u)| : ∥u∥ ≤ 1} ≤ M1 = sup {|∆1 (u) − ∆

K 

∥Rˆ k − Rk ∥2 →P 0.

k=1

Second, ∆1 (u) attains its maximum at e1 and the solution is unique (up to its sign). Then, for all ϵ , define H (ϵ) = u : ∥u∥ ≤ 1, u′ 1p ≥ 1, ∥u − e1 ∥ ≥ ϵ





226

J. Miettinen et al. / Journal of Multivariate Analysis 123 (2014) 214–227

and

δ1 (ϵ) = ∆1 (e1 ) − sup {∆1 (u) : u ∈ H (ϵ)} . Clearly, δ1 (ϵ) > 0 and δ1 (ϵ) → 0 as ϵ → 0. But then



ˆ 1 (e1 ) ≥ sup{∆ ˆ 1 (u) : u ∈ H (ϵ)} ˆ 1 − e1 ∥ < ϵ) ≥ P ∆ P (∥u   δ1 (ϵ) → 1, ≥ P M1 ≤



∀ϵ,

3

and the convergence follows. ˆ 2 is seen as follows. First choose the p × (p − 1) matrix Uˆ 1 closest to (e2 , . . . , ep ) (measured by the 2. The convergence of u

ˆ 1 , Uˆ 1 ) is orthogonal. Then Uˆ 1 →P (e2 , . . . , ep ). For u ∈ Rp−1 , write matrix norm) such that (u ∆2 (u) =

K  (u′ (e2 , . . . , ep )′ Rk (e2 , . . . , ep )u)2 k =1

and

ˆ 2 (u) = ∆

K  (u′ Uˆ 1′ Rˆ k Uˆ 1 u)2 . k =1

Clearly Uˆ 1′ Rˆ k Uˆ 1 →P (e2 , . . . , ep )′ Rk (e2 , . . . , ep ),

ˆ 2 (u)| : ∥u∥ ≤ 1} →P 0, M2 = sup {|∆2 (u) − ∆ ˆ 2 (u) ˆ ∗2 of ∆ and ∆2 (u) attains its unique maximum at e1 ∈ Rp−1 (up to its sign). As in the previous case, the maximizer u

ˆ 2 = Uˆ 1 uˆ ∗2 → e2 in Rp . Finally, the stochastic convergence converges in probability to a (p − 1)-vector e1 , and therefore u ˆ 3 , . . . , uˆ p−1 to e3 , . . . , ep−1 can be proven in a similar way, and the convergence of uˆ p then also follows. of u

Proof of Theorems 2 and 4. Note that Theorem 4 implies Theorem 2. Therefore, we prove only Theorem 4. The limiting distributions of the symmetrized sample autocovariance matrices are given in [12]. Notice then that as γˆ j →p ej , then Tˆ (γˆ j ) →p µjj ej , where µjj =



T (Tˆ (γˆ j ) − µjj ej ) =



k

g (λkj )λkj . The estimating Eq. (4) then yields



T (Sˆ0 Uˆ j Tˆ (γˆ j ) − µjj ej ),

(10)

 γˆ r γˆ ′r . Now as Uˆ j →p Uj = jr =1 er e′r , the right-hand side of the equation can be written as √ √ √ √ T (Sˆ0 Uˆ j Tˆ (γˆ j ) − µjj ej ) = T (Sˆ0 − Ip )Uˆ j Tˆ (γˆ j ) + T (Uˆ j − Uj )Tˆ (γˆ j ) + T (Uj (Tˆ (γˆ j ) − µjj ej )).

where Uˆ j =

j

r =1

Thus (10) becomes

√ √ √ (Ip − Uj ) T (Tˆ (γˆ j ) − µjj ej ) = T (Sˆ0 − Ip )Uˆ j Tˆ (γˆ j ) + T (Uˆ j − Uj )Tˆ (γˆ j ) √ √ = T (Sˆ0 − Ip )µjj ej + T (Uˆ j − Uj )µjj ej + op (1).

(11)

Now



T (Uˆ j − Uj )µjj ej =

j  √

T (γˆ r γˆ r − er e′r )µjj ej ′

r =1

=

j  √

T ((γˆ r − er )γˆ r + er (γˆ r − er )′ )µjj ej ′

r =1

= µjj

 √

T (γˆ j − ej ) +



j  √

T (γˆ r − er ) ej er ′

+ op (1).

(12)

r =1

For the left hand side of Eq. (11), note that K √ K   √ √ ′ ˆ − µjj ej ) = ˆ − g (λkj ))(Ip − Uj )Sˆk γˆ j + T (g (γˆ Sˆk γ) (Ip − Uj ) T (Tˆ (γ) g (λkj )(Ip − Uj ) T (Sˆk γˆ j − λkj ej ) k=1

=

K  k=1

k=1



g (λkj )(Ip − Uj ) T (Sˆk − Λk )ej +

K  k=1

g (λkj )(Ip − Uj )Λk



T (γˆ j − ej ) + op (1).

(13)

J. Miettinen et al. / Journal of Multivariate Analysis 123 (2014) 214–227

227

Collecting the results from (11)–(13) gives us p  K 



g (λkj ) T (Sk )ji ei +

i=j+1 k=1

 = µjj

p  K 

g (λkj )λkj

T γˆ ji ei

i=j+1 k=1 j −1  √

T (Sˆ0 )ji ei +

p  √



T ((Sˆ0 )jj − 1)ej +

i=1

+ µjj





T (Sˆ0 )ji ei

i=j+1

 √

T (γˆ j − ej ) +



j  √

T (γˆ i − ei ) ej ei ′

+ op (1).

(14)

i=1

The result then follows by solving the p equations in (14) one by one. Proof of Theorem 3. The limiting variances and covariances of the symmetrized sample autocovariance matrices are obtained in [12]. As



T (γˆ j − ej ) =

j −1  √

T γˆji ei +



T (γˆjj − 1)ej +

p 

γˆji ei

i=j+1

i=1

and the asymptotic covariances of the components of γˆ j are zero, we obtain that the asymptotic covariance matrix of



T (γˆ j − ej ) is j−1 

ASV (γˆji )ei e′i + ASV (γˆjj )ej e′j +

i =1

p 

ASV (γˆji )ei e′i .

i=j+1

The asymptotic variances can be derived using Theorem 2. References [1] A. Belouchrani, K. Abed-Meraim, J.-F. Cardoso, E. Moulines, A blind source separation technique using second-order statistics, IEEE Transactions on Signal Processing 45 (1997) 434–444. [2] P.J. Brockwell, R.A. Davis, Time Series: Theory and Methods, second ed., Springer-Verlag, New York, 1991. [3] J.F. Cardoso, A. Souloumiac, Jacobi angles for simultaneous diagonalization, SIAM Journal on Mathematical Analysis and Applications 17 (1996) 161–164. [4] Y. Chen, W. Härdle, V. Spokoiny, Portfolio value at risk based on independent component analysis, Journal of Computational and Applied Mathematics 205 (2007) 594–607. [5] A. Cichocki, S. Amari, Adaptive Blind Signal and Image Processing: Learning Algorithms and Applications, Wiley, Cichester, 2002. [6] P. Comon, C. Jutten, Handbook of Blind Source Separation. Independent Component Analysis and Applications, Academic Press, Amsterdam, 2010. [7] S. Degerine, A. Zaidi, Separation of an instantaneous mixture of Gaussian autoregressive sources by the exact maximum likelihood approach, IEEE Transactions on Signal Processing 52 (2004) 1499–1512. [8] A. García-Ferrer, E. González-Prieto, D. Peña, A conditionally heteroskedastic independent factor model with an application to financial stock returns, International Journal of Foreecasting 28 (2012) 70–93. [9] A. Hyvärinen, J. Karhunen, E. Oja, Independent Component Analysis, John Wiley & Sons, New York, 2001. [10] P. Ilmonen, K. Nordhausen, H. Oja, E. Ollila, A new performance index for ICA: properties computation and asymptotic analysis, in: V. Vigneron, V. Zarzoso, E. Moreau, R. Gribonval, E. Vincent (Eds.), Latent Variable Analysis and Signal Separation, Springer, Heidelberg, 2010, pp. 229–236. [11] S. Lee, H. Shen, Y. Truong, M. Lewis, X. Huang, Independent component analysis involving autocorrelated sources with an application to functional magnetic resonance imaging, Journal of the American Statistical Association 106 (2010) 1009–1024. [12] J. Miettinen, K. Nordhausen, H. Oja, S. Taskinen, Statistical properties of a blind source separation estimator for stationary time series, Statistics & Probability Letters 82 (2012) 1865–1873. [13] K. Nordhausen, H.W. Gutch, H. Oja, F.J. Theis, Joint diagonalization of several scatter matrices for ICA, in: F. Theis, A. Cichocki, A. Yeredor, M. Zibulevsky (Eds.), Latent Variable Analysis and Signal Separation, in: LNCS, vol. 7191, Springer, Heidelberg, 2012, pp. 172–179. [14] K. Nordhausen, P. Ilmonen, A. Mandal, H. Oja, E. Ollila, Deflation-based FastICA reloaded, in: Proceedings of 19th European Signal Processing Conference 2011, EUSIPCO 2011, 2011, pp. 1854–1858. [15] K. Nordhausen, E. Ollila, H. Oja, On the performance indices of ICA and blind source separation, in: Proceedings of 2011 IEEE 12th International Workshop on Signal Processing Advances in Wireless Communications, SPAWC 2011, 2011, pp. 486–490. [16] N. Su, R. Lund, Multivariate versions of Bartlett’s formula, Journal of Multivariate Analysis 105 (2012) 18–31. [17] A.C. Tang, J.-Y. Liu, M.T. Sutherland, Recovery of correlated neuronal sources from EEG: the good and bad ways of using SOBI, NeuroImage 7 (2005) 507–519. [18] F.J. Theis, Y. Inouye, On the use of joint diagonalization in blind signal processing, in: Proceedings of IEEE International Symposium on Circuits and Systems, 2006, pp. 3589–3592. [19] L. Tong, V.C. Soon, Y.F. Huang, R. Liu, AMUSE: a new blind identification algorithm, in: Proceedings of IEEE International Symposium on Circuits and Systems 1990, 1990, pp. 1784–1787. [20] A. Yeredor, Blind separation of Gaussian sources via second-order statistics with asymptotically optimal weighting, IEEE Signal Processing Letters 7 (2000) 197–200. [21] A. Yeredor, Blind separation of Gaussian sources with general covariance structures: bounds and optimal estimation, IEEE Transactions on Signal Processing 58 (2010) 5057–5068. [22] A. Ziehe, K.-R. Müller, TDSEP—an efficient algorithm for blind separation using time structure, in: Proceedings of ICANN, 1998, pp. 675–680.