Minimum distance index for complex valued ICA

Minimum distance index for complex valued ICA

Statistics and Probability Letters 118 (2016) 100–106 Contents lists available at ScienceDirect Statistics and Probability Letters journal homepage:...

466KB Sizes 0 Downloads 30 Views

Statistics and Probability Letters 118 (2016) 100–106

Contents lists available at ScienceDirect

Statistics and Probability Letters journal homepage: www.elsevier.com/locate/stapro

Minimum distance index for complex valued ICA Niko Lietzén a,∗ , Klaus Nordhausen b,c , Pauliina Ilmonen a a

Aalto University, School of Science, Department of Mathematics and Systems Analysis, P.O. Box 11100, 00076 Aalto, Finland

b

University of Turku, Department of Mathematics and Statistics, 20014 Turku, Finland

c

University of Tampere, School of Health Sciences, 33014 Tampere, Finland

article

info

Article history: Received 20 April 2016 Received in revised form 22 June 2016 Accepted 22 June 2016 Available online 28 June 2016

abstract We generalize the Minimum Distance (MD) index to be applicable in complex valued ICA. To illustrate the use of the MD index, we present a complex version of AMUSE and compare it to complex FOBI in a simulation study. © 2016 Elsevier B.V. All rights reserved.

MSC: 62M10 62G05 62H12 62H20 62H35 Keywords: Independent component analysis Complex valued variables Performance indices AMUSE Time series

1. Introduction In the complex valued independent component (IC) model, the elements of a p-variate random vector are assumed to be linear combinations of the elements of an unobservable complex valued p-variate vector with mutually independent components. In complex valued independent component analysis (ICA) the aim is to recover the independent components by estimating a complex valued unmixing matrix that transforms the observed p-variate vector to mutually independent components. The IC model is not in general uniquely defined as after permutation, scaling and phase shifts, independent components remain independent. Thus, different ICA estimates can estimate different population quantities. ICA is applied in various fields of science, e.g. biomedical image data applications, signal processing and economics, see Hyvärinen et al. (2001) and Comon and Jutten (2010). Complex random signals play an increasingly important role in the field of ICA. The complex IC model is used for example in magnetic resonance imaging and antenna array signal processing for wireless communications and radar applications. See for example Hyvärinen et al. (2001) and Ollila et al. (2008). The main contribution of this paper is that we extend the minimum distance (MD) index for complex valued ICA. The MD index was presented in the case of real valued ICA in Ilmonen et al. (2010). The index is independent of the standardization of the model and thus treats different ICA estimators fairly even if they estimate different quantities in terms of order, scale



Corresponding author. E-mail address: [email protected] (N. Lietzén).

http://dx.doi.org/10.1016/j.spl.2016.06.019 0167-7152/© 2016 Elsevier B.V. All rights reserved.

N. Lietzén et al. / Statistics and Probability Letters 118 (2016) 100–106

101

and phase. We present a generalization of the MD index for complex valued ICA. We also consider some properties of the complex MD index. Additionally, we present a complex valued version of the well-known AMUSE functional. The AMUSE transformation, for real valued time series ICA, is based on simultaneous use of the real valued covariance matrix and a real valued autocovariance matrix with lag τ ̸= 0. The method was presented in Tong et al. (1990), and its asymptotic behavior was examined in Miettinen et al. (2012). In this article, we consider the AMUSE functional in the case of complex valued time series. We conduct a simulation study to illustrate the use of the generalized MD index by comparing the complex AMUSE and complex FOBI (Cardoso, 1989; Ilmonen, 2013) estimators in different settings. The paper is organized as follows. In Section 2 we generalize the MD index for complex valued ICA. In Section 3 we present the complex version of the AMUSE functional. Simulations are presented in Section 4 and some final remarks are given in Section 5. The proofs of the theorems are presented in the Appendix. 2. Minimum distance index for complex valued ICA The basic complex valued independent component (IC) model has the form, xt = Ω zt + µ,

t = 1, . . . , n,

(1)

where Ω is a full-rank p × p complex valued mixing matrix, zt is an unobservable centered complex-valued p-vector with mutually independent components and µ is a p-variate location vector. The location µ is usually a nuisance parameter that is in the following assumed to be zero for simplicity. Based on the observed p-vector xt the goal is to find an unmixing matrix Γ such that Γ xt has again mutually independent components. Clearly the model is ill-defined as order, scale and phase of the components of zt are not well-defined. The literature, especially in the real case, suggests many ICA estimators, Γˆ , and their performance is usually compared using performance indices. For a review in the real case about different indices see Nordhausen et al. (2011). Most of the indices presented in the literature assume that the IC model is standardized in some particular way, see Ilmonen et al. (2010). For example, the popular AMARI index (Amari et al., 1996) that can be used for complex ICA as well, see Ollila et al. (2008), suffers from this problem as it assumes that the methods assign the components the same scale, Ilmonen et al. (2010). This can be problematic as different estimators may estimate different scales. The MD index, introduced in Ilmonen et al. (2010) for real valued ICA, was designed to solve this problem. We next generalize the MD index for complex valued ICA. Let C denote the set of matrices of the form DP, where P is a p × p permutation matrix and D is a complex valued p × p diagonal matrix. The sets {CA : C ∈ C } partition the set of complex valued p × p matrices into equivalence classes. If B ∈ {CA : C ∈ C }, notation A ∼ B is used. The shortest squared distance between the set {CA : C ∈ C } of matrices that are equivalent to A and Ip is given by D2 (A) =

1 p−1

2 inf CA − Ip F ,





(2)

C ∈C

where ∥ · ∥F is the Frobenius norm. Remark 2.1. Note that D2 (A) = D2 (CA) for all C ∈ C . Theorem 2.1. Let A be any complex valued p × p full rank matrix. The shortest squared distance D2 (A) fulfills the following four conditions given below: 1. 2. 3. 4.

1 ≥ D2 (A) ≥ 0, D2 (A) = 0 iff A ∼ Ip , D2 (A) = 1 iff A ∼ 1p aT for some complex valued p-variate vector a, and     the function c → D2 Ip + c off (A) is increasing in c ∈ [0, 1] for all matrices A such that Aij  ≤ 1, i ̸= j.

ˆ Consider the complex valued IC model with mixing matrix Ω and   an unmixing matrix estimate Γ . The shortest distance ˆ : C ∈ C equivalent to the gain matrix Gˆ = Γˆ Ω is given in the between the identity matrix and the set of matrices C G following definition. Definition 2.1 (Minimum Distance Index). The minimum distance index for Γˆ is

ˆ = D(Γˆ Ω ) = √ D

1

 

 

inf C Γˆ Ω − Ip  .

p − 1 C ∈C

F

ˆ ≥ 0, and Dˆ = 0 only if Γˆ ∼ Ω −1 . Furthermore, Dˆ = 1 is obtained in the It follows from Theorem 2.1 that 1 ≥ D pathological case when all the row vectors of Γˆ Ω have the same direction. The value of the minimum distance index is now easy to interpret. Values close to 0 are associated with excellent separation, and large values indicate poor performance.

102

N. Lietzén et al. / Statistics and Probability Letters 118 (2016) 100–106

Note that D(Γˆ Ω ) = D(C Γˆ Ω ) for all C ∈ C . Additionally, if the unmixing matrix estimator is affine equivariant up to the phase, that is Γ (AX ) = H Γ (X ) A−1 with some phase-shift matrix H, we have that D (H Γ (AX ) AΩ ) = D (H Γ (X )Ω ) = D (Γ (X )Ω ) .

(3)

The minimization over all choices C ∈ C is done by two steps. 2

ˆ = Γˆ Ω and let G˜ ij = |Gˆ |ij / Theorem 2.2. Let P denote the set of all p × p permutation matrices. Let G minimum distance index can be written as 

1

ˆ = D(Gˆ ) = D

p−1

 

1/2

˜ p − max tr P G P ∈P

p

k=1

2 |Gˆ|ik . Now the

.

The maximization problem



˜) max tr(P G



(4)

P

over all permutation matrices P can be seen as a linear sum assignment problem (LSPA). We solve the LSPA using the Hungarian method, see Papadimitriou and Steiglitz (1982). Remark 2.2. Note that the use of generalized MD index is not restricted to IC models. It can be applied in assessing the performance of mixing matrix estimators also in general blind source separation (BSS) models. 3. Complex valued AMUSE Let x = (xt )t =0,±1,±2,... be a p-variate stochastic process and let X = [x1 . . . xn ], where x1 , . . . , xn is a corresponding observed time series. Autocovariance matrix functionals, with different lags τ = 1, 2, . . . , are defined in the following way Sτ (xt ) = E (xt − E(xt ))(xt +τ − E(xt ))∗ ,





where A∗ denotes the conjugate transpose of A. Note that for τ = 0, the regular covariance matrix is obtained. Hermite symmetrized autocovariance matrix functionals, with different lags τ = 1, 2, . . . , are defined as follows Sτ ,symm (xt ) =

1 2

(Sτ (xt ) + Sτ (xt )∗ ).

The corresponding sample estimates are obtained as follows. mx =

n 1

n t =1

Sˆτ (X ) =

xt ,

Cov(X ) =

n 1

n t =1

(xt − mx )(xt − mx )∗ ,

n−τ    (xt − mx )(xt +τ − mx )∗

1

(n − τ ) t =1  1 Sˆτ ,symm (X ) = Sˆτ + Sˆτ∗ .

and

2

Considering now ICA for complex-valued time series, the sources are assumed to be independent complex valued stationary time series following model (1). For the following method we also make the additional assumptions: for all t = 1, . . . , n, E(zt ) = 0, E(zt zt∗ ) = I and Sτ (zt ) = Sτ ,symm (zt ) = D is a diagonal matrix with diagonal elements d1 ≥ · · · ≥ dp > 0. Hereby, the order and the scale of the sources are fixed. Remark 3.1. Note that the assumption above implies second order weak stationarity and uncorrelatedness of the components of zt . The aim is to find an estimate of an unmixing matrix Γ such that Γ xt gives the sources (up to the order and constant multiplications). The well known AMUSE algorithm for complex valued time series ICA can then be given as follows. Let a p × p matrix valued functional Γ = Γ (xt ), and a p × p diagonal matrix valued functional Λ = diag(λ1 , . . . , λp ) = Λ(xt ) be defined such that, if zt = Γ (xt )xt , then Cov(zt ) = Ip , where λ1 ≥ · · · ≥ λp .

and

Sτ (zt ) = Sτ ,symm (zt ) = Λ,

(5)

N. Lietzén et al. / Statistics and Probability Letters 118 (2016) 100–106

103

The matrices Γ ∗ and Λ above are the eigenvector matrix and the corresponding eigenvalue matrix of the product

(Cov(xt ))−1 Sτ ,symm (xt ). The eigenvector matrix Γ ∗ is standardized such that Cov(zt ) = I and Sτ ,symm (zt ) is a diagonal matrix having diagonal elements in decreasing order. Assuming that the diagonal elements of Λ are distinct, the matrix Γ (xt ) provides a solution for the ICA problem. Moreover, the solution is affine equivariant up to phases in the sense that Γ (Axt + b) = H Γ (xt )A−1 , where H =   diag eθ1 i , . . . , eθp i . The proof of this is almost identical to the proofs of Theorem 4.1 and Remark 4.1 of Ilmonen (2013). Finite sample estimates are obtained by replacing the population quantities by the corresponding estimates. Note that for a finite sample it is recommended to work with a symmetrized autocovariance matrix as the non-symmetrized might not be symmetric for finite data. Complex valued AMUSE algorithm, for a chosen τ , can be conducted using the following four steps. 1. Whitening the time series: yˆ t = Cov (X )−1/2 (xt − x¯ ), where A1/2 is the conjugate symmetric square root of A. 2. Computing the symmetrized autocovariance matrix:





Sˆτ ,symm (Yˆ ) = Sˆτ (Yˆ ) + Sˆτ (Yˆ )∗ /2,

Yˆ = yˆ 1 . . . yˆ n .





ˆ Uˆ ∗ , where Dˆ is a diagonal matrix and Uˆ is orthogonal. 3. Solving eigenvalue—eigenvector decomposition of Sˆτ ,symm (Yˆ ) = Uˆ D 4. Estimating the independent time series zˆt = Uˆ ∗ yˆ t and unmixing matrix Γˆ = Γˆ (X ) = Uˆ ∗ Cov(X )−1/2 .

Note that AMUSE is a special case of SOBI (Belouchrani et al., 1997) which jointly diagonalizes several autocovariance matrices. The theoretical properties of AMUSE in the real case are discussed in Miettinen et al. (2012). 4. Simulation study We conducted a simulation study to demonstrate the use of the generalized MD index. We also wanted to demonstrate the importance of choosing the right ICA estimator for time series data and for i.i.d. observations. Furthermore, we wanted to assess the effect of the sample size, and of the parameter τ on the performance of the complex AMUSE estimator. The FOBI functional is, instead of the simultaneous use of the covariance matrix and the autocovariance matrix, based on simultaneous use of the covariance matrix and the scatter matrix based on fourth moments, see Cardoso (1989) and Ilmonen (2013). Let Fx be a cdf of a random variable x. The scatter matrix based on fourth moments is defined as follows, Cov4 (x) =

1 p+2

E (x − E(x)) (x − E(x))∗ Cov(x)−1 (x − E(x)) (x − E(x))∗ .





ICA unmixing matrix functional ΓFOBI (x) is obtained by replacing the symmetrized autocovariance matrix by Cov4 in Eq. (5). FOBI is designed for IC models with i.i.d. observations and does not take the time structure into account. The FOBI estimator is obtained by replacing the covariance matrix and Cov4 by the corresponding sample estimates. We compared complex AMUSE with complex FOBI in several different three-variate simulation settings. Since both, AMUSE and FOBI, are affine equivariant up to phase, the mixing matrix has no effect on the performance of the estimators. Thus wlog the sources were mixed with the identity matrix. In four simulation settings FOBI estimate and AMUSE estimates with parameters τ = 1, 2, 10 were calculated. All the sample sizes were n = 5000 and the number of repetitions was 2000. The four settings are: Setting 1: Both real and imaginary parts were independently sampled from t (10), beta(0.5, 0.5) and unif(0, 1). Setting 2: The real parts are three independent AR(2) processes with parameters (0.25, 0.6), (0.15, 0.4) and (0.1, 0.2) and imaginary parts are independent from the real parts and are AR(2) processes with parameters (−0.5, −0.6), (0.7, −0.1) and (−0. 4, 0.3). The innovations for all the parts have the same distributions as in Setting 1. Setting 3: The real parts have the same serial dependence as in Setting 2 but now with N(0, 0.4) innovations. The imaginary parts were formed by multiplying the real parts by −4.2, 2.0 and 0.9, respectively. Standard normal innovations were added after multiplication. Setting 4: The real parts have the same serial dependence as in Setting 2 but now with t (9) innovations. The imaginary parts are now formed by multiplying the real parts by −4.2, 2.0 and 0.9, respectively and adding i.i.d. random variables sampled from t (10), beta(0.5, 0.5) and unif(0, 1). Hence, Setting 1 is an i.i.d. setup where AMUSE should not work but FOBI should, since the components have distinct kurtosis values. Settings 2–4 all have serial dependence. While in Setting 2 the real parts are independent from the imaginary parts, there is dependence between the parts in Settings 3 and 4. In Setting 3 all innovations are normally distributed, in Setting 2 all innovations have different kurtosis values and in Setting 4 only the imaginary parts have innovations with different kurtosis values. In Settings 2–4 all the serial informations should be contained in the lower lags. Figs. 1–4 give the obtained MD index values in all four settings. Fig. 1 shows clearly that AMUSE does not work at all while FOBI does well, which is as expected. For Setting 2–4 the figures clearly show that for AMUSE with τ = 10 does not contain any information about the serial dependence anymore

104

N. Lietzén et al. / Statistics and Probability Letters 118 (2016) 100–106

Fig. 1. MD index calculated for the first simulation setting.

Fig. 2. MD index calculated for the second simulation setting.

Fig. 3. MD index calculated for the third simulation setting.

Fig. 4. MD index calculated for the fourth simulation setting.

and τ = 1, 2 are much better. This is expected as the serial dependence in these settings is short (in time). However, the best lag seems to depend on the setting. A bit unexpectedly FOBI seems not to work at all in any of the settings with serial dependence for the given sample size. To evaluate the effect of the sample size, we considered Setting 2 for FOBI and AMUSE with τ = 1 for sample sizes n = 1000, 5000, 50 000, 100 000. The corresponding MD index values based on 2000 repetitions are given in Fig. 5. The figure shows that FOBI seems to need much more observations to work here, whereas AMUSE needs considerably less observations for good estimates. 5. Conclusions In this paper we extended the MD index which is increasingly popular in the real case to complex-valued ICA. Complex values occur frequently in ICA applications which require therefore complex-valued ICA methods and appropriate performance indices. In this paper we gave the key properties of the index which is surprisingly easy to compute and demonstrate that in simulations the results are as one expects from the theory. For i.i.d. settings FOBI obtained better MD index values than AMUSE. For time series settings AMUSE outperformed FOBI. In the case of real valued ICA, the limiting

N. Lietzén et al. / Statistics and Probability Letters 118 (2016) 100–106

105

Fig. 5. MD index calculated for FOBI and AMUSE with τ = 1 in Setting 2 for different sample sizes.

distribution of the MD index is known. One of the future goals is to consider the limiting distribution of the MD index also in the case of complex valued ICA. Acknowledgments The authors wish to thank the anonymous referee for her/his insightful comments that helped to improve the paper. The work of Klaus Nordhausen was supported by the Academy of Finland (grant 268703). Appendix Proof of Theorem 2.1. The proof is almost identical to the corresponding proof for the real valued MD index presented in Ilmonen et al. (2012), and thus it is omitted here. Proof of Theorem 2.2. Let A = aij = vij + wij i ∈ Cp×p , where vij and wij are real valued and i is the imaginary unit. Assume that there is at leastone nonzero element in each row of A. Let L denote the set of all nonsingular p × p diagonal  matrices and let L = lij = xij + yij i ∈ L, where xij and yij are real valued. Now

 

∥LA − Ip ∥2F =

p 

|aii lii − 1|2 +

i =1

=





p  p    aij lij 2 i =1 j =1 i̸=j

p  p   2

xij vij2 + wij2 + y2ij vij2 + wij2





i =1 j =1

  p    2  ∂ 2 2 ∥LA − Ip ∥F = 2 xii vij + wij − 2vii ∂ xii j =1 and

  p    2  ∂ 2 2 ∥LA − Ip ∥F = 2 yii vij + wij + 2wii . ∂ yii j=1 The derivatives are zero when

vii Re(aii ) = p    2   2 vij + wij2 aij p

j =1

−2

p  i=1

Then

xii =



j=1

(vii xii − wii yii ) + p.

106

N. Lietzén et al. / Statistics and Probability Letters 118 (2016) 100–106

and yii = −

wii p



=

vij2 + wij

 2

−Im(aii ) . p     2 aij

j =1

j =1

The value of ∥LA − Ip ∥2F is then p−

p  |aii |2 . p    i =1 aij 2 j =1 2

ˆ = Γˆ Ω and G˜ij = |Gˆ |ij / Let P denote the set of all p × p permutation matrices. Now denote G can then write the minimum distance index as ˆ = D(Gˆ ) = √ 1 D p−1



 

1/2

˜ p − max tr P G P ∈P

p

k =1

2

|Gˆ |ik , i, j = 1, . . . , p. We

.

Note that the Frobenius norm has the following property,

  n   m  ∥A∥F =  |aij |2 = tr(AA∗ ), i=1 j=1

where A∗ is the conjugate transpose of A. References Amari, S., Cichocki, A., Yang, H.H., et al., 1996. A new learning algorithm for blind signal separation. Adv. Neural Inf. Process. Syst. 8, 757–763. Belouchrani, A., Abed-Meraim, K., Cardoso, J.F., Moulines, E., 1997. A blind source separation technique using second-order statistics. IEEE Trans. Signal Process. 45, 434–444. Cardoso, J.F., 1989. Source separation using higher order moments. In: 1989 International Conference on Acoustics, Speech, and Signal Processing, 1989. ICASSP-89. IEEE, pp. 2109–2112. Comon, P., Jutten, C., 2010. Handbook of Blind Source Separation: Independent Component Analysis and Applications. Academic press. Hyvärinen, A., Karhunen, J., Oja, E., 2001. Independent Component Analysis. John Wiley & Sons, New York. Ilmonen, P., 2013. On asymptotic properties of the scatter matrix based estimates for complex valued independent component analysis. Statist. Probab. Lett. 83, 1219–1226. Ilmonen, P., Nordhausen, K., Oja, H., Ollila, E., 2010. A new performance index for ICA: properties, computation and asymptotic analysis. In: Latent Variable Analysis and Signal Separation. Springer, pp. 229–236. Ilmonen, P., Nordhausen, K., Oja, H., Ollila, E., 2012. On asymptotics of ICA estimators and their performance indices. arXiv preprint arXiv:1212.3953. Miettinen, J., Nordhausen, K., Oja, H., Taskinen, S., 2012. Statistical properties of a blind source separation estimator for stationary time series. Statist. Probab. Lett. 82, 1865–1873. Nordhausen, K., Ollila, E., Oja, H., 2011. On the performance indices of ICA and blind source separation. In: 2011 IEEE 12th International Workshop on Signal Processing Advances in Wireless Communications. (SPAWC). IEEE, pp. 486–490. Ollila, E., Oja, H., Koivunen, V., 2008. Complex-valued ICA based on a pair of generalized covariance matrices. Comput. Statist. Data Anal. 52, 3789–3805. Papadimitriou, C.H., Steiglitz, K., 1982. Combinatorial optimization: Algorithms and Complexity. Courier Corporation. Tong, L., Soon, V., Huang, Y., Liu, R., 1990. AMUSE: a new blind identification algorithm. In: IEEE International Symposium on Circuits and Systems, 1990. IEEE, pp. 1784–1787.