Journal of the Franklin Institute 339 (2002) 161–187
Subspace design of low-rank estimators for higher-order statistics$ Ivan Bradarica,*, Athina P. Petropulua,1, Konstantinos I. Diamantarasb a
Electrical and Computer Engineering Department, Drexel University, 3141 Chestnut Street, Philadelphia, PA 19104, USA b Department of Informatics, Technological Education Institute of Thessaloniki, 54101, Sindos, Greece
Abstract Higher-order statistics (HOS) are well known for their robustness to additive Gaussian noise and ability to preserve phase. HOS estimates, on the other hand, have been criticized for high complexity and the need for long data in order to maintain small variance. Since rank reduction offers a general principle for reduction of estimator variance and complexity, we consider the problem of designing low-rank estimators for HOS. We propose three methods for choosing the transformation matrix that reduces the mean-square error (MSE) associated with the low-rank HOS estimates. We also demonstrate the advantages of using low-rank third-order moment estimates for blind system estimation. Results indicate that the full rank MSE corresponding to some data length N can be attained by a low-rank estimator corresponding to a length significantly smaller than N: r 2002 Published by Elsevier Science Ltd. on behalf of The Franklin Institute. Keywords: Higher-order statistics; Low-rank estimates; Variance reduction; System identification
$ Portions of this paper have appeared in ‘‘Low rank approach in system identification using higherorder statistics’’, 9th IEEE Signal Processing Workshop on Statistical Signal and Array Processing, Portland, Oregon, September 1998, and ‘‘Design of low rank estimators for higher-order statistics based on the second-order statistics’’, IEEE Signal Processing Workshop on Higher-Order Statistics, Cezarea, Israel, June 1999. This work was supported by NSF under grant MIP-9553227 and The Whitaker Foundation. *Corresponding author. Tel.: +1-215-895-2358; fax: +1-215-895-2066. E-mail addresses:
[email protected] (I. Bradaric),
[email protected] (A.P. Petropulu),
[email protected] (K.I. Diamantaras). 1 Also correspondence to: Prof. Athina P. Petropulu, Electrical and Computer Engineering Department, Drexel University, Philadelphia, PA 19104.
0016-0032/02/$22.00 r 2002 Published by Elsevier Science Ltd. on behalf of The Franklin Institute. PII: S 0 0 1 6 - 0 0 3 2 ( 0 2 ) 0 0 0 1 9 - 4
162
I. Bradaric et al. / Journal of the Franklin Institute 339 (2002) 161–187
1. Introduction Higher-order statistics (HOS), as opposed to second-order statistics, exhibit many desirable properties, among which are robustness to additive Gaussian noise, detection and characterization of nonlinearities, and ability to preserve phase. It is well established that reconstruction of a nonminimum phase system, driven by nonGaussian noise, from the system output is only possible using HOS of the output [1]. Despite the large number of applications that could be formulated as a system reconstruction problem, the applicability of HOS has been hampered by issues such as high complexity, and the need for long data in order to keep the variance of HOS estimates low. Several efforts of reducing variance and complexity are underway. In [2,3] a system reconstruction method based on preselected higher-order spectrum slices has been proposed, which yields low-variance system estimates by avoiding low signalto-noise ratio higher-order spectra regions. The variance reduction enables the use of smaller data lengths, and the need of slices only, instead of the entire higher-order spectrum, leads to complexity reduction. It is well known that rank reduction methods offer a general principle for reduction in estimator variance. Some recent works in the area of multilinear algebra [4–6] offered a mathematical framework for treating the transformations associated with the low-rank estimators. In [7], low-rank estimation of HOS has been proposed as a means of variance and complexity reduction. The signal is transformed into a coordinate system (subspace) in which only a few signal coefficients are significant. Applying HOS on the transformed signal, can not only lead to more robust HOS estimators based on shorter data lengths, but also reduce complexity. Also in [7], the authors gave an approximation for the variance reduction for the case of white data, and presented several ideas for subspace design. However, no procedure was provided for selecting in advance the transformation matrix that would yield the reduced mean-square error (MSE). In this paper, we propose three different methods for choosing the transformation matrix that reduces the MSE of HOS estimates. All the three methods require either partial, or no prior information about the system. We also apply low-rank approach on system reconstruction from the system output. Simulation results of system reconstruction based on low-rank HOS estimates of the system output indicate significant reduction in variance, when compared to the corresponding full-rank result. We also demonstrate that, the full-rank MSE corresponding to some data length N; can be attained by a low-rank estimator corresponding to a length significantly smaller than N: The paper is organized as follows. In Section 2, we present the mathematical framework for treating the transformations associated with subspaces and low-rank estimators. In Section 3, we propose three subspace design methods for choosing the transformation matrix that optimizes the tradeoff between bias and variance. In translating the HOS variance reduction into reduction of system error in system identification problem, we pick identification method of Pozidis and Petropulu [2,3]. The method is briefly summarized and the statistical analysis is presented in
I. Bradaric et al. / Journal of the Franklin Institute 339 (2002) 161–187
163
Section 4. In Section 5, we present simulation results of the proposed methods and their application to the system identification problem. Finally, in Section 6 we give some concluding remarks.
2. Mathematical background Let x be a ðM 1Þ data vector that consists of samples of a zero mean stationary process (throughout this paper, boldface upper and lower case symbols are used to represent matrix and vector quantities, respectively). The nth-order moment of x can be expressed as mn;x ¼ EfxðnÞ g;
xðnÞ ¼ x#x#::#x |fflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflffl}
ð1Þ
n times
or in the matrix form as Mn;x ¼ Efxðn1Þ #xT g;
ð2Þ
where # denotes Kronecker (tensor) product of vectors or matrices. Let T be an M Q ðQpMÞ transformation matrix whose Q orthonormal columns span a Q-dimensional approximation subspace, and let z be a subspace data vector given by z ¼ TT x:
ð3Þ
The idea is to use z in the HOS-based processing algorithm instead of x: If we denote with mn;z the nth order moments of z; then the following expressions hold [7]: T
mn;z ¼ EfzðnÞ g ¼ TðnÞ mn;x ;
ð4Þ T
Mn;z ¼ Efzðn1Þ #zT g ¼ Tðn1Þ Mn;x T:
ð5Þ
While analyzing and comparing full- and low-rank estimators, it is useful to relate the subspace data z to the original full-dimensional space. To this extent, we define the low-rank data vector x# ¼ Px ¼ Tz;
ð6Þ
which is equal to the projection of x onto the subspace, with P ¼ TTT being the subspace matrix. In all three proposed methods for choosing the transformation matrix, matrix T is set to have as columns the orthonormal vectors. Therefore, TT T ¼ I and matrix P is symmetric and idempotent. Low-rank moments, or equivalently, the moments of the low-rank data x# are given by mn;x# ¼ Efx# ðnÞ g ¼ TðnÞ mn;z ¼ PðnÞ mn;x ;
ð7Þ
I. Bradaric et al. / Journal of the Franklin Institute 339 (2002) 161–187
164
Mn;x# ¼ Efx# ðn1Þ #x# T g ¼ Tðn1Þ Mn;z TT ¼ Pðn1Þ Mn;x P:
ð8Þ
Moments estimate can be obtained as # n;x ¼ m
N 1 X xðnÞ ; N i¼1 i
ð9Þ
where x1 ; x2 ; y; xN are independent random vectors. In practice, the xi ’s could be segments of x: # n;x ; we obtain [7] Starting with the MSE of the low-rank estimate PðnÞ m # n;x jj2 g MSE ¼ Efjjmn;x PðnÞ m # n;x jj2 g # n;x g þ PðnÞ Efm # n;x g PðnÞ m ¼ Efjjmn;x PðnÞ Efm # n;x PðnÞ Efm # n;x gjj2 g þ EfjjPðnÞ m # n;x gjj2 g: ¼ Efjjmn;x PðnÞ Efm
ð10Þ
The first term in Eq. (10) is the squared bias of the estimator and the second term represents its variance. The latter term can be simplified to 1 # n;x Þ ¼ ðEfðxT PxÞðnÞ g mTn;x PðnÞ mn;x Þ: varðPðnÞ m ð11Þ N Let us now consider the following system reconstruction scenario: yðkÞ ¼ rT h þ wðkÞ ¼ xðkÞ þ wðkÞ;
ð12Þ T
where yðkÞ is the output, h ¼ ½hð0Þhð1Þ?hðM 1Þ is the unknown FIR channel of length M, r ¼ ½rðkÞrðk 1Þ?rðk M þ 1ÞT ; rðkÞ is i.i.d., stationary, zero-mean and nonsymmetrically distributed random process, and wðkÞ is zero-mean, Gaussian noise, independent of rðkÞ: At this point we will consider third-order statistics. Let us denote with c3;y the real third-order cumulant of yðkÞ; and with c# 3;y the corresponding sample estimate. Then, Bias2 ðPð3Þ c# 3;y Þ ¼ jjc3;y Pð3Þ Ef#c3;y gjj2 ¼ jjc3;y Pð3Þ c3;y jj2 ;
ð13Þ
1 ðEfðyT PyÞð3Þ g cT3;y Pð3Þ c3;y Þ: ð14Þ N The problem is now focused on determining the transformation matrix P ¼ TTT that minimizes the MSE associated with low-rank estimator Pð3Þ c# 3;y : In the next section, we will propose three different methods for selecting P: In all three proposed methods, matrix P is set to be P ¼ TTT ; where matrix T has as columns the orthonormal vectors. We will next show that using low-rank data y# ¼ Py leads to a cumulant vector whose covariance matrix has reduced Frobenius norm as compared with the full-rank case. In the full-rank case, the covariance matrix is varðPð3Þ c# 3;y Þ ¼
I. Bradaric et al. / Journal of the Franklin Institute 339 (2002) 161–187
165
defined as Sc3;y ¼ Efc3;y cT3;y g Efc3;y gEfcT3;y g:
ð15Þ
When the transformation matrix P is applied, we have Sc3;y# ¼ Efc3;#y cT3;y# g Efc3;#y gEfcT3;y# g T
¼ EfPð3Þ c3;y cT3;y Pð3Þ g Pð3Þ Efc3;y gEfcT3;y gPð3Þ
T
T
¼ Pð3Þ Sc3;y Pð3Þ :
ð16Þ
It is easy to show that matrices Pð3Þ and Tð3Þ have the same properties as matrices P ð3ÞT ð3Þ and T: Namely, T T ¼ I and matrix Pð3Þ is symmetric and idempotent. Then we can write T
jjSc3;y# jj2F ¼ jjPð3Þ Sc3;y Pð3Þ jj2F T
T
¼ trðPð3Þ Sc3;y Pð3Þ Pð3Þ Sc3;y Pð3Þ Þ T
T
T
T
¼ trðTð3Þ Tð3Þ Sc3;y Tð3Þ Tð3Þ Tð3Þ Tð3Þ Sc3;y Tð3Þ Tð3Þ Þ T
T
¼ trðTð3Þ Sc3;y Tð3Þ Tð3Þ Sc3;y Tð3Þ Þ T
¼ jjTð3Þ Sc3;y Tð3Þ jj2F 3
¼
3
Q Q X X i¼1
T ð#ti Sc3;y #tj Þ2 :
ð17Þ
j¼1
Since Tð3Þ is M 3 Q3 ðQpM) matrix with orthonormal columns, it follows that jjSc3;y jj2F
X
3 3 ðM1Þ X X ðM1Þ
i¼1
X?X
3
ð#ti Sc3;y #tj Þ2 X?X T
j¼1 1 X
1 X
i¼1
j¼1
i¼1 T ð#ti Sc3;y #tj Þ2 :
3
Q Q X X
T ðt#i Sc3;y #tj Þ2
j¼1
ð18Þ
Therefore, the smaller the rank of the transformation matrix, the smaller the Frobenius norm of the covariance matrix of the moment vector. In Section 4, we will relate this result to the statistics of the channel estimates. Note that this analysis ensures only the reduction in the variance of the third-order cumulant estimates when the transformation matrix with reduced rank is applied. In the next section we will propose three methods for the MSE reduction that exploit this result.
3. Design of low-rank estimators In general, the ‘‘optimal’’ transformation matrix that minimizes the MSE of the HOS estimates can be obtained only when the channel, h; the statistic of the input
I. Bradaric et al. / Journal of the Franklin Institute 339 (2002) 161–187
166
signal and the SNR are known. Since this is not a practical case, we focus on obtaining a ‘‘suboptimal’’ solution that will lead to the MSE reduction. We propose three different methods. The first one requires partial information about the system, the second one requires only approximate bandwidth of the channel, while the third one requires no system information. Since the advantages of rank reduction are more pronounced if the system is bandlimited, the following methods refer to bandlimited systems. 3.1. Method I In [7] it was proposed to select T so that its columns span the frequency band of interest. This can be accomplished by selecting T such that Z min jjdðoÞ TTT dðoÞjj2 do; ð19Þ T
Oh
where dðoÞ ¼ ½1ejo ej2o ?ejðM1Þo H ; TT T ¼ I and Oh is normalized frequency that corresponds to the bandwidth of the system (Oh A½0; 1). The solution of this minimization problem can be obtained by taking T to have as columns the eigenvectors of the matrix Z dðoÞ dðoÞH do ð20Þ KðOh Þ ¼ Oh
associated with the Qh significant eigenvalues. For a low-pass system with passband ½0; Oh the eigenvectors of the matrix KðOh Þ are the discrete prolate spheroidal wave functions [8] and the number of significant eigenvalues in this case can be approximated as Qh EMOh þ 1: In the sequel, we consider the case of a low-pass system with passband ½0; Oh : The case of a band-pass, or high-pass filter can be treated in an analogous way. In [7], the authors assumed that the transformation matrix T is defined based on Oh and Qh : However, in general, such T does not yield minimum MSE. This conclusion is based on the following facts. To decrease variance one needs to decrease the number of eigenvectors of KðOÞ used in forming T; i.e., Q: On the other hand, the bias increases as Q decreases at a rate that is higher when O is high. This is because the number of significant eigenvalues of KðOÞ increases when O increases. This point is illustrated in Fig. 1, where we assumed M ¼ 16; computed the eigenvalues of KðOÞ and arranged them into the increasing order. The x-axis of Fig. 1 denotes eigenvalue index and the y-axis denotes normalized eigenvalues. The graph is repeated for different values for O from 0.05 to 1. As a result, the MSE is a convex function of Q and O; with minimum in the neighborhood of ðOh ; Qh Þ; but not necessarily at ðOh ; Qh Þ: Furthermore, the location of the minimum depends on SNR and the statistic of the input signal. Out of the features of hðnÞ that determine the degree of rank reduction that can be achieved, the major role is played by the bandwidth [8,9]. This fact served as a
I. Bradaric et al. / Journal of the Franklin Institute 339 (2002) 161–187
167
1
1 0.9
.95 0.8 normalized eigenvalue
.8 0.7
.65
.85 0.6
.9
.7
.5
0.5
.35
.55
.75 .6
0.4
.2
.4 .45
.1
.3
0.3
.05
.25
.15
0.2 0.1 0
0
2
4
6 8 10 index of the eigenvalue
12
14
16
Fig. 1. Normalized eigenvalues of KðOÞ for different values for O:
motivation for the following ad hoc scheme. Let us consider the following oversimplified scenario: y* ¼ s2r h* þ w ¼ x* þ w; ð21Þ where w ¼ ½wðkÞwðk 1Þ?wðk M þ 1ÞT is the Gaussian noise of Eq. (12), s2r is * the variance of rðkÞ; and hðnÞ is the ideal low-pass equivalent of hðnÞ; with Fourier transform defined as ( 1 jOjpOh ; * HðoÞ ¼ ð22Þ 0 otherwise; where Oh is the bandwidth of hðnÞ: The model of Eq. (21) can be viewed as a power spectrum domain equivalent of Eq. (12), that retains the bandwidth characteristics of Eq. (12). For Eq. (18) there exists a closed form MSE, given by Eqs. (10), (13) and (14), where [10] Efð*yT P*yÞð3Þ g ¼ 8 tr½ðPRy* Þ3 þ 6 tr½ðPRy* Þ2 trðPRy* Þ þ ½trðPRy* Þ3 ;
ð23Þ
Ry* ¼ Rx* þ Rw * n þ Rw * T Rr H ¼H * T diagðs2 ÞH * n þ diagðs2 Þ; ¼H r w
ð24Þ
168
I. Bradaric et al. / Journal of the Franklin Institute 339 (2002) 161–187
where Ry* ; Rx* Rw and Rr denote covariance matrices of yðkÞ; * xðkÞ; * wðkÞ and rðkÞ; * is the convolution matrix respectively, s2w denotes variance of wðkÞ and matrix H * Given h; * s2 ; s2 and matrix P; the MSE can be computed via corresponding to h: r w Eqs. (10), (13), (14), (23) and (24) and searched for a minimum. After these remarks, the selection of T can be carried out as follows. Assuming * For knowledge of the system bandwidth, i.e., Oh ; we form the convolution matrix H: different values of O we compute KðOÞ; and then, for Q ¼ 1; y; M; form TðO; QÞ; based on the Q significant eigenvectors of KðOÞ: We subsequently compute the MSE * and TðO; QÞ via Eqs. (10), (13) and (14). Finally, the corresponding to H transformation matrix T is formed based on Qmin eigenvectors of the matrix KðOmin Þ; where ðOmin ; Qmin Þ is the location of the MSE minimum. This method requires prior knowledge of the s2r ; s2w and the cut-off frequency of the signal. If these parameters are known, the pair ðOmin ; Qmin Þ can be determined in advance, thus low-rank estimation can be performed ‘‘off-line’’. In the case of white noise, the bandwidth can be obtained from the power spectrum of the output signal. 3.2. Method Ia Although the model of Eq. (21) does not require knowledge of the channel, it still needs the power of input and noise, respectively s2r and s2w : To bypass the need for the latter two parameters, we can operate on the bias part of the MSE only. The idea here is to choose the smallest possible rank that keeps the bias below a certain level. As the rank of the transformation matrix P will decrease, the variance will decrease too. Therefore, in cases where s2r and s2w are unknown, Method I can be modified as * * For follows. Based on hðnÞ we form the corresponding convolution matrix H: different values of O; we compute KðOÞ; and then, for Q ¼ 1; y; M; form TðO; QÞ; based on the Q significant eigenvectors of KðOÞ: We subsequently compute the * and TðO; QÞ via Eq. (13), i.e., BðO; QÞ: Let e be the squared bias corresponding to H level of squared bias that can be considered very small, and let us define the rank Qb as the minimal Q for which the normalized squared bias B2 ðO; QÞoe; where B2 ðO; QÞ9 ¼
jjc3;y PðO; QÞc3;y jj2 : jjc3;y jj2
ð25Þ
The transformation matrix T is now formed based on Qb eigenvectors of the matrix KðOb Þ; where Ob is the value for O that corresponds to Qb : 3.3. Method II The true cumulant matrix C3;y ¼ M3;y ¼ Efyð2Þ #yT g
ð26Þ
I. Bradaric et al. / Journal of the Franklin Institute 339 (2002) 161–187
169
has the singular value decomposition C3;y ¼ WKVT ;
ð27Þ
where v1 ; y; vM are orthonormal right singular vectors. It was shown in [7] that if T is chosen, so that its columns are the Q right singular vectors v1 ; y; vQ ; the transformation matrix Pt ¼ TTT is near optimal. However, the true cumulant matrix is unknown and the only way one can apply this idea is by using third-order cumulant estimates obtained as c# n;y ¼
N 1 X yð3Þ : N i¼1 i
ð28Þ
Unfortunately, as it was also reported in [7], experiments suggest that it is very difficult to achieve improvement using third-order estimates, since the transformation matrix T obtained this way is not deterministic and has high variance. In the sequel we propose to compute T based on estimates of second-order statistics of the data. T will be again a nondeterministic matrix, but its variance will be smaller than the third-order statistics estimates-based T: Similar to Method Ia, our goal is to find the transformation matrix with the smallest possible rank that keeps the bias below a certain level. One way of achieving this goal is low-rank approximation of random vector y: In this approach, the bias that we want to control is the bias introduced by modeling the data, rather than the bias of the third-order cumulants estimates as in Method Ia. Let y be a ðM 1Þ output data vector described in the previous section and let R be the correlation matrix for y: R ¼ EfyyT g ¼ UKUT ;
ð29Þ
where U is a unitary matrix of the eigenvectors for R and the matrix K is a diagonal matrix of the eigenvalues for R: U ¼ ½u1 u2 ?uM ;
ð30Þ
K ¼ diag½l21 l22 ?l2M :
ð31Þ
Let us place the eigenvalues and the eigenvectors into the order: l2ð1Þ uð1Þ
X
l2ð2Þ
X?X
l2ðMÞ
uð2Þ
?
uðMÞ
:
ð32Þ
Let P be the Q rank transformation matrix with the singular value decomposition: P ¼ TPQ TT ¼
Q X i¼1
si ti tTi ;
TT T ¼ I:
ð33Þ
170
I. Bradaric et al. / Journal of the Franklin Institute 339 (2002) 161–187
Then the MSE introduced by the approximation of the random vector y with y# ¼ Py is given as MSE ¼ Efðy y# Þ2 g 9tr Efðy y# Þðy y# ÞT g ¼ tr ðI PÞRðI PÞT ¼ tr TðI PQ ÞTT RTðI PQ ÞTT ¼ tr TT RTðI PQ Þ2 ¼
Q X
tTi Rti ð1 si Þ2 þ
i¼1
Q X
tTi Rti :
ð34Þ
i¼1
To minimize the last expression, we set si ¼ 1; i ¼ 1; 2; y; Q and T ¼ U: Then, the minimal MSE is given as MSEmin ¼ Bias2 þ Var ¼
M X
l2ðiÞ :
ð35Þ
i¼Qþ1
Therefore, the algorithm can be performed as follows. First, we compute estimates of the second order statistics as N X # ¼ 1 R y yT N i¼1 i i
ð36Þ
and perform its decomposition. Then we find the smallest Q for which Peigenvalue 2 2 e*XMSEmin ¼ M l XBias ; where e* is some small number defined in advance, i¼Qþ1 ðiÞ and form the transformation matrix T based on the Q dominant eigenvectors of the # In practice, it is better to define some relative threshold e: In our matrix R: experiments we will look for the smallest Q such that PM 2 i¼Qþ1 lðiÞ eX PM 2 : ð37Þ i¼1 lðiÞ Although we are using estimates of the second-order statistics, it will be shown that this algorithm gives good results. This method does not require any prior knowledge about statistics of the output signal. It can be seen as the Karhunen– Loeve expansion of the output signal that uses significant eigenvectors only.
4. Application in system identification It is reasonable to expect that using the low-rank cumulants c3;#y ¼ Pð3Þ c3;y instead of c3;y ; any algorithm that estimates h based on third-order moment of yðkÞ will lead to a lower variance estimate of h: Let us consider the recently proposed method of system reconstruction based on selected regions of the discretized HOS presented in [3]. This method was selected because simulations indicated that it can lead to
I. Bradaric et al. / Journal of the Franklin Institute 339 (2002) 161–187
171
improved system estimates as compared to previously proposed methods. We will use the proposed rank selection approach to obtain low-rank third-order moment estimates, and illustrate the efficiency of rank reduction in channel estimation. In [3] it was shown that unique identification of the system impulse response can be performed based on two horizontal slices of the discretized third-order spectrum of the system output, as long as the distance between slices satisfies a simple condition. The system hðnÞ can be reconstructed based on the bispectrum slices By3 ð:; lÞ and By3 ð:; l þ rÞ as follows. Let H ¼ ½log Hð1Þ; y; log HðN 1ÞT ððN 1Þ 1Þ;
ð38Þ
where HðkÞ is the discrete fourier transform of hðnÞ: Then H can be obtained as the solution of AH ¼ d;
ð39Þ
where dðkÞ ¼ log Bh3 ðk r l; lÞ log Bh3 ðk r l; l þ rÞ þ bl;r k ¼ 0; y; N 1; bl;r ¼
1 X 1 N ½log By3 ðk; l þ rÞ log By3 ðk; lÞ N k¼0
ð40Þ
and A is a (ðN 1Þ ðN 1Þ) sparse matrix; its first row contains a 1 at the rth column and 0’s elsewhere. The kth row contains a 1 at column ðk 1Þ mod N; and a 1 at column ðk þ r 1Þ mod N: The bispectrum of the system output is the only quantity required in Eq. (39). If computed as the Fourier transform of the ‘‘best’’ rank moment estimate, it is expected to produce a reduced MSE system estimate. We will justify this expectation by performing the statistical analysis of the method proposed in [3] and using the results obtained in the previous section. Let h# be the estimate of the system impulse response obtained using the algorithm proposed in [3] and let Sh# be its asymptotic covariance matrix. Using the truncated Taylor series it was shown in [3] that this covariance matrix approximately equals Sh# EðeJH A1 LÞSb# 3;y ðeJH A1 LÞT ;
ð41Þ
where JH ¼ diag½Hð1Þ; y; HðN 1Þ; ðeÞkn ¼ ð1=NÞ expfjð2p=NÞkng and A and L are constant matrices consisting of 0’s, 1’s and 1’s. Sb# 3;y is a covariance matrix of the ð2N 2Þ 1 random vector b# 3;y that is defined as y y b# 3;y ¼ ½ log B# 3 ðr l; lÞ; y; log B# 3 ðN þ 2 r l; lÞ y log B# 3 ðN þ 2 r l; l þ rÞT :
y log B# 3 ðr l; l þ rÞ; y;
ð42Þ
172
I. Bradaric et al. / Journal of the Franklin Institute 339 (2002) 161–187
The covariance matrix Sb# 3;y can be further approximated as (again using truncated Taylor series):
1 1 ; y; y Sb# 3;y E diag y B3 ðm1 Þ B3 ðm2N2 Þ SB# 3;y
1 1 diag y ; y; y B3 ðm1 Þ B3 ðm2N2 Þ
T ;
ð43Þ
where mi ’s are pairs of discrete frequencies along slices l and l þ r and SB# 3;y is the covariance matrix of the bispectrum estimates along these slices. Let us now consider the estimates of the entire bispectrum. They are obtained as a linear combination of the cumulant estimates:
N N X X 2p y y ðk1 p1 þ k2 p2 Þ : B# 3 ðk1 ; k2 Þ ¼ ð44Þ C# 3 ðp1 ; p2 Þ exp j 2N þ 1 p ¼N p ¼N 1
2
Let SB* 3;y be the covariance matrix of the entire bispectrum estimates and let SC# 3;y be the covariance matrix of the cumulant estimates. Then SB* 3;y ¼ fSC# 3;y f T ;
ð45Þ
where ðfÞk1 k2 p1 p2 ¼ expðjð2p=ð2N þ 1ÞÞðk1 p1 þ k2 p2 ÞÞ: We have shown in Section 2 that all three proposed techniques for rank reduction can guarantee the reduction of the Frobenius norm of the covariance matrix associated with the cumulant estimates. It is in general very difficult to relate this result to channel estimates. In order to simplify the analysis we will assume that this reduction is uniformly distributed, that is SC# 3;y# ði; jÞ ¼ aSC# 3;y ði; jÞ;
for all i; j;
ð46Þ
where 0pap1: Our simulation results, as it will be illustrated in the next section, indicate that this simplified assumption yields a valid estimate of the reduction of the channel MSE. Then, it is obvious from Eq. (45) that SB* 3;y# ði; jÞ ¼ aSB* 3;y ði; jÞ;
for all i; j
ð47Þ
and consequently from Eqs. (41) and (43): Sb# 3;y# ði; jÞ ¼ aSb# 3;y ði; jÞ; Sh# y# ði; jÞ ¼ aSh# y ði; jÞ;
for all i; j; for all i; j:
ð48Þ ð49Þ
Therefore, although the algorithm of [3] is highly nonlinear, assuming the simplification given by Eq. (46), the reduction in variance of the cumulant estimates will approximately have the same effect on the reduction of the variance of the channel estimates. In the next section we will apply proposed low-rank techniques for third-order spectrum estimation on this algorithm and compare MSE’s of the channel estimates.
I. Bradaric et al. / Journal of the Franklin Institute 339 (2002) 161–187
173
5. Simulation results The claims of this paper are tested for two different channel cases. Both channels were selected to be bandlimited systems in order to exploit the advantages of rank reduction. Example 1. Let us consider the case where h is an unknown two-ray communications channel with cut-off frequency C0:5 as shown in Fig. 2 (by the cut-off frequency we assume the frequency for which the normalized gain of the channel is 6 dB). The process rðkÞ that was used as an input to the system was generated as two-level i.i.d., zero-mean and nonsymmetrically distributed random process. (1a) Methods I and Ia: We first provide results to support the claim that the simplification of Eq. (21) results in a MSE very similar to the one corresponding to the scenario given by Eq. (12). Assuming the knowledge of the bandwidth of the real channel, we simplify the scenario by setting h* to be FIR low-pass filter (Hamming window) with normalized cut-off frequency 0:5 and length 16: The MSE error (see Eqs. (13), (14), (23) and (24)), obtained for different values of O and Q; corresponding to data length 1600 and SNR ¼ 6 dB is shown in Fig. 3(a). Next, we consider the case where h is the unknown channel shown in Fig. 2. The corresponding MSE, obtained performing the 100 Monte Carlo runs under the same conditions with those of Fig. 3(a), is shown in Fig. 3(b). It can be seen that Figs. 3(a) and (b) exhibit very similar trends, as far as the location of the minimum is concerned. The same scenario is assumed next but with different SNR (SNR ¼ 3 dB and data length 1600). The results are illustrated in Fig. 4(a) (theoretical values computed
0.4 0.2 0 _ 0.2
2
4
6
8
10
12
14
16
1 0.8 0.6 0.4 0.2 0 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Fig. 2. Two-ray communication channel and its frequency response.
1
I. Bradaric et al. / Journal of the Franklin Institute 339 (2002) 161–187
normalized mean squared error
174
1 0.9 0.8 0.7 0.6 0.5 20 10
15 10 ran k
8 6 2 0
(a)
normalized mean squared error
4
5 0
nor
maliz
ncy x
que ed fre
10
1 0.9 0.8 0.7 0.6 0.5 20 15
10 10 ran k
(b)
8 6 4
5 0
2 0
zed rmali
ency
frequ
x 10
no
Fig. 3. MSE for SNR ¼ 6 dB and data length L ¼ 1600 when: (a) h is approximated with its low-pass equivalent (theoretical values), and (b) h is the two-ray communication channel (simulation results).
using h* as the low-pass equivalent of h) and (b) (simulation results over 100 Monte Carlo runs assuming h is a true channel). Again, MSEs exhibit very similar trends. It can also be seen that in the lower SNR case, the minimum has moved towards a lower Q region. This was expected, since the lower the SNR is, the lower the rank should be picked to maximize the benefits of rank reduction. If s2r and s2w are unknown, then we can operate on the bias part only. Contour graphs representing the squared bias for different values for Q and O are shown in
normalized mean squared error
I. Bradaric et al. / Journal of the Franklin Institute 339 (2002) 161–187
1.3 1.2 1.1 1 0.9 0.8 0.7 0.6 20 15 10 ran k
5 0
normalized mean squared error
(a)
2 0
6
8 0 yx1
6
8 0 yx1
10
4 enc frequ lized
a norm
1.3 1.2 1.1 1 0.9 0.8 0.7 0.6 20 10
15 10 ran k
(b)
175
4
5
2 0
0
equ ed fr
enc
liz
a norm
Fig. 4. MSE for SNR ¼ 3 dB and data length L ¼ 1600 when: (a) h is approximated with its low-pass equivalent (theoretical values), and (b) h is the two-ray communication channel (simulation results).
* and (b) (simulation results assuming h is a Fig. 5(a) (theoretical values based on h) true channel). Gray area denotes the region where normalized squared bias is less than e ¼ 0:005 (0:5%). As it can be noticed, the graphs have very similar shape and both suggest that the best values for Q and O are Q ¼ 10 and O ¼ 0:5 (as it was explained in Section 3.2, we are looking for the smallest possible rank and the corresponding O that keeps the normalized squared bias below e).
I. Bradaric et al. / Journal of the Franklin Institute 339 (2002) 161–187
176
16
14
12
rank
10
8
6
4
2 1
2
3
4 5 6 7 normalized frequency x 10
8
9
10
1
2
3
4 5 6 7 normalized frequency x 10
8
9
10
(a) 16
14
12
rank
10
8
6
4
2
(a)
Fig. 5. Normalized squared bias when: (a) h is approximated with its low-pass equivalent, and (b) h is the two-ray communication channel.
Next, we present the performances of the proposed methods, I and Ia, on channel estimation based on low-rank moments estimates. We performed 100 Monte Carlo runs of the proposed method of system reconstruction (see Section 4) using full-rank moment estimators. The output length was 1600 and the SNR was 6 dB: The estimation results are shown in Fig. 8(a), where the true system is shown in dash–dot line, the average estimate is shown in solid line and the gray area depicts standard
I. Bradaric et al. / Journal of the Franklin Institute 339 (2002) 161–187
177
0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0
2
4
6
Fig. 6. Normalized sum
8 Rank
PM
i¼Qþ1
10
12
14
16
l2ðiÞ for different ranks.
1 0.9
Normalized variance
0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
9
10
11
12
13
14
15
16
Rank
Fig. 7. Normalized variance of the third-order statistics for different ranks for SNR ¼ 6 dB and data length L ¼ 1600:
deviation. The corresponding low-rank result, obtained using Q ¼ 7 and O ¼ 0:7; as suggested by Fig. 3(a), is shown in Fig. 8(b). Finally, for the case where s2r and s2w are unknown, the selected values for Q and O are Q ¼ 10 and O ¼ 0:5 (see Fig. 5). The
178
I. Bradaric et al. / Journal of the Franklin Institute 339 (2002) 161–187 1 0.8 0.6 0.4 0.2 0 _ 0.2 _ 0.4
0
2
4
6
8
10
12
14
16
18
20
0
2
4
6
8
10
12
14
16
18
20
0
2
4
6
8
10
12
14
16
18
20
2
4
6
8
10
12
14
16
18
20
(a) 1 0.8 0.6 0.4 0.2 0 _ 0.2 _ 0.4
(b) 1 0.8 0.6 0.4 0.2 0 _ 0.2 _ 0.4
(c) 1 0.8 0.6 0.4 0.2 0 _ 0.2 _ 0.4
0
(d) Fig. 8. System reconstruction results for bandpass channel for SNR ¼ 6 dB and data length L ¼ 1600 using: (a) full rank, (b) Method Ia (assuming that s2r and s2w are unknown) and (c) Method II.
corresponding low-rank estimation results are shown in Fig. 8(c). Figs. 8(b) and (c) indicate that both low-rank techniques resulted in reduced variance of the channel estimate as compared to the full-rank case.
I. Bradaric et al. / Journal of the Franklin Institute 339 (2002) 161–187
179
# using Eq. (36) and subsequently performed (1b) Method II: We estimated matrix R the eigenvalue decomposition the transformation matrix P: Fig. 6 PMto determine 2 illustrates normalized sum l obtained averaging over 100 independent i¼Qþ1 ðiÞ input realizations for different values for Q: We next constructed matrix P based on the Q dominant eigenvectors for Q in the range where normalized squared bias is less than e (we selected e ¼ 0:025) to model third-order statistics. The variance of the third-order statistics estimator (see Eq. (14)) is shown in Fig. 7. It can easily be verified that the squared bias defined in Eq. (13) is negligible in the observed range for Q: Therefore, Fig. 7 approximately represents the MSE of the estimator. The minimal MSE for QA½9; 16 will occur for the smallest Q: This, however, is not the overall minimum since it depends on the SNR (the lower the SNR; the lower the Q is, as it was mentioned before). Again system estimation was performed based on Q ¼ 10 as suggested by Fig. 6, and the result is shown in Fig. 8(d). As it can be seen both from Figs. 7 and 8(d), the low-rank result has reduced variance as compared to the full rank one. Example 2. In this example we considered h to be an unknown bandpass channel with effective passband C0:15pOp0:45 as shown in Fig. 9. The input process was again two-level i.i.d., zero-mean and non-symmetrically distributed random process. (2a) Methods I and Ia: Again, the channel was approximated with its bandpass equivalent (FIR bandpass filter with length 16). This time the integration given in Eq. (20) was performed in the area ½Oh ; Ol ,½Ol ; Oh ; for different values for Ol and Oh (0pOl oOh p1). We again assumed that SNR ¼ 6 dB and data length L ¼ 1600: Under these conditions, the variance is the dominant part of the MSE and the
0.2
0
_ 0.2
2
4
6
8
10
12
14
16
1 0.8 0.6 0.4 0.2 0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Fig. 9. Bandpass communication channel and its frequency response.
1
180
I. Bradaric et al. / Journal of the Franklin Institute 339 (2002) 161–187
minimal MSE occurs for the lowest rank Q ¼ 1; as can be easily verified using Eqs. (10), (13), (14), (23) and (24). Such a rank leads to the minimal MSE but with unacceptably high bias. Thus, in this case the minimal MSE is not the good criteria for selecting the transformation matrix and we will not apply Method I. In order to determine when Method I can be applied, some additional constraint can be imposed on the normalized squared bias of the cumulants of the approximated model. In that way, whenever the bias that corresponds to the solution obtained using Method I fails to meet the certain requirement, the end user knows that Method I should not be used.
1
0.5
0
_ 0.5 0
(a)
2
4
6
8
10
12
14
16
18
20
2
4
6
8
10
12
14
16
18
20
2
4
6
8
10
12
14
16
18
20
1 0.8 0.6 0.4 0.2 0 _ 0.2 _ 0.4 _ 0.6
0
(b) 1 0.8 0.6 0.4 0.2 0 _ 0.2 _
0.4
_ 0.6
0
(c) Fig. 10. System reconstruction results for SNR ¼ 6 dB and data length L ¼ 1600 using: (a) full rank, (b) Method I (assuming that s2r and s2w are known), (c) Method Ia (assuming that s2r and s2w are unknown) and (d) Method II.
I. Bradaric et al. / Journal of the Franklin Institute 339 (2002) 161–187
181
For Method Ia, the best values for Ol ; Oh and Q that keep the squared bias under e ¼ 0:005 were found to be Ol ¼ 0:1; Oh ¼ 0:5 and Q ¼ 9: The performances of the method for channel estimation are demonstrated in Fig. 10 ((a) full rank and (b) using Method Ia). Again, the low-rank result has significantly smaller variance than the full-rank one. # was (2b) Method II: Eigenvalue decomposition of the estimated matrix R performed in order to determine the transformation matrix P: Assuming e ¼ 0:025; the best rank was found to be Q ¼ 11: The results of channel estimation corresponding to L ¼ 1600 output samples and SNR ¼ 6 dB are shown in Fig. 10(c). 5.1. Robustness to short data lengths Next, we demonstrate the performance for various data lengths of low-rank estimators corresponding to all three methods. The SNR was taken to be 6 dB: Fig. 11 shows the normalized MSE of the cumulant estimates corresponding to Method I, Ia and II and Fig. 12 illustrates the normalized MSE of the corresponding channel estimates. All results are based on 100 Monte Carlo runs. It can be noticed that the graphs of Fig. 12 exhibit similar trend to those of Fig. 11, a conclusion also suggested by the approximate analysis given in Section 4.
1.6
1.4
Method II
1.2
normalized MSE
Method Ia 1
0.8
full rank 0.6
Method I 0.4
800
1000
1200
1400
1600 1800 data length
2000
2200
2400
2600
Fig. 11. MSE of the cumulants estimates for full-rank and low-rank methods for SNR ¼ 6 dB and different data lengths.
I. Bradaric et al. / Journal of the Franklin Institute 339 (2002) 161–187
182
0.28 0.26 0.24
normalized MSE
0.22 0.2 full rank 0.18 0.16
Method II
0.14 Method Ia 0.12 0.1 800
Method I
1000
1200
1400
1600 1800 2000 data length
2200
2400
2600
2800
Fig. 12. MSE of the channel estimates for full-rank and low-rank methods for SNR ¼ 6 dB and different data lengths.
4
3.5
normalized MSE
3
2.5 full rank 2
Method II 1.5
1 800
Method Ia
1000
1200
1400
1600 1800 data length
2000
2200
2400
2600
Fig. 13. MSE of the cumulants estimates for full-rank and low-rank methods for bandpass channel, SNR ¼ 6 dB and different data lengths.
I. Bradaric et al. / Journal of the Franklin Institute 339 (2002) 161–187
183
0.28 0.26 0.24
normalized MSE
0.22 0.2 full rank
0.18 0.16
Method II
0.14
Method Ia
0.12 0.1 800
1000
1200
1400
1600
1800 2000 data length
2200
2400
2600
2800
Fig. 14. MSE of the bandpass channel estimates for full-rank and low-rank methods for SNR ¼ 6 dB and different data lengths.
The same simulation was performed for the bandpass channel in Example 2 and results are shown in Figs. 13 and 14. It is evident that the full-rank MSE of channel estimates corresponding to some data length N can be attained by a low-rank estimator with data length significantly smaller than N:
5.2. Application to other HOS-based algorithms As it was mentioned before, it is reasonable to expect that using low-rank cumulant estimates with reduced variance will lead to the channel estimates with reduced variance in any algorithm that uses higher-order statistics. We also applied the low-rank approach on two other algorithms. The first of them is implemented as a standard routine in MATLAB for estimation of MA parameters and is based on both correlation and third-order diagonal slice cumulant. It is known as GM method [11]. The only difference was the cumulant estimation. Instead of using the MATLAB subroutine, we applied the low-rank technique. The results are shown in Fig. 15 (for two-ray channel) and Fig. 16 (for bandpass channel). The second method was proposed in [12] (RG method) and is based on log-bispectra. The results for this method are shown in Fig. 17 (for two-ray channel) and Fig. 18 (for bandpass channel). It can be noticed that graphs for both methods exhibit similar trend to that of Figs. 12 and 14.
I. Bradaric et al. / Journal of the Franklin Institute 339 (2002) 161–187
184
0.5
0.45
Method Ia
normalized MSE
full rank 0.4
Method II
0.35
Method I 0.3
0.25 800
1000
1200
1400
1600 1800 data length
2000
2200
2400
2600
Fig. 15. MSE of the low-pass channel estimates using the MATLAB routine for full-rank and low-rank methods for SNR ¼ 6 dB and different data lengths.
0.55
0.5
full rank normalized MSE
0.45
Method II 0.4
Method Ia 0.35
0.3
0.25 800
1000
1200
1400
1600
1800
2000
2200
2400
2600
data length
Fig. 16. MSE of the bandpass channel estimates using the MATLAB routine for full-rank and low-rank methods for SNR ¼ 6 dB and different data lengths.
I. Bradaric et al. / Journal of the Franklin Institute 339 (2002) 161–187
185
0.28 0.26
normalized MSE
0.24 0.22 full rank 0.2 Method Ia 0.18 Method II
0.16 0.14
Method I 0.12 0.1 800
1000
1200
1400
1600
1800 2000 data length
2200
2400
2600
2800
Fig. 17. MSE of the low-pass channel estimates using the RG method for full-rank and low-rank methods for SNR ¼ 6 dB and different data lengths.
0.34 0.32 0.3
normalized MSE
0.28
Method II
0.26
full rank
0.24 0.22
Method Ia
0.2 0.18 0.16 800
1000
1200
1400
1600
1800 2000 data length
2200
2400
2600
2800
Fig. 18. MSE of the bandpass channel estimates using the RG method for full-rank and low-rank methods for SNR ¼ 6 dB and different data lengths.
186
I. Bradaric et al. / Journal of the Franklin Institute 339 (2002) 161–187
6. Conclusions Rank reduction approach in estimation of HOS offers a general framework for reduction in variance in exchange for a slight increase in bias. Three different methods for subspace design of low-rank HOS estimators have been proposed. The first method requires prior knowledge about the powers of the input and noise. This method guarantees the best performances with respect to the MSE of the cumulant estimates. However, for some applications, the bias that corresponds to the minimal MSE could be high. Since the method yields approximate values for both variance and bias, a threshold can be used to determine whether or not to use the minimal MSE as a criteria for subspace design. The second method requires only information about the bandwidth of the system. For bandlimited channels, using this method we can decrease the rank of the transformation matrix and consequently obtain reduction in variance of the cumulant estimates without affecting the bias. Finally, the third method requires no prior information about the system. It also guarantees the reduction of the variance of the cumulant estimates. Another advantage of using the low-rank estimates is reduction in complexity. Obtaining the estimates of the subspace rather than of the true data requires significantly less computations. However, the exact computational savings depend on the way the transformation matrix is obtained. While in the case of Methods I and Ia, the transformation matrix can be obtained in advance, Method II requires estimation of the autocorrelation matrix, which reduces the computational savings. In addition, low-rank approach has been applied to various methods for channel identification. It was shown that using low-rank estimates, it is possible to significantly improve system identification algorithms that are based on HOS.
References [1] C.L. Nikias, A.P. Petropulu, Higher-Order Spectra Analysis, Prentice-Hall, Englewood Cliffs, NJ, 1993. [2] H. Pozidis, A.P. Petropulu, Low-complexity estimation of bandlimited channels based on trispectrum slices, 32nd Annual Conference on Information Sciences and Systems, Princeton University, Princeton, NJ, 18–20 March, 1998. [3] H. Pozidis, A.P. Petropulu, System reconstruction based on selected regions of discretized higher order spectra, IEEE Trans. Signal Process. 46 (12) (1998) 3360–3377. [4] P. Comon, Independent component analysis, a new concept? Special issue higher order statistics, Signal Process. 36 (3) (1994) 287–314. [5] P. Comon, B. Mourrain, Decomposition of quantics in sums of powers of linear forms, Special issue higher order statistics, Signal Process. 53 (2–3) (1996) 93–108. [6] L. Delathauwer, Mathematics in Signal Processing IV, IMA Conference Series, Oxford University Press, Oxford, 1997. [7] T.F. Andre, R.D. Nowak, B.D. Van Veen, Low rank estimation of higher order statistics, IEEE Trans. Signal Process. 45 (3) (1997) 673–685. [8] D. Slepian, Prolate spheroidal wave functions, Fourier analysis, and uncertaintyFV: the discrete case, Bell Syst. Tech. J. 57 (5) (1978) 1371–1429. [9] L.L. Scharf, D.W. Tufts, Rank reduction for modeling stationary signals, IEEE Trans. Acoust. Speech Signal Process. ASSP-35 (3) (1987) 350–354.
I. Bradaric et al. / Journal of the Franklin Institute 339 (2002) 161–187
187
[10] A.M. Mathai, S.B. Provost, Quadratic Forms in Random Variables, Marcel Dekker Inc., New York, 1992. [11] G.B. Giannakis, J.M. Mendel, Identification of non-minimum phase systems using higher-order statistics, IEEE Trans. Acoust. Speech Signal Process. 37 (3) (1989) 360–377. [12] M. Rangoussi, G.B. Giannakis, FIR modeling using log-bispectra: weighted least-squares algorithms and performance analysis, IEEE Trans. Signal Process. 38 (3) (1991) 281–296.