Signal Processing 73 (1999) 169—183
Source separation: A TITO system identification approach Holger Broman *, Ulf Lindgren , Henrik Sahlin , Petre Stoica Department of Signals and Systems, Chalmers University of Technology, S-412 96 Gothenburg, Sweden Systems and Control Group, Uppsala University, P.O. Box 27, S-751 03 Uppsala, Sweden Received 17 July 1997; received in revised form 10 August 1998
Abstract The blind source separation problem is studied in this paper using a system identification approach. Two unmeasurable source signals are mixed by means of a channel system resulting in two measurable output signals. The measurable signals can be used in a two-inputs two-outputs system (TITO) identification approach in order to extract the sources. The system to be identified can be described by two parts, namely a model of the source generating filters and a model of the mixing channels. Both the source generating filters and the mixing channels are modeled as ARMA-filters. It is shown that the system, under various weak conditions, is identifiable using second-order statistics only. These conditions are presented along with identifiability results. Furthermore, with some minor additional restrictions, the sources can be recovered. One method to perform this system identification is by using the prediction error method (PEM). PEM is a general method which is statistically efficient under the Gaussian hypothesis. To be able to perform the separation in an on-line manner, a recursive version of PEM is used. Simulation results using this approach are presented, both with computer generated and real-world signals. 1999 Elsevier Science B.V. All rights reserved. Zusammenfassung In diesem Beitrag wird ein Systemidentifikationsansatz auf das Problem der blinden Signaltrennung angewendet. Zwei nicht me{bare Quellensignale werden in einem System verknu¨pft an dessen Ausgang zwei me{bare Signale zur Verfu¨gung stehen. Die beiden Quellensignale ko¨nnen in weiterer Folge aus den me{baren Signalen mit Hilfe eines TITO (zwei Einga¨nge und zwei Ausga¨nge) Systemidentifikationsverfahrens extrahiert werden. Das zu identifizierende System kann durch zwei Teile beschrieben werden, na¨mlich die die Quellsignale erzeugenden Filter und die Kana¨le, die die Quellsignale verknu¨pfen. Sowohl die die Quellsignale erzeugenden Filter als auch die Kanalfilter werden als ARMAFilter modelliert. Wir zeigen, da{ das zugrundeliegende System unter schwachen Bedingungen unter Verwendung von Statistiken zweiter Ordnung identifiziert werden kann. Wir beschreiben diese Bedingungen und geben Identifikationsergebnisse an. Weiters zeigen wir, da{ unter schwachen zusa¨tzlichen Bedingungen auch auf die Quellsignale ru¨ckgeschlossen werden kann. Eine mo¨gliche Methode zur Systemidentifikation ist die Pra¨diktionsfehlermethode (PEM). Das PEM Verfahren ist eine allgemeine Methode, die unter der Gauss Annahme statistisch effizient ist. Wir stellen eine rekursive Version des PEM Algorithmus vor, die es gestattet die Signaltrennung on-line durchzufu¨hren. Schlie{lich werden Simulationsergebnisse gezeigt, die sowohl synthetische als auch natu¨rliche Signale verwenden. 1999 Elsevier Science B.V. All rights reserved.
* Corresponding author. Tel.: #46 31 772 1845; fax: #46 31 772 1782; e-mail:
[email protected] This work was financially supported by the Swedish National Board for Industrial and Technical Development (NUTEK) and the Swedish Foundation for Strategic Research (SSF). 0165-1684/99/$ — see front matter 1999 Elsevier Science B.V. All rights reserved. PII: S 0 1 6 5 - 1 6 8 4 ( 9 8 ) 0 0 1 9 1 - 1
170
H. Broman et al. / Signal Processing 73 (1999) 169—183
Re´sume´ Dans cet article, on e´tudie le proble`me de la se´paration aveugle de sources en utilisant une approche d’identification de syste`mes. Deux signaux sources non mesurables sont me´lange´s par un syste`me repre´sentant le canal, ce qui re´sulte en deux signaux de sortie mesurables. Ces signaux peuvent eˆtre utilise´s dans une approche d’identification Deux-entre´es Deux-sorties (TITO) afin d’extraire les sources. Le syste`me a` identifier peut eˆtre de`crit en deux parties, a` savoir un mode`le des filtres ge´ne´rant les sources, et un mode`le du canal les me´langeant. On adopte un mode`le ARMA pour ces deux mode`les de filtre. On montre que le syste`me, sous diverses conditions faibles, est identifiable en utilisant les statistiques du second ordre seulement. Ces conditions sont assortis de re´sultats d’identifiabilite´. De plus, avec quelques restrictions mineures, les sources peuvent eˆtre re´cupe´re´es. Une me´thode d’identification utilise la Me´thode de l’Erreur de Pre´diction (PEM). C’est une me´thode ge´ne´rale qui est statistiquement efficace sous hypothe`se gaussienne. Pour eˆtre capable d’effectuer la se´paration en-ligne, une version re´cursive de PEM est utilise´e. Des re´sultats de simulation obtenus avec cette approche sont pre´sente´s, d’une part pour des signaux synthe´tise´s par ordinateur et d’autre part pour des signaux du monde re´el. 1999 Elsevier Science B.V. All rights reserved. Keywords: Convolutive mixtures; Blind signal separation; Blind source separation; Two-inputs two-outputs; Second order statistics; RPEM; Identifiability
1. Introduction Source separation has recently become an intense area of research. An important paper was published in 1991 by Jutten and Herault [7], who presented an algorithm for the separation of static mixtures of sources using higher-order moments (i.e. moments of order higher than two) of the mixtures. Others have followed this problem formulation and advanced the methodology of separating static mixtures [2—4,20,21]. A basic assumption in this approach is that the sources to be recovered are independent, and a key result is that the sources cannot be uniquely identified from second-order statistics only. It follows that independent Gaussian sources cannot be uniquely identified. In parallel to the research on static mixtures, work has been carried out on the problem of separating dynamic mixtures of independent sources [8,12,22—24]. As has been pointed out, this model is more realistic in applications such as personal communication, enhancing speech in the presence of background noise [16] foetal ECG (electrocardiogram) extraction [1], image reconstruction [15] and passive sonar applications. The results that have been published thus far are based on more or less ad hoc applications of various criteria and algorithms, and the search continues for a most general, statistically sound
criterion that yields the desired separation. Furthermore, rigorous analyses of the algorithms suggested for source separation are still to be carried out. In particular, the parameter identifiability and signal recoverability questions, relating to the dynamic mixture case of the source separation problem, have not been addressed properly in the literature. Tong et al. [21] proved identifiability in a wider context but only for instantaneous mixtures. Gorokhov and Loubaton [6] presents some results but only for scenarios with fewer sources than sensors. Also, Moulines et al. [11] presents some results for single input multiple output systems. Thus it appears that results of identifiability and convergence have been neglected in the literature. In the present paper we reformulate the separation problem, for the case of two sources, as a two-input two-output (TITO) system identification problem. We show that the system is parameter identifiable, in several cases, using second-order statistics only. This reduces the demand of source independence to source uncorrelatedness. In addition, the paper demonstrates that under natural, practical conditions, the signals can be recovered from the identified system. It is shown that wellknown methods from the system identification literature can be applied, and, in particular, that minimization of the prediction errors is an approach which leads to the desired source separation.
H. Broman et al. / Signal Processing 73 (1999) 169—183
We should note here that the source separation is an instance of an interesting class of estimation problems for which passing from the static case to the dynamic one generally weakens the parameter identifiability conditions. Indeed, as mentioned above, the second-order information on the observed data is not enough to solve the static channel case of the source separation problem, whereas a dynamic mixture can be solved in most cases by using second-order statistics solely. The intuitive explanation to this somewhat counterintuitive feature relies on the following observation. While increasing the ‘dynamic degree’ of the mixture or of the source-generating mechanism increases the number of unknown parameters, it also leads to an increase, typically at a faster rate, of the number of equations that can be derived from the second-order moments of the data.
2. Problem formulation Fig. 1 depicts the scenario under consideration. A TITO system is used to specify the source separation problem. It is assumed that the source signals are ARMA processes and that the channel filters have rational transfer functions. The resulting equations can be put in a matrix form as % % m y " $ , $ (2.1) % % m y $ $ where G ,G , etc., are polynomials in the unit delay operator z, and the signals y ,y , etc., are functions
171
of the discrete time variable t"1,2,2 . (To simplify notation we omit the dependence on z and t whenever possible.) Without any restriction we assume that F (z), F (z), A (z) and A (z) are monic and stable, and that G (z) and G (z) are minimum phase. Also, the two driving sequences m (t) and m (t) are assumed to be zero-mean, unit variance, mutually uncorrelated white noises. The system identification problem can now be stated as the problem of estimating all polynomials in Eq. (2.1). Note that the source separation problem requires only the identification of the channel so that the source signals x and x can be re covered (see Section 4). However, channel estimation only is in general suboptimal from the statistical performance standpoint. It can attain the statistical performance achieved by system identification only if the Crame´r—Rao bound matrix for all parameter estimates is block diagonal, with no correlation between the estimates of the source parameters and the estimates of the channel parameters, see [17]. In this paper we focus on the system identification approach to the source separation problem.
3. Parameter identifiability This section deals with the parameter identifiability of the system presented in Fig. 1 and Eq. (2.1), using second-order statistics only. It follows from Eq. (2.1) that the spectral density matrix, W, of the output vector is
W"
N /D N /D , N /D N /D
(3.1)
where
Fig. 1. The data generating system.
N "G G F F A A #G G F F B B ,
(3.2)
N "G G F F A B #G G F F A B ,
(3.3)
N "N ,
(3.4)
N "G G F F B B #G G F F A A ,
(3.5)
D "F F F F A A ,
(3.6)
D "F F F F A A ,
(3.7)
172
H. Broman et al. / Signal Processing 73 (1999) 169—183
D "D ,
(3.8)
D "F F F F A A .
(3.9)
Hereafter, the overbar denotes a polynomial in z\, i.e. pN (z) equals p(z\). The parameter identifiability (PI) problem can be formulated as follows. Given the spectral density matrix above, can we uniquely derive the coefficients of the polynomials involved? A convenient source of information for the PI analysis turns out to be the determinant of the matrix W, the numerator and denominator of which are given by N "G G G G (A A !B B )(A A !B B ) (3.10) and D "F F F F A A A A ,
(3.11)
respectively. The following generically valid condition is typical of an identifiability analysis. Assumption C1. No cancelations occur either in the elements of the spectral matrix or in its determinant. Furthermore, the determinant of spectral matrix W is assumed not to be identically zero. Note that according to Assumption C1, the polynomial B B has no roots in common with A A , otherwise cancelations would occur in the ratio of Eqs. (3.10) and (3.11). We start the analysis with the following general observation. Due to the stability assumption along with Assumption C1 above, we can from Eqs. (3.6), (3.9) and (3.11) uniquely identify the polynomials A ,A and the product F F . We now proceed with a lemma. Lemma 1. ºnder Assumption C1, the system given by Eq. (2.1) is PI if the polynomial B or B is known. Proof. Due to the symmetry of the problem, it suffices to show Lemma 1 for the case of known B .
The following polynomial relation follows from Eqs. (3.2), (3.3), (3.4) and (3.5): A N !B N , B "A (3.12) A N !B N so that B is PI. Now, as N "G G G G , (A A !B B )(A A !B B ) (3.13) and G and G are minimum phase by assumption, it follows that the product G G is PI. In addition, as B N !A N "G G F F , (3.14) A (B B !A A ) it follows that the product G F is PI. As F F , G G and G F are PI, it follows that F , F , G and G are PI, which concludes the proof of Lemma 1. 䊐 By using Lemma 1, we can demonstrate the following propositions, along a line of argurments inspired by [18]. Proposition 1. ºnder assumption C1, the system given by Eq. (2.1) is PI if the filters of both ch annels have more poles than zeros. Proof. Let z be a root of A with multiplicity k. G The polynomial relation (3.12) can be rearranged A (B N !A N )"B (B N !A N ).
(3.15)
Since the root z of A has multiplicity k we know G that dL 0" [A (B N !A N )]" XXG dzL for n"0,1,2,k!1.
(3.16)
Next, we show by induction that 0"(B N !A N )H" XXG for j"0,1,2,m where m)k!1.
(3.17)
H. Broman et al. / Signal Processing 73 (1999) 169—183
1. m"0 is obvious, since B (z ) in Eq. (3.15) differs G from zero according to Assumption C1. 2. Assume it is true for m"M(k!1. Then for M#1 we have
173
(3.18)
unique roots of A and sufficiently many deriva tives at each multiple root of A . This is sufficient in order to specify B completely. Thus, B is PI under the assumption of this proposition and it follows from Lemma 1 that the system is PI. 䊐
;(B N !A N )H" XXG
(3.19)
Proposition 2. ºnder Assumption C1, the system given by Eq. (2.1) is generically parameter identifiable (GPI) if one of the filters of the channel has more poles than zeros (and we do not necessarily know which one).
"B (B N !A N )+>" . XXG
(3.20)
d+> [B (B N !A N )]" 0" XXG dz+> +> " H
M#1 j
B+>\H
As B (z )O0, the second part of the induction is G demonstrated. From Eq. (3.17) it follows that at z"z G (A N )H"(B N )H for j"0,1,2,m where m)k!1.
(3.21)
By rewriting Eq. (3.21) we obtain
H j (A N )H" " BH\JNJ " XXG XXG l J for j"0,1,2,m where m)k!1.
Proof. Assume that the degree of A is less than or equal to the degree of B . Then by using the argument of Proposition 1, we would estimate a polynomial BK which differs from the true polynomial B . However, we will generically be able to detect such an error by making use of the equation
(3.22)
Eq. (3.22) can be used to form a system of linear equations. Each equation in that system corresponds to a j in Eq. (3.22). This can be stated as a matrix equation having the following form: l"Mb,
For a definition of GPI see [19]. Roughly speaking, GPI means that the PI property fails to hold only in a zero measure set in the set of all possible systems.
(3.23)
where M is a lower left triangular matrix with N (z ) on the diagonal, l a vector in which the jth G row is the left-hand side of Eq. (3.22) and b the vector of B and its derivatives at z"z . As N (z ) G G differs from zero, M is non-singular. Eq. (3.23) is a matrix formulation of Eq. (3.22). For each root z of A , a value of B for that same G z"z is specified by the uppermost row of G Eq. (3.23). If A has multiple roots, values of the derivatives of B for z"z are specified by the G consecutive rows of Eq. (3.23). Thus, it follows from Eq. (3.23) that we can obtain values of B in as many points as we have
A N !B N , B "A A N !B N
(3.24)
which can simply be derived from Eqs. (3.2)—(3.11). Using Eq. (3.24), but inserting BK rather than B yields an expression that in the generic case does not evaluate to a polynomial. We will thus be directed towards using the filter B /A , which will yield the correct identification of the system as shown by Proposition 1. 䊐 Proposition 3. ºnder Assumption C1, the system given by Eq. (2.1) is PI if the sources are purely autoregressive, and the filter A A !B B is min imum phase and of degree larger than those of B and B . Proof. From Eq. (3.10), it follows that the roots of A A !B B can be determined, as G and G are
174
H. Broman et al. / Signal Processing 73 (1999) 169—183
scalars. Let us study the polynomial ratios
system. This proposition was also stated in [21], but it is included since it is a somewhat different way to study identifiability for distinct power spectra.
N R "A N G G F F A B #G G F F A B (3.25) "A G G F F A A #G G F F B B and N R "A N G G F F A B #G G F F A B . (3.26) "A G G F F B B #G G F F A A If z is a root of A A !B B then, by inserting G A A "B B in Eq. (3.25), a simple calculation yields R (z )"B (z ), G G and similarly, from Eq. (3.26)
(3.27)
R (z )"B (z ). (3.28) G G By using the same argument as in Proposition 1 to cover the case of multiple roots of A A !B B we conclude that B and B can be identified and by using Lemma 1 we can then identify G ,G ,F and F . Thus, the system is PI. 䊐 Remark 1. The minimum phase requirement on A A !B B is elaborated upon in the next sec tion, where it is shown to be a necessary and sufficient condition for the signal recoverability. Compared to Proposition 2, the previous Proposition 3 is important in that it includes the case of purely moving average channels, of particular interest in communication applications. In such applications the sources are speech signals which can be well approximated by AR models (cf. the assumption made in Proposition 3). An assumption common to Propositions 1—3 is that the channel is not static. The following proposition demonstrates the necessity of such an assumption for the parameter identifiability of the
Proposition 4. ¹he system given by Eq. (2.1), with a static channel, is only locally parameter identifiable (¸PI) with exactly two solutions if the sources are colored differently, (i.e. G /F is not proportional to G /F ) and the product of the channel gains dif fers from unity. Proof. It follows from the expression for the numerator of the determinant and the minimumphase assumption on G and G , that the roots, if any, of the product G G are identifiable. Recall that the product F F is also (always) identifiable. Let z be a root of G G F F (which has been assumed to have at least one root). Consider the polynomial ratios N G G F F #G G F F b R " " N GGFFb#GGFFb
(3.29)
and N !kN , R (k)" N !kN
(3.30)
where b and b are the channel gains. Due to the assumption made, it is possible to find a root z where N differs from zero. The following two possibilities exist for the value of Eq. (3.29) at the root z : R (z )"b if z is a root of F G ,
(3.31a)
1 R (z )" if z is a root of F G . b
(3.31b)
Assume that the root belongs to F G , then Eq. (3.29) results in the scalar b . Eq. (3.30) evalu ates to b when inserting k"b . Thus, one solution is the pair (b ,b ). Now, if the root z belongs to F G then Eq. (3.29) results in the scalar 1/b . In serting k"1/b in Eq. (3.30) results in 1/b . Thus, a second solution is the pair (1/b ,1/b ). Then it follows after a straightforward calculation that
H. Broman et al. / Signal Processing 73 (1999) 169—183
there are exactly two possible solutions to the identification problem: +F ,F ,G ,G ,b ,b , (case a), +F ,F ,G b ,G b ,1/b ,1/b , (case b). This concludes the proof of Proposition 4.
䊐
Remark 2. Assumption C1 is not used to prove Proposition 4 which makes this proposition more general.
Proof. Let G "k R F and
(case a) xL "x and xL "x ,
w "k#kb, w "w "kb #kb and
Next we show that the assumption of sources with different colors is not only sufficient for the system with static channels to be LPI, but also necessary. This result was given in [21] but not explicitly proven. Proposition 5. In the case of G /F proportional to G /F and static channels the system given by Eq. (2.1) is not even locally identifiable.
k#kb kb #kb (3.33) kb #kb kb#k so that R is PI. Identifiability is thus determined by the solution set of the following equations: W"RR
In case a the estimates equal the source signals in a logical order. However, in case b the estimated signals have swapped the source signals. Thus, case b specifies the meaning of the ‘channel flop’ solution.
(3.32a)
G "k R, (3.32b) F where R has a monic polynomial also in the numerator. It follows that the spectral matrix (3.1) can be written as
Remark 3. Using the channel gains of the two solutions found in the proof of Proposition 4 we find
(case b) xL "b x and xL "b x .
175
(3.34a) (3.34b)
w "kb#k, (3.34c) where w , w and w can be identified from the spectral matrix W and the rational transfer function R. The solution to Eqs. (3.34a)—(3.34c) can be written as w !w b , b " w !w b w !w b k" 1!b b and
(3.35a) (3.35b)
k"w !kb,
(3.35c)
Table 1 Sufficient conditions for parameter identifiability from second-order statistics Type of channel
Identifiability result
Exact statement
Static channel
The system is LPI iff the sources have different color.
Propositions 4 and 5
Dynamic channel
The system is PI if both channel filters have poles in excess to zeros. It is still PI even if only one channel filter has poles in excess to zeros, provided we know which one; If we do not, then the system is GPI. It is PI if the sources are AR and A A !B B is minimum phase
Proposition 1 Propositions 1 and 2 Proposition 3
176
H. Broman et al. / Signal Processing 73 (1999) 169—183
where b is free to vary with the restriction that the division in Eqs. (3.35a) and (3.35b) are valid and that k and k must be non-negative. Thus, for cases where k*e, k*e, e'0 there exists a neighborhood to the point +b ,b ,k,k, where Eqs. (3.35a), (3.35b) and (3.35c) have a one-dimensional infinite solution set. This shows that the problem is not locally identifiable. 䊐 A summary of the previously derived system identifiability results is presented in Table 1.
4. Signal recoverability In applications, the goal is often to recover the source signals, i.e. x and x in Fig. 1. It immediate ly follows from the figure that
y 1 B /A x " , (4.1) y B /A 1 x so that x and x are recoverable iff the transfer matrix in Eq. (4.1) is casually invertible. This condition is equivalent to the condition that the filter B B 1! , (4.2) A A has a stable inverse, which it has if and only if A A !B B is minimum phase. It is of interest to note that the Rouche´ theorem [14] gives a sufficient condition for A A !B B to be minimum phase. For the reader’s convenience, we first state this theorem. Theorem (Rouche´). If two functions f (z) and g(z) are analytic on and inside a simple, closed path C, and " f (z)"'"g(z)" on C, then f (z) and f (z)#g(z) have the same number of zeros inside C. Now, let C be the unit circle and f (z)"A (z)A (z) and
(4.3)
g(z)"!B (z)B (z).
(4.4)
As A (z)A (z) has no zeros inside C, then neither has A (z)A (z)!B (z)B (z) if B (z)B (z) (1 for u3[0,2p]. (4.5) A (z)A (z) X U We have thus proven the following proposition.
Proposition 6. If the channel is such that the product of the channel frequency gains is less than unity for all frequencies, then the source signals are recoverable. The previous condition (4.5) on the channel dynamic gains is a very reasonable one in most applications. Note that this condition is in fact only sufficient for recoverability, i.e. A A !B B may be minimum phase even if "A A ")"B B " somewhere on the unit circle (as shown by some simple examples, which we omit in the interest of brevity).
5. Parameter estimation The observed output vector, [y (t) y (t)]2, of the TITO system under study is a bivariate ARMA signal, the unknown parameters of which can be estimated by several methods (see, e.g. [9,19]). From the class of available parameter estimation methods we choose the prediction error method (PEM), for two reasons: 1. PEM is a most general approach which, under weak conditions, guarantees the convergence of the parameter estimates — as the sample length increases — to the set of points for which the model spectral density matrix equals the true spectral density. According to the previous analysis of identifiability, the abovementioned set consists of a single point (corresponding to the true parameter values), under reasonable conditions. This means that the PEM parameter estimates are consistent (i.e. the true parameter values minimize the asymptotic PEM cost function) [9,19]. 2. Under a Gaussian hypothesis, the PEM estimates are also statistically efficient. If the Gaussian hypothesis is relaxed, the PEM is still the
H. Broman et al. / Signal Processing 73 (1999) 169—183
most accurate method in the fairly large class of estimates whose asymptotic properties do not depend on the data distribution [9,19]. The use of PEM for the estimation of the unknown parameters in Eq. (2.1) is straightforward, owing to the fact that the model (2.1) requires only a minor modification to become a PE generating equation, as described in the next proposition. Proposition 7. ¸et, for k"1,2, p "G (0), I I b "B (0) I I and introduce the following scaled versions of +G , I and +m ,: I G QG /p , I I I mI "p m . I I I (For convenience we use the same notation for the scaled and un-scaled +G , polynomials). Rewrite I Eq. (2.1) in the following equivalent form (assuming b b O1):
(5.1)
Assume that A A !B B is minimum phase. ¹hen Eq. (5.1) is a PE model of the data y(t), which means that it satisfies [9,19]: e(t) is white noise,
(5.2)
H(0)"I,
(5.3)
H is stable,
(5.4)
H\ is stable.
(5.5)
Proof. Conditions (5.2) and (5.3) obviously hold true. The rational matrix H is stable by assumption.
177
Finally, the condition on A A !B B guarantees that H\ is also stable. 䊐 It is readily verified that the covariance matrix of the prediction errors +e, is given by
p#bp b p#b p . (5.6) b p#b p p#bp The prediction errors can (asymptotically) be obtained from Eq. (5.1) as K"E(ee2)"
(5.7)
e"H\y(t)"Gy(t), where
G"
1
b and where
1
b 1
$ % 0
0 $ %
1
\ 1
(5.8)
\ 1
A A !A B 1 . (5.9) A A !B B !A B A A We parameterize the PE model (5.7) (or equivalently, Eq. (5.1)) by the elements of the matrix K and the vector h, which contains the coefficients of "
+A ,A ,B ,B ,F ,F ,G ,G ,. (5.10) Note that we use three parameters to describe K, whereas it depends only on two independent variables in addition to h, namely p and p (cf. Eq. (5.6)). It turns out that this way of parameterizing K is computationally advantageous (see [9,19] and below). The price to be paid for that advantage is a (most likely, minor) degradation in the statistical performance of the PEM, by the parsimony principle. The PEM obtains parameter estimates by minimizing the following function, with respect to h: R jR\Qe2(s,h)KK \e(s,h), (5.11) Q Q where now e is to be seen as a function of the indeterminate h. In Eq. (5.11), j3(0,1] is a forgetting factor (the more slowly time varying the system under study is, the closer to 1 is j selected); and KK is Q
178
H. Broman et al. / Signal Processing 73 (1999) 169—183
an estimate of the covariance matrix K, obtained from s samples, for example Q KK "(1!j) jQ\Ie(k,h)e2(k,h) (5.12) Q I (for j"1, the factor (1!j) above is to be replaced by 1/s). The forgetting factors in Eqs. (5.11) and (5.12) need not be the same. As in most applications the source separation problem must be solved in ‘real time’, we focus on the recursive PEM (RPEM) estimation of the unknown parameter vector h. The recursive (in t) minimizer of the criterion in Eq. (5.11) can be obtained by the following set of Gauss—Newton recursions (see [9,19]): hK (t)"hK (t!1)#K(t)e(t), K(t)"P(t!1)t(t) [jKK #t2(t)P(t!1)t(t)]\, R P(t)"+P(t!1)!P(t!1)t(t) ;[jKK #t2(t)P(t!1)t(t)]\t2(t)P(t!1),/j, R (5.13) where hK (t) denotes the parameter estimate at time instant t, e(t) is an on-line approximation of e(t,hK (t!1)), and t(t) is an on-line approximation of the Jacobi matrix
!
*e2(t,h) *h
. FFK R\
(5.14)
The matrix P(t), obtained as a by-product of the calculation in Eq. (5.13), can be shown to equal the covariance matrix of the estimation error in hK (t) (for large values of t). The estimate of the noise covariance matrix, appearing in Eq. (5.13), is easily updated as follows: KK "jKK #(1!j)e(t)e2(t). R R\
(5.15)
A simple way to initialize the recursive computations in Eq. (5.13) is as follows (see [9,19]): hK (0)"an initial estimate of h (e.g. hK (0)"0), P(0)"cI, where c'0 (the more uncertain hK (0), the larger c).
The on-line computation of e(t,hK ) and of Eq. (5.14) can be done as outlined in the next section (also see [9,19]). To complete the description of the RPEM algorithm, we must derive equations for the elements of the Jacobi matrix (5.14), which is done in the remaining part of the present section. Let a denote a generic element of h. As, from Eq. (5.7), *e(t,h) *G(z,h) " y(t) *a *a
(5.16)
it follows that we need expressions for *G(z,h)/*a. To illustrate the derivation of these expressions, we let a be the ith coefficient of A , a"a . Then G $ 1 b 0 1 *G(z,h) % \ ! " $ *a b 1 0 1 % 0 0 1 \ ; 1 ! zG 0 _zGG (z,h). (5.17) Let
e (t,h)"!G (z,h)y(t). Then, it follows from Eq. (5.17) that *e(t,h) "e (t!i,h), i"1,2,2 . *a
(5.18)
(5.19)
The important conclusion is that it suffices to compute e (t) in order to obtain the derivatives of e(t,h) with respect to all coefficients in A . Similar expres sions and conclusions hold true for the derivatives with respect to the other elements of h. The RPEM algorithm (5.13) should be supplemented with a method for ensuring that the ‘signal recoverability’ condition holds true during the recursive computation of the parameter estimates (otherwise the computation of e(t) and t(t) becomes unstable). The stability of +G , must also be enI sured, see Eqs. (5.8) and (5.17), etc. While this ‘stability monitoring’ can be done, at the price of slightly increasing the computational burden (see, e.g. [9,19]), it appears that it is necessary only in boundary cases, that is in instances where the true system is such that either A A !B B or +G ,, or I both, have roots close to the unit circle.
H. Broman et al. / Signal Processing 73 (1999) 169—183
179
6. Signal estimation
7. Simulations
Under the ‘signal recoverability’ condition, (4.2), the source signals are obtained from the observed outputs of the TITO system as
In this section we present simulation results to illustrate the theoretical developments in the previous sections. The source signals used in the simulations of the first two examples have been generated as two mutually independent realizations of zero mean, unit variance, white Gaussian noises, filtered through the ARMA-source filters G /F and G /F , respectively. These two source signals have been mixed according to Eq. (2.1) using zero initial conditions in all filters. To ensure the stationary of the data sequences we have excluded the transient part of the output sequences of y and y . In Example 3 real-world signals have been used. These are signals which are acoustically mixed and recorded by two microphones. The recursions for the parameters (5.13) and the covariance (5.12) employ a ‘forgetting’ factor j. This j is time varying and increases towards 1 as j(t)"0.99R0.95#(1!0.99R) in the first two examples. In Example 3 j is varying as j(t)" 0.99R0.995#(1!0.999R)0.999. Initial values for all parameters to be estimated were set to zero in every simulation. The prediction error covariance matrix K, Eq. (5.15), was initialized as the identity matrix and the estimating error covariance matrix P(t), Eq. (5.13), was initialized by the identity matrix times 0.01 in the first two examples and as 0.001 times the identity matrix in Example 3. Simulations have shown that the algorithm sometimes estimates a +G , which has roots on or I outside the unit circle. When this occurs, the step size of the Gauss—Newton recursion (Eq. (5.13)), is repeatably halved until the new parameter estimate hK (t) produces a stable +G , (see [10]). I The results from the simulations are presented as estimated filter coefficient values as functions of time (sample number). In the first two examples 10 independent realizations are presented in each graph in a superimposed manner in order to visualize the behavior of the algorithm. The true values of the source and channel parameters are indicated as dotted lines. The figures also present the root mean square (rms) values of the difference between the true and the estimated source signals. These rms values are averaged over 10 realizations, smoothed
x A A 1 " A A !B B !A B x
!A B A A
y
. (6.1)
y
Estimates of the signals, x and x , can hence be derived by using the estimated filters, provided by RPEM, instead of the unknown parameters in Eq. (6.1):
xL AK AK 1 " AK AK !BK BK xL !A K BK
!AK BK AK AK
y
. (6.2)
y
The filtering in Eq. (6.2) is usually initialized with zero, since the values of y and y prior to the observation interval are typically unknown. (Note that the closer the zeros of A A !B B are to the unit circle, the more signals samples are affected by an arbitrary initialization of the filtering in Eq. (6.1).) The filtering is to be performed in an on-line manner. To illustrate how this can be done (see [9,19] for more details), let B z" u A
(6.3)
or in a difference equation form z(t)#a z(t!1)#2#a z(t!n) L "b u(t)#b u(t!1)#2#b u(t!m). K
(6.4)
A simple on-line approximation of z(t), aL (t!1), bK (t!1) can be obtained as follows: z(t)"!aL (t!1)z(t!1)!2!aL (t!1)z(t!n) L #bK (t!1)u(t)#bK (t!1)u(t!1)#2 #bK (t!1)u(t!m), K
(6.5)
where z(t!1), z(t!2), etc., are values previously computed by using aL (t!2), bK (t!2), etc.
180
H. Broman et al. / Signal Processing 73 (1999) 169—183
over 100 samples using a Bartlett window and normalized using the true source signals.
channel were modeled by using the true number of parameters. The simulation results are presented in Fig. 2.
7.1. Example 1 7.2. Example 2 The system used in this section is PI according to Propositions 1 and 2. The signals used in this example were generated using the following parameter values:
The system used in this section is PI according to Proposition 3. The signals were generated using the following parameter values:
A "1#0.9z#0.4z, A "1!0.6z, B "0.7!0.2z, B "0.7#0.3z, F "1!1.3z#0.7z, F "1#1.3z#0.7z, G "1#0.2z, G "1!0.2z. It is seen that the filter B /A contains more poles than zeros while B /A has equally many. In the RPEM algorithm both the source filters and the
A "A "1, B "0.5!0.4z, B "0.5, G "G "1, F "1!1.3z#0.7z, F "1#1.3z#0.7z. Once again both the source filters and the channel were modeled by using the true number of parameters. The simulation results are presented in Fig. 3.
Fig. 2. Parameter estimates and normalized rms values of both the residuals for Example 1. The dotted lines show the true parameter values.
H. Broman et al. / Signal Processing 73 (1999) 169—183
181
Fig. 3. Parameter estimates and normalized rms values of both the residuals for Example 2. The dotted lines show the true parameter values.
7.3. Example 3 In this section real-world signals were used. The signals were acoustically mixed in an anechoic room and recorded by a modified mobile unit. This modified mobile unit has two microphones, one mounted at the lower end and one mounted at the upper end of the mobile unit. The source signal corresponding to x in Fig. 1 was a speech se quence from a TIMIT-database (Texas Instruments, Inc., Massachusetts Institute of Technology) of a total length of 90 s. This sequence was stored on a DAT-recorder used for stimulating an artificial mouth. The modified mobile unit was placed near the artificial mouth. The other source signal was noise, recorded inside a car, coming from a loudspeaker placed behind the mobile unit. This arrangement implies that the speech signal arrives at the first microphone some three samples before arriving at the second microphone (sampling rate 8000 kHz). In the same way the noise arrives at the second microphone before arriving at the first one. Thus, the channel can be modeled with causal filters. Other arrangements of the position of the speech signal and the noise signals might lead to the need of using noncausal filters [16] in the model of the channel. See [16] for more details about similar real-data experiments.
The signals used are non-stationary, in conflict with the assumptions made in the analysis in the preceding sections. Also, speech signals typically require a very large number of parameters to be ‘correctly modeled’. Several different models of the source filters have been tested, but as one source signal contain speech it seems reasonable to model them as AR processes (see, e.g. [13]). In the present example, an AR(3) model was used for both sources. The results are presented in Fig. 4. It should be noted that for recovering the source signals it is only the values of the channel parameters that are important. The RPEM has also been run with larger degrees of the source filter models. However, this increase in the number of parameters appeared to result in an increased variance of the estimated parameters and a decrease in performance (larger rms error). In Fig. 4 the rms error is presented for separated signal s . The true source signal, x , has been taken as the speech sequence recorded by the modified mobile unit in the anechoic room without any noise present, see [16]. The reduction in the rms error between observed signal y and seperated signal s , after the transient part of the parameters, varies between 0 and !10 dB, according to Fig. 4 with the mean of !4.9 dB. The instances where the reduction in the rms error is close to zero are when the speech signal is very strong.
182
H. Broman et al. / Signal Processing 73 (1999) 169—183
Fig. 4. Parameter estimates and normalized rms values of one channel for real-world signals (Example 3).
8. Concluding remarks In this paper we have shown that, under reasonable conditions, a TITO system identification approach to the source separation problem results in parameter identifiability from second-order statistics only. A previous work on PI has been published in [21]. In that paper, LPI is shown using second-order statistics and accepting a channel flop, but only for static channels. Those results are covered by Propostion 4 of this paper. The present paper shows PI for several scenarios of sources and channels. PI is not yet shown for the general case of ARMA sources and ARMA channel filters. This is an area for further research. One of the novelties presented, in a signal separation context, is the estimation of the filter coefficients for both the sources and the channel. To perform the signal separation, only estimation of the channel parameters is needed. In a paper by Feder et al. [5], the estimation of the spectral parameters of the source signals is also considered. In that paper, however, only FIR channel filters were allowed and the sources were assumed to be AR-processes.
These constraints are removed in the present paper and, with specified restrictions, ARMA filters are allowed for both the sources and the channel. In the setup of the present paper additive noises on the signals y and y are not considered. This is a limitation and it might be considered in the future. The RPEM algorithm seems to perform well when all filter coefficients to be estimated are initialized at zero. Other initializations have occasionally caused convergence problems. One solution to these problems is to restart the algorithm with other initial values of the parameters. The simulation of special interest is the one with real-world signals in Section 7.3. It appears that the RPEM algorithm can improve the signal quality, also in the case of underparameterization of the source filters. Extensive simulations have been performed with underparameterized sources for known source filters and channel. Good estimates of the channel were produced as long as the model of the sources was able to approximate the true source filters reasonably well. Hence, it may be possible that under mild conditions the channel is PI even though the sources are not exactly modeled.
H. Broman et al. / Signal Processing 73 (1999) 169—183
References [1] E. Bacharakis, A.K. Nandi, V. Zarzoso, Foetal ECG extraction using blind source separation methods, in: Proc. EUSIPCO’96, Vol. I, Trieste, Italy, 1996, pp. 395—398. [2] A. Belouchrani, K. Abed Meraim, J.-F. Cardoso, E¨. Moulines, A blind source separation technique based on second-order statistics, IEEE Trans. Signal Process. 45(2) (February 1997) 434—444. [3] P. Comon, Independent component analysis, A new concept?, Signal Processing 36 (June 1994) 287—314. [4] N. Delfosse, P. Loubaton, Adaptive separation of independent sources: A deflation approach, in: Proc. ICASSP’94 Conf. Vol. IV, Adelaide, Australia, 1994, pp. 41—44. [5] M. Feder, A.V. Oppenheim, E. Weinstein, Maximum likelihood noise cancellation using the EM algorithm, IEEE Trans. Acoust. Speech Signal Process. ASSP-37 (February 1989) 204—216. [6] A. Gorokhov, P. Loubaton, Multiple input multiple output arma systems: second-order blind identification for signal extraction, in: Proc. 8th IEEE Signal Processing Workshop on Statistical Signal and Array Processing, Corfu, Greece, 1996, pp. 348—351. [7] C. Jutten, J. Heuralt, Blind separation of sources, Part I: An adaptive algorithm based on neuromimetic architecture, Signal Processing 24 (June 1991) 1—10. [8] U. Lindgren, H. Broman, Source separation: Using a criterion based on second-order statistics, IEEE Trans. Signal Process. 46 (7) (July 1998) 1837—1850. [9] L. Ljung, System Identification, Theory for the User, Prentice-hall, Englewood Cliffs, NJ, 1987. [10] L. Jung, T. So¨derstro¨m, Theory and Practice of Recursive Identification, MIT Press, Cambridge, MA, 1983. [11] E. Moulines, P. Duhamel, J.F. Cardoso, S. Mayrargue, Subspace methods for the blind identification of multichannel FIR filters, IEEE Trans. Signal Process. 43 (2) (February 1995) 516—525.
183
[12] H.-L. Nguyen Thi, C. Jutten, Blind source separation for convolutive mixtures, Signal Processing 45 (1995) 209—229. [13] L.R. Rabiner, R.W. Schafer, Digital Processing of Speech Signals, Prentice-Hall, Englewood Cliffs, NJ, 1978. [14] W. Rudin, Real and Complex Analysis, McGraw-Hill, New York, 1966. [15] H. Sahlin, H. Broman, Blind separation of images, in: Proc. 30th Asilomar Conf. on Signals, Systems, and Computers, 1996, pp. 109—113. [16] H. Sahlin, H. Broman, Separation of real-world signals, Signal Processing 64 (1) (January 1998) 103—113. [17] H. Sahlin, U. Lindgren, The asymptotic Crame´r—Rao lower bound for blind signal separation, in: Proc. 8th IEEE Signal Processing Workshop on Statistical Signal and Array Processing, Corfu, Greece, 1996, pp. 328—331. [18] T. So¨derstro¨m, Spectral decomposition with application to identification, Technical Report, Tech. Rep. Uptec 7942R, Institute of Technology, Uppsala University, Sweden, 1979. [19] T. So¨derstro¨m, P. Stoica, System Identification, PrenticeHall, London, UK, 1989. [20] L. Tong, Y. Inouye, R. Liu, Waveform-preserving blind estimation of multiple independent sources, IEEE Trans. Signal Process. 41 (July 1993) 2461—2470. [21] L. Tong, Y.R. Liu, V. Soon, Y. Huang, Indeterminacy and identifiability of blind identification, IEEE Trans. Circuits System 38 (May 1991) 499—509. [22] S. van Gerven, D. van Compernolle, Signal separation by symmetric adaptive decorrelation: stability, convergence and uniqueness, IEEE Trans. Signal Process. 43 (1995) 1602—1612. [23] E. Weinstein, M. Feder, A.V. Oppenheim, Multi-channel signal separation by decorrelation, IEEE Trans. Speech Audio Process. 1 (October 1993) 405—413. [24] D. Yellin, E. Weinstein, Criteria for multichannel signal separation, IEEE Trans. Signal Process. 42 (August 1994) 2158—2168.