Statistics and Probability Letters 78 (2008) 3075–3081
Contents lists available at ScienceDirect
Statistics and Probability Letters journal homepage: www.elsevier.com/locate/stapro
Strong consistency of a family of model order selection rules for estimating 2D sinusoids in noise Mark Kliger a , Joseph M. Francos b,∗ a
Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, USA
b
Department of Electrical and Computer Engineering, Ben-Gurion University, Israel
article
info
Article history: Received 15 November 2005 Received in revised form 4 September 2007 Accepted 1 May 2008 Available online 18 May 2008
a b s t r a c t We consider the problem of jointly estimating the number as well as the parameters of 2D sinusoidal signals, observed in the presence of an additive white Gaussian noise field. In this paper we prove the strong consistency of a large family of model order selection rules. © 2008 Elsevier B.V. All rights reserved.
1. Introduction We consider the problem of jointly estimating the number as well as the parameters of 2D sinusoidal signals, observed in the presence of an additive white Gaussian noise field. This problem is, in fact, a special case of a much more general problem: From the 2D Wold-like decomposition we have that any 2D regular and homogeneous discrete random field can be represented as a sum of two mutually orthogonal components: a purely-indeterministic field and a deterministic one. In this correspondence we consider the special case where the deterministic component consists of a finite (unknown) number of sinusoidal components, while the purely-indeterministic component is assumed to be a white noise field. Many algorithms have been devised to estimate the parameters of sinusoids observed in additive white Gaussian noise. Most of these assume the number of sinusoids is a priori known. However this assumption does not always hold in practice. Over the past three decades the problem of model order selection for 1D signals has received considerable attention. In general, model order selection rules are based (directly or indirectly) on three popular criteria: Akaike information criterion (AIC) (Akaike, 1974), the minimum description length (MDL) (Rissanen, 1978), and the maximum a posteriori probability (MAP-BIC) criterion (Schwarz, 1978). All these criteria have a common form composed of two terms: a data term and a penalty term, where the data term is the log-likelihood function evaluated for the assumed model. Most of the papers that address the problem of model order selection are concerned with various models of 1D signals, while the problem of modeling multidimensional fields has received considerably less attention. Stoica et al. (1986) proposed a cross-validation selection rule and demonstrated its asymptotic equivalence to the Generalized Akaike Information Criterion (GAIC). The penalty term is given by kK (N ) where k is the number of model parameters, N is the length of the observed data vector, and K (N ) is some penalty term which is a function of N. Li and Stoica (1996) used this criterion to detect the number of sinusoids in 1D and 2D signals, using the same penalty term in both cases. This penalty parameter is chosen as K (N ) = c log log N where c > 2. They arrived at this choice of K (N ), by using consistency arguments based on Hannan (1980, 1981). However, Hannan (1980, 1981) proved the consistency of an order selection criterion for ARMA processes. These processes have an absolutely continuous spectral distribution. However, the model considered by Li and Stoica (1996) is that of sinusoids in noise, and hence has a discontinuous distribution. Moreover, for the data model of 1D
∗ Corresponding address: Department of Electrical and Computer Engineering, Ben-Gurion University, Beer-Sheva 84105, Israel. Tel.: +972 8 6461842; fax: +972 8 6472949. E-mail address:
[email protected] (J.M. Francos). 0167-7152/$ – see front matter © 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.spl.2008.05.011
3076
M. Kliger, J.M. Francos / Statistics and Probability Letters 78 (2008) 3075–3081
sinusoids observed in white noise, Quinn (1989) derived conditions for the strong consistency of any model order selection criterion. The penalty term of the criterion of Li and Stoica (1996), does not satisfy Quinn’s consistency conditions even for the 1D problem. Kliger and Francos (2005a) derived a maximum a posteriori (MAP) model order selection criterion for jointly estimating the number and the parameters of 2D sinusoids observed in the presence of an additive white Gaussian noise field. In this correspondence, we establish the strong consistency of a large family of model order selection rules, which includes the MAP based rule as a special case. 2. Notations and definitions Let {y(s, t )}, (s, t ) ∈ Ψ (S , T ) where Ψ (S , T ) = {(i, j)|0 ≤ i ≤ S − 1, 0 ≤ j ≤ T − 1}, be the observed 2D real valued random field such that y(s, t ) = h(s, t ) + u(s, t ).
(1)
The field {u(s, t )} is a 2D zero mean, white Gaussian field with finite variance σ 2 . The field {h(s, t )} is the harmonic random field. Assuming there are k sinusoidal components in the harmonic field we have h(s, t ) =
k X
Ci cos(sωi + t νi ) + Gi sin(sωi + t νi )
(2)
i=1
where (ωi , νi ) are the spatial frequencies of the ith harmonic. The Ci ’s and Gi ’s are the unknown amplitudes of the sinusoidal components in the observed realization. Let us define the following matrix notations: y = [y(0, 0) . . . y(0, T − 1) y(1, 0) . . . y(1, T − 1) . . . y(S − 1, T − 1)]T .
(3)
The vectors u and h are similarly defined. Rewriting (1) we have y = h + u. Also define ak = [C1 G1 C2 G2 . . . Ck Gk ]T .
(4)
Let ei = ej[0ωi +0νi ] ej[0ωi +1νi ] . . . ej[0ωi +(T −1)νi ] . . . ej[(S −1)ωi +(T −1)νi ]
T
(5)
and let us define the following ST × 2k matrix Dk = [Re(e1 ) Im(e1 ) Re(e2 )Im(e2 ) . . . Re(ek ) Im(ek )] .
(6)
Using the foregoing notations we have that y = Dk ak + u.
(7)
Let {Ψi } be a sequence of rectangles such that Ψi = {(s, t ) ∈ Z | 0 ≤ s ≤ Si − 1, 0 ≤ t ≤ Ti − 1}. 2
Definition 1. The sequence of subsets {Ψi } is said to tend to infinity (we adopt the notation Ψi → ∞) as i → ∞ if limi→∞ min(Si , Ti ) = ∞ and 0 < limi→∞ (Si /Ti ) < ∞. In the following, to simplify notations, we shall omit the subscript i. Thus, the notation Ψ (S , T ) → ∞ implies that both S and T tend to infinity as functions of i, and at roughly the same rate. Let θ k ∈ Θk denote the parameter vector of the harmonic field, i.e.,
θ k = [C1 G1 ω1 ν1 . . . Ck Gk ωk νk ]T
(8)
where for all l, Cl and Gl are real and bounded. Assume further that (ωl , νl ) ∈ ([0, 2π ) × [0, 2π )) \ (0, 0) where min(|ωl − ωj |) ≥ δ or min(|νl − νj |) ≥ δ for l 6= j, so that no two regressors coincide. Hence, the parameter space, Θk , is a subset of the 4k-dimensional Euclidian space. By the above assumption we further conclude that Dk has rank 2k, and that the corresponding 2k × 2k Gram matrix DTk Dk is of rank 2k as well. Let m denote the actual number of sinusoidal components in the observed field and let Ai =
q
Ci2 + G2i denote the
amplitude of the ith sinusoid. It is assumed that the amplitudes are strictly positive and bounded. For convenience, and without loss of generality, it is further assumed that the sinusoidal components are indexed according to a descending order of their amplitudes where A1 ≥ A2 ≥ · · · ≥ Am > 0. Let P⊥ k denote the projection matrix defined by T −1 T P⊥ Dk k = I2k − Dk (Dk Dk )
(9)
where I2k is a 2k × 2k identity matrix. Under the normality assumptions of the noise field, the maximum likelihood estimate (MLE) θˆ k of θ k is the same as the nonlinear least square estimate (LSE) and is obtained through minimization of the quadratic ⊥
⊥ ˆ ˆ form yT P⊥ k y. Let Pk denote the matrix Pk , with θ k substituted by θ k . Let p(k) denote the a priori probability of the kth model, where k denotes the unknown number of sinusoidal components in the data model given by (1) and (2).
M. Kliger, J.M. Francos / Statistics and Probability Letters 78 (2008) 3075–3081
3077
It is assumed that there are Q competing models, where Q > m, and that each model is equiprobable. That is 1
p(k) =
Q
,
k ∈ ZQ
(10)
where ZQ = {0, 1, 2, . . . , Q − 1}. Following the MDL-MAP template, define the statistic ⊥
χξ (k) = ST log(yT Pˆ k y) + ξ k log ST ,
(11)
where ξ > 8 is a finite constant and k ∈ ZQ . The number of 2D sinusoids m is estimated by minimizing χξ (k) over k ∈ ZQ ⊥
ˆ = arg min{χξ (k)} = arg min{ST log(yT Pˆ k y) + ξ k log ST }. m
(12)
k∈ZQ
k∈ZQ
We finally note that the choice of ξ = 10 in (12) yields the MAP model order selection rule derived by Kliger and Francos (2005a), while the choice of ξ = 4 in (12) corresponds to a ‘‘naive’’ application of a model order selection rule based on the MDL principle (e.g. (Rissanen, 1978)). 3. Consistency of the criterion The objective of this section is to prove the almost sure consistency of the model order selection procedure in (12).
ˆ be given by (12) with ξ > 8. Then as Theorem 1. Let m denote the correct number of sinusoids in the field, and let m Ψ (S , T ) → ∞ ˆ →m m
a.s.
(13)
Proof. Let ⊥
σˆ k2 =
ˆk y yT P
(14)
ST
denote the variance of the residual field obtained after the removal of the k most dominant estimated sinusoidal components of the observed field. In the following all equalities and inequalities are in the almost sure sense as Ψ (S , T ) → ∞. Let Aˆ i denote the ML estimate of Ai . Hence, for k ≤ m we have as Ψ (S , T ) → ∞
σˆ k2 = σ 2 +
m 1X
2 i =1
A2i −
k 1X
2 i =1
Aˆ 2i + o(1) a.s.
(15)
(See Appendix A for the detailed derivation of (15)). Since under the normality assumption of the noise field, the maximum likelihood estimate θˆ k of θ k is the same as the nonlinear least square estimate, we conclude using Theorem 1, Kliger and Francos (2005b), that as Ψ (S , T ) → ∞, Aˆ i → Ai
a.s. i = 1 . . . k.
(16)
Hence, as Ψ (S , T ) → ∞
σˆ k2 → σ 2 +
m 1 X
2 i=k+1
A2i
a.s.
(17)
a.s.
(18)
and similarly
σˆ k2−1 → σ 2 +
m 1X
2 i=k
A2i
Using (14) we can write the criterion function in the next form
χξ (k) = ST log(ST σˆ k2 ) + ξ k log ST = ST log ST + ST log σˆ k2 + ξ k log ST .
(19)
Therefore, for k ≤ m,
χξ (k − 1) − χξ (k) = ST log ST + ST log σˆ k2−1 + ξ (k − 1) log ST − ST log ST − ST log σˆ k2 − ξ k log ST ! σˆ k2−1 = ST log − ξ log ST . σˆ k2
(20)
3078
Since
M. Kliger, J.M. Francos / Statistics and Probability Letters 78 (2008) 3075–3081 log ST ST
tends to zero, as Ψ (S , T ) → ∞, then as Ψ (S , T ) → ∞
(ST )−1 (χξ (k − 1) − χξ (k)) → log 1 +
A2k m
2σ +
P
2
A2i
a.s.
(21)
i=k+1
A2k Pm 2 2σ + i=k+1 A2i
Since log 1 +
is strictly positive, we have χξ (k − 1) > χξ (k). Hence, for k ≤ m, the function χξ (k) is
monotonically decreasing with k. We next consider the case where k = m + l for any integer l ≥ 1. Let
2 u(s, t )e−j[sω+t ν] t =0
S −1 T −1 2 X X
Iu (ω, ν) =
ST s=0
(22)
denote the periodogram of the noise field (scaled by a factor of 2). Based on Theorem 2.2 in He (1995), we have that if {u(s, t )} is a 2D i.i.d. zero-mean real valued field with variance σ 2 such that E [u(0, 0)2 log |u(0, 0)|] < ∞ then sup Iu (ω, ν) ω,ν
lim sup
σ 2 log(ST )
Ψ (S ,T )→∞
≤ 8 a.s.
(23)
It is easy to check that indeed when {u(s, t )} is a Gaussian white noise field the condition E [u(0, 0)2 log |u(0, 0)|] < ∞ is satisfied. Indeed, E [u2 log |u|] = E [u2 log |u|1{|u|≥1} ] + E [u2 log |u|1{|u|<1} ], where E [u2 log |u|1{|u|≥1} ] ≤ E [u4 1{|u|≥1} ] ≤ 1
E [u4 ] = 3σ 4 , while for |u| < 1 the function u2 log |u| is bounded and has a minimum at |u| = exp− 2 and this minimum is −0.5e−1 . Thus, 0 ≥ E [u2 log |u|1{|u|<1} ] ≥ −0.5e−1 P ({|u| < 1}) > −∞. Since the noise field is Gaussian, the MLE θˆ k of θ k and the LSE of θ k are identical. Hence, we have that a.s. as Ψ (S , T ) → ∞
σˆ
2 m+l
= σˆ − 2 m
Ul ST
+o
log ST
(24)
ST
where Ul =
l X
Iu (ωi , νi )
(25)
i=1
is the sum of the l largest elements of the periodogram of the noise field {u(s, t )} (see Appendix B for a detailed derivation of (24)). Clearly Ul ≤ l sup Iu (ω, ν).
(26)
ω,ν
From Theorem 2, Rao et al. (1994), as Ψ (S , T ) → ∞
σˆ m2 → σ 2 a.s.
(27)
Using (11) similarly to (19) and (20), a.s. as Ψ (S , T ) → ∞,
χξ (m + l) − χξ (m) = ST log ST + ST log σˆ m2 +l + ξ (m + l) log ST − ST log ST − ST log σˆ m2 − ξ m log ST Ul log ST Ul = ξ l log ST + ST log 1 − + o = ξ l log ST − + o ( log ST ) (1 + o(1)) ST σ 2 ST σ2 l sup Iu (ω, ν) Ul ω,ν = (log ST ) ξ l − 2 + o(1) ≥ (log ST ) ξ l − + o(1) σ log ST σ 2 log ST = l(log ST ) ξ −
sup Iu (ω, ν) ω,ν
σ 2 log ST
+ o(1)
(28)
where the second equality is obtained by substituting σˆ m2 +l and σˆ m2 using the equalities (24) and (27), respectively. The third l equality is due to the property that for x → 0, log(1 + x) = x(1 + o(1)), where the observation that the term σ 2 ST tends to zero a.s. as Ψ (S , T ) → ∞ is due to (23).
U
M. Kliger, J.M. Francos / Statistics and Probability Letters 78 (2008) 3075–3081
3079
Substituting (23) into (28) we conclude that
χξ (m + l) − χξ (m) > 0
(29)
for any integer l ≥ 1. Therefore, a.s. as Ψ (S , T ) → ∞, the function χξ (k) has a global minimum for k = m.
We have thus established the strong consistency of a family of model order selection rules for the problem of estimating 2D sinusoidal signals, observed in the presence of an additive white Gaussian noise field. The MAP model order selection rule derived by Kliger and Francos (2005a) belongs to this family of strongly consistent model order selection criteria. The consistency of the model order selection procedure in (12) for ξ ≤ 8 (that includes the MDL criterion) and the performance of the proposed rule in cases where the noise field is non-Gaussian or colored remain open problems. Appendix A
σˆ k2 =
yT y ST
ˆ ky yT P
−
=
ST
uT u ST
+
uT Dm am ST
+
aTm DTm u ST
+
aTm DTm Dm am ST
T
−
T
ˆ k Dˆ k aˆ k aˆ k D ST
(30)
where T
T
ˆ k Dˆ k )−1 Dˆ k y. aˆ k = (D
(31)
By the SLLN, a.s. as Ψ (S , T ) → ∞, uT u
→ σ 2.
ST
(32)
From Lemma 3, Rao et al. (1994) we have that a.s. as Ψ (S , T ) → ∞ uT Dm am ST
= o(1).
(33)
Recall that for ω ∈ (0, 2π ), s=0 exp(jωs) = O(1). Therefore, for each θ m ∈ Θm , as Ψ (S , T ) → ∞,
PS −1
[Ci Gi ] [Re(ei ) Im(ei )]T [Re(el ) Im(el )] [Cl Gl ]T ST
=
A2i
+ o(1), i = l o2(1), i 6= l
(34)
and then aTm DTm DTm am ST
=
m X A2 i
2
i =1
+ o(1).
(35)
Similarly, since θˆ k is the maximum likelihood estimate of θ k and θ k ∈ Θk T
T
T
ˆ k Dˆ aˆ k aˆ k D ST
=
k X Aˆ 2 i
i =1
2
+ o(1).
(36)
Substituting (32), (33), (35) and (36) into (30) we have (15). Appendix B Let (ω ˆ m+1 , υˆ m+1 ) denote the largest element of the periodogram of the noise field {u(s, t )}, i.e.,
(ωˆ m+1 , υˆ m+1 ) = arg max Iu (ω, υ)
(37)
(ω,υ)∈(0,2π)2
and let Aˆ m+1 =
S −1 T −1 2 XX
ST s=0 t =0
u(s, t )e−j(sωˆ m+1 +t υˆ m+1 ) .
(38)
ˆ m+1 = =(Aˆ m+1 ). Let eˆ i denote the vector ei defined in (5), with (ωi , νi ) substituted by (ωˆ i , νˆ i ). Let Cˆ m+1 = R(Aˆ m+1 ) and G
3080
M. Kliger, J.M. Francos / Statistics and Probability Letters 78 (2008) 3075–3081
Using a straightforward extension of Theorem 2 in Kliger and Francos (2005b),1 we have that in the case where the model order is over-estimated the ML estimate contains a subvector that converges a.s. as Ψ (S , T ) → ∞ to the correct parameters of the sinusoidal signals, while the frequencies of the sinusoids that result from the over-estimated order assumption are assigned to the spatial frequencies that maximize the noise periodogram. Hence,
σˆ
2 m+1
yT y
=
ST
−
ˆ m+1 y yT P ST
=
yT y ST
T
−
T
ˆ m+1 Dˆ m+1 aˆ m+1 aˆ m+1 D ST
=
T
yT y
−
ST
T
ˆ m Dˆ m aˆ m aˆ m D ST
− M1 − M2
= σˆ m2 − M1 − M2
(39)
where M1 = 2
T [Cˆ m+1 Gˆ m+1 ] Re(ˆem+1 ) Im(ˆem+1 ) Dˆ m aˆ m
(40)
ST
and M2 =
T [Cˆ m+1 Gˆ m+1 ] Re(ˆem+1 ) Im(ˆem+1 ) Re(ˆem+1 ) Im(ˆem+1 ) [Cˆ m+1 Gˆ m+1 ]T ST
.
(41)
From (3), for ω ∈ (0, 2π ), as S → ∞, we have S −1 X
1
(S log S )
1 2
exp(jωs) = o(1)
(42)
s =0
and consequently S −1 1X
S s =0 Since
|Aˆ m+1 |2 2
exp(jωs) = o
=
log S
21
S
Iu ( ω ˆ m+1 ,ˆνm+1 ) , ST
.
(43)
by (23)
1 log ST 2 ˆ |Am+1 | = O .
(44)
ST
ˆ m+1 have the same order. Obviously the amplitudes Cˆ m+1 and G Due to the definition of the parameter space Θk , for each 1 ≤ i ≤ m, (ω ˆ m+1 , νˆ m+1 ) 6= (ωˆ i , νˆ i ). Therefore, using (43) and (44) we conclude that for each 1 ≤ i ≤ m T T [Cˆ m+1 Gˆ m+1 ] Re(ˆem+1 ) Im(ˆem+1 ) Re(ˆei )Im(ˆei ) [Cˆ i Gˆ i ]T ST
=o
log ST
(45)
ST
and consequently M1 = 2
T [Cˆ m+1 Gˆ m+1 ] Re(ˆem+1 ) Im(ˆem+1 ) Dˆ m aˆ m ST
=o
log ST ST
.
(46)
By a simple algebra, M2 =
|Aˆ m+1 |2 2
+
S −1 X T −1 ˆ m+1 X Cˆ m+1 G
ST
sin(2sω ˆ m+1 + 2t υˆ m+1 ) +
o
32
2ST
s=0 t =0
ˆ m+1 are both of an order O Since Cˆ m+1 and G log ST ST
S −1 X T −1 2 ˆ2 X Cˆ m +1 − Gm+1
log ST ST
12
cos(2sω ˆ m+1 + 2t υˆ m+1 ).
(47)
s=0 t =0
, we have using (43) that the sum of the last two terms of M2 is of order
. Therefore,
M2 =
|Aˆ m+1 |2 2
+o
log ST ST
32 =
Iu (ω ˆ m+1 , υˆ m+1 ) ST
+o
log ST ST
23 (48)
1 By Assumption 2 of Kliger and Francos (2005b), the frequency of the ith exponential satisfies (ω0 , ν 0 ) ∈ ((0, 2π) × (0, 2π)). However, this assumption i i is over-restrictive, as it can be shown that it is sufficient that (ωi0 , νi0 ) ∈ ([0, 2π) × [0, 2π)) \ (0, 0) for the results of Kliger and Francos (2005b) to hold.
M. Kliger, J.M. Francos / Statistics and Probability Letters 78 (2008) 3075–3081
where the first equality holds since |Aˆ m+1 |2 = O
log ST ST
, while the sum of the remaining two terms is o
decays to zero much faster than |Aˆ m+1 |2 . The last equality holds because Substituting (46) and (48) into (39), we have
σˆ m2 +1 = σˆ m2 −
Iu (ω ˆ m+1 , υˆ m+1 ) ST
+o
log ST ST
|Aˆ m+1 |2 2
=
Iu (ω ˆ m+1 ,ˆνm+1 ) . ST
3081 log ST ST
3 2
and hence
(49)
and similarly
σˆ
2 m+l
= σˆ − 2 m
Ul ST
+o
log ST ST
.
(50)
References Akaike, H., 1974. A new look at the statistical model identification. IEEE Trans. Automat. Control 19, 716–723. Hannan, E.J., 1980. The estimation of the order an ARMA process. Ann. Statist. 8, 1071–1081. Hannan, E.J., 1981. Estimating the dimension of a linear system. J. Multivariate Anal. 11, 459–472. He, S., 1995. Uniform convergency for weighted periodogram of stationary linear random fields. Chinese Ann. Math. 16B, 331–340. Kliger, M., Francos, J.M., 2005a. MAP model order selection rule for 2-D sinusoids in white noise. IEEE Trans. Signal Process. 53, 2563–2575. Kliger, M., Francos, J.M., 2005b. Strong consistency of the over- and under-determined LSE of 2-D exponentials in white noise. IEEE Trans. Inform. Theory 51, 3314–3321. Li, J., Stoica, P., 1996. Efficient mixed-spectrum estimation with application to target feature extraction. IEEE Trans. Signal Process. 44, 281–295. Quinn, B.G., 1989. Estimating the number of terms in a sinusoidal regression. J. Time Series Anal. 10, 71–75. Rao, C.R., Zhao, L.C., Zhou, B., 1994. Maximum likelihood estimation of 2-D superimposed exponential signals. IEEE Trans. Signal Process. 42, 1795–1802. Rissanen, J., 1978. Modeling by shortest data description. Automatica 14, 468–478. Schwarz, G., 1978. Estimating the dimension of a model. Ann. Statist. 6, 461–464. Stoica, P., Eykhoff, P., Janssen, P., Soderstrom, T., 1986. Model-structure selection by cross-validation. Int. J. Control 43, 1841–1878.