Training sequence selection for frequency offset estimation in frequency selective channels

Training sequence selection for frequency offset estimation in frequency selective channels

Digital Signal Processing 13 (2003) 106–127 www.elsevier.com/locate/dsp Training sequence selection for frequency offset estimation in frequency sele...

244KB Sizes 0 Downloads 54 Views

Digital Signal Processing 13 (2003) 106–127 www.elsevier.com/locate/dsp

Training sequence selection for frequency offset estimation in frequency selective channels Olivier Besson a,∗ and Petre Stoica b a Department of Avionics and Systems, ENSICA, Toulouse, France b Department of Systems and Control, Uppsala University, Uppsala, Sweden

Abstract We consider the problem of optimal training sequence selection for frequency offset estimation in frequency-selective channels. Since it is desired that the optimal training sequence does not depend on a particular estimation method, we examine the Cramér–Rao bound (CRB) for the problem at hand. For a fairly large class of training sequences, an expression for the asymptotic CRB is derived which depends in a simple way on the channel impulse response and the training sequence correlation. Based on the asymptotic CRB, two methods are presented to select an optimal training sequence. First, we consider a minmax problem which consists in minimizing the worst-case asymptotic CRB and whose solution is shown to be a white training sequence. Next, an expression for the training sequence that minimizes the asymptotic CRB is derived. Numerical simulations illustrate the estimation performance obtained with these training sequences.  2002 Elsevier Science (USA). All rights reserved. Keywords: Frequency estimation; Frequency selective channels; Cramér–Rao bound; Training sequence design

1. Introduction Many digital communication systems employ a training sequence in order to mitigate the effects of frequency-selective channels and provide the receiver with information about the channel state [1]. This information is in turn used for equalization and symbol detection. However, prior to that, any frequency offset between the carrier and the local reference at the receiver must be properly compensated for. Otherwise, deleterious effects are observed, especially for large-constellation alphabets. For instance, in multicarrier systems such as * Corresponding author.

E-mail addresses: [email protected] (O. Besson), [email protected] (P. Stoica). 1051-2004/02/$ – see front matter  2002 Elsevier Science (USA). All rights reserved. PII: S 1 0 5 1 - 2 0 0 4 ( 0 2 ) 0 0 0 0 8 - 8

O. Besson, P. Stoica / Digital Signal Processing 13 (2003) 106–127

107

OFDM, it is known that frequency offsets give rise to self-interference between the carriers. Therefore, considerable attention has been paid recently to estimating these frequency offsets in a data-aided context either for general frequency-selective channels, see, e.g., [2,3], or within the framework of OFDM systems, see [4–6] for recent publications. Despite this significant attention, the problem of selecting the training sequence for accurate frequency offset estimation appears to be open. Indeed, optimal training sequence selection has been studied essentially only for channel estimation purposes. In [7], the training sequence was chosen to minimize the mean-square error of a least-squares type channel estimator. A frequency-domain approach was suggested in [8], whereas a comparison between the methods in [7] and [8] can be found in [9]. Finally, the problem of optimal training sequences and pilot tones for OFDM systems was recently addressed in [10]. Note that the methods of [7–9] try to optimize the training sequence for the channel estimation problem and do not consider the frequency offset estimation problem. Moreover, the “optimal sequences” obtained are really optimal only for a given estimation method. In this paper, we focus on the problem of finding an optimal sequence for frequency offset estimation in frequency selective channels. Restricting attention to frequency offset only enables us to understand the problem and pave the way for attacking the more general and practical problem of training selection for both frequency offset and channel estimation, which we leave for future research. Also, observe that the channel estimate is usually based on the frequency offset estimate [2] and hence trying to minimize the variance of the latter is a sensible thing to do. Since we do not want the training sequence to be tied to any method, we consider the Cramér–Rao bound (CRB) for the problem at hand. The CRB provides a lower bound on the achievable accuracy of any unbiased estimator [11,12]. Therefore, it is independent of the estimation technique. It only depends on the channel and the training sequence. In this paper, we show that asymptotically, i.e., as the number of samples in the training sequence grows large, the CRB tends to a value that depends in a simple manner on the channel and the training sequence. This result holds for a fairly large class of training signals, as detailed below. The simple asymptotic expression of CRB enables us to explicitly derive optimal training sequences. More precisely, two approaches are taken which consist either in minimizing the worst-case asymptotic CRB or in minimizing the asymptotic CRB. In the latter case, the optimal sequence is shown to depend on the channel whereas the former yields a white training sequence as the solution.

2. Data model In this section, we briefly present the data model to be used in the rest of the paper. The framework is essentially that of [2]. Let us consider a frequency selective channel with a slow evolution in time compared to the signaling interval. Then, the received signal sampled at the symbol rate is given by x(n) = s(n)ei2πnf0 + w(n),

n = 0, . . . , N − 1,

(1)

108

O. Besson, P. Stoica / Digital Signal Processing 13 (2003) 106–127

where f0 is the normalized frequency offset and w(n) is a noise term; s(n) is the response of the channel to the training sequence and can be written as s(n) =

L−1 

h(k)an−k ,

(2)

k=0

where {an }N−1 n=−L+1 is the training sequence. It is assumed here that the channel has a finite impulse response given by h = [h(0) h(1) . . . h(L − 1)]T . Rewriting (1) in a vector form results in the following model x = (f0 )Ah + w, with x = [x(0) . . . x(N − 1)]T , w = [w(0) . . . w(N − 1)]T , and   (f0 ) = diag 1, ei2πf0 , . . . , ei2π(N−1)f0 , A(k, ) = ak− ,

1  k  N, 1    L.

(3)

(4) (5)

We assume that w is a zero-mean circularly symmetric complex-valued Gaussian vector with covariance matrix   C w = E wwH = σw2 I . (6)

3. Asymptotic Cramér–Rao bound As already said in the previous section, we do not wish to find training sequences that are optimal only for a certain estimation method. Therefore, we resort to the analysis of the Cramér–Rao bound. The CRB for the estimation of f0 was derived in [2] and is given by CRB =

−1 σw2  H H H ⊥ h A D A DAh , 2 8π

(7)

H −1 H where D = diag(0, . . . , N − 1) and ⊥ A = I − A(A A) A is the orthogonal projector onto the null space of A. The first approach that comes in mind when searching for an optimal sequence is to minimize the CRB with respect to the training sequence {ak }. However, this turns out to be a hard problem. Instead, a minmax approach can be taken which consists in solving the problem

min max CRB. {ak }

h

Doing so, one first looks for the worst channel that results in the highest CRB. Then the training sequence is selected to minimize the worst-case CRB. It can readily be established that  −1  H H ⊥ 

8π 2 H H H ⊥ A D A DA h 2 , max CRB = min h A D DAh = λ−1 A min 2 h σw h

O. Besson, P. Stoica / Digital Signal Processing 13 (2003) 106–127

109

where λmin { } stands for the minimum eigenvalue of the matrix between brackets. Hence, the minmax approach amounts to solving the following maximization problem   max λmin AH D H ⊥ (8) A DA . {ak }

This also requires a computationally intensive method. Since the exact CRB is a complicated function of h and {ak }, our idea is to use instead the asymptotic CRB which hopefully has a simpler expression. Doing so, it could be expected that finding an optimal sequence would become easier. In fact, it turns out that for a wide class of training sequences, the asymptotic CRB exhibits a very simple dependence on the channel impulse response and the training sequence, as stated in the proposition below. Proposition 1. Let {an } be a zero-mean circularly symmetric Gaussian distributed process with covariance sequence ra (k) = E{a ∗ (n)a(n + k)} and assume that for any integer q  0 1 N 2(q+1)

N  N 

2 k q q ra (k − ) = O(1/N).

(9)

k=1 =1

Then 3σw2 1 , N→∞ 2π 2 hH Rh where R(i, j ) = ra (i − j ) denotes the covariance matrix of the training sequence. lim N 3 CRB =

Proof. See Appendix A.

(10)



Remark 1. The condition in (9), which may appear somewhat technical, is an essentially necessary and sufficient condition for (10) to hold; see the derivations in Appendix A. It should be pointed out that (9) is not a restrictive condition. Indeed, as shown in Appendix A, it is satisfied by a large class of signals including any sequence with finitelength correlation (e.g., a MA process) and, more generally, any signal whose correlation decreases exponentially to zero such as ARMA processes. Hence the result of Proposition 1 is fairly general and applies to many training sequences. Remark 2. Note that the asymptotic CRB depends on the training sequence only through its covariance matrix R. More precisely, it only depends on the first L covariance lags {ra (m)}L−1 m=0 . Remark 3. The expression (10) has a simple interpretation. Indeed, it can be rewritten as asCRB =

3 1 2π 2 N 3 SNR∞

(11)

SNR∞ 

hH Rh 1 hH AH Ah = lim N→∞ N σw2 σw2

(12)

with

110

O. Besson, P. Stoica / Digital Signal Processing 13 (2003) 106–127

and where the second fairly natural equality is proved in Appendix A. It is worth pointing out that (11) coincides with the expression of the CRB for the estimation of the frequency of a constant-amplitude exponential signal in additive noise with a signal-to-noise ratio given by SNR∞ ; see, e.g., [12]. Remark 4. As shown in Appendix A, the formula (10) also holds true if {an } is a sum of pilot tones, i.e., an =

d 

Ak einγk ,

(13)

k=1

despite the fact that a training sequence given by a sum of exponential signals does not satisfy the condition (9) of Proposition 1. For channel estimation purposes, this choice may not be fully adequate as only a few frequencies of the channel are excited and the channel response at those frequencies may be a weak signal. However, the corresponding frequency offset estimation problem is quite simple, which is an advantage. First, note that the output of the channel corresponding to (13) is given by

s(n) =

L−1 

h(k)an−k =

k=0

=

d  =1

A

L−1  k=0

L−1  k=0

h(k)

d 

A ei(n−k)γ

=1

h(k)e−ikγ einγ =

d 

 einγ , A

=1

which is also a sum of exponentials with the same frequencies. Therefore, the received signal, in the absence of additive noise, will be a sum of p exponential signals. Furthermore, the frequency of each mode corresponds to a fixed displacement (given by the frequency offset f0 ) from a known frequency. This property, along with the numerous computationally simple frequency estimators that can be found in the literature [11,12], would presumably lead to simple and accurate frequency offset estimators. However, pursuing this idea in detail is beyond the scope of the present paper. In contrast to the finite-sample CRB expression, (10) provides a simple expression of the asymptotic CRB as a function of the channel and the training sequence. This can be exploited to advantage when deriving optimal sequences; see the next section. Of course, the asymptotic CRB holds only in the asymptotic regime and training sequences derived from it may not be fully optimal in the finite-sample regime. Nevertheless, this seems the only tractable way to proceed since derivation of an optimal sequence from the exact CRB is a hard problem. Of course, a natural question is how many samples are required to reach the asymptotic regime. This question can hardly be answered theoretically and therefore one has to examine numerically the behavior of the CRB as the length of the training sequence is varied; see Section 5.

O. Besson, P. Stoica / Digital Signal Processing 13 (2003) 106–127

111

4. Optimal training sequence design In this section, the asymptotic expression (10) is used to derive an optimal sequence. Two approaches are considered, one using the minmax principle while the other seeks to minimize the asymptotic CRB directly. 4.1. Minmax approach The minmax approach consists in solving min max asCRB. R

h, h =1

(14)

The constraint h = 1 is set here to prevent the trivial worst channel h = 0. It is a usual way to proceed in minmax problems and it only affects the asymptotic CRB by a scaling factor. Using the expression (10), we get  −1  −1 2π 2 N 3 max asCRB = min hH Rh = λmin {R} . 2 3σw h, h =1 h, h =1 Hence, the minmax approach amounts to finding the training sequence that maximizes λmin {R}. Obviously, this maximization should be carried out using a proper constraint. Herein, we constrain the transmitted power to be less than a prescribed level. In other words, the training sequence is obtained as the solution to the following problem max λmin {R} subject to Tr{R}  α, R

(15)

where Tr{ } stands for the trace of a matrix and a is the vector of training samples. The solution is given by [13, Lemma A.35] R minmax =

α I L

(16)

which corresponds to a white training sequence. Hence, a white training sequence is minmax optimal, at least for N 1. Observe that, as expected, the obtained sequence is independent of the channel. Also, note that a white sequence is a “usual” choice and can be used also for channel estimation purposes. With the choice (16), the asymptotic CRB becomes asCRBminmax =

L 3σw2 . 2 3 2π N α h 2

(17)

4.2. Minimization of the asymptotic CRB The second approach consists in minimizing the asymptotic CRB directly with respect to the training sequence. As is evident from inspection of (10), the asymptotic CRB depends on {ak } only via the covariance matrix R whose (i, j )th element is ra (i − j ). Using the Carathéodory parameterization of a covariance matrix [12, Chap. 4], there exist

112

O. Besson, P. Stoica / Digital Signal Processing 13 (2003) 106–127 p

p  L − 1, {Pk , ωk }k=1 , and σ 2 such that R can be written as R=

p 

Pk e(ωk )eH (ωk ) + σ 2 I ,

(18)

T  e(ωk ) = 1 eiωk . . . ei(L−1)ωk .

(19)

k=1

In addition to being a general representation, the parameterization in (18) enables us to easily synthesize the training sequence as a sum of exponential modes in white noise. Alternatively, we can always synthesize the covariance matrix R using an AR sequence of order L [12]. Similarly to the minmax approach, minimization of the asymptotic CRB should be done under a proper constraint. As before, we impose a power constraint on the training sequence and the problem can be formulated as min asCRB subject to Tr{R}  α

(20)

R

with R given by (18). Note that arg min asCRB = arg R

= arg

max

p,Pk ,ωk ,σ 2

max

p,Pk ,ωk ,σ 2

hH Rh 

p 

 2 2 2 Pk H (ωk ) + σ h ,

(21)

k=1

where H (ω) 

L−1 

h(n)e−inω

(22)

n=0

is the channel frequency response. As an aside, note that the proposed approach consists in selecting the training sequence that maximizes the power at the output of filter h, which is a sensible thing to do. Let 2 ωmax = arg max H (ω) . (23) ω

Then, the solution to the problem in (21) is  2 (α/L)I if H (ωmax )  h 2 , 2 R= (α/L)e(ωmax )eH (ωmax ) if H (ωmax ) > h 2 . The proof of (24) is straightforward. First assume that |H (ωmax )|2  h 2 . Then p  k=1

 p     2 α 2 2 2 2 2 H α I h, Pk H (ωk ) + σ h  Pk + σ h  h = h L L k=1

(24)

O. Besson, P. Stoica / Digital Signal Processing 13 (2003) 106–127

113

p where we used the fact that Tr{R} = L[ k=1 Pk + σ 2 ]  α. Next assume that |H (ωmax )|2 > h 2 . Then  p  p   2 2 2 2 2 Pk H (ωk ) + σ h  Pk + σ H (ωmax ) k=1

k=1



  2 α α H (ωmax ) = hH e(ωmax )eH (ωmax ) h L L

which concludes the proof. However, for |H (ωmax )|2 > h 2 this results in a rank-one covariance matrix R which is not acceptable since R must be invertible for the asymptotic expression (10) to hold. Hence, some additional constraint must be enforced to ensure that R is full-rank. Thus, we propose to look for the training sequence that solves the following problem max hH Rh subject to R

Tr{R}  α,

λmin {R}  β > 0,

(25)

where R still obeys (18). Since Lλmin {R}  Tr{R} it follows that we must choose Lβ  α; otherwise there is no solution to the problem. The solution to (25) is given by  2 (α/L)I if H (ωmax )  h 2 , (26) R opt = 2 (α/L − β)e(ωmax )eH (ωmax ) + βI if H (ωmax ) > h 2 . The proof of the first part is obtained by noting that λmin {(α/L)I } = α/L  β. Hence for |H (ωmax )|2  h 2 , the solution of (25) is the same as that of problem (20). Let us now consider the second part, for which |H (ωmax )|2 > h 2 . Clearly, the solution in (26) satisfies the constraints in (25), and   2 α H h R opt h = − β H (ωmax ) + β h 2 . (27) L Since p  k=1

p  2 2 Pk H (ωk ) + σ 2 h 2  Pk H (ωmax ) + σ 2 h 2 k=1

 2 α 2 −σ H (ωmax ) + σ 2 h 2  L   2 α = − β H (ωmax ) + β h 2 L 2    + σ 2 − β h 2 − H (ωmax )   2 α − β H (ωmax ) + β h 2  L 

the proof is concluded (to obtain the last inequality we use the fact that σ 2 = λmin {R}  β).

114

O. Besson, P. Stoica / Digital Signal Processing 13 (2003) 106–127

Observe that the optimal solution depends on the channel. With thechoice (26), the asymptotic CRB becomes 2π 2 N 3 × asCRBopt 3σw2  −1 (α/L) h 2 2 =  −1 (α/L − β) H (ωmax ) + β h 2

2 if H (ωmax )  h 2 , 2 if H (ωmax ) > h 2 .

(28)

For |H (ωmax )|2  h 2 , the minmax and optimal solutions provide the same asymptotic CRB. For |H (ωmax )|2 > h 2 , using the fact that α  Lβ, we have that    asCRBopt |H (ωmax )|2 Lβ |H (ωmax )|2 −1 = + 1 (29) 1 − asCRBminmax

h 2 α

h 2 which implies that the selection (26) should lead to a better offset estimation than the minmax approach, for any β. For a fixed α, the improvement becomes more pronounced as β decreases. However, when β decreases, R opt becomes more and more ill-conditioned which may give rise to numerical problems, e.g., for the maximum likelihood estimator. Hence, the choice of β should be guided by a tradeoff between statistical and numerical accuracy aspects. Remark 5. The optimal training sequence depends on the channel via ωmax , |H (ωmax )|2 , and h 2 which, in practical applications, is unknown. However, the channel estimate obtained in the nth burst may be used to select the training sequence in the (n + 1)th burst. Note that the receiver needs to feedback to the transmitter only very simple information consisting of a binary variable indicating which part of (26) should be used and the value of ωmax if needed. If the channel variations are not too fast, this approach should result in an accurate frequency offset estimator. This fact will be illustrated in the next section.

5. Numerical examples In this section, we illustrate the performance that can be achieved when using the training sequences derived above. Similarly to [2], the channel response consists of six paths and is given by h(k) =

5 

An g(kTs − τn − t0 ),

(30)

n=0

where An and τn are the attenuations and delays of each path. The An ’s are i.i.d. zero-mean Gaussian random variables with variances {−3, 0, −2, −6, −8, −10} in dB. The normalized delays τn /Ts are selected as {0, 0.054, 0.135, 0.432, 0.621, 1.351} and t0 = 3Ts . g(t) is a raised cosine roll-off filter with a roll-off of 0.5. Under these assumptions, the impulse response takes significant values only for k  7 and therefore L = 8. The frequency offset is set to f0 = 0.08 and the noise power is selected such that 10 log10 (α/Lσw2 ) = 10 dB. We choose α = L and, for the optimal training sequence generation, the parameter β appearing in (26) was chosen as β = 0.1.

O. Besson, P. Stoica / Digital Signal Processing 13 (2003) 106–127

115

Fig. 1. Frequency response of channel 1.

Fig. 2. Frequency response of channel 2.

In a first series of simulations, two channels drawn from (30) were used. We refer to them as channel 1 and channel 2. Figures 1 and 2 display the frequency responses of these channels. It can be seen that channel 1 has a higher dynamic range than channel 2. Two

116

O. Besson, P. Stoica / Digital Signal Processing 13 (2003) 106–127

thousand Monte Carlo trials were run to investigate the influence of the training sequences onto the CRB, the asymptotic CRB and the performance of the maximum likelihood estimator (MLE) [2]. For each trial, a different noise and different training sequences were generated. Three kinds of training sequences were used: TS1 {an } is a white training sequence. TS2 {an } is generated (as an exponential signal in white noise) such that it has a covariance matrix given by (26) where the true channel response is used. However, in practical situations, the latter is unknown; hence this optimal training sequence obtained with the true h may only be used as a reference. TS3 To remedy this problem, another solution is proposed which consists in designing {an } using (26) with the channel estimated using the white training sequence. More ωmax and hˆ where the latter are precisely, (26) is used with ωmax and h replaced by  computed from the ML estimate of the channel obtained with the training sequence TS1. The rationale behind this approach is that, in a practical application, one can use a white sequence in the first burst and then in all other bursts use an optimal sequence designed based on hˆ obtained in the previous burst. In Figs. 3–8, we plot the mean value of the CRB obtained in the 2000 trials, the asymptotic CRB, and the mean-square error (MSE) of the MLE versus N . Figures 3 and 6 consider the case of a white training sequence. The results for an optimal training sequence using the true channel are plotted in Figs. 4 and 7. Finally, the results for the scheme that

Fig. 3. CRB, asymptotic CRB, and mean-square error of the MLE versus N for channel 1. White training sequence.

O. Besson, P. Stoica / Digital Signal Processing 13 (2003) 106–127

117

Fig. 4. CRB, asymptotic CRB, and mean-square error of the MLE versus N for channel 1. Optimal training sequence using true channel.

Fig. 5. CRB, asymptotic CRB, and mean-square error of the MLE versus N for channel 1. Optimal training sequence using channel estimate.

118

O. Besson, P. Stoica / Digital Signal Processing 13 (2003) 106–127

Fig. 6. CRB, asymptotic CRB, and mean-square error of the MLE versus N for channel 2. White training sequence.

Fig. 7. CRB, asymptotic CRB, and mean-square error of the MLE versus N for channel 2. Optimal training sequence using true channel.

O. Besson, P. Stoica / Digital Signal Processing 13 (2003) 106–127

119

uses an estimated channel to derive the optimal sequence are given in Figs. 5 and 8. The following observations can be made: • As the number of samples in the training sequence increases, the CRB and the asymptotic CRB come very close to each other, for both kinds of training sequences. This validates the theoretical result given in (10). Moreover, observe that the two CRBs are very close to each other for a reasonably low number of samples, typically N  20. Hence, the asymptotic formula applies for training sequences length encountered in practice. Note also that the asymptotic theory holds true for smaller N when the optimal design is used than when a white sequence is used. • The MLE has a performance very close to the CRB. Additionally, the MSE obtained with an optimal training sequence is smaller (by approximately a factor of 4 for channel 1 and 2 for channel 2) than that obtained with a white training sequence. Hence, the optimal sequence design enables us to get more accurate frequency offset estimates. Alternatively, for a given statistical accuracy, fewer training samples are required with the optimal design. • The performance obtained when the optimal training sequence is based on a channel estimate is as good as that obtained with the true channel. This is an interesting result as it provides an effective practical method to design a quasi-optimal sequence. Since the previous curves deal with two specific channels, in a second series of simulations, 2000 Monte Carlo trials corresponding to 2000 different channel impulse responses are run. In each trial, a different white training sequence and optimal training sequences using either the true channel or an estimated channel are generated, and the CRB and asymptotic CRB are computed. The results are given in Figs. 9, 10, and 11 where we plot the minimum/maximum values of the CRB and the asymptotic CRB together with the MSE of the MLE. Again, the asymptotic CRB comes very close to the finitesample CRB for N larger than 20. Some improvement is observed when using an optimal training sequence as compared to a white sequence. For instance, the min/max values of both the finite sample and the asymptotic CRBs for a white sequence are approximately 1.1/2.5 times larger than those for the optimal sequences. Hence, more accurate estimates are obtained with the optimal training sequences, whatever the channel characteristics. Therefore, even if the optimal training sequence was primarily obtained by minimizing the asymptotic CRB, it appears to be a good choice for the finite-sample CRB as well. Also, the estimation performance will be less dependent on the channel when the design (26) is used since the variations in both the CRB and the asymptotic CRB are smaller. Finally, we stress the fact that using an estimate of the channel in lieu of the true channel for the optimal design does not penalize the performance.

6. Conclusions The problem of selecting an optimal training sequence for data-aided frequency offset estimation in frequency-selective channels was addressed. The Cramér–Rao bound was studied as it does not depend on any particular estimation method. It was shown that

120

O. Besson, P. Stoica / Digital Signal Processing 13 (2003) 106–127

Fig. 8. CRB, asymptotic CRB, and mean-square error of the MLE versus N for channel 2. Optimal training sequence using channel estimate.

Fig. 9. CRB, asymptotic CRB, and mean-square error of the MLE versus N . White training sequence.

O. Besson, P. Stoica / Digital Signal Processing 13 (2003) 106–127

121

Fig. 10. CRB, asymptotic CRB, and mean-square error of the MLE versus N . Optimal training sequence using true channel.

Fig. 11. CRB, asymptotic CRB, and mean-square error of the MLE versus N . Optimal training sequence using channel estimate.

122

O. Besson, P. Stoica / Digital Signal Processing 13 (2003) 106–127

direct minimization of the CRB is hardly feasible. A minmax approach was suggested but it results in a complicated maximization problem. Therefore, the asymptotic CRB was examined. It was shown that, for a large class of training sequences, the asymptotic CRB depends in a simple way on channel and training sequences. This fact was exploited both to minimize the worst-case asymptotic CRB and to find the training sequence that results in a minimum asymptotic CRB. Since the latter depends on the channel which is usually unknown, we proposed a scheme where the optimal sequence design is based on a channel estimate. It was shown through numerical simulations that the optimal training sequence selection (using either the true or estimated channel) results in considerable improvements in terms of statistical accuracy and sensitivity to variations in the channel characteristics. Future research will deal with training sequence design for joint channel and frequency offset estimation.

Acknowledgment The work of P. Stoica was supported in part by the Swedish Foundation for Strategic Research.

Appendix A. Derivation of the asymptotic Cramér–Rao bound In this Appendix, we prove that the Cramér–Rao bound, as given by (7), tends to (10) as N tends to infinity. Note that the CRB in (7) is conditioned on h. The proof will be given for two classes of signals. Class 1. The training sequence {an } is a zero-mean circularly symmetric Gaussian distributed process with correlation ra (m) = E{an∗ an+m } such that for any integer q  0 N  N 

1 N 2(q+1)

2 k q q ra (k − ) = O(1/N).

(A.1)

k=1 =1

Herein, ζ = O(1/N) means that Nζ tends to a constant when N tends to infinity. This property is verified by any process with finite-length correlation or if ra (m) decreases exponentially with m; see Result 2 below. Note that the Gaussian assumption is made for convenience; it could be relaxed provided that the fourth-order cumulants also vanish exponentially. Hence, this class of signals covers a wide range of possible training sequences. Class 2. The training sequence is a sum of exponential modes; i.e., an =

d 

Ak einγk .

(A.2)

k=1

As explained in Remark 4, this type of signal may be of interest for estimating frequency offsets by computationally simple methods.

O. Besson, P. Stoica / Digital Signal Processing 13 (2003) 106–127

123

 q ∗ The main step in the proof of (10) lies in showing that (1/N q+1 ) N k=1 k ak−m ak−n converges to a certain constant as N → ∞. In the following, the convergence will be defined in the mean-square sense; i.e.,   lim ζN = ζ ⇔ lim E |ζN − ζ |2 = 0. N→∞

N→∞

Result 1. If an is a zero-mean circularly symmetric Gaussian distributed process such that (A.1) holds for any integer q  0, then lim

N 

1

N→∞ N q+1

∗ k q ak−m ak−n =

ra (m − n) . q +1

∗ k q ak−m ak−n −

ra (m − n) . q +1

k=1

(A.3)

Proof. Let ζN =

N 

1 N q+1

k=1

Then,   E |ζN |2 =

N 

1 N 2(q+1) −

∗ ∗ k q q E{ak−m ak−n a−m a−n }+

k,=1

|ra (m − n)|2 (q + 1)2

N ra (m − n)  q ∗ k E{ak−m ak−n } (q + 1)N q+1 k=1



N ra∗ (m − n)  q ∗  E{a−m a−n } (q + 1)N q+1 =1

=

N 

1 N 2(q+1)

2 k q q ra (k − )

k,=1

2 + ra (m − n)



S(q, N) 1 + S 2 (q, N) − 2 (q + 1)2 q +1

 (A.4)

with S(q, N) =

1 N q+1

N 

kq .

k=1

However, it is known [14] that S(q, N) =

1 + O(1/N). q +1

(A.5)

Using (A.4) and (A.5) along with the assumption (A.1), it follows that E{|ζN |2 } = O(1/N). Therefore

124

O. Besson, P. Stoica / Digital Signal Processing 13 (2003) 106–127 N 

1

lim

N→∞ N q+1

∗ k q ak−m ak−n =

k=1

ra (m − n) q +1

(A.6)

which proves (A.3). ✷ Result 2. If ra (m) is such that ra (m) = 0 for |m| > M or if ra (m) decreases exponentially to zero (i.e., |ra (m)|  P e−α|m| , α > 0), then an satisfies the condition (A.1). As a corollary, (A.3) holds true in these cases. Proof. The proof is tedious but straightforward and we omit it. It is possible to show that ra (m)  P e−α|m| ⇒

1 N 2(q+1)

N  N  k=1 =1

2  1 k q q ra (k − )  const. + O(1/N) . N

As an example, for a white training sequence, i.e., ra (m) = σ 2 δ(m), one can prove that   N  N  2 σ 2 1 1 q q r + O(1/N) . ✷ k  (k − ) = a N 2q + 1 N 2(q+1) k=1 =1

Result 3. If an =

d

k=1 Ak e

N 

1

lim

N→∞ N q+1

inγk ,

then (A.3) holds true

∗ k q ak−m ak−n =

k=1

ra (m − n) , q +1

(A.7)

where ra (m) 

d 

|Ak |2 eimγk .

k=1

Proof. The proof is an extension to any integer q  0 of the proof given in [12, Chap. 4; 15] for q = 0. Let εN =

N 

1 N q+1

∗ k q ak−m ak−n −

k=1

ra (m − n) . q +1

Then, |εN |2 =

1 N 2(q+1) −

N 

∗ ∗ k q q ak−m ak−n a−m a−n +

k,=1

|ra (m − n)|2 (q + 1)2

N N ra (m − n)  q ra∗ (m − n)  q ∗ ∗ k a a −  a−m a−n . k−m k−n (q + 1)N q+1 (q + 1)N q+1 k=1

In the following, we will make use of the following result [15]:  N 1  q ikω 1/(q + 1) + O(1/N), ω = 0, k e = O(1/N), ω = 0. N q+1 k=1

=1

O. Besson, P. Stoica / Digital Signal Processing 13 (2003) 106–127

125

It follows that N 

1 N q+1

∗ k q ak−m ak−n =

k=1

=

N 

1 N q+1 d 

k=1

kq

d 

Ar A∗s ei[(k−m)γr −(k−n)γs ]

r,s=1

Ar A∗s ei(−mγr +nγs ) ×

r,s=1

1 N q+1

N 

k q eik(γr −γs )

k=1

and therefore N 

1

lim

N q+1

N→∞

∗ k q ak−m ak−n =

k=1

1 ∗ r (m − n). q +1 a

Similarly N 

1 N 2(q+1) = =

∗ ∗ k q q ak−m ak−n a−m a−n

k,=1 N 

1 N 2(q+1) d 

q q

k 

k,=1

d 

A∗r As At A∗u ei[−(k−m)γr +(k−n)γs +(−m)γt −(−n)γu ]

r,s,t,u=1

A∗r As At A∗u ei[m(γr −γt )−n(γs −γu )]

r,s,t,u=1

×

N 

1 N 2(q+1)

k q q eik(γs −γr ) ei(γt −γu )

k,=1

so that lim

N→∞

N 

1 N 2(q+1)

∗ ∗ k q q ak−m ak−n a−m a−n =

k,=1

1 ra (m − n) 2 . (q + 1)2

Consequently, lim |εN |2 = 0

N→∞

which concludes the proof. ✷ Equipped with these results, we are now in a position to prove Proposition 1. Toward this end, note that the CRB is given by  −1 CRB = const hH AH D H ⊥ . A DAh However, N 3 H  H 2  3  h A D A N /3 h 3

  −1  H

  N 3 H H h A DA N 2 /2 AH A/N − A DA N 2 /2 h. 4

hH AH D H ⊥ A DAh =

126

O. Besson, P. Stoica / Digital Signal Processing 13 (2003) 106–127

One can readily verify that 

AH A/N



= m,n

N 1  ∗ ak−m ak−n , N k=1



N

  2  ∗ (k − 1)ak−m ak−n , AH DA N 2 /2 m,n = 2 N k=1



AH D 2 A

N

 3  3  ∗ N /3 m,n = 3 (k − 1)2 ak−m ak−n . N k=1

Let R be the L × L matrix whose (m, n)th element is R(m, n)  ra (m − n). Applying Results 1 and 3 yields   lim AH A/N = R, N→∞ 

  lim AH DA N 2 /2 = R, N→∞

   lim AH D 2 A N 3 /3 = R. N→∞

In the following we assume that R is invertible. Note that, for the CRB expression (7) to hold, A must be full-rank which implies that AH A is invertible. Since R = limN→∞ N −1 AH A and R is a covariance matrix, this is a mild assumption. Using standard results on the convergence of (random) variables [13; 16, Chap. 6], it follows that the asymptotic CRB for the two classes of signals considered is given by  −1 σw2 1 H 1 H 3σw2  H −1 3 h h Rh − Rh = lim N CRB = h Rh N→∞ 4 8π 2 3 2π 2 which proves (10).

References [1] J. Proakis, Digital Communications, 3rd ed., McGraw–Hill, New York, 1995. [2] M. Morelli, U. Mengali, Carrier-frequency estimation for transmissions over selective channels, IEEE Trans. Comm. 48 (2000) 1580–1589. [3] H. Viswanathan, R. Krishnamoorthy, A frequency offset estimation technique for frequency-selective fading channels, IEEE Comm. Lett. 5 (2001) 166–168. [4] M. Morelli, U. Mengali, An improved frequency offset estimator for OFDM applications, IEEE Comm. Lett. 3 (1999) 75–77. [5] Y. Choi, P.J. Voltz, F.A. Cassara, ML estimation of carrier frequency offset for multicarrier signals in Rayleigh fading channels, IEEE Trans. Vehicular Technol. 50 (2001) 644–655. [6] J. Li, G. Liu, G.B. Giannakis, Carrier frequency offset estimation for OFDM-based WLANs, IEEE Signal Process. Lett. 8 (2001) 80–82. [7] S.N. Crozier, D.D. Falconer, S.A. Mahmoud, Least sum of squared errors (LSSE) channel estimation, IEEE Proc. F 138 (1991) 371–378. [8] C. Tellambura, M.G. Parker, Y. Jay Guo, S.J. Sheperd, S.K. Barton, Optimal sequences for channel estimation using discrete Fourier transform techniques, IEEE Trans. Comm. 47 (1999) 230–238.

O. Besson, P. Stoica / Digital Signal Processing 13 (2003) 106–127

127

[9] W. Chen, U. Mitra, Training sequence optimization: Comparisons and an alternative criterion, IEEE Trans. Comm. 48 (2000) 1987–1991. [10] J.H. Manton, Optimal training sequences and pilot tones for OFDM systems, IEEE Comm. Lett. 5 (2001) 151–153. [11] S.M. Kay, Fundamentals of Statistical Signal Processing: Estimation Theory, Prentice Hall, Englewood Cliffs, NJ, 1993. [12] P. Stoica, R. Moses, Introduction to Spectral Analysis, Prentice Hall, Upper Saddle River, NJ, 1997. [13] T. Söderström, P. Stoicce, System Identification, Prentice Hall International, London, 1989. [14] I.S. Gradshteyn, I.M. Ryzhik, in: A. Jeffrey (Ed.), Table of Integrals, Series and Products, 5th ed., Academic Press, San Diego, 1994. [15] P. Stoica, T. Söderström, F.N. Ti, Asymptotic properties of the high-order Yule–Walker estimates of sinusoidal frequencies, IEEE Trans. Acoustics Speech Signal Process. 37 (1989) 1721–1734. [16] P.J. Brockwell, R.A. Davis, Time Series: Theory and Methods, 2nd ed., in: Springer Series in Statistics, Springer-Verlag, Berlin, 1991.