Timing synchronizer and its architecture for OFDM-based high-throughput millimeter wave systems

Timing synchronizer and its architecture for OFDM-based high-throughput millimeter wave systems

Accepted Manuscript Short communication Timing Synchronizer and Its Architecture for OFDM-Based High-Throughput Millimeter Wave Systems Trong Nghia Le...

1MB Sizes 0 Downloads 21 Views

Accepted Manuscript Short communication Timing Synchronizer and Its Architecture for OFDM-Based High-Throughput Millimeter Wave Systems Trong Nghia Le, Yi-Ting Hsieh, Wen-Long Chin PII: DOI: Reference:

S1434-8411(17)31319-5 http://dx.doi.org/10.1016/j.aeue.2017.05.043 AEUE 51914

To appear in:

International Journal of Electronics and Communications

Received Date: Accepted Date:

24 May 2016 29 May 2017

Please cite this article as: T.N. Le, Y-T. Hsieh, W-L. Chin, Timing Synchronizer and Its Architecture for OFDMBased High-Throughput Millimeter Wave Systems, International Journal of Electronics and Communications (2017), doi: http://dx.doi.org/10.1016/j.aeue.2017.05.043

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Timing Synchronizer and Its Architecture for OFDM-Based High-Throughput Millimeter Wave Systems Trong Nghia Le, Yi-Ting Hsieh, and Wen-Long Chin∗ Department of Engineering Science, National Cheng Kung University, No. 1 University Road, Tainan City, Taiwan.

Abstract In this work, a new symbol time synchronization and its architecture design for high-throughput millimeter wave systems based on orthogonal frequencydivision multiplexing (OFDM) are introduced. Complementary Golay sequences with good signal properties are popular training sequences used for the preamble design of a multi-gigabit communication system, which is the promising technology for future 5G communications. The basic idea of our contribution is to obtain a time estimate based on the aperiodic autocorrelation function (ACF) of complementary Golay sequences. Besides, achieving low power consumption and less chip area remains the challenge of high-throughput millimeter wave systems. To achieve over 2.64 GSamples/s throughput requirement, the proposed algorithm is especially suitable for the parallel design architecture of very high throughput receivers. Moreover, the complexity is further reduced by employing the correlation characteristic of the ACF of complementary Golay sequences and the regularity of the algorithm. Simulations confirm the advantages of the proposed synchronizer. Keywords: Millimeter wave system; orthogonal frequency-division multiplexing (OFDM); receiver; synchronization.

∗ Corresponding

author Email address: [email protected] (Trong Nghia Le, Yi-Ting Hsieh, and Wen-Long Chin)

Preprint submitted to Journal of LATEX Templates

June 2, 2017

1. Introduction Attention has been paid to short-range wireless communications for highthroughput data rate in recent years. Millimeter wave communication systems are playing key roles in contemporary gigabit wireless communication area as 5

millimeter-wave industrial standards from IEEE [1]. The IEEE 802.11ad [1] proposed to use the Golay sequence as the preamble signal for the 60 GHz wireless orthogonal frequency-division multiplexing (OFDM) system, which is the promising technology for future 5G communications. However, synchronization parameter impairments can significantly deteriorate the performance of OFDM

10

systems [2]. To cope with signal synchronization problems, some algorithms had been proposed for OFDM. There are three major tasks in a synchronizer, packet detection, carrier [3] and time [4]–[5] synchronization, while carrier synchronization and packet detection are beyond the scope of this work. Besides the

15

three major tasks, phase noise [6], [7] is another impairment that should be eliminated. Timing synchronization can be divided into two categories: non-dataaided methods [4]–[8] and data-aided methods [9]–[5]. An interesting work [10] studied the security issues of OFDM synchronization, i.e., impacts of jamming attacks on the synchronization. Another study [11] improves the throughput of

20

60-GHz wireless personal area networks (WPAN) by an effective time slot allocation scheme. Complementary Golay sequences [12], [13] were proposed for the training sequences of the IEEE 802.11ad standard [1]. However, conventional data-aided synchronization techniques have poor performance for the preamble design based on complementary Golay sequences.

25

This work studies a robust synchronization method based on the good correlation properties of the Golay sequence. It adopts inherent properties of the Golay sequence and utilizes the advantages of auto- and cross-correlations at the same time. Based on the proposed synchronization method, this work further modifies the algorithm to reduce the complexity of the hardware design based on

30

the proposed aperiodic autocorrelation function (ACF) of complementary Go-

2

Figure 1: Block diagram of the millimeter wave transmitter (TX) and receiver (RX) based on the OFDM.

lay sequences. To the best of our knowledge, this is the first work that adopts the ACF for the synchronization purpose. Besides, achieving low power consumption and less chip area remains the challenge of high-throughput OFDM systems [14], [15]. To achieve over 2.64 GSamples/s throughput requirement [1], 35

the proposed algorithm is especially suitable for the parallel design architecture of very high throughput receivers. Moreover, the complexity is greatly reduced by observing the correlation characteristic of the ACF of complementary Golay sequences and the regularity of the proposed algorithm. Simulations validate the advantages of the proposed design.

40

The rest of this paper is organized as follows. Section 2 introduces the receiver block diagram and signal model. Sections 3 and 4 present the proposed synchronization algorithm and describe the simulation analysis for the complexity reduction of the proposed method, respectively. Next, Section 5 gives the

3

architecture design of the proposed algorithm. Conclusions are finally drawn in 45

Section 6.

2. Transmitter and Receiver Block Diagram and Signal Model Figure 1 displays the block diagram of the millimeter wave transmitter and receiver, where, for simplicity, the forward error correction (FEC) decoder is not shown. The receiver performs the reverse operations in the transmitter. 50

On the transmitter side, the preamble composed by the Golay sequence a(n) is inserted at the beginning of each frame. After synchronization and cyclic prefix (CP) removal, the received signal is sent to fast Fourier transform (FFT) and demapper. For synchronization purpose, the preamble has a repetitive structure with M segments. Each segment is composed by the Golay sequence

55

a(n) with a length of N . The sequences a(n) and b(n) are assumed to be a pair of complementary Golay sequences [12]. Notably, the maximum permitted transmitter power varies by country, but in general +10 dBm can be taken as a practical limit. The minimal receiver sensitivity is -68 dBm. Let h(l), l = 0, 1, . . . , L, denote the impulse response of a multipath channel

60

with (L + 1) taps. The channel taps are assumed to be uncorrelated and quasistationary over a symbol. The m-th segment of the whole transmitted training signal x(n0 ) is denoted by xm (n) ≡ x(n0 = mN + n), where n ∈ [0, N ) and xm (n) = a(n), ∀m. After passing the multipath fading channel, the received training signal can be written as y(n0 )

=

L X

h(l)x(n0 − l − θ) + w(n0 )

l=0

=

M −1 X

ym (n − θ)

(1)

m=0 65

where ym (n − θ) ≡ y(n0 = mN + n − θ) =

PL

l=0

h(l)xm (n − l − θ) + wm (n)

represents the m-th segment of the received signal, wm (n) ≡ w(mN + n), w(n) 2 is the additive white Gaussian noise (AWGN) with zero mean and variance σw ,

and θ is the unknown timing offset to be estimated. 4

Table 1: Values of Ra (k) and Rb (k).

k

1

2

3

4

5

...

125

126

127

Ra (k)

-1

0

3

0

-1

...

-3

0

1

Rb (k)

1

0

-3

0

1

...

3

0

-1

3. Time-Shifted ACF of Received Signals and Timing Synchroniza70

tion Inspired by the properties of the complementary Golay sequence and the multipath channel, the time-shifted ACF of the m-th segment of the Golay sequence of received signals is proposed as RySm (k, ∆θ)

=

1 Vh

NX −k−1

ym (n + ∆θ) ·

n=0

∗ ym (n + ∆θ + k)

(2)

where · denotes the multiplication, k = 0, 1, ..., N − 1, (·)S denotes the time75

shifted version of the involved quantity, ∆θ ≡ θ˜ − θ denotes the time shift relative to the ACF of its complementary sequence b(n), θ˜ denotes the hypothesis i P 2 P h 2 of the timing offset θ, and Vh ≡ σ = E |h(l)| denotes the total l h(l) l channel variance, where E [·] denotes the expectation. As shown in Appendix, the total channel variance Vh can be estimated by the packet detection and

80

then compensated for the proposed synchronization method. For clarity, in the following discussion, we assume Vh = 1. Since there are M repetitive segments of the Golay sequence, to fully utilize all received Golay sequences and reduce the complexity, one can define the time-shifted ACF of the sample average of received samples as RyS (k, ∆θ)

=

NX −k−1

y (n + ∆θ) y ∗ (n + ∆θ + k)

n=0

(3) 85

where y(n) =

1 M

PM −1 m=0

ym (n).

5

To fully utilize all periods (N −k), for ∀k 6= 0, of the ACF, the timing metric was defined as T (∆θ) =

X RyS (k, ∆θ) + Rb (k)

(4)

k6=0

where Rb (k) is the ACF of the complementary Golay sequence b(·) and |·| denotes the absolute operation. Notably, Rb (k) is constant and can be precal90

culated and stored in memory for use. As ∆θ = 0, xm (n) = a(n), ∀m. As such, when M → ∞, RS (k, ∆θ) → R (k) because h(·) is uncorrelated. Therea

y

fore, according to the complementary property of the Golay sequence [12], i.e. Ra (k) + Rb (k) = 0 for ∀k 6= 0, when θ˜ = θ, the proposed timing metric will have its minimum value. The values of Ra (k) and Rb (k) for N = 128 are displayed 95

in Table 1, which validates that Ra (k) + Rb (k) = 0, ∀ k = 6 0. The absolute operation is taken in that RyS (k, ∆θ) + Rb (k) is generally a complex number. The proposed timing estimation is finally given by θˆ = arg min T (∆θ). θ˜

(5)

The central idea of the presented method is that the complementary property of the Golay sequence is utilized. First, owing to the properties of the 100

uncorrelated multipath channel, the ACF (2) is defined. Next, at the correct symbol time, the metric (5) has its minimum value because of the complementary property of the Golay sequence.

4. Performance Evaluation Monte Carlo simulations are conducted to evaluate the performance of the 105

proposed estimator. The number of subcarrier is 512; the modulation scheme is BPSK; the carrier frequency is 60 GHz; the bandwidth is 2640 MHz; the subcarrier spacing is 4.125 MHz; the length of CP is 128 samples; the sampling period is 0.379 ns; N = 128, and M = 16. The performance is measured in terms of the bit error rate (BER)

110

and mean-squared error (MSE), which are averaged over 20000 trials. 6

0

10

Schmidl¢ s Method CrossCorr Method Proposed Method

-1

10

-2

BER

10

-3

10

-4

10

-5

10

-6

10

0

2

4

6 8 SNR (dB)

10

12

14

Figure 2: BERs as a function of SNR over the residential channel based on the symbol times derived by the proposed, Schmidl’s, and cross-correlation estimators. -3

10

Proposed Method, w=1 Proposed Method, w=2 Proposed Method, w=3

MSE

Proposed Method, w=4

-4

10

-5

10

0

2

4

6 8 SNR (dB)

10

12

14

Figure 3: MSE of the symbol time estimate (relative to the ideal one) of the proposed estimator as a function of SNR under various ω over the residential channel.

Figure 2 plots the BERs as a function of signal-to-noise ratio (SNR) over the residential channel model based on the symbol times derived by the proposed, auto-correlation (Schmidl’s method), and cross-correlation (CrossCorr) estimators. The residential channel impulse response is generated by the CM1

7

115

residential channel model [16]. The timing error is randomly generated by the   uniform distribution over 0, N/4 . As shown, the BERs of the Schmidl’s and cross-correlation estimators have a floor owing to the reduction of signal-tointerference-and-noise ratio (SINR) generated by the synchronization error. When ∆θ = 0, the metric (4) sums over 1 ≤ k ≤ N − 1 and ideally has N − 1

120

zero terms; hence, we propose to perform decimation to reduce the complexity. Assume the metric is decimated by ω, the metric (4) becomes T (∆θ) =

−2 b Nω c

X RyS (1 + iω, ∆θ) + Rb (1 + iω)

(6)

i=0

where b·c denotes the greatest integer function. Figure 3 plots the MSE of the symbol time estimate relative to the ideal one as a function of SNR under various ω over the residential channel model. As pre125

sented, the performance decreases when ω increases because the sampling rate is reduced. Moreover, the computational complexity is inversely proportional to ω. Therefore, to tradeoff the performance for computational complexity without too much performance loss, we propose to use ω = 2, because, as shown in Table 1, when k is an even number, Ra (k) and Rb (k) exactly equal 0, which

130

contributes less to the metric (6) than odd k. 5. Architecture Design The overall architecture of the synchronizer is displayed in Fig. 4, where clk 16x denotes the clock with sampling frequency. Besides the clock divider by 16 (DIV), there are 3 main blocks, AVG, RY, and THETA EST, used for the

135

purpose of calculating sample average, time-shifted ACF, and theta estimate, respectively. 5.1. Sample Average Block (AVG) To achieve 2.64 GSamples/s, the AVG, used for the overlap-and-add operation, is designed with sixteen parallel signal paths operating at the

140

1 16

sampling

frequency. Figure 5 plots the datapth of AVG, where s(7,4f) and Q denote a 7-bit signed number with 4-bit decimal and quantization function, respectively. 8

clk_2x rst

DIV2

clk_1x

received_Im

clk_2x

clk_1x

clk_1x

Ry_valid_1,3,5,7

input_valid received_Re

SYNC

clk_1x

y_bar_out_valid

AVG

RY

y_bar_Re

..

2x

y_bar_Im

done

Ry_re_1_1,2 Ry_im_1_1,2 Ry_re_8_1,2 _

THETA _EST

theta_hat

Ry_im_8_1,2

Figure 4: Architecture of the synchronizer.

Figure 5: Datapath of the block, AVG.

5.2. Time-Shifted ACF Block (RY) ˜ For illustration purpose, we assume ω = 1 and θ = 0 (i.e. ∆θ = θ˜ − θ = θ). In this case, the ACF is given by ˜ = RyS (k, θ)

NX −k−1

    y n + θ˜ y ∗ n + θ˜ + k

n=0

    = RyS (k, θ˜ − 1) − y θ˜ − 1 y ∗ θ˜ − 1 + k     + y θ˜ + N − k − 1 y ∗ θ˜ + N − 1 . 145

(7)

˜ can be obtained using the previous ACF, RS (k, θ− ˜ Hence, current ACF, RyS (k, θ) y 1). Besides, we propose to use 8 parallel signal paths to reduce the clock rate, 9

Figure 6: Datapath of the block, RY.

as shown in Fig. 6. Figure 7 presents the pipeline operation of the ACF and its output order

150

˜ Furthermore, by expanding (7), Fig. 8 displays those for various k and θ. terms involved in determining RS (1, 0), RS (127, 0), RS (1, 1), and RS (127, 1). y

y

y

y

˜ it can be observed that, for k1 + k2 = N , RS (k1 , θ) ˜ Generally, under a given θ, y ˜ can share a complex multiplier to achieve a constant processing and RyS (k2 , θ) duration and reduces the number of required multiplier by a factor of 2. Notably, for ω = 2, only RY 1, RY 3, RY 5, RY 7 are required and the area is reduced 155

by half. 5.3. Theta Estimation Block (THETA EST) After obtaining the ACF, the metric (6) is calculated, as shown in Fig. 9. The ACF of the complementary Golay sequence, b(n), is added to the ACF 10

˜ Figure 7: Pipeline operation of the ACF and its output order for various k and θ. RyS (127,1)

y_bar[0] y_bar[1] y_bar[1] y_bar[2] y_bar[2] y_bar[3] y_bar[3] y_bar[4]

S y

R (1,1)

RyS (1, 0 )

... y_bar[125] y_bar[126] y_bar[126] y_bar[127] y_bar[127] y_bar[0]

RyS (127, 0 )

Figure 8: Expanded terms used for calculating RyS (1, 0), RyS (127, 0), RyS (1, 1), and RyS (127, 1).

˜ The square root operation is implemented using of received signals, RyS (k, θ). 160

the square-root approximation (SRA) [17]. The index of the metric with the minimum value is output as the symbol time estimate.

11

mux

D

mux

D

mux

clk_1x

D

+

rst Ry_valid_1,3,5,7

+

rb(k)

Ry_re_1_1 Ry_im_1_1

Ry_re_8_2 _ Ry_im_8_2

SRA 1_1

rb(k)

..

+

SRA 8_2

D done

+ mux

..

+

+

COMPARATOR

D

SUM_1

T(q )

COUNTER

theta_hat

SUM_2 SUM_3

THETA_EST

SUM_4

Figure 9: Datapath of the block, THETA EST. 0

10

Floating Point, w=1 Fixed Point, w=1

-1

10

Fixed Point, w=2

-2

BER

10

-3

10

-4

10

-5

10

-6

10

0

2

4

6 8 SNR (dB)

10

12

14

Figure 10: Fixed-point BER of the proposed estimator as a function of SNR over the residential channel, ω = 1 and ω = 2.

5.4. Fixed-Point Bit Error Rate (BER) Performance Figure 10 presents the fixed-point BER simulation of the proposed estimator as a function of SNR over the residential channel, ω = 1 and ω = 2. The floating165

point BER of ω = 1 is also displayed as the benchmark. Compare the fixed-point simulation of ω = 2 to the benchmark, the proposed design has approximately

12

Table 2: Implementation Summary, ω = 1 and ω = 2.

ω=1

ω=2

Gate Count

Power (mW)

Gate Count

Power (mW)

AVG

7.4K

2.3

7.4K

2.3

RY

56.9K

17.1

36.2K

10.1

THETA EST

18.8K

6.1

16.2K

4.8

Total

83.1K

25.5

59.8K

17.2

less than 1 dB degradation and can efficiently enhance the system performance. 5.5. Experimental Result Table 2 lists the implementation summary for ω = 1 and ω = 2 designed in 170

TSMC 90 nm CMOS technology. An equivalent gate is counted as the size of a two-input NAND gate. The power consumption is based on the post-layout simulation. As shown, with a performance of less than 1 dB degradation, the gate count and power consumption of ω = 1 compared with those of ω = 2 can be saved by about 28% and 32.5%, respectively.

175

6. Conclusions This study proposes a new symbol time synchronization algorithm and its design architecture that are based on the good correlation property of the complementary Golay sequence. The proposed method can be utilized for the training sequences with a repeated structure. Furthermore, by adopting the par-

180

allel design methodology, the proposed architecture can achieve the stringent high-throughput specification of the promising millimeter wave communication systems.

Acknowledgements This work is supported in part by the grant MOST 105-2221-E-006-019185

MY2, Taiwan. 13

Appendix A. Derivation of The Total Channel Variance To estimate the total channel variance Vh , it can be shown that, the sample PM −1 PN −1 2 average of received power is M1N m=0 n=0 |ym (n)| with its mean value " # M −1 N −1 1 X X 2 E |ym (n)| M N m=0 n=0   PL σ 2 + σ 2 = V + σ 2 , H h 1 w w l=0 h(l) = (A.1)  σ2 , H 0

w

where H1 and H0 denote the hypotheses that the packet is present and not 190

present, respectively. The presence of signals is typically determined by the packet detection implemented using the energy detector. Therefore, the estimation of Vh can be determined by Vˆh =

M −1 N −1 1 X X 2 2 |ym (n)|H1 | − σ ˆw M N m=0 n=0

(A.2)

2 where ym (n)|H1 denotes the received signal ym (n) under H1 hypothesis, σ ˆw = P N1 −1 2 1 n=0 |y(n)|H0 | , and N1 denotes the duration of H0 . N1

195

References [1] IEEE, Standard for information technology–telecommunications and information exchange between systems–local and metropolitan area networks– specific requirements-part 11: Wireless lan medium access control (mac) and physical layer (phy) specifications amendment 3: Enhancements for

200

very high throughput in the 60 ghz band, IEEE Std. 802.11ad (2011) 1– 372doi:10.1109/IEEESTD.2011.6018236. [2] H. Steendam, M. Moeneclaey, Analysis and optimization of the performance of ofdm on frequency-selective time-selective fading channels, IEEE Trans. Commun. 47 (12) (1999) 1811–1819. doi:10.1109/26.809701.

205

[3] W. J. Shin, D. H. Kim, Y. H. You, Block-wise frequency offset estimation scheme in mimo-ofdm systems, AEU-Int. J. Electron. Commun. 66 (12) (2012) 979–984. doi:10.1016/j.aeue.2012.04.007. 14

[4] H. T. Hsieh, W. R. Wu, Maximum likelihood timing and carrier frequency offset estimation for ofdm systems with periodic preambles, IEEE 210

Trans. Vehicular Tech. 58 (8) (2009) 4224–4237. doi:10.1109/TVT.2009. 2019820. [5] W. L. Chin, Maximization of effective signal power in dct window for symbol time synchronization in optical fast ofdm, IEEE/OSA J. Lightw. Technol. 31 (5) (2013) 740–748. doi:10.1109/JLT.2012.2232642.

215

[6] A. Ishaque, G. Ascheid, Efficient map-based estimation and compensation of phase noise in mimo-ofdm receivers, AEU-Int. J. Electron. Commun. 67 (12) (2013) 1096–1106. doi:10.1016/j.aeue.2013.08.016. [7] J. Fang, E. P. Simon, M. Berbineau, M. Lienard, Joint channel and phase noise estimation in ofdm systems at very high speeds, AEU-Int. J. Electron.

220

Commun. 67 (4) (2013) 295–300. doi:10.1016/j.aeue.2012.09.002. [8] W. L. Chin, Blind symbol synchronization for ofdm systems using cyclic prefix in time-variant and long-echo fading channels, IEEE Trans. Vehicular Tech. 61 (1) (2012) 185–195. doi:10.1109/TVT.2011.2177502. [9] T. M. Schmidl, D. C. Cox, Robust frequency and timing synchronization

225

for ofdm, IEEE Trans. Commun. 45 (12) (1997) 1613–1621. doi:10.1109/ 26.650240. [10] M. J. L. Pan, T. C. Clancy, R. W. McGwier, Physical layer orthogonal frequency-division multiplexing acquisition and timing synchronization security, Wiley Wirel. Commun. Mob. Comput. 16 (2) (2016) 177–191.

230

doi:10.1002/wcm.2500. [11] W. Zou, Y. Hu, B. Li, Z. Zhou, Z. Cui, An improved exclusive region scheduling algorithm-based timeslot allocation scheme for mmwave wpans, Wiley Wirel. Commun. Mob. Comput. 14 (13) (2014) 1276–1286. doi: 10.1002/wcm.2231.

15

235

[12] M. J. E. Golay, Complementary series, IRE Trans. Inf. Theory IT-7 (2) (1961) 82–87. doi:10.1109/TIT.1961.1057620. [13] Y. Li, W. B. Chu, More golay sequences, IEEE Trans. Inf. Theory 51 (3) (2005) 1141–1145. doi:10.1109/TIT.2004.842775. [14] H. Y. Liu, C. Y. Lee, A low-complexity synchronizer for ofdm-based uwb

240

system, IEEE Trans. Circuits Syst. II, Exp. Briefs 53 (11) (2006) 1269– 1273. doi:10.1109/TCSII.2006.882804. [15] J. Y. Yu, C. C. Chung, C. Y. Lee, A symbol–rate timing synchronization method for low power wireless ofdm systems, IEEE Trans. Circuits Syst. II, Exp. Briefs 55 (9) (2008) 922–926. doi:10.1109/TCSII.2008.923405.

245

[16] A. Sadri, 802.15.3c usage model document (umd), IEEE Draft (2006) 1– 32doi:10.1109/TAP.2009.2030524. [17] M. Allie, R. Lyons, A root of less evil digital signal processing, IEEE Signal Process. Mag. 22 (2) (2005) 93–96. doi:10.1109/MSP.2005.1406500.

16