Accepted Manuscript Short communication Timing Synchronizer and Its Architecture for OFDM-Based High-Throughput Millimeter Wave Systems Trong Nghia Le, Yi-Ting Hsieh, Wen-Long Chin PII: DOI: Reference:
S1434-8411(17)31319-5 http://dx.doi.org/10.1016/j.aeue.2017.05.043 AEUE 51914
To appear in:
International Journal of Electronics and Communications
Received Date: Accepted Date:
24 May 2016 29 May 2017
Please cite this article as: T.N. Le, Y-T. Hsieh, W-L. Chin, Timing Synchronizer and Its Architecture for OFDMBased High-Throughput Millimeter Wave Systems, International Journal of Electronics and Communications (2017), doi: http://dx.doi.org/10.1016/j.aeue.2017.05.043
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Timing Synchronizer and Its Architecture for OFDM-Based High-Throughput Millimeter Wave Systems Trong Nghia Le, Yi-Ting Hsieh, and Wen-Long Chin∗ Department of Engineering Science, National Cheng Kung University, No. 1 University Road, Tainan City, Taiwan.
Abstract In this work, a new symbol time synchronization and its architecture design for high-throughput millimeter wave systems based on orthogonal frequencydivision multiplexing (OFDM) are introduced. Complementary Golay sequences with good signal properties are popular training sequences used for the preamble design of a multi-gigabit communication system, which is the promising technology for future 5G communications. The basic idea of our contribution is to obtain a time estimate based on the aperiodic autocorrelation function (ACF) of complementary Golay sequences. Besides, achieving low power consumption and less chip area remains the challenge of high-throughput millimeter wave systems. To achieve over 2.64 GSamples/s throughput requirement, the proposed algorithm is especially suitable for the parallel design architecture of very high throughput receivers. Moreover, the complexity is further reduced by employing the correlation characteristic of the ACF of complementary Golay sequences and the regularity of the algorithm. Simulations confirm the advantages of the proposed synchronizer. Keywords: Millimeter wave system; orthogonal frequency-division multiplexing (OFDM); receiver; synchronization.
∗ Corresponding
author Email address:
[email protected] (Trong Nghia Le, Yi-Ting Hsieh, and Wen-Long Chin)
Preprint submitted to Journal of LATEX Templates
June 2, 2017
1. Introduction Attention has been paid to short-range wireless communications for highthroughput data rate in recent years. Millimeter wave communication systems are playing key roles in contemporary gigabit wireless communication area as 5
millimeter-wave industrial standards from IEEE [1]. The IEEE 802.11ad [1] proposed to use the Golay sequence as the preamble signal for the 60 GHz wireless orthogonal frequency-division multiplexing (OFDM) system, which is the promising technology for future 5G communications. However, synchronization parameter impairments can significantly deteriorate the performance of OFDM
10
systems [2]. To cope with signal synchronization problems, some algorithms had been proposed for OFDM. There are three major tasks in a synchronizer, packet detection, carrier [3] and time [4]–[5] synchronization, while carrier synchronization and packet detection are beyond the scope of this work. Besides the
15
three major tasks, phase noise [6], [7] is another impairment that should be eliminated. Timing synchronization can be divided into two categories: non-dataaided methods [4]–[8] and data-aided methods [9]–[5]. An interesting work [10] studied the security issues of OFDM synchronization, i.e., impacts of jamming attacks on the synchronization. Another study [11] improves the throughput of
20
60-GHz wireless personal area networks (WPAN) by an effective time slot allocation scheme. Complementary Golay sequences [12], [13] were proposed for the training sequences of the IEEE 802.11ad standard [1]. However, conventional data-aided synchronization techniques have poor performance for the preamble design based on complementary Golay sequences.
25
This work studies a robust synchronization method based on the good correlation properties of the Golay sequence. It adopts inherent properties of the Golay sequence and utilizes the advantages of auto- and cross-correlations at the same time. Based on the proposed synchronization method, this work further modifies the algorithm to reduce the complexity of the hardware design based on
30
the proposed aperiodic autocorrelation function (ACF) of complementary Go-
2
Figure 1: Block diagram of the millimeter wave transmitter (TX) and receiver (RX) based on the OFDM.
lay sequences. To the best of our knowledge, this is the first work that adopts the ACF for the synchronization purpose. Besides, achieving low power consumption and less chip area remains the challenge of high-throughput OFDM systems [14], [15]. To achieve over 2.64 GSamples/s throughput requirement [1], 35
the proposed algorithm is especially suitable for the parallel design architecture of very high throughput receivers. Moreover, the complexity is greatly reduced by observing the correlation characteristic of the ACF of complementary Golay sequences and the regularity of the proposed algorithm. Simulations validate the advantages of the proposed design.
40
The rest of this paper is organized as follows. Section 2 introduces the receiver block diagram and signal model. Sections 3 and 4 present the proposed synchronization algorithm and describe the simulation analysis for the complexity reduction of the proposed method, respectively. Next, Section 5 gives the
3
architecture design of the proposed algorithm. Conclusions are finally drawn in 45
Section 6.
2. Transmitter and Receiver Block Diagram and Signal Model Figure 1 displays the block diagram of the millimeter wave transmitter and receiver, where, for simplicity, the forward error correction (FEC) decoder is not shown. The receiver performs the reverse operations in the transmitter. 50
On the transmitter side, the preamble composed by the Golay sequence a(n) is inserted at the beginning of each frame. After synchronization and cyclic prefix (CP) removal, the received signal is sent to fast Fourier transform (FFT) and demapper. For synchronization purpose, the preamble has a repetitive structure with M segments. Each segment is composed by the Golay sequence
55
a(n) with a length of N . The sequences a(n) and b(n) are assumed to be a pair of complementary Golay sequences [12]. Notably, the maximum permitted transmitter power varies by country, but in general +10 dBm can be taken as a practical limit. The minimal receiver sensitivity is -68 dBm. Let h(l), l = 0, 1, . . . , L, denote the impulse response of a multipath channel
60
with (L + 1) taps. The channel taps are assumed to be uncorrelated and quasistationary over a symbol. The m-th segment of the whole transmitted training signal x(n0 ) is denoted by xm (n) ≡ x(n0 = mN + n), where n ∈ [0, N ) and xm (n) = a(n), ∀m. After passing the multipath fading channel, the received training signal can be written as y(n0 )
=
L X
h(l)x(n0 − l − θ) + w(n0 )
l=0
=
M −1 X
ym (n − θ)
(1)
m=0 65
where ym (n − θ) ≡ y(n0 = mN + n − θ) =
PL
l=0
h(l)xm (n − l − θ) + wm (n)
represents the m-th segment of the received signal, wm (n) ≡ w(mN + n), w(n) 2 is the additive white Gaussian noise (AWGN) with zero mean and variance σw ,
and θ is the unknown timing offset to be estimated. 4
Table 1: Values of Ra (k) and Rb (k).
k
1
2
3
4
5
...
125
126
127
Ra (k)
-1
0
3
0
-1
...
-3
0
1
Rb (k)
1
0
-3
0
1
...
3
0
-1
3. Time-Shifted ACF of Received Signals and Timing Synchroniza70
tion Inspired by the properties of the complementary Golay sequence and the multipath channel, the time-shifted ACF of the m-th segment of the Golay sequence of received signals is proposed as RySm (k, ∆θ)
=
1 Vh
NX −k−1
ym (n + ∆θ) ·
n=0
∗ ym (n + ∆θ + k)
(2)
where · denotes the multiplication, k = 0, 1, ..., N − 1, (·)S denotes the time75
shifted version of the involved quantity, ∆θ ≡ θ˜ − θ denotes the time shift relative to the ACF of its complementary sequence b(n), θ˜ denotes the hypothesis i P 2 P h 2 of the timing offset θ, and Vh ≡ σ = E |h(l)| denotes the total l h(l) l channel variance, where E [·] denotes the expectation. As shown in Appendix, the total channel variance Vh can be estimated by the packet detection and
80
then compensated for the proposed synchronization method. For clarity, in the following discussion, we assume Vh = 1. Since there are M repetitive segments of the Golay sequence, to fully utilize all received Golay sequences and reduce the complexity, one can define the time-shifted ACF of the sample average of received samples as RyS (k, ∆θ)
=
NX −k−1
y (n + ∆θ) y ∗ (n + ∆θ + k)
n=0
(3) 85
where y(n) =
1 M
PM −1 m=0
ym (n).
5
To fully utilize all periods (N −k), for ∀k 6= 0, of the ACF, the timing metric was defined as T (∆θ) =
X RyS (k, ∆θ) + Rb (k)
(4)
k6=0
where Rb (k) is the ACF of the complementary Golay sequence b(·) and |·| denotes the absolute operation. Notably, Rb (k) is constant and can be precal90
culated and stored in memory for use. As ∆θ = 0, xm (n) = a(n), ∀m. As such, when M → ∞, RS (k, ∆θ) → R (k) because h(·) is uncorrelated. Therea
y
fore, according to the complementary property of the Golay sequence [12], i.e. Ra (k) + Rb (k) = 0 for ∀k 6= 0, when θ˜ = θ, the proposed timing metric will have its minimum value. The values of Ra (k) and Rb (k) for N = 128 are displayed 95
in Table 1, which validates that Ra (k) + Rb (k) = 0, ∀ k = 6 0. The absolute operation is taken in that RyS (k, ∆θ) + Rb (k) is generally a complex number. The proposed timing estimation is finally given by θˆ = arg min T (∆θ). θ˜
(5)
The central idea of the presented method is that the complementary property of the Golay sequence is utilized. First, owing to the properties of the 100
uncorrelated multipath channel, the ACF (2) is defined. Next, at the correct symbol time, the metric (5) has its minimum value because of the complementary property of the Golay sequence.
4. Performance Evaluation Monte Carlo simulations are conducted to evaluate the performance of the 105
proposed estimator. The number of subcarrier is 512; the modulation scheme is BPSK; the carrier frequency is 60 GHz; the bandwidth is 2640 MHz; the subcarrier spacing is 4.125 MHz; the length of CP is 128 samples; the sampling period is 0.379 ns; N = 128, and M = 16. The performance is measured in terms of the bit error rate (BER)
110
and mean-squared error (MSE), which are averaged over 20000 trials. 6
0
10
Schmidl¢ s Method CrossCorr Method Proposed Method
-1
10
-2
BER
10
-3
10
-4
10
-5
10
-6
10
0
2
4
6 8 SNR (dB)
10
12
14
Figure 2: BERs as a function of SNR over the residential channel based on the symbol times derived by the proposed, Schmidl’s, and cross-correlation estimators. -3
10
Proposed Method, w=1 Proposed Method, w=2 Proposed Method, w=3
MSE
Proposed Method, w=4
-4
10
-5
10
0
2
4
6 8 SNR (dB)
10
12
14
Figure 3: MSE of the symbol time estimate (relative to the ideal one) of the proposed estimator as a function of SNR under various ω over the residential channel.
Figure 2 plots the BERs as a function of signal-to-noise ratio (SNR) over the residential channel model based on the symbol times derived by the proposed, auto-correlation (Schmidl’s method), and cross-correlation (CrossCorr) estimators. The residential channel impulse response is generated by the CM1
7
115
residential channel model [16]. The timing error is randomly generated by the uniform distribution over 0, N/4 . As shown, the BERs of the Schmidl’s and cross-correlation estimators have a floor owing to the reduction of signal-tointerference-and-noise ratio (SINR) generated by the synchronization error. When ∆θ = 0, the metric (4) sums over 1 ≤ k ≤ N − 1 and ideally has N − 1
120
zero terms; hence, we propose to perform decimation to reduce the complexity. Assume the metric is decimated by ω, the metric (4) becomes T (∆θ) =
−2 b Nω c
X RyS (1 + iω, ∆θ) + Rb (1 + iω)
(6)
i=0
where b·c denotes the greatest integer function. Figure 3 plots the MSE of the symbol time estimate relative to the ideal one as a function of SNR under various ω over the residential channel model. As pre125
sented, the performance decreases when ω increases because the sampling rate is reduced. Moreover, the computational complexity is inversely proportional to ω. Therefore, to tradeoff the performance for computational complexity without too much performance loss, we propose to use ω = 2, because, as shown in Table 1, when k is an even number, Ra (k) and Rb (k) exactly equal 0, which
130
contributes less to the metric (6) than odd k. 5. Architecture Design The overall architecture of the synchronizer is displayed in Fig. 4, where clk 16x denotes the clock with sampling frequency. Besides the clock divider by 16 (DIV), there are 3 main blocks, AVG, RY, and THETA EST, used for the
135
purpose of calculating sample average, time-shifted ACF, and theta estimate, respectively. 5.1. Sample Average Block (AVG) To achieve 2.64 GSamples/s, the AVG, used for the overlap-and-add operation, is designed with sixteen parallel signal paths operating at the
140
1 16
sampling
frequency. Figure 5 plots the datapth of AVG, where s(7,4f) and Q denote a 7-bit signed number with 4-bit decimal and quantization function, respectively. 8
clk_2x rst
DIV2
clk_1x
received_Im
clk_2x
clk_1x
clk_1x
Ry_valid_1,3,5,7
input_valid received_Re
SYNC
clk_1x
y_bar_out_valid
AVG
RY
y_bar_Re
..
2x
y_bar_Im
done
Ry_re_1_1,2 Ry_im_1_1,2 Ry_re_8_1,2 _
THETA _EST
theta_hat
Ry_im_8_1,2
Figure 4: Architecture of the synchronizer.
Figure 5: Datapath of the block, AVG.
5.2. Time-Shifted ACF Block (RY) ˜ For illustration purpose, we assume ω = 1 and θ = 0 (i.e. ∆θ = θ˜ − θ = θ). In this case, the ACF is given by ˜ = RyS (k, θ)
NX −k−1
y n + θ˜ y ∗ n + θ˜ + k
n=0
= RyS (k, θ˜ − 1) − y θ˜ − 1 y ∗ θ˜ − 1 + k + y θ˜ + N − k − 1 y ∗ θ˜ + N − 1 . 145
(7)
˜ can be obtained using the previous ACF, RS (k, θ− ˜ Hence, current ACF, RyS (k, θ) y 1). Besides, we propose to use 8 parallel signal paths to reduce the clock rate, 9
Figure 6: Datapath of the block, RY.
as shown in Fig. 6. Figure 7 presents the pipeline operation of the ACF and its output order
150
˜ Furthermore, by expanding (7), Fig. 8 displays those for various k and θ. terms involved in determining RS (1, 0), RS (127, 0), RS (1, 1), and RS (127, 1). y
y
y
y
˜ it can be observed that, for k1 + k2 = N , RS (k1 , θ) ˜ Generally, under a given θ, y ˜ can share a complex multiplier to achieve a constant processing and RyS (k2 , θ) duration and reduces the number of required multiplier by a factor of 2. Notably, for ω = 2, only RY 1, RY 3, RY 5, RY 7 are required and the area is reduced 155
by half. 5.3. Theta Estimation Block (THETA EST) After obtaining the ACF, the metric (6) is calculated, as shown in Fig. 9. The ACF of the complementary Golay sequence, b(n), is added to the ACF 10
˜ Figure 7: Pipeline operation of the ACF and its output order for various k and θ. RyS (127,1)
y_bar[0] y_bar[1] y_bar[1] y_bar[2] y_bar[2] y_bar[3] y_bar[3] y_bar[4]
S y
R (1,1)
RyS (1, 0 )
... y_bar[125] y_bar[126] y_bar[126] y_bar[127] y_bar[127] y_bar[0]
RyS (127, 0 )
Figure 8: Expanded terms used for calculating RyS (1, 0), RyS (127, 0), RyS (1, 1), and RyS (127, 1).
˜ The square root operation is implemented using of received signals, RyS (k, θ). 160
the square-root approximation (SRA) [17]. The index of the metric with the minimum value is output as the symbol time estimate.
11
mux
D
mux
D
mux
clk_1x
D
+
rst Ry_valid_1,3,5,7
+
rb(k)
Ry_re_1_1 Ry_im_1_1
Ry_re_8_2 _ Ry_im_8_2
SRA 1_1
rb(k)
..
+
SRA 8_2
D done
+ mux
..
+
+
COMPARATOR
D
SUM_1
T(q )
COUNTER
theta_hat
SUM_2 SUM_3
THETA_EST
SUM_4
Figure 9: Datapath of the block, THETA EST. 0
10
Floating Point, w=1 Fixed Point, w=1
-1
10
Fixed Point, w=2
-2
BER
10
-3
10
-4
10
-5
10
-6
10
0
2
4
6 8 SNR (dB)
10
12
14
Figure 10: Fixed-point BER of the proposed estimator as a function of SNR over the residential channel, ω = 1 and ω = 2.
5.4. Fixed-Point Bit Error Rate (BER) Performance Figure 10 presents the fixed-point BER simulation of the proposed estimator as a function of SNR over the residential channel, ω = 1 and ω = 2. The floating165
point BER of ω = 1 is also displayed as the benchmark. Compare the fixed-point simulation of ω = 2 to the benchmark, the proposed design has approximately
12
Table 2: Implementation Summary, ω = 1 and ω = 2.
ω=1
ω=2
Gate Count
Power (mW)
Gate Count
Power (mW)
AVG
7.4K
2.3
7.4K
2.3
RY
56.9K
17.1
36.2K
10.1
THETA EST
18.8K
6.1
16.2K
4.8
Total
83.1K
25.5
59.8K
17.2
less than 1 dB degradation and can efficiently enhance the system performance. 5.5. Experimental Result Table 2 lists the implementation summary for ω = 1 and ω = 2 designed in 170
TSMC 90 nm CMOS technology. An equivalent gate is counted as the size of a two-input NAND gate. The power consumption is based on the post-layout simulation. As shown, with a performance of less than 1 dB degradation, the gate count and power consumption of ω = 1 compared with those of ω = 2 can be saved by about 28% and 32.5%, respectively.
175
6. Conclusions This study proposes a new symbol time synchronization algorithm and its design architecture that are based on the good correlation property of the complementary Golay sequence. The proposed method can be utilized for the training sequences with a repeated structure. Furthermore, by adopting the par-
180
allel design methodology, the proposed architecture can achieve the stringent high-throughput specification of the promising millimeter wave communication systems.
Acknowledgements This work is supported in part by the grant MOST 105-2221-E-006-019185
MY2, Taiwan. 13
Appendix A. Derivation of The Total Channel Variance To estimate the total channel variance Vh , it can be shown that, the sample PM −1 PN −1 2 average of received power is M1N m=0 n=0 |ym (n)| with its mean value " # M −1 N −1 1 X X 2 E |ym (n)| M N m=0 n=0 PL σ 2 + σ 2 = V + σ 2 , H h 1 w w l=0 h(l) = (A.1) σ2 , H 0
w
where H1 and H0 denote the hypotheses that the packet is present and not 190
present, respectively. The presence of signals is typically determined by the packet detection implemented using the energy detector. Therefore, the estimation of Vh can be determined by Vˆh =
M −1 N −1 1 X X 2 2 |ym (n)|H1 | − σ ˆw M N m=0 n=0
(A.2)
2 where ym (n)|H1 denotes the received signal ym (n) under H1 hypothesis, σ ˆw = P N1 −1 2 1 n=0 |y(n)|H0 | , and N1 denotes the duration of H0 . N1
195
References [1] IEEE, Standard for information technology–telecommunications and information exchange between systems–local and metropolitan area networks– specific requirements-part 11: Wireless lan medium access control (mac) and physical layer (phy) specifications amendment 3: Enhancements for
200
very high throughput in the 60 ghz band, IEEE Std. 802.11ad (2011) 1– 372doi:10.1109/IEEESTD.2011.6018236. [2] H. Steendam, M. Moeneclaey, Analysis and optimization of the performance of ofdm on frequency-selective time-selective fading channels, IEEE Trans. Commun. 47 (12) (1999) 1811–1819. doi:10.1109/26.809701.
205
[3] W. J. Shin, D. H. Kim, Y. H. You, Block-wise frequency offset estimation scheme in mimo-ofdm systems, AEU-Int. J. Electron. Commun. 66 (12) (2012) 979–984. doi:10.1016/j.aeue.2012.04.007. 14
[4] H. T. Hsieh, W. R. Wu, Maximum likelihood timing and carrier frequency offset estimation for ofdm systems with periodic preambles, IEEE 210
Trans. Vehicular Tech. 58 (8) (2009) 4224–4237. doi:10.1109/TVT.2009. 2019820. [5] W. L. Chin, Maximization of effective signal power in dct window for symbol time synchronization in optical fast ofdm, IEEE/OSA J. Lightw. Technol. 31 (5) (2013) 740–748. doi:10.1109/JLT.2012.2232642.
215
[6] A. Ishaque, G. Ascheid, Efficient map-based estimation and compensation of phase noise in mimo-ofdm receivers, AEU-Int. J. Electron. Commun. 67 (12) (2013) 1096–1106. doi:10.1016/j.aeue.2013.08.016. [7] J. Fang, E. P. Simon, M. Berbineau, M. Lienard, Joint channel and phase noise estimation in ofdm systems at very high speeds, AEU-Int. J. Electron.
220
Commun. 67 (4) (2013) 295–300. doi:10.1016/j.aeue.2012.09.002. [8] W. L. Chin, Blind symbol synchronization for ofdm systems using cyclic prefix in time-variant and long-echo fading channels, IEEE Trans. Vehicular Tech. 61 (1) (2012) 185–195. doi:10.1109/TVT.2011.2177502. [9] T. M. Schmidl, D. C. Cox, Robust frequency and timing synchronization
225
for ofdm, IEEE Trans. Commun. 45 (12) (1997) 1613–1621. doi:10.1109/ 26.650240. [10] M. J. L. Pan, T. C. Clancy, R. W. McGwier, Physical layer orthogonal frequency-division multiplexing acquisition and timing synchronization security, Wiley Wirel. Commun. Mob. Comput. 16 (2) (2016) 177–191.
230
doi:10.1002/wcm.2500. [11] W. Zou, Y. Hu, B. Li, Z. Zhou, Z. Cui, An improved exclusive region scheduling algorithm-based timeslot allocation scheme for mmwave wpans, Wiley Wirel. Commun. Mob. Comput. 14 (13) (2014) 1276–1286. doi: 10.1002/wcm.2231.
15
235
[12] M. J. E. Golay, Complementary series, IRE Trans. Inf. Theory IT-7 (2) (1961) 82–87. doi:10.1109/TIT.1961.1057620. [13] Y. Li, W. B. Chu, More golay sequences, IEEE Trans. Inf. Theory 51 (3) (2005) 1141–1145. doi:10.1109/TIT.2004.842775. [14] H. Y. Liu, C. Y. Lee, A low-complexity synchronizer for ofdm-based uwb
240
system, IEEE Trans. Circuits Syst. II, Exp. Briefs 53 (11) (2006) 1269– 1273. doi:10.1109/TCSII.2006.882804. [15] J. Y. Yu, C. C. Chung, C. Y. Lee, A symbol–rate timing synchronization method for low power wireless ofdm systems, IEEE Trans. Circuits Syst. II, Exp. Briefs 55 (9) (2008) 922–926. doi:10.1109/TCSII.2008.923405.
245
[16] A. Sadri, 802.15.3c usage model document (umd), IEEE Draft (2006) 1– 32doi:10.1109/TAP.2009.2030524. [17] M. Allie, R. Lyons, A root of less evil digital signal processing, IEEE Signal Process. Mag. 22 (2) (2005) 93–96. doi:10.1109/MSP.2005.1406500.
16