OQPSK signals

OQPSK signals

Microprocessors and Microsystems 32 (2008) 437–446 Contents lists available at ScienceDirect Microprocessors and Microsystems journal homepage: www...

637KB Sizes 0 Downloads 17 Views

Microprocessors and Microsystems 32 (2008) 437–446

Contents lists available at ScienceDirect

Microprocessors and Microsystems journal homepage: www.elsevier.com/locate/micpro

A programmable carrier phase independent symbol timing recovery circuit for QPSK/OQPSK signals Paolo Zicari, Emanuele Sciagura, Stefania Perri, Pasquale Corsonello * Department of Electronics, Computer Science and Systems, D.E.I.S. – University of Calabria, 42/C Via P.Bucci, I-87036 Arcavacata di Rende (CS), Italy

a r t i c l e

i n f o

Article history: Available online 2 July 2008

Keywords: Symbol timing recovery QPSK OQPSK Timing error detector Synchronization

a b s t r a c t This paper presents an efficient and optimized carrier phase independent programmable Symbol Timing Recovery (STR) circuit. The novel structure is highly versatile. In fact, it can be configured at runtime to work in different conditions. All BPSK, QPSK and OQPSK modulations are supported thanks to runtime variable control coefficients. This approach also provides flexibility in performances and support for different sampling rates. The proposed circuit is presented in a Digital PLL loop structure and it is designed according to the Software Defined Radio (SDR) philosophy, which requires ever more flexible communication solutions able to support different protocols and standards. High performances are reached by the proposed hardware implementation, moreover, flexibility is guaranteed by the configurable architecture. When implemented with a Xilinx XC4VLX60 FPGA chip, the new circuit reaches the maximum running frequency of 108.7 MHz, thus sustaining a symbol rate of 10 MSps when 10 samples per symbol are employed. Ó 2008 Elsevier B.V. All rights reserved.

1. Introduction Flexibility and adaptability are becoming the most important features for future telecommunication systems, owing to the rapid evolution of different standards and protocols in the world of communications. For this reason, in the last few years, the design of efficient hardware architectures able to support different standards, multiple radio protocols, enhanced feature and functional upgrades has received a great deal of attention. The ideal solution for completely flexible and versatile communication systems is represented by Software Defined Radio (SDR) systems, which aim to reduce the analog portion of the transceiver as much as possible, limiting it to only the front-end stage (antenna, band-pass filter and low noise amplifier), while migrating most of the involved signal processing towards digital implementations [1]. Re-programmability is guaranteed by the use of real time digital signal processing engines like DSPs (Digital Signal Processors), FPGA (Field Programmable Gate Array) accelerators [2] and general purpose processors. Nowadays, there are several difficulties in digitally processing Radio Frequency (RF) and Intermediate Frequency (IF) signals inside real time communication systems owing to the limited A/D and D/A converter performances in terms of jitter, resolution and sampling rate [1]. For this reason, feasible SDR systems are employed only for the implementation of the base-band (BB) processing stages and the digital down-conversion * Corresponding author. E-mail address: [email protected] (P. Corsonello). 0141-9331/$ - see front matter Ó 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.micpro.2008.06.001

from IF to BB. Moreover, according to the SDR Digital Signal Processing perspective, most of the transceiver functionalities should be realized by software solutions to support the desired flexibility. As a counterpart, poor performances are reached when compared with the hardware implementation. When highly computationally expensive operations need to be executed, an interesting compromise between flexibility and performance is achieved through configurable hardware architectures, which can support different standards with just a few input parameter changes. In this paper a new Symbol Timing Recovery (STR) circuit based on a Digital PLL (Phase Locked Loop) is presented. The main feature of the proposed architecture is its versatility. In fact, it guarantees the carrier phase independent timing synchronization of all BPSK, QPSK and OQPSK demodulated signals, thus offering a flexible hardware platform targeted for use in wireless applications such as satellite communications where high speed data transmissions are required. When realized within a Xilinx XC4VLX60 FPGA device, the proposed circuit occupies 243 slices and reaches a maximum running frequency of 108.7 MHz. A Symbol Rate of 10 MSps is sustained when 10 samples per symbol are employed in QPSK and OQPSK modulation schemes, by processing 14-bit in phase and 14-bit in quadrature input samples. The rest of the paper is organized as follows: in Section 2, the main characteristics of BPSK, QPSK and OQPSK are underlined; in Section 3, the proposal of a timing error estimation algorithm is given, which is suitable for BPSK, QPSK and OQPSK flexibility requirements; in Section 4, the proposed versatile Symbol Timing Recovery circuit is described; the Matlab/Simulink digital

438

P. Zicari et al. / Microprocessors and Microsystems 32 (2008) 437–446

the In-Phase Channel (I). The Tsym/2 offset in Q channel causes a reduction in maximum envelope variations, which may be translated into better spectral efficiency [5] and a lesser distortion of the OQPSK signal in comparison with the QPSK signal [6]. Fig. 1 shows the BPSK, QPSK and the OQPSK modulations for the input data stream ‘‘11000111”. It can be seen that, for the QPSK modulation the bit sequence is split into odd and even numbered bits, which are transmitted over the in phase and quadrature components. In the OQPSK, the alignment of the in phase and quadrature streams is offset by the duration of a bit period Tsym/2. The generic structure of an I–Q digital communication receiver is reported in Fig. 2. The Down Conversion block translates the received signal from the Intermediate Frequency (IF) to the Base-Band (BB) frequency, the Synchronization block executes the carrier recovery and the symbol timing recovery functions, while the Data Detection block includes a decision and eventually a decoding circuit for the generation of the received data bits. The Symbol Timing Recovery (STR) is one of the most critical functions in a digital synchronous communications receiver. After the Down Conversion process, the optimum symbol detection time needs to be extracted from the base-band, noisy in phase I and quadrature Q signal samples, in order to reduce the number

communication base-band model used to provide a complete modeling structure for prediction and performance estimation is introduced in Section 5; finally, in Section 6 the obtained results are discussed. 2. The BPSK, QPSK and OQPSK modulations Binary Phase Shift Keying (BPSK), Quadrature Phase Shift Keying (QPSK), and Offset Quadrature Phase Shift Keying (OQPSK) are three specific Phase Shift Keying modulation schemes widely used in wireless applications such as satellite communications and mobile communications [3,4]. The QPSK modulation can be considered as the combination of two orthogonal BPSK signals. For this reason it requires half the channel bandwidth for the same bit rate and provides the same error performances as the BPSK for a given transmitted energy per bit [4]. The OQPSK modulation is an improvement, in which the instantaneous phase change is limited to 90° instead of 180°. This provides a reduction of the amplitude envelope fluctuation and consequently better performances when transmitted over band-limited nonlinear channels. The OQPSK modulation scheme is very similar to the QPSK except for the fact that the Quadrature channel (Q) is delayed by Tsym/2 (where Tsym is the symbol time) compared to

Data stream = 11000111

BPSK 1

1

0

0

0

1

1

1

1

1

0

1

1

1

QPSK 1

0

Tsym

Quadrature

0

Tsym 2Tsym 3Tsym 4Tsym 5Tsym 6Tsym 7Tsym 8Tsym

Tsym 2Tsym 3Tsym 4Tsym 5Tsym 6Tsym 7Tsym 8Tsym

In phase

0

1

0

3Tsym

2Tsym

0

Tsym

1

1

1

3Tsym

0

Tsym

4Tsym

1

1

2Tsym

0

1

Tsym 11

QPSK In phase and Quadrature

3Tsym

2Tsym

0

4Tsym

1

1

3Tsym

2Tsym 00

4Tsym

4Tsym

01

Tsym

11

3Tsym

2Tsym

4Tsym

OQPSK In phase

1

0

0

Tsym

1

Quadrature

Tsym/2

3Tsym/2

1

1

2Tsym

3Tsym

4Tsym

0

1

1

5Tsym/2

7Tsym/2

0

Tsym

2Tsym

1

9Tsym/2

Tsym/2

0

0

3Tsym

4Tsym

1

1

5Tsym/2

3Tsym/2

11

1

00

7Tsym/2 01

9Tsym/2 11

OQPSK In phase and Quadrature Tsym Tsym/2 Fig. 1. BPSK, QPSK and OQPSK modulations.

3Tsym

2Tsym 3Tsym/2

5Tsym/2

4Tsym 7Tsym/2

9Tsym/2

439

P. Zicari et al. / Microprocessors and Microsystems 32 (2008) 437–446

Down Conversion

LPF IF

ADC

Data Detection

Synchronization I

Carrier Recovery Circuit

I

Decision and Decoding Circuits

NCO

π/2

Symbol Timing Recovery Circuit

Q LPF

Q

Received Data

Fixed Sampling Fig. 2. General structure of an I-Q digital communication receiver.

of symbol decision errors and consequently to ensure the best Bit Error Rate (BER) performances. Over the last few decades different solutions have been proposed in the literature for the STR design [7–18,26]. Feedback schemes use the timing error to control the feedback loop [7,9–16,26], whereas feedforward approaches employ the timing error to estimate directly the optimal signal sample near the maximum eye-open instant [8,17,18].

where {xn} and {yn} are the samples of the in phase and quadrature signal sequences, the index n represents the symbol number and the n  1/2 represents the intermediate sample between the n and the n  1 symbol. The carrier phase error h causes a rotation of the in phase and quadrature samples, as reported in (2), where un and vn are the samples affected by the phase error h.

3. The Timing Error Detection algorithms

In [11], Gardner demonstrates the carrier phase independence property of the algorithm for QPSK signals by verifying that the timing error e’, reported in (3), which is calculated for the samples un and vn affected by the carrier phase error, is equal to the timing error e calculated on the samples xn and yn free of any phase error, as reported in (4).

The Timing Error Detection (TED) function is the most critical operation executed by the Symbol Timing Recovery block for the timing synchronization. In fact, the block destined to the timing error calculus typically needs to be specialized for the specific modulation scheme adopted for the communication. Different timing error estimation algorithms have been proposed in the literature. As an example, the algorithms demonstrated in [9–14,25–26] are targeted to QPSK modulation schemes, whereas those proposed in [8,19] are specialized for the OQPSK modulation. In what follows, the BPSK modulation will be considered a particular condition of QPSK modulation, where only one from the in phase or quadrature channel is considered. The focus of the proposed work is the realization of a flexible STR circuit, suitable for supporting all BPSK, QPSK, and OQPSK modulation schemes. Moreover, the property of carrier phase offset independence needs to be satisfied, which guarantees better performances for the detection of signals characterized by low signal to noise ratios [7,8]. These specific requirements have led towards the research of a flexible TED architecture, which could be used for the different modulations. Initially, the BPSK/QPSK Gardner algorithm [11] was chosen as the best candidate Timing Error Detector to be hardware implemented thanks to its important features: (1) the error estimation calculus uses only two samples per symbol; (2) it guarantees the carrier phase offset independence; (3) it does not require complex mathematical operations thus making an efficient and fast hardware implementation approachable. Before executing the Gardner timing error detection for OQPSK signals, the realignment of the in phase and quadrature input samples is necessary. The in phase sample needs to be delayed by a bit period Tsym/2 due to the fact that the transmitted quadrature signal was previously delayed of Tsym/2 as shown in Fig. 1. Unfortunately, as demonstrated in the following, the Gardner algorithm loses the property of carrier phase independence when preceded by the realignment operation. The Gardner algorithm [11] calculates the QPSK timing error by applying Eq. (1)

en ¼ xn1=2 ðxn  xn1 Þ þ yn1=2 ðyn  yn1 Þ

ð1Þ

un ¼ xn cos #  yn sin # vn ¼ xn sin # þ yn cos #

ð2Þ

e0n ð#Þ ¼ un1=2 ðun  un1 Þ þ vn1=2 ðvn  vn1 Þ

ð3Þ

e0n ð#Þ ¼ en

ð4Þ

When OQPSK modulation is considered, the received quadrature y0n sample represents the original yn sample shifted by Tsym/ 2, as reported in (5).

ðx0n ; y0n Þ ¼ ðxn ; yn1=2 Þ

ð5Þ

In order to apply the Gardner Timing error detection algorithm to the OQPSK signals, the received in phase x0n and quadrature y0n samples need to be realigned of Tsym/2, as reported in (6), to be coherent with the QPSK Gardner calculus.

en OQPSK ¼ x0n1=2 ðx0n  x0n1 Þ þ y0n ðy0nþ1=2  y0n1=2 Þ ¼ xn1=2 ðxn  xn1 Þ þ yn1=2 ðyn  yn1 Þ

ð6Þ

The in phase and quadrature samples u0n and v0n , which are obtained when a carrier phase error h is considered, are reported in (7).

u0n ¼ x0n cos #—y0n sin # v0n ¼ x0n sin # þ y0n cos #

ð7Þ

Unfortunately, the Gardner Timing Error e0n OQPSK, reported in (8), calculated on the realigned u’ and v’ samples corrupted by the carrier phase error h is different from the error enOQPSK calculated in absence of phase error, as reported in (9), because the trigonometric simplifications are no longer possible. This results in the loss of the carrier phase independence property.

e0n OQPSKð#Þ ¼ u0n1=2 ðu0n  u0n1 Þ þ v0n1 ðv0n1=2  v0n3=2 Þ

ð8Þ

e0n OQPSKð#Þ–en OQPSK

ð9Þ

Consequently, to maintain the carrier phase independency property for all BPSK, QPSK and OQPSK modulations, a square and differentiate timing error recovery algorithm was considered. As the carrier phase error h term disappears from

440

P. Zicari et al. / Microprocessors and Microsystems 32 (2008) 437–446

the magnitude of the sampled input signal, the timing error estimation is carrier phase insensitive. The square and differentiate timing error recovery algorithm, proposed in [19] for the timing error detection of only OQPSK signals, has been taken in consideration because a complete characterization in terms of the S-curve representation demonstrated its effectiveness in detecting OQPSK signals. Moreover it does not require complex operations. The timing error estimation algorithm proposed in [19] considers a received OQPSK filtered base-band complex signal, expressed by (10)

" jh

rðtÞ ¼ e

þ1 X

aIk gðt  kT sym 

k¼1

þ1 X

sÞ þ j

aQk g

k¼1

 # T sym t  kT sym  s 2 ð10Þ

where h is the carrier phase error; aIk and aQk are the in-phase and quadrature terms of the kth symbol; g(t) = gT(t)  gR(t) is the impulse response of the pulse shaping transmitting and receiving filters; Tsym is the symbol period; s is the timing error; finally, the Tsym/2 term specifies the timing offset between the in phase and the quadrature samples. The input signal r(t) is supposed to be sampled every sample period Ts. The squared magnitude of the sampled input signal r(t), reported in (11), contains a spectral component at 1/Tsym, but it does not include the carrier phase error h term. For this reason it is recommended for being used in a carrier phase insensitive timing error estimation.

 þ1 X  sðnT s Þ ¼  aI gðnT s  kT sym  sÞ k¼1 k þ1 X

þj

aQk g



k¼1

T sym nT s  kT sym  s 2

2   

ð11Þ

While conventional TEDs extract the timing synchronization by adopting a narrow band filter centred on 1/Tsym or an analog PLL [10], a simple digital solution is represented by differentiating s(nTs) as shown in (12), and estimating the OQPSK timing error as reported in (13).

1 ½sððn þ 1ÞT s Þ  sððn  1ÞT s Þ 2T sym      T sym T sym  xðnT s Þ  x nT s  eOQPSK ðsÞ ¼ x nT s  4 2

xðnT s Þ ¼

ð12Þ ð13Þ

The presented work extends the validity of the algorithm proposed in [19] for only the OQPSK signals to the QPSK modulation, still maintaining the property of carrier phase independence. The received QPSK–OQPSK filtered base-band complex signal is now expressed more generally by (14),

" 0

r ðtÞ ¼ e

jh

þ1 X

aIk gðt

 kT sym  sÞ þ j

k¼1

þ1 X

# aQk gðt

 kT sym  X  sÞ

Fig. 3. The simulated S-curve for the QPSK modulation.

2   þ1 þ1 X X   s0 ðnT s Þ ¼  aIk gðnT s  kT sym  sÞ þ j aQk gðnT s  kT sym  X  sÞ  k¼1 k¼1 ð16Þ 1 ½s0 ððn þ 1ÞT s Þ  s0 ððn  1ÞT s Þ x ðnT s Þ ¼ 2T sym 0

ð17Þ

In this way, the OQPSK timing error estimation reported in (18) still remains the same as the one proposed in [19], moreover, the QPSK timing error estimation is now obtained by choosing different samples of the differentiated squared signal x0 (nTs), as reported in (19); more precisely the samples of the signal x0 (nTs) are considered at a distance of Tsym/2 from one each other, instead of Tsym/4.

     T sym T sym  x0 ðnT s Þ  x0 nT s  eOQPSK ðsÞ ¼ x0 nT s  4 2     T sym eQPSK ðsÞ ¼ x0 nT s   x0 ðnT s Þ  x0 ðnT s  T sym Þ 2

ð18Þ ð19Þ

The Matlab simulated S-curve for the QPSK modulation, performed in absence of noise and with a roll-off factor a = 0.5 of the raised cosine filter impulse response, is reported in Fig. 3; it maintains the same sinusoidal shape as the OQPSK S-curve reported in [19], but with a period of Tsym instead of Tsym/2. 4. The Novel Symbol Timing recovery architecture The proposed work has focused on the realization of a flexible STR circuit, suitable for supporting all BPSK, QPSK, and OQPSK modulation schemes, without the requirement of a prior acquisition of carrier phase. The proposed STR architecture is based on a typical feedback Symbol Timing Recovery loop scheme consisting

k¼1

ð14Þ where the Tsym/2 term is now replaced by X which depends on the adopted modulation as specified in (15).

(



0 if QPSK T sym 2

if OQPSK

r

Timing Error Detector

e

fe Loop Filter

Synch_strobe NCO

ð15Þ

The squared magnitude of the sampled input signal r0 (t) is reported in (16), while its differentiation is reported in (17).

Fig. 4. The feedback symbol timing recovery loop.

441

P. Zicari et al. / Microprocessors and Microsystems 32 (2008) 437–446

of a Timing Error Detector block, a Loop Filter and a Numerically Controlled Oscillator (NCO), as shown in Fig. 4. The Timing Error Detector block determines the timing error e as the phase difference between the reconstructed symbol clock synch_strobe and the input r data signal. The estimated phase error e is filtered by the Loop Filter block in order to reduce the variance of the timing error and the static timing offset. The filtered phase error fe signal is inputted to the Numerically Controlled Oscillator, which changes opportunely the phase of the synchronism signal synch_strobe used for the right symbol detection.

Z-1

fe

MSB

Accumulation Register

Tsys T ' sym Fig. 7. NCO scheme.

4.1. The Timing Error Detector A flexible TED architecture, carrier phase independent, which is suitable for both QPSK and OQPSK modulations, is here proposed. It uses the square and differentiate algorithm proposed in the previous section. The block diagram of the realized TED architecture is shown in Fig. 5. By opportunely setting the Delay Registers it is possible to process differently QPSK or OQPSK demodulated signals. In accordance with (19), when a QPSK signal is inputted to the TED, the two Delay Registers are enabled by the en_ted signal every Tsym/2. On the contrary, when the TED is used for the OQPSK modulation, the registers are enabled every Tsym/4, according to Eq. (18). 4.2. The Loop Filter The Loop Filter of the proposed STR architecture is reported in Fig. 6. It is a proportional-plus-integral filter, the z-transfer function is reported in (20), where KP and KI are respectively the proportional and integral coefficients. These coefficients need to be properly set to obtain the desired speed and stability properties of the filter.

FðzÞ ¼ K P þ

KI 1z1

ð20Þ

4.3. The Numerically Controlled Oscillator The proposed Numerically Controlled Oscillator calculates the instantaneous symbol period T 0sym as reported in (21), where Ts is the sample period and Tsym is the nominal symbol period.

T0sym ¼

Ts þ fe

Ts T sym

ð21Þ

Fig. 7 shows the NCO block diagram. The number of samples per T c and at every Ts period the cyclic symbol is equal to the ratio b Tsym s s þ fe value. The accumulation register is incremented by the T Tsym most significant bit of the accumulation register is used as the symbol strobe and it becomes actively high when the accumulated value exceeds the normalized nominal symbol period. As the accumulation register is not reset after each symbol period, the residual interval from the previous symbol period detection is taken into account for the following one. 5. Evaluation model and simulations

In Phase r(t)

s(nTs) Ts

Ts

x(nTs)

Quadrature

x(nTs)

Delay Register

e(τ)

Delay Register

en_ted Fig. 5. The proposed TED architecture.

KP

fe

e KI

Z-1

Fig. 6. The digital Loop Filter scheme.

In order to evaluate the performances of the proposed STR scheme for the different modulations, a simulation environment was set using the MATLAB/SIMULINK software tools. Fig. 8 shows the block diagram of the simulation system, which represents an M-PSK/OQPSK base-band modem structure. A random symbol sequence whose values fall within the (0, M  1) range is generated by a Random Symbol Generator and sent to the M-PSK/OQPSK base-band modulator. The Base-band Modulator block modulates the input symbols using the M-ary phase shift keying method providing in output a base-band representation of the modulated signal, where M represents the number of points in the signal constellation. Moreover, the modulator provides a Td delay to the quadrature channel, where Td can assume two possible values, 0 for BPSK and QPSK or Tsym/2 when an OQPSK modulation is tested. The in phase and quadrature signals coming from the modulator are band-limited by a Tx filter with a gT(t) impulse response. The AWGN channel block adds white Gaussian noise with respect to an input specified SNR level, and the RX filter block shapes the received in phase and quadrature signals inputted to the Symbol Timing Recovery block to be tested. The Symbol Decision Logic block exploits the synchronism signal Synch strobe generated by the STR block to detect the right output symbol data, which are compared with the input data coming from the Random Symbol Generator in the Symbol Error Rate Computation block for the Symbol Error Rate (SER) estimation. Two Square Root Raised Cosine Filters (SRRCF) h(t) have been used for the tx and rx filters, as the SRRCF impulse response h(t) reported in (22) is characterized by the property that the convolution g(t) = gT(t)  gR(t) = h(t)  h(t) is approximately equal to the impulse response of the normal raised cosine filter reported in (23).

442

P. Zicari et al. / Microprocessors and Microsystems 32 (2008) 437–446

M

Td

M-PSK OQPSK Baseband Modulator

Random Symbol Generator

SNR

gT(t) TX filter

ejθ

AWGN Channel

gR(t) RX filter I

Generated Symbol

Symbol Error Rate Computation

SER

Detected Symbol

Q

Symbol Decision Logic

Synch strobe

Symbol Timing Recovery

Fig. 8. The M-PSK/OQPSK base-band modem structure simulation model.

Fig. 9. The constellation maps for h = 0 and h = p/8.

aÞpt=TÞ cosðð1 þ aÞpt=TÞ þ sinðð1 ð4at=TÞ pffiffiffi p T ð1  ð4at=TÞ2 Þ sinðpt=TÞ cosðpat=TÞ gðtÞ ¼  ðpt=TÞ ð1  4a2 t2 =T 2 Þ

hðtÞ ¼ 4a

ð22Þ ð23Þ

In order to test the carrier phase error independence property of the TED, the rotation of the constellation map produced by a carrier phase error was simulated by introducing an input phase offset ejh, which is multiplied with the received signal coming from the AWGN channel, as shown in Fig. 8. Fig. 9b shows the rotation of the constellation map caused by the introduced phase error h. Fig. 10 reports the SER vs the Signal to Noise Ratio Es/No (dB) graph for the QPSK and the OQPSK base-band Matlab/Simulink model of Fig. 8. The IDEAL SER curve was added to the graph, it represents the symbol error rate evaluated when the Symbol Decision Logic is directly controlled by the Tsymbol Generator, the same one which synchronizes also the Random Symbol Generator, as shown in Fig. 11. In this case the symbol error rate depends on only the SNR level introduced in the test and it is not influenced by the estimation of the timing error. For the simulation test, a roll-off factor a = 0.5 was considered for the filters, 50,000 symbols were generated with a uniform random distribution; 10 samples per symbol were used for the timing synchronization. Preliminary simulation tests were performed to find the

Fig. 10. SER vs. Es/No (dB) for QPSK–OQPSK modulation schemes.

443

P. Zicari et al. / Microprocessors and Microsystems 32 (2008) 437–446

M

Random Symbol Generator

Tb

ejθ

SNR

M-PSK OQPSK Baseband Modulator

gT(t) TX filter

AWGN Channel

gR(t) RX filter I

Q Symbol Decision Logic

Symbol Error Rate Computation Tsym

IDEAL SER

Tsymbol Generator

Fig. 11. The IDEAL SER calculation scheme.

STR CIRCUIT DECISION LOGIC

OUT_I D ce

IN_I(14)

D-FF clk D ce

IN_Q(14)

D-FF clk

OUT_Q

en_ted

Sampling stage

en_lpf IN_I

14

e

IN_Q

en_out

fe

TED

LOOP FILTER 20

NCO 34

14

clk reset en_recovery

Ki_filter Control Register

reg_address write_enable

Kp_filter Control Register

NCO Control Register

enable_3 Address Decoder

enable_2 enable_1

data_in PROGRAMMABLE CONTROL LOGIC

Fig. 12. The STR top level architecture.

filter coefficients which guarantee the best performances in terms of SER and speed in the tested conditions. Moreover, it was possible to demonstrate that the approximation of the inte-

gral and proportional filter coefficients to the nearest power of two numbers does not cause significant changes in the filtering functionality inside the whole STR system.

444

P. Zicari et al. / Microprocessors and Microsystems 32 (2008) 437–446

TED block. For this reason the en_lpf signal is the same as the en_out signal. Fig. 13 shows the implemented NCO circuit. At each clock cycle, the quantity W coming from the Programmable Control Logic block is summed with the filtered error fe, and accumulated in the Acc_Register. W is set by the control block to Ts/Tsym for the QPSK modulation, and 2  Ts/Tsym for the OQPSK modulation; in this way it is possible to generate the different strobe enable signals en_out and en_ted accordingly to the right modulation. The en_out output signal triggers the positive edge of the most significant bit of the Acc_Register, while the en_ted signal triggers the following bit. The dimensions of the adders and registers were properly chosen to avoid loss in precision during the calculus. Fig. 14 shows the implemented Loop Filter circuits. To increase the speed performances and to reduce the area occupancy, an optimized version of the Loop Filter was implemented, where, the multipliers used in the circuit of Fig. 14a have been replaced by the two Barrel Shifter modules reported in Fig. 14b, which execute multiplications by the power of two approximation of the integral and proportional filter coefficients. The full programmability of the STR circuit is ensured by the Programmable Control Logic block. It is structured as a set of registers which control the filter coefficients and the NCO block. The registers can be set at runtime to support evolving working conditions. The setting of a control register is executed by specifying its address through the reg_address input port and enabling the write operation of the new inputted data_in value through the activation of the write_enable input signal. The Ki and Kp Filter Control Registers store respectively the integral and proportional filter coefficients, which are used for a fine runtime adjustment of the speed and the loop acquisition time. The NCO Control Register sets the W input to the NCO block, to support both the QPSK and OQPSK s . modulations and different sampling frequencies TTsym Two D-type flip-flops with clock enable are used to detect the symbol represented by the OUT_I and OUT_Q output logic bit, which are the sign bit of the IN_I and IN_Q input signals sampled

6. Hardware design and implementation results The complete architecture of the implemented STR circuit is depicted in Fig. 12. At each system clock (clk) cycle, the STR circuit processes one couple of the 14-bit in phase IN_I and quadrature IN_Q input samples. The NCO block provides the right strobe signals to the other blocks for the best symbol detection. In more detail, the en_out output signal identifies the selected input samples to be used by the Decision Logic block for the optimum symbol detection. The en_out signal is activated every symbol period Tsym for the QPSK modulation, while it is activated every Tsym/2 for the OQPSK. In fact, in this case, the in phase and in quadrature samples need to be selected at a time distance of half a symbol period one from each other, as described in Fig. 1. The enable en_ted signal is used by the TED block for the timing error estimation. The en_ted signal is activated every Tsym/2 according to the QPSK error estimation reported in (8), while it is activated every Tsym/4 for the OQPSK modulation, as specified in (7). The en_lpf signal enables the clk input to the accumulation register inside the Loop Filter so that the error filtering is synchronized with the error calculation in the

accum(33:0) Acc_Register

fe

34 34

en_out

accum(33)

accum(32) en_ted

sum (32)

W

clk Fig. 13. The implemented NCO circuit.

a

Kp 15 34 15

Reg

Reg

Ki e 19

34

34

fe 34

CE

CE

Reg

en_lpf

CE

clk

b

Barrel Shifter

Reg

Reg

e 19

34

Barrel Shifter

34

34

CE

CE

4

4 Reg

en_lpf

CE

clk

Kp

Ki

Fig. 14. The implemented Loop Filter circuits with: multipliers (a); barrel shifters (b).

fe 34

445

P. Zicari et al. / Microprocessors and Microsystems 32 (2008) 437–446

at the optimum clocking time specified by the en_out strobe signal. At the beginning of the symbol timing recovery, the global reset input signal needs to be set for the initialization of all the internal registers, while the en_recovery input signal is used to enable the recovery functionality. Different implementations of the proposed STR circuit were carried out using the Xilinx Virtex-4 XC4VLX60 FPGA chip. Higher speed performances are reached by the implemented solution which exploits barrel shifters in the Loop Filter module and the DSP48 slices specific resources in the TED block. DSP48 slices are introduced by Xilinx into the FPGA devices starting from the Virtex-4 family to support high speed DSP functionalities based on multiplication and addition arithmetic operations; they include an 18-bit by 18-bit two’s complement multiplier and a three-input 48-bit adder/subtracter. The solution optimized for the specific Virtex-4 FPGA chip uses 3 DSP48 slices for the implementation of the multipliers present in the TED circuit reported in Fig. 5, 243 CLB slices are occupied by the rest of the circuit, and the maximum running frequency of 108.7 MHz is reached. The STR circuit was

also implemented with the multipliers calculating the proportional and the integral terms in the Loop Filter block; in this case 5 DSP48 slices and 207 CLB slices are employed by reaching a maximum running frequency of 108.4 MHz. For the sake of completeness, more generic implementations were carried out which avoid the use of FPGA chip specific resources. The LUT implementation of the multipliers resulted in an increased area occupancy and decreased speed performances (601 slices and 98.7 MHz for the solution which uses barrel shifters, 842 slices and 92.8 MHz for the LUT-based multiplier version). Table 1 summarizes the implementation results in terms of area resources, maximum clock frequency and the dissipated power. A comparison of the presented STR circuit with the other FPGA solutions present in the literature is reported in Tables 2 and 3. In more detail, Table 2 reports the comparison of the hardware characterizations, while in Table 3 are reported the performances. The FPGA solutions presented in [9,24] are not reported in the comparison because of the lack of implementation details. For the sake of completeness, three implementations of the Early-Late

Table 1 Implementation results STR

Resources

Solution (1) (Loop Solution (2) (Loop Solution (3) (Loop Solution (4) (Loop multipliers)

Filter Filter Filter Filter

uses uses uses uses

barrel shifters, TED uses DSP multipliers) DSP multipliers, TED uses DSP multipliers) barrel shifter, TED uses LUT-based multipliers) LUT-based multipliers, TED uses LUT-based

Area (slices)

LUTs

FFs

Special resources

F max (Mhz)

Power @ 90 MHz (mW)

243 207 601 842

400 307 1101 1540

208 230 208 237

3 Dsp48s 5 Dsp48s no no

108.7 108.4 98.7 92.8

456.96 457.22 495.1 494.94

Table 2 Hardware comparison STR

FPGA

Area (slices)

Embedded multipliers

f (MHz)

TED algorithm

Modulation

[20] [21] [25] [22] [23] [23] New New New New

XC2V1000 XCV100E Altera EP1S25F780C5 XC2V1000 XC2VP7 XC2VP7 XC4VLX60 XC4VLX60 XC4VLX60 XC4VLX60

84 719 LUT 616 FF NA 719 138 404 243 207 601 842

No No NA 18 2 No 3 5 No No

91 48.7 72 91 106 77.6 108.7 108.4 98.7 92.8

Early-late Early-late Early-late Gardner Gardner Gardner Square and Square and Square and Square and

BPSK BPSK QPSK BPSK, BPSK, BPSK, BPSK, BPSK, BPSK, BPSK,

(1) (2) (3) (4)

differentiate differentiate differentiate differentiate

QPSK QPSK QPSK QPSK, QPSK, QPSK, QPSK,

OQPSK OQPSK OQPSK OQPSK

Table 3 Performance comparison STR

f (MHz)

Symbol rate (MSps)

#Samples/symbol

Digital input data sample precision (# bit)

SER

ES/N0 (dB)

Modulation

[20] [21] [25] [22] [22] [22] [23] [23] New New New New New New New New

91 48.7 18 91 91 91 106 77.6 108.7 108.7 108.4 108.4 98.7 98.7 92.8 92.8

45 NA 1 45 32 8 10 7.7 10 10 10 10 9 9 9 9

2 NA 2 2 3 11 10 10 10 10 10 10 10 10 10 10

10 NA 10 8 8 8 14 14 14 14 14 14 14 14 14 14

NA NA NA 169  10e  4 128  10e  4 37  10e  4 1.4  10e  4 1.4  10e  4 2.6  10e  4 18.6  10e  4 2.6  10e  4 18.6  10e  4 2.6  10e  4 18.6  10e  4 2.6  10e  4 18.6  10e  4

NA NA NA N0 = 0 N0 = 0 N0 = 0 12 12 12 12 12 12 12 12 12 12

BPSK BPSK QPSK QPSK QPSK QPSK QPSK QPSK QPSK, OQPSK QPSK OQPSK QPSK OQPSK QPSK OQPSK

(1) (1) (2) (2) (3) (3) (4) (4)

446

P. Zicari et al. / Microprocessors and Microsystems 32 (2008) 437–446

gate based STR circuits [20,21,25] are also reported. As the latter proposes an entire QPSK receiver with also the carrier recovery circuit, it has not been possible to extrapolate the hardware resources used for the only symbol timing recovery circuit to be included in the comparative Table 2. It is worth pointing out that the solution proposed here is the most flexible and versatile implementation, because it is the only hardware solution which supports all the BPSK, QPSK and OQPSK modulations; on the contrary, solutions reported in [20,21] consider only BPSK signals, by processing just one of the in phase and quadrature QPSK components, while solutions in [22,23,25] are not targeted for the OQPSK modulation. Moreover, the novel solution guarantees this flexibility with the counterpart of just a limited extra area occupancy compared to the other solutions. The two most important parameters which characterize the performances of a timing synchronizer are the Symbol Error Rate (SER) and the Symbol Rate; in fact the SER measures the quality of the received data by counting the erroneous detected symbols from a noisy received signal, while the symbol rate represents the speed performances. For a detailed comparison of the performances of different STR implementations, the SER and the Symbol Rate need to be considered together with the other parameters, reported in Table 3, to which are strictly related: Signal to Noise Ratio, maximum working frequency of the implemented circuit, the number of used samples per symbol, the digital input data precision. In fact, the symbol rate is determined by both the maximum frequency of the implemented circuit, and the number of samples used per symbol. By using more samples per symbol a more accurate symbol detection is performed thus lowering the SER, despite the speed rate decreasing. The digital input data precision of the input sampled data should also be mentioned, which involves different precision in the timing error computation thus influencing the SER; higher data precision provides better SER quality performances, but reduces the speed performances owing to more complex circuits and worse critical data paths. Good performances are reached by the proposed solution; a SER of 2.6  10e  4 and 18.6  10e  4, respectively for the QPSK and OQPSK modulations, are reached in the following test condition: a signal to noise ratio of 12 dB; two 2’s complement 14-bit integer values for the in phase and quadrature signals are inputted to the synchronizer; 10 samples are employed for each symbol detected. In these conditions, a symbol rate of 10 MSps is reached by the solution optimized for the specific Virtex-4 FPGA, while a rate of 9 MSps is obtained by the solution, which does not use the DSP specific resources. From Table 3 it is possible to notice that similar symbol rates are reached by the different compared solutions when the same number of samples per symbol is considered, while much better performances in terms of SER are reached by the novel implementation compared to the solution proposed in [22], considering that the latter is tested without any added noise. 7. Conclusions In this paper a new highly versatile programmable Symbol Timing Recovery circuit for BPSK, QPSK and OQPSK modulations is proposed, moreover, the property of Carrier Phase Independence is guaranteed. A Matlab/Simulink model was designed in order to characterize the architecture in terms of implementation loss in the presence of an AWGN channel. The circuit was designed in VHDL, and implemented on a Xilinx Virtex-4 XC4VLX60 device. Its versatility was tested by changing the modulation at runtime, the Loop Filter coefficients and the number of samples per symbol;

a processing rate of 10 MSps is reached for both QPSK and OQPSK modulations, when a sample frequency of 10 Samples per Symbol is considered. References [1] E. Buracchini, The Software Radio Concept, IEEE Communications Magazine 38 (September) (2000) 138–143. [2] C. Dick, F. Harris, M. Rice, FPGA implementation of carrier synchronization for QAK receivers, Journal of VLSI Signal Processing 36 (January) (2004) 57–71. [3] J.G. Proakis, Digital Communications, McGraw-Hill, New York, 1989. [4] L. W- Couch II, Digital and Analog Communications Systems, Fourth ed., Macmillan, P.C., New York, 1993. [5] J.D. Oetting, A comparison of modulation techniques for digital radio, IEEE Transactions on Communication COM-27 (December) (1979) 1752–1762. [6] R. Raich, Zhou, G.T., Analyzing spectral regrowth of QPSK and OQPSK signals, in: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP ‘01), vol. 4, May 2001, pp. 2673–2676. [7] W.G. Cowley, L.P. Sabel, The performance of two symbol timing recovery algorithms for PSK demodulators, IEEE IEEE Transactions on Communication COM-42 (June) (1994) 423–429. [8] A.D’. Amico et al., Feedforward joint phase and timing estimation with OQPSK modulation, IEEE Transactions on Vehicular Technology 48 (May) (1999) 824– 832. [9] C. Dick, F. Harris, M. Rice, Synchronization in software radios–carrier and timing recovery using FPGAs, in: 2000 IEEE Symposium on FieldProgrammable Custom Computing Machines, 17–19, April, 2000, pp. 195–204. [10] M. Oerder, H. Meyr, Digital filter and square timing recovery, IEEE IEEE Transactions on Communication 36 (May) (1988) 605–611. [11] F.M. Gardner, A BPSK/QPSK timing-error detector for sampled receivers, IEEE Transactions on Communication COM-34 (May) (1986) 423–429. [12] D. Verdin, T.C. Tozer, Symbol-timing recovery for M-PSK modulation schemes using the signum function, IEE Colloquium on New Synchronisation Techniques for Radio Systems (1995). 27 November, 2/1–2/7. [13] D.R. Judd et al., Data synchronization simulation using the MATHWORKS communication toolbox, IEEE International Conference on Communications 2 (1996) 706–710. 23–27 June. [14] D. Lim, A modified gardner detector for symbol timing recovery of M-PSK signals, IEEE Transaction on Communications 52 (October) (2004) 1643–1647. [15] M. Rahnema, Symbol timing recovery algorithms and their evaluation for burst communication, International Journal of Wireless Information Networks 5 (4) (1998). [16] U. Lambrette, K. Langhammer, H. Meyr, An aliasing-free receiver with variable sample rate digital feedback M/T NDA timing synchronization, Wireless Personal Communications I 8 (2) (1998) 165–183. September. [17] N. Doan Vo, T. Le-Ngoc, Maximim likelihood (ML) symbol timing recovery (STR) techniques for reconfigurable PAM and QAM modems, Wireless Personal Communications 41 (3) (2007) 379–391. May. [18] A. Gesell, J.B. Huber, B. Lankl, G. Sebald, Data-aided symbol timing estimation for linear modulation, AEUE – International Journal of Electronics and Communications 56 (5) (2002) 303–311. [19] H. Lee, M. Kim, Timing error detector for OQPSK signal, IEEE 62nd Vehicular Technology Conference, vol. 3, 25–28 September 2005, pp. 25–28. [20] P. Zicari, P. Corsonello, S. Perri, An efficient circuit for bit-detection and timing recovery using FPGAs, in: Proceedings of the 13th IEEE International Conference on Electronics, Circuits and Systems – ICECS’06, Nice, France, December 10–13, 2006. [21] B. Cerato, L. Colazzo, M. Martina, A. Molino, F. Vacca, Parametric FPGA earlylate DLL implementation for a UMTS receiver, in: Conference on Signals, Systems and Computers, Conference Record of the 36th Asilo-mar, 3–6 November, 2002. [22] Z. Jian, W. Nan, K. Jingming, W. Hua, High speed all digital symbol timing recovery based on FPGA, in: Proceedings of the International Conference on Wireless Communications, Networking and Mobile Computing, vol. 2, 23–26, September, 2005, pp. 1402–1405. [23] E. Sciagura, P. Zicari, S. Perri, P. Corsonello, An efficient and optimized FPGA feedback M-PSK symbol timing recovery architecture based on gardner timing error detector, in: Proceedings of the EUROMICRO IEEE International Conference on Digital System Design, August 2007, pp. 102–108. [24] H. Wei, S. Ming, Z. Hui, Optimal design and simulation for multi-rate symbol timing recovery in software radio QPSK demodulation, in Third International Conference on Computational Electromagnetics and its Applications Proceedings, 2004, pp. 312–315. [25] J. K. Hwang, C. H. Chu, FPGA implementation of an all-digital T/2-spaced QPSK receiver with farrow interpolation timing synchronizer and recursive costas loop, in: IEEE Asia-Pacific Conference on Advanced System Integrated Circuits (AP-ASIC2004), August 4–5, 2004, pp. 248–251. [26] Y. Linn, A self-normalizing symbol synchronization lock detector for QPSK and BPSK, IEEE Transactions on Wireless Communications 5 (2) (2006). February.