A reduced complexity multipulse compression system

A reduced complexity multipulse compression system

Speech Communication 10 (1991) 171-178 North-Holland i71 A reduced complexity multipulse compression system S . G . B a k a m i d i s , N . A . G l ...

504KB Sizes 4 Downloads 154 Views

Speech Communication 10 (1991) 171-178 North-Holland

i71

A reduced complexity multipulse compression system S . G . B a k a m i d i s , N . A . G l a r o s a n d G. C a r a y a n n i s National Technical UniversiO, o f Athens, 157 73 Zografou, Athens, Greece

Received 24 August 1989 Revised 26 October 1990

Abstract. A reduced complexity multipulse coder is proposed in this paper. A 33% complexity reduction affects the overall speech output quality only slightly as is demonstrated in both informal listening and SNR computation. On the other hand. it makes possible the implementation of the coder with cheap commercial signal processing chips. The TMS32020 has been used for real time implementation of the proposed algorithm at 16 kbit/s with only fixed point computations. The method studied here can be considered as an alternative to the regular pulse excitation technique. Zusammenfassung. Ein Kodierer mit mehrfacher Pulserregung von geringer Komplexit~itwird vorgestellt. Eine Reduzierung von 33% der Komplexit~it wirkt sich nur gering auf die Sprachqualit~it aus wie eine Berechnung des Signal-Rauschabstandes sowie ungeeichte HOrtests belegen. Andererseits wird so die Ausfiihrung des Kodierers mit Hilfe von billigen, kommerziell verfiigbaren Chips m6glich. Der TMS32020 wurde for eine 16 kbit/s Ausfiihrung des Kodierers in Realzeit mit Fixkomma verwendet. Die bier vorgestellte Methode kann als eine Alternative zur herkOmmlichen Mehrfachpulserregung betrachtet werden. R~sum~. Nous proposons un codeur 5_ multi-impulsions de complexit6 r6duite. Une reduction de la complexit6 de 33% affecte tres peu la qualitd de la parole produite comme nous le montrons sur base d'un calcul du rapport signal sur bruit et sur base d'une 6valuation auditive informelle. Elle permet, par contre, l'implantation du codeur b~ l'aide de circuits commerciaux peu coOteux. Le TMS32020 a 6t6 utilisd pour une implantation de I'algorithme en temps reel /a 16 kbit/s en virgule fixe. La mdthode 6tudide ici peut atre considdr6e comme une alternative 5_la technique multi-pulse conventionnelle.

Keywords. Speech compression, fast techniques, multipulse excitation, fixed-point arithmetic.

1. Introduction M a n y h y b r i d t e c h n i q u e s exist for m e d i u m r a t e s p e e c h c o d i n g ( J a y a n t a n d Noll, 1984; S l u y t e r , 1983/84), o n e o f the m o s t efficient is the multipulse e x c i t e d c o d e r first p r o p o s e d by A t a l a n d R e m d e (1982). This c o d e r uses as e x c i t a t i o n w a v e f o r m a small n u m b e r of pulses. D u r i n g the analysis p r o c e d u r e the a m p l i t u d e s a n d l o c a t i o n s of the e x c i t a t i o n p u l s e s are o b t a i n e d so as to minimize a perceptually weighted error. Since the i n t r o d u c t i o n of the original multipulse e x c i t a t i o n c o n c e p t m a n y d i f f e r e n t v a r i a n t s h a v e b e e n p r o p o s e d ( S e n e n s i e b et al., 1984: Singhal a n d B.S. A t a l , 1984; L e f e v r e a n d P a s s i e n , 1985: K r o o n a n d D e p r e t t e r e , 1984: Singhal a n d A t a l , 1989) i m p r o v i n g b o t h c o m p u t a t i o n a l c o m p l e x i t y a n d / o r the s p e e c h q u a l i t y at the low e n d

of the m e d i u m r a t e b a n d . In this p a p e r a faster t e c h n i q u e is p r o p o s e d . This t e c h n i q u e lies bet w e e n the original m u l t i p u l s e e x c i t a t i o n a n d the r e g u l a r pulse e x c i t a t i o n (P. K r o o n et al., 1986). In the first t e c h n i q u e the pulses are p l a c e d anyw h e r e in the analysis f r a m e , while in the r e g u l a r pulse e x c i t a t i o n the p o s i t i o n s of the pulses are fixed a n d the a m p l i t u d e s are o p t i m a l l y d e t e r m i n e d by a l e a s t - s q u a r e s analysis by synthesis p r o c e d u r e . In the p r o p o s e d t e c h n i q u e a mild restriction is a p p l i e d to the pulse p o s i t i o n s l e a d i n g to a significant cut in the c o m p u t a t i o n a l c o m p l e x i t y w i t h o u t any n o t i c e a b l e loss in s p e e c h quality w h e n c o m p a r e d to the basic m u l t i p u l s e a p p r o a c h . C o m p u t e r s i m u l a t i o n results and i n f o r m a l listening assure the efficiency of the m e t h o d . In Section 2 the new r e d u c e d c o m p l e x i t y multipulse a l g o r i t h m is p r e s e n t e d a n d the r e d u c t i o n

0167-6393/91/$03.50 © 1991 - E l s e v i e r Science P u b l i s h e r s B . V . ( N o r t h - H o l l a n d )

S.G. Bakamidh' et al. / A reduced complexi O' multipulse compression system

i72

in both computational load and bit rate is considered. In Section 3 the behavior of the new algorithm is compared to the classical algorithm in terms of SNRseg and subjective tests. Section 4 refers to some hardware implementation issues.

in the formant regions where it is less audible (Atal and Remde, 1982). Its transfer function W(z) is given by

w(:)

A(z) :

AO,z) P

2. Reduced complexity multipulse algorithm One way to reduce the computational load of the multipulse algorithm is to impose restrictions on the positioning of the multipulse sequence. Here, the restriction of positioning the pulses only at even or odd places across the search interval is proposed. This restriction is based on two observations. First the maxima of the crosscorrelation are not very sharp. This means that the value of the previous or the next position to that of a maximum value does not differ too much. Secondly the iterative method for pulse search is such that a pulse can partially correct errors due to incorrect placing of previous pulses. Also it is clear that there is a 50% chance of placing the pulses correctly as in the original algorithm. This is especially true for the first pulses placed mainly to pitch period onsets for voiced sounds. A block diagram of the basic multipulse coder structure is given in Figure 1. The weighted error e(n) is obtained by filtering the difference between the original signal and the synthetic signal. The multipulse excitation is obtained by minimizing e(n). The synthesis filter is fed with the multipulse excitation to produce the synthetic signal which is compared to the original signal to produce the error signal e'(n). The aim of the error weighting filter is to concentrate the coding noise

LPC

s(n) ~(n)~

Synthesis filter

1

(1)

P

1-

~ akykZ k k=l

where V e [0,1]. Let us assume that the analysis is performed over a speech segment of N samples and it is desired to place M pulses in that segment, with the restriction that the pulses are placed only on odd, or even positions. So, the ith pulse is placed at position ni with amplitude gi, where ni = 22 or n i = 22 + 1, 2 e Z. Thus the excitation signal u(n) can be written as M

u(n) = ~ giO(n - ni),

(2)

i=1

where b(n) is the unit pulse signal 1,

ifn = 0 ,

0,

else.

d(n) =

(3)

After m pulses have been placed, the resulting weighted error e(n) can be written as (Lefevre and Passien, 1985) M

el'"l(n) = r(n) * h'(n) - ~ g~h'(n - ni),

(4)

i-1

filter

el'"l(n) = el''' - q ( n ) - gmh'(n - nm).

minimization procedure

(5)

Minimization of the quantity N

Fig. 1. S i m p l i f i e d b l o c k d i a g r a m o f t h e b a s i c m u l t i p u l s e c o d e r . Speech Communication

k

=

Error

weighting

Error



~ a~z -k

where r(n) is the LP residual, h'(n) is the convolution sum of the impulse response of the synthesis filter and the impulse response of the error weighting filter. The above relation leads to an iterative form of the error eIml(n)

I e(n)

l

Pulse Generator

1-

E(ml =

(elml(n)) 2

~ n-

I

N

(e ['t q(n) - gmh'(n - n m ) ) 2 t1=1

(6)

S.G. Bakamidis et al. / A reduced complexity multipulse compression system and

with respect to gm gives

tirol(k) = t [''~ - l ] ( k ) - g,,r],h(In,, , - kl) , In,,,-kl<~nt, k = 2 2 or k = 2 2 + 1 ,

N

e I''' - l l ( n ) h '(n - n,,,) g,,, =

173

n=l

(7)

(14)

N E

h'(n

n-

n,,,) 2

-

1

and E['"l(k) is minimized for k = n .... n,,, = 22 or n,,, = 22 + 1,2 e Z, which maximizes the quantity (Lefevre and Passien, 1985) y~ e l ' " -

p"l(k)

~l(n)h'(,,

- k)

= L" = '

[h'{n

-

1-

(8)

k)] e

ti- [

t I'''l can be recursively calculated by the formula (Lefevre and Passien, 1985)

tl"l(k) = t Im ll(k) - g,,r~,h(ln,,, -- kl), (9) l<~k<~N, k=22 or k = 2 2 + 1, 2 e Z , where r'hh is the autocorrelation of the impulse response h ' ( n ) . As it is stated in (Senensieb et al., 1984)

tF(n) = h(n)7",

(10)

and because ;, is less than one, the envelope of h'(n) decays more rapidly than that of h(n). This allows us to write

h'(n) = 0

for n > nl,

where n I < < N.

(11)

Also, h ' ( n ) = 0 for n < 0 due to the causality of the filter. A typical value for n~ is 15, leading to a significant computational saving (Lefevre and Passien, 1985). The autocorrelation r~,h(n ) then becomes

rf, h(n ) =

for tnl > hI.

(12)

Taking into account (11) and (12), t [°1 can be written as II I

t[°l(k) = r(k) +

~ (r(k - n) + r(k + n))rl;h(n), tl = 1

I <_k<~N,

k=22

or k = 2 2 +

1, 2 s Z

(13)

2oZ.

Table 1 summarizes the above described algorithm. A count of the operations is also included. A major advantage of the modified algorithm is that it saves bits in coding because the positions of the pulses are exclusively only odd or even. For example, if in the original algorithm the pulse search interval is 64 points, in the modified it is only 32, which means a saving of 1 bit per pulse. If the pulse rate is 1500 pulses/sec then a total saving of 1500 bits/sec is obtained which is very significant and it can be used for increasing either the pulse rate for a given bit rate, or the precision of coding the pulse amplitudes or both. As another example let us consider a combinatorial scheme for coding the pulse positions (Montagna and Omologo, 1986). If M pulses are placed in a frame of length N, then only ( N ) possibilities exist, so the number of bits B per frame is given by

If N = 160 and M = 36 then from (15) B = 120 bits/frame. When the pulses are placed at odd or even positions then the actual frame length becomes N/2 and the number of bits B' per frame is given by B' = log:

(>

.

(16)

Substitution of N and M in (16) gives B' = 76 bits/frame which means that the actual saving is 44 bits/flame or 2.2 kbit/sec for the above example. One possible application is the use of this algorithm for wide-band speech or audio where the savings could be significant since the sampling rate is much higher. Let us now discuss the disadvantages of the restriction imposed on the pulse positions. It is obvious that such a restriction will result in some ko/

10. No. 2. Jun,-' I091

174

S.G. Bakamidis et al. / A reduced complexi(v multipulse compression sTstem

Table 1 Computational organization and complexity of the reduced complexity multipulse algorithm Multiplications 1.1 1.2

Calculation of the weighted impulse response h'(n) = h(n)7", 0 ~< n ~
Additions

2hi

Calculation of tl I

k

(h '(i)h '(i + k)) i=O rich(k) =

0 <~ k <~ n I

(nl - k + 2)(n I + I)

(2/'11 - k)

(2pN)

(2pN)

(n~ + I)[N/2]

n,[N/2]

(n I + 1)(m - 1)

(n t + 1)(rn - i)

n~

~, (h'(i)) 2 i=O 1.3

Calculauon of the error signal r(tt)

1.4

Calculation of #l 1

t[°](k) = r(k) +

~ (r(k - n) + r(k + n))r;,h(n ), n=

l<~k<~N,

k=2Z,

1

~,•Z

m=0 Iterations 2.1

m=rn+

1

2.2

Find n,, such that )(nm)l-maxlt(i)l,

l <~i<~N. i = 2Z, 2 • Z

2.3

g i n - t(nm)

2.4

i f m = M then go to step 3

2.5

Update oft(k) t['~l(k) = t[m l}(k) - gr,6,,(ln,. - kl) --n 1 "+-n m ~ k <~ n I + nm, k - 22, 2 • Z

3.0

End

degradation of the synthetic speech quality. The question is how much this quality differs from the original one, and what compensation can be applied in order to gain as much as possible of the lost quality, by using the ability of the new algorithm to save bits in coding. More about this will be discussed in the next section, where the evaluation of the modified algorithm in terms of computer simulation results and informal listening tests is presented.

3. Evaluation The above described algorithm has been implemented on a digital computer and tested extenSpeech Communication

sively in order to investigate its performance. In all the experiments a linear prediction filter with 10 poles is used. The predictor parameters are calculated every 20 ms using the autocorrelation method with the Split-Schur algorithm (Delsarte and Genin, 1987). The multipulse analysis is performed in non-overlapping successive 20 ms intervals and the error weighting factor is taken to be y = 0.9.

Both the amplitudes and the positions of the pulses are quantized leading to the following rates, which have been successfully experimented: 16, 12, 9.6 kbit/s. Table 2 gives a summary of the parameters used in both the original and the reduced complexity algorithms. These parameters are kept

175

S.G. Bakamidis et al. / A reduced complexity multipulse compression system

(a)

10"1

G:.5~ 3 .33

~A A

0 .01

,

,

(b)!

]LO'I

,~'3 21 "~*~1

-0

,GI I

-1

,G3~

0 .01

v

A olgs

'

'

10 2

t'.37

(c)

10 °t

-0 3 1~

AvAAA

t^A

^&AA

4 ]

0 ,01

o'.¢*

,

I

l

3:~

10

2

N u m b e r of s a m p l e s

Fig. 2. (a) Original signal (solid line); synthetic signal from original MP (dotted line): synthetic signal from modified MP (dashed line). (b) Multipulse sequence from original MP. (c) Multipulse sequence from modified MP.

Vol. IlL No. 2, June 199!

S.G. Bakamidis et al. / A reduced complexity multipulse compression system

176

of the multipulse algorithm; that is, at every new pulse it tries to minimize the total error. Finally, Figure 3 depicts the average SNRseg obtained as a function of the pulse rate for a database of four utterances spoken by three males and one female. The units in the horizontal axis indicate the number of pulses used per analysis frame. Also, the + and × signs express the average SNRseg for the original and the modified MPs, respectively, for a given number of pulses per analysis interval. These results are based on a non-quantized version of the coder. It can be observed that the two curves are quite close to each other and that their difference is always about 1.5 dB. This is of course the compensation for the decrease in computational complexity by almost 33%. Another observation is that by increasing the bit rate slightly, we can gain part of the lost SNR. This can be achieved with a slight increase of the pulse rate but this can be fully compensated by the bit rate reduction, due to the positioning of the pulses. It must also be noticed that the additional computational effort is almost negligible. Another alternative is to use the saved bits for coding of the pulse amplitudes, which, as is wellknown, results in a better quality of the output. Amplitude optimization has not been used throughout the analysis, because its computational complexity renders it more or less impracti-

Table 2 Analysis parameters

Parameter

Value

Sampling frequency LP analysis procedure Autocorrelation window type Prediction order Multi-pulse analysis interval Error weighting factor

8 khz Autocorrelation Hamming 10 LP coefficients 20 ms 0.9

constant throughout the tests. In Figure 2(a) a typical speech fragment is shown (solid line) along with the synthetic signals produced by both the original MP (dotted line) and the modified MP (dashed line) algorithms using the same number of pulses per frame (30 pulses). It should be noticed that the synthetic waveforms are tightly close to each other and this is of course a good indication of the performance of the modified MP. Figures 2(b) and 2(c) depict the multipulse sequencies corresponding to the original and the modified MP, respectively. Here, there exist some differences in the pulse positioning as was expected, due to the imposed restriction and this also affects the amplitudes of the pulses. Also, some small pulses are placed near big pulses, acting as correction terms in case the latter are wrongly positioned. This is the major advantage

20-

db

+. +

x

+. ..I-

x

.,I.4-

10-

4X

4x

x

4-

x

4,-

..Ix

4-

4"

x

x

'4x

4x

4x

x

x

x

X

X

lo

2"o

sb Number of pulses

Fig. 3. Average SNRseg as a function of the n u m b e r of pulses for a database of four speakers. + = average SNRseg for the original MP: × - average SNRseg for the modified MP. Speech Communication

S.G. Bakarnidis et al. / A reduced complexity multipulse compression system

cal for hardware implementation, with currently available cheap signal processors. Informal listening tests have been conducted with different bit rates for both the original and modified MPs in order to compare them. The modified MP was judged to be almost equivalent to the original MP. Some small degradation was observed in a very high pitch female voice (average pitch 330 Hz) but this is a rather extreme case.

4. Real-time implementation of the reduced complexity multipulse algorithm A PC compatible board, centered around a 20 MHz TMS32020 Digital Signal Processor (Texas Instruments DSP Applic., 1987) in a standard configuration mode and equipped with a 16-bit A/D and D/A conversion circuit, has been used for the realization of the MultiPulse excitation algorithm (MP).

4.1. Arithmetic used-MP's precision requirements After exhaustive testings in a simulation environment with many speech-signals, it has been found that the physical process, which underlies the mathematics of the various procedures of MP, satisfies all the prerequisites for the fixed-point arithmetic usage. The dynamic range of both the input/output and the intermediate quantities involved in these procedures is actually limited to intervals of the form [-n,n), where the values of n can be 1, 2, 4, 8, 16 or 32. To take care of the various n cases, mixed fixed-point formats are used. The word-length accuracy trade-offs under the time and speech quality constraints have forced the realization of MP procedures with a variety of versions, each one employing a different combination of word lengths and fixed-point formats. 32-bits word length was necessary in the precision-sensitive algorithms, such as the crosscorrelation, impulse response and autocorrelation computations. 16-bits word length, single or in combination with 32-bit data, was used only for the implementation of those MP procedures for which the aver-

177

age SNR, as well as the segmental SNR were not dropped more than 1 dB and 2 dB, respectively, from their full accuracy values, or there was no sensible loss in the overall quality of the speech output. In experimenting using SNR computation for the determination of the appropriate word length, it has been observed that whenever the word length was guaranteed at least four correct fractional decimal digits for all the involved quantities, the resulted synthetic speech quality was indistinguishable from that of the full accuracy approach. This empirical rule seems to regulate MP's arithmetic.

4.2. Implementation issues In order to satisfy the increased need of MP algorithm for real-time execution speed, some special software optimization techniques have been developed, which allow a time-efficient matching of MP algorithm to the TMS32020 hardware environment. These techniques along with other optimization possibilities are summarized as follows: (1) Merging of two arrays into a single one using a memory overlapping scheme which allows two arrays to be indexed by a single TMS32020 memory address-pointer. (2) Dynamic storage of an array in data-memory. An array of dimension p, dynamically stored in memory and used by an n-times executed loop, occupies n *p memory locations and advances one place to higher ones after every loop's execution. This method requires extra memory to be reserved for a specific array. It nevertheless accelerates execution by eliminating a number of storage and indexmodifying instructions. (3) Replacement of a loop with its body-instructions repeated in program-memory as many times as this loop is executed. In this manner faster execution of a loop is accomplished at program-memory expense. Application of this method to small loops executed multiple times within larger loops may save up to 4000 instruction cycles depending on the particular procedure. (4) Wide use of the powerful M A C / M A C D inVul

Itl, No. 2, J u n e 1991

178

S.G. Bakamidis" et al. / A reduced complexity multipulse compression system

structions, which in r e p e a t - m o d e can perform inner products, convolution and digital filtering operations rapidly. The implementation of the original multipulse algorithm under the formerly mentioned considerations generated a TMS32020 code of 118200 to 126300 cycles (minimum and m a x i m u m execution times), while 100000 cycles per segment are the upper limit at 8 kHz sampling rate. Application of the 33% complexity reduction introduced by the proposed multipulse algorithm to the above execution times, shows that the real-time implementation of the modified MP algorithm is indeed feasible on the DSP TMS32020. To enhance the poor p e r f o r m a n c e of 16-bit arithmetic in case of very small numbers, scaling/ normalization routines have been embodied to certain points of the program. Such scaling is necessary, especially to the input-parameters of the synthesis routine, which otherwise seems to fail when silenced or low-energy speech-frames are being processed. With these i m p r o v e m e n t s the TMS32020 realtime 16 kbit/s multipulse code reproduces the voice of a speaker in an acceptable way. Almost natural quality is obtained at 16 kbit/s. All the range of multipulse coders from 9.6 to 16 kbit/s can be easily derived by using the initial real-time multipulse assembly program. Some of these coders have also been implemented and tested. A decrease in their performance has been noticed as the bit rate lowers.

5. Conclusion In this p a p e r a modification of the original MP algorithm is studied. The computational effort is reduced by 33%, while informal listening tests have shown that the overall performance of the modified algorithm is almost equivalent to that of the original one. A n o t h e r interesting point that m a k e s the proposed algorithm attractive, is that for a given n u m b e r of pulses the bit rate is reduced, as c o m p a r e d to that of the original mul-

SpeechCommunication

tipulse algorithm. This reduction is due to some extra pulse position bits saved after the restriction for odd or even places of the pulses. So, for constant bit rate, more pulses are allowed to be allocated, which is translated to better speech quality.

References B.S. Atal and J.R. Remde (1982), "A new model of LPC excitation for producing natural-sounding speech at low bit rates", Proc. IEEE blternat. Conf. Acoust. Speech Signal Process., pp. 614-617. P. Delsarte and Y. Genin (1987), "On the splitting of classical algorithms in linear prediction theory", IEEE Trans. Acoust. Speech Signal Process., Vol. ASSP-35, pp. 645653. N.S. Jayant and P. Noll (1984), Digital Coding of Waveforms

Principles

and Applications

to Speech

and

Video

(Prentice-Hall, Englewood Cliffs, NJ). P. Kroon and E.F. Deprettere (1984), "Experimental evaluation of different approaches to the multipulse coder", Proc. IEEE lnternat. Conf. Acoust. Speech Signal Proeess., pp. 10.4.1-10.4.4.

P. Kroon, E.F. Deprettere and R.J. Sluyter (1986), ~'Regularpulse excitation--A novel approach to effective and efficient multipulse coding of speech", IEEE Trans. Acoust. Speech Signal Process., Vol. ASSP-34, No. 5. J.-P. Lefevre and O. Passien (1985), "Efficient algorithms for obtaining multipulse excitation for LPC coders", Proc. 1EEE b~ternat. Conf. Acoust. Speech Signal Process., pp. 25.6.1-25.6.4. R. Montagna and M. Omologo (1986), "Some results on multipulse linear predictive coding", G L O B E C O M 86, Houston, TX. 1-4 December 1986. G.A. Senensieb, A.J. Milbourn, A.H. Lloyd and I.M. Warrington (1984), "A non-iterative algorithm for obtaining multipulse excitation for linear predictive speech coders". Proc. huernat. Conf. Acoust. Speech Signal Process., pp. 10.2.1-10.2.4. S. Singhal and B.S. Atal (1984). "Improving performance of multi-pulse LPC coders at low bit rates", Proc. IEEE blternat. Conf. Acoust. Speech Signal Process., pp. 1.3.1-1.3.4. S. Singhal, B.S. Atal (1989), "Amplitude optimization and pitch prediction in multipulse coders", lEEE Trans. Acoust. Speech Signal Process., Vol. 37, No. 3. R.J. Sluyter (1983/84), "Digitization of speech", Philips Techn. Rev., Vol. 41, No. 7/8. Digital Signal Processing Applications with the TMS320 Family. Texas Instruments Inc. (1987).