Digital Signal Processing 9, 76–88 (1999) Article ID dspr.1999.0334, available online at http://www.idealibrary.com on
Fast Search Methods for Spectral Quantization John Leis*,1 and Sridha Sridharan† *Faculty of Engineering, University of Southern Queensland, Toowoomba, Queensland, Australia; †Speech Research Laboratory, Queensland University of Technology, Brisbane, Queensland, Australia Leis, J., and Sridharan, S., Fast Search Methods for Spectral Quantization, Digital Signal Processing 9 (1999), 76–88. In this paper, we examine the computational requirements for the split-vector class of vector quantizers when applied to low-rate speech spectrum quantization. The split-vector quantization techniques are able to reduce the complexity and storage requirements of the 24-bit per frame spectral quantizer to manageable proportions. However, further dramatic reductions in computational complexity are possible, as will be demonstrated. As the fast-search algorithms reported in the literature are somewhat data dependent, it has been necessary to carefully evaluate several methods specifically for the speech coding problem. A total of six methods have been evaluated for their effectiveness in this task, and we show that a so-called ‘‘geometric’’ fast-search method results in a reduction in the average search time of an order of magnitude. r1999 Academic Press
1. PROBLEM FORMULATION In coding speech for transmission or storage, a significant portion of the available bandwidth is taken up with the quantization of the short-term spectrum. This short-term spectrum is normally derived from the linear predictor (LP) coefficients of the one-sample predictor, sˆ(n) ⫽ a1s(n ⫺ 1) ⫹ a2s(n ⫺ 2) ⫹ ··· ⫹ aps(n ⫺ P)
(1)
P
⫽
兺 a s(n ⫺ k).
(2)
k
k⫽1
Direct quantization of the LPC coefficients is known to be unsatisfactory, principally because the LP spectra is sensitive to perturbations in the LP 1
E-mail:
[email protected].
1051-2004/99 $30.00 Copyright r 1999 by Academic Press All rights of reproduction in any form reserved.
76
coefficients and the fact that the LP spectral sensitivity is not localized to one particular frequency region. A better domain for quantization is the line spectrum frequency (LSF, also called line spectrum pair or LSP) representation, which was originally introduced by Itakura [1] but which has only recently been exploited in low-rate speech coding to a significant degree. Given the LPC model with coefficients ai, the LSF representation is found by decomposing A(z) into two polynomials P(z) and Q(z) as [2, 3]: P(z)
6 ⫽ A(z) ⫾ z
⫺(P⫹1)
Q(z)
A(z⫺1).
(3)
The resulting LSF’s are interleaved on the unit circle, with the roots of P(z) corresponding to the odd-numbered indices and the roots of Q(z) corresponding to the even-numbered indices. The coefficients of the synthesis filter 1/(1 ⫺ A(z)) are recovered simply by A(z) ⫽
P(z) ⫹ Q(z) . 2
(4)
To attain the lowest possible rates, vector quantization (VQ) [4, 5] of the LSF’s is used, which has been demonstrated to be satisfactory at rates of 24 to 26 bits per 20-ms frame [3]. In a vector quantizer, a codebook ⺓ of size N ⫻ k maps the k-dimensional space R k onto the reproduction vectors (also called codevectors or codewords): Q:R k = ⺓ : ⺓ 0 (c1, c2, ···, cN,)T,
ci 僆
Rk
(5)
The codebook can be thought of as a finite list of vectors, ci : i ⫽ 1, . . . , N. The codebook vectors are preselected through a clustering or training process to represent the training data. In the coding process of vector quantization, the input samples are handled in blocks of k samples, which form a vector x. The VQ encoder searches the codebook for an entry ci that serves best as an approximation for the current input vector xt at time t. The encoder minimizes the distortion d(·) to give the optimal estimated vector xˆt: xˆt ⫽ arg min d(xt, ci). ci僆⺓
(6)
The index i thus derived constitutes the VQ representation of x. Speech spectrum quantization typically requires of the order of 24 to 26 bits per frame [3]. Because of the large computational load and memory required for direct implementation at this rate, structured or product-code vector quantizers have been introduced [5]. Of these, multistage and split vector quantizers have found the greatest applicability. The multistage quantizer is a suboptimal approach which uses the summation of a set of smaller codebooks. The split
77
vector quantizer encodes the 10-dimension vector using smaller subvectors, typically of a 3 ⫹ 3 ⫹ 4 dimension. In any vector quantizer, the computational complexity of the encoder and decoder are highly asymmetrical; the complexity of the encoder is substantially greater than the decoder. The simplistic exhaustive-search method is guaranteed to find a global minimum, given the current codebook. However, real-time constraints usually dictate a maximum encoding delay. Even non-real-time applications such as voice store-and-forward mail benefit from a reduction in the computational complexity, as it permits either a lower encoding time on any given hardware, or it enables less expensive hardware to be utilized without degrading performance. Although it is axiomatic that a vector quantizer will require substantially greater computational resources than a scalar quantizer, a number of simplifications may be made to the VQ search algorithm to reduce the complexity. These fall into the categories of: Implementation-specific simplifications. This includes conventional techniques such as loop unrolling and precalculation of constants. This approach is well documented in the general literature and is not specific to the vector quantization problem and, as such, is not considered further here. Partial-search techniques. This approach aims to eliminate certain codevectors from the search before the full distortion is calculated. The computational reduction attainable depends upon how early the codevector may be eliminated as a candidate in the current search. Algebraic simplifications. This approach uses a mathematical simplification of the distortion metric to both precalculate certain constants and apply preliminary tests to remove certain codevectors from candidature. Geometric considerations. By considering the search process in R k, a number of simplifications based on geometric arguments may be made by considering the k-dimensional volume enclosed by all admissible codevectors. This method is demonstrated here to be the most powerful of all, resulting in a reduction in the search time of an order of magnitude for the spectrum encoder. The exact reduction attainable (if indeed any reduction is possible) is somewhat dependent upon the source characteristics—hence, the motivation for the present study in connection with speech spectrum quantization. The majority of the fast algorithms published have been examined in the context of Gauss–Markov sources (to gauge a theoretical performance bound), or image data sources (as the most common application area of VQ). The partial-search and algebraic techniques require at least one scalar comparison to be performed on each vector of the codebook, and hence the upper bound on the reduction in complexity is limited by the codebook size. The geometric methods, however, eliminate most codevectors in the codebook without any computation and thus form an extremely powerful approach, with the potential for substantial performance improvements. Many fast-search algorithms, and variations thereof, have been proposed in the recent past [6–13]. Essentially, these may be classified into suboptimal
78
methods, and full-search equivalent methods—the notion of optimality being applied to the average distortion of the encoder as compared to a full (or exhaustive) search. Because of the exacting nature of the encoding problem under consideration, attention has been limited to full-search equivalent methods. The Euclidean distance metric k
d2(xi, y) ⫽ 0 xi ⫺ y02 ⫽
兺 (x
ij
⫺ yj)2,
i ⫽ 1, 2, . . ., L,
(7)
j⫽1
L represent the set of codebook vectors, where x is the source vector and ⺓ ⫽ 5ci6i⫽1 is in common use. One obvious simplification is to use the L1 ‘‘city block’’ distortion criterion, in order to eliminate the squaring operation on each vector component: For the L1 (‘‘city block’’) distortion, k
d1(xi, y) ⫽ 0 xi ⫺ y0 ⫽
兺 0x
ij
⫺ yj 0, i ⫽ 1, 2, . . ., L.
(8)
j⫽1
This criteria has been found to give substantially inferior performance when used to search a codebook of speech LSF vectors. In order to properly evaluate the fast-search algorithms, a criterion for evaluation of the performance gain is needed. Traditionally, this has been the number of multiplication operations required, as this was normally the slowest processor operation (usually by at least an order of magnitude). Newer processor architectures using the reduced instruction set computer (RISC) and digital signal processor (DSP) philosophies generally implement all instructions in one processor clock cycle, thus rendering the ‘‘number-of-multiplications’’ criterion obsolete. Considering this, the results reported in the remainder of this paper are in terms of the number of vector comparisons. Thus a codebook of size L requires L vector comparisons, and a successful fast-search technique will have R vector comparisons with R 9 L.
2. PARTIAL DISTORTION SEARCH The partial distortion search was originally put forward by Bei and Gray [9] and enhanced by Paliwal and Ramasubramanian [8]. This method simply terminates the distortion calculation prematurely if the accumulated distortion is greater than the lowest found in the codebook thus far, because in that case the codevector under consideration cannot possibly be a candidate for the minimum global distortion. Given the current trends in processor architecture (Section 1), the validity of this method must be examined carefully. This is because the method effectively trades off additional arithmetic operations (subtraction, addition, and multiplication) for an additional comparison during the calculation of each component.
79
If it is assumed that the processor time required for an operation is and that is independent of the operation type (addition, subtraction, multiplication, or comparison), then a full-search over a vector of dimension k requires k subtractions, k additions, and k multiplications for a total time of 3k. Let the partial distortion calculation terminate after R iterations, where R ⱕ k. The partial distortion method requires an additional comparison at each stage, for a total time of 4R. For the method to be of any benefit, it is required that 4R ⱕ 3k ⬖Rⱕ
3 4
k,
(9)
where equality holds when there is no benefit of the partial search over a full search. For small dimension vectors, such as found in the split vector quantizer considered here, the value of k may be only 3 or 4, and hence the likely benefit is restricted. For the split vector quantizer with a split of 3, 3, and 4 subvectors and a codebook size of 256 per subvector, the computational load associated with this method was found to be reduced by a factor of approximately two. These results are summarized in Table 1, where a full-search comparison would rate as 256 vector comparisons per stage. In real-time applications, not only the average number of comparisons but also the worst-case number of comparisons must be considered. To this end, Fig. 1 shows the distribution of the number of vector comparisons over a test database of 32,768 vectors (outside the training sequence). This indicates that the computational load associated with this method is reduced on average by a factor of two and is always less than 75% of the full-search method. Note that this method does not require any additional storage, apart from the small additional program space taken up by the additional comparison.
3. ALGEBRAIC EXPANSION METHODS The so-called ‘‘algebraic’’ methods have been introduced to the generalized VQ encoding problem by Torres and Huguet [11]. The approach is simply to expand the Euclidean metric equation and simplify algebraically. The method is applicable to speech LSF quantization, because it requires cij ⱖ 0 ᭙i, j. Two
TABLE 1 Average Number of Vector Comparisons Using the Partial Distortion Algorithm
Partial search Full search
Split 1
Split 2
Split 3
126 256
122 256
101 256
80
FIG. 1. Histograms of the number of comparisons required for a partial-distortion search in a 256 vector per stage split vector quantizer.
variations were proposed, the ‘‘single-test’’ algorithm and the ‘‘double-test’’ algorithm. For the split vector quantizer, neither of the algebraic algorithms yielded any computational savings. This is due to: (i) The utilization of a metric which assumes the computational burden for comparisons is equal to that for arithmetic operations. (ii) The small size of the codevectors in split VQ.
4. GEOMETRICAL METHODS The three optimal, fast-search equivalent algorithms presented in by Huang et al. [6] have been examined for their performance in both split- and multistage-VQ applications. In the remainder of this section, Algorithms I, II, and III refer to the designation of [6]. A simple modification to Algorithm II of [6] is also suggested, which yields improved performance for LSF data. These methods utilize precomputed distance tables, but the experimental results obtained have confirmed that quite substantial reductions in computational requirements may be achieved. The additional memory requirements are not unreasonable, given current DSP memory availability. The methods are based on the triangle inequality, which for a test vector x
81
and codebook vectors ci and cj states that d(ci, cj) ⱕ d(x, ci) ⫹ d(x, cj).
(10)
The method requires a precomputed table of all the distances d(ci, cj ) in the codebook ⺓, requiring L(L ⫺ 1)/2 entries. A good choice of the initial starting vector ci will minimize the size of the search hypersphere. The original paper [6] suggests a good candidate is the norm of each vector ri ⫽ 0ci 0
(11)
and shows that it is an appropriate choice for image quantization. The closest ri to rx in the table is chosen as the initial ci for the fast search. Algorithm II of [6] does not require a distance matrix of all d(ci, cj )—only a set of ri computed as in Eq. (11). Given an input vector x and a starting point ci, the best match in the codebook must satisfy rx ⫺ hi ⱕ rk ⱕ rx ⫹ hi,
(12)
where hi is the distance from x to ci. This may be visualized in two dimensions as a region bounding 0 x 0 ⫾ hi. Again, the initial codevector ci is chosen as the codevector whose ri is closest to rx. Algorithm III of [6] is simply a combination of Algorithms I and II and defines a search region which is the union of the search regions for Algorithms I and II.
5. PERFORMANCE BOUNDS OF THE GEOMETRICAL ALGORITHMS Figure 2 compares the implementation of the three geometric algorithms for a Gauss–Markov source with correlation coefficient of 0.9 for vector dimensions P from 2 to 10 and a fixed codebook size of L ⫽ 1024 codevectors. The following observations are made with respect to Fig. 2: (i) Substantial performance gains are possible for vector dimensions of four or less, using either Algorithms I, II, or III. (ii) Algorithm I (requiring a precomputed distance matrix of size L(L ⫺ 1)/2) reduces the number of computations substantially for all vector dimensions up to P ⫽ 10. (iii) Algorithm II (requiring precomputed magnitude table of size L) reduces the number of computations moderately, but only for vector dimensions up to P ⫽ 5. (iv) Algorithm III (requiring both precomputed magnitude table of size L and a precomputed distance matrix of size L(L ⫺ 1)/2) provides marginally better performance than Algorithm I. Note that Fig. 2 implies that Algorithm II requires more vector comparisons than the full-search for vector dimensions greater than about five. It is
82
FIG. 2. Fast vector quantization using a Gauss–Markov source; Comparison of Huang et al. algorithms using a 1024-entry codebook.
hypothesized that this is due to a poor initial choice of ci, resulting in the search disk (hypersphere) of Algorithm I being diametrically opposed to the final codevector. The initial choice, using Eq. (11), may define any point on a circle in two dimensions, or a hypersphere in higher-dimensional space. Naturally, the number of comparisons could be limited to a maximum of the full-search case by the simple expedient of storing a flag for each codevector to indicate whether or not the distance has been computed in the current search iteration. This result does, however, suggest that Algorithm I alone would not provide any performance gain for speech LSF data of dimension P ⫽ 10 (for multistage VQ), but it would be an appropriate choice for lower P (for split VQ). The observed behaviour of the algorithm for a Gauss–Markov source motivates further evaluation for the quantization of spectral parameters, because a first-order Gauss–Markov model is an imperfect approximation for the evolution of the speech spectrum. This is considered in the following section.
6. FAST SEARCH VQ FOR SPEECH DATA In this section, application of the fast-search algorithms of Huang et al. [6] is considered for the speech quantization problem for both split and multistage vector quantizers.
83
TABLE 2 Comparison of Fast-Search Algorithms for Multistage Vector Quantization Full search
II
IIA
III
4096
2151
2060
348
6.1. Comparison of Algorithms Table 2 compares the fast-search Algorithms II and III for multistage VQ. Only the first-stage of a two-stage, 12-bit per stage codebook is shown. Due to the high interframe correlation of speech, the use of an alternate measure for determining the initial starting point ci is proposed, which is referred to as Algorithm IIA. The starting point is chosen, based on the norm of the previous frame’s codevector rx ⫽ 0xˆt⫺1 0.
(13)
It is seen that this method (IIA) provides an improvement over method II. Method III provides by far the greatest reduction in the number of vector comparisons (less than one-tenth that of the full-search), but it requires the
FIG. 3. Number of vector comparisons for split vector quantization using fast-search Algorithm III.
84
distance table to be prestored. For a 4K codebook this is not feasible, requiring approximately 8.4 ⫻ 106 precomputed values.
6.2. Comparison of Codebook Structures The fast search algorithms are now compared for the split, multistage, and split-differential quantizers. Figure 3 shows the histogram of the number of comparisons required for each subcodebook for a split vector quantizer, with a vector split of 3-3-4 subvectors, each with a codebook size of 256 entries. The rate of the codebook is thus 24 bits per frame. It is evident that each of the split subvectors provides a substantial reduction in the number of comparisons required, from the full-search case of 256 comparisons. For the first two subvectors, the number of comparisons is nearly always less than 64, meaning that the complexity (or equivalently, time required) is reduced by a factor of four. For the case of the split-vector quantization of interframe LSF differences, the fast-search algorithm exhibits essentially similar behaviour as shown in Fig. 4. In this case, the performance of the last subvector is consistently worse than either of the previous subvectors or the corresponding subvector for direct vector quantization. It is suggested that this may be due to the unstructured distribution of the codevectors in R k, when compared to the highly structured
FIG. 4. Number of vector comparisons for frame-differential split vector quantization using fast-search Algorithm III.
85
distribution observed in the LSF components themselves. Notwithstanding this, the number of vector comparisons is still reduced by a factor of two in nearly all cases. Figure 5 shows the experimental results obtained for a multistage vector quantizer, containing three stages of 256 codevectors each. Note that the scale on the histogram of Fig. 5 has been extended beyond the size of the codebook. Evidently, the first stage yields a modest reduction in the number of comparisons required when compared to the full-search equivalent requirement of 256. However, some codevectors require a larger number of comparisons as compared to the full-search case. This is especially so for the second and third stages. This phenomenon was observed in Section 5, where a hypothesis to account for this inferior performance was put forward. Table 3 compares the average number of comparisons required for split VQ, differential split VQ, and multistage VQ, where it is seen that a reduction of approximately one order of magnitude is achievable for split VQ. It is concluded that the fast search Algorithm III described here is well suited to split vector quantization of speech spectrum parameters, resulting in a reduction by a factor of approximately 10 in computational complexity. Differential split quantization also benefits somewhat, with a reduction in average search time of the order of 5 to 10. Multistage VQ exhibits the ‘‘re-searching’’ phenomenon
FIG. 5. Number of vector comparisons for multistage vector quantization using fast-search Algorithm III.
86
TABLE 3 Comparison of Average Fast Search Results (Number of Vector Comparisons) Stage Method Split vector quantization Differential split VQ Multistage VQ Full-search product-code VQ
1
2
3
20 29 95
19 25 291
35 48 306
256
256
256
observed in the higher dimension Gauss–Markov case and, as such, provides only a modest complexity reduction, and then only in the first stage. Subsequent stages in a multistage VQ do not benefit from the fast-search procedure.
7. CONCLUSIONS The product-code vector quantization techniques are able to reduce complexity and storage requirements of the 24-bit per frame spectral quantizer to manageable proportions. However, further dramatic reductions in computational complexity are possible, as has been demonstrated in this paper. As the fast-search algorithms reported in the literature are somewhat data dependent, it has been necessary to carefully evaluate each method for the problem at hand. A total of six methods have been compared: partial distortion search, single-test algebraic, double-test algebraic, and three geometric algorithms. Four of these methods are useful for vector quantization of speech spectral parameters, with the algebraic methods proving to be unsuccessful. Of the methods tested, the geometric approach yields the greatest reduction in computational complexity for both split and multistage vector quantization. However, the additional memory requirements preclude the use of this method for multistage vector quantization in practice. Spectral compression using split vector quantization has been demonstrated to be well suited to the geometric fast-search method, resulting in a reduction in complexity of an order of magnitude.
REFERENCES 1. Itakura, F. Line spectral representation of linear predictive coefficients of speech signals. J. Acoust. Soc. Am. 57, No. S35 (1975). 2. Soong, F. K., and Juang, B. H. Optimal quantization of LSP parameters. IEEE Trans. Speech Audio Process. 1, No. 1 (1993), 15–24. 3. Paliwal, K. K., and Atal, B. S. Efficient Vector Quantization of LPC Parameters at 24 Bits/Frame. IEEE Trans. Speech Audio Process., 1, no. 1, (1993), 3–14. 4. Gray, R. M. Vector Quantization. IEEE Acoustics, Speech and Signal Process. Magazine (1984) 4–28.
87
5. Gersho, A., and Gray, R. M. Vector Quantization and Signal Compression. Kluwer Academic, Dordecht/Norwell, MA, (1992). 6. Huang, C. M., Bi, Q., Stiles, G. S., and Harris, R. W. Fast full search equivalent encoding algorithms for image compression using vector quantization. IEEE Trans. on Image Process. 1, No. 3 (1992), 413–416. 7. Soleymani, M. R., and Morgera, S. D. A fast MMSE encoding technique for vector quantization. IEEE Trans. on Commun. 37, No. 6 (1989), 656–659. 8. Paliwal, K. K., and Ramasubramanian, V. Effect of ordering the codebook on the efficiency of the partial distance search algorithm for vector quantization. IEEE Trans. on Commun. 37, No. 5, (1989), 538–540. 9. Bei, C. D., and Gray, R. M. An improvement of the minimum distortion encoding algorithm for vector quantization. IEEE Trans. on Commun. COM-33, No. 10 (1985), 1132–1133. 10. Lo, K. T., and Cham, W. K. Subcodebook searching algorithm for efficient VQ encoding of images. IEEE Proceedings-I 140, No. 5 (1993), 327–330. 11. Torres, L., and Huguet, J. An improvement on codebook search for vector quantization. IEEE Trans. on Commun. 42 No. 2/3/4 (1994), 656–659. 12. Soleymani, M. R., and Morgera, S. D. An efficient nearest neighbour search method. IEEE Trans. Commun. COM-35, No. 6 (1987), 677–679. 13. Tai, S. C., Lai, C. C., and Lin, Y. C. Two fast nearest neighbour searching algorithms for image vector quantization. IEEE Trans. Commun. 44 No. 12 (1996), 1623–1628.
JOHN LEIS received the Bachelor of Electrical Engineering from the University of Southern Queensland in 1988 and the Master of Engineering Science and Doctor of Philosophy from Queensland University of Technology in 1991 and 1999, respectively, with research concentrating in the areas of low bit rate speech encoding and its interaction with automatic speaker identification. He is currently a lecturer at the University of Southern Queensland and has presented several papers in the areas of speech modelling, speech compression, speaker identification, and related areas. SRIDHA SRIDHARAN obtained his B.Sc. (Electrical Engineering) from University of Ceylon, his M.Sc. (Communication Engineering) from the University of Manchester Institute of Science and Technology, United Kingdom, and his Ph.D. (Signal Processing) from the University of New South Wales, Australia. Dr. Sridharan is a senior member of the IEEE, USA, and a corporate member of IEE of the United Kingdom and IEAust of Australia. He is currently an associate professor in the School of Electrical and Electronic Systems Engineering of the Queensland University of Technology (QUT) and he is also the head of the Speech Research Laboratory at QUT.
88