Digital Signal Processing 14 (2004) 54–71 www.elsevier.com/locate/dsp
A cascade coding technique for compressing large-dynamic-range integer sequences Lixue Wu,a Adam Zielinski,b,∗ and John S. Bird a a School of Engineering Science, Simon Fraser University, Burnaby, BC V5A 1S6, Canada b Department of Electrical and Computer Engineering, University of Victoria, Victoria, BC V8W 3P6, Canada
Abstract The compact representation of integers is an important consideration in areas such as data compression. In this paper a novel, simple, and effective approach for lossless compression of largedynamic-range integer sequences is presented. It is shown that the proposed code is better than other codes for stepwise distributions. An optimization method for finding the optimal coding parameters is described. The method links the optimization to curve fitting. Because of the simplicity of the coding scheme, the data compression algorithms can be implemented using hardware circuitry to meet the requirements of real-time applications. 2003 Elsevier Inc. All rights reserved.
1. Introduction Despite rapid progress in improving mass-storage density and digital communication system performance, compression of data representing sampled signals continues to be important in many engineering and research areas, since it can overcome data storage and transmission bandwidth limitations. Many coding techniques result in broadband residue (large-dynamic-range integer) sequences that must be retained for an exact recovery of the original data. Examples of these include linear predictive coding (LPC) of speech, audio, image, and seismic waveforms [1–9]. Lossless (exact recovery) compression methods such as Huffman and arithmetic coding [10] can be efficiently applied to residue sequences when the dynamic range of the residues to be encoded is small. However, these techniques are not efficient for residue sequences of LPC [11], where the residue ranges are large, generally * Corresponding author.
E-mail address:
[email protected] (A. Zielinski). 1051-2004/$ – see front matter 2003 Elsevier Inc. All rights reserved. doi:10.1016/j.dsp.2003.10.001
L. Wu et al. / Digital Signal Processing 14 (2004) 54–71
55
of order 103 . Therefore, considerable research effort has been devoted to the improvement of Huffman and arithmetic coding for such a case [12–16]. In this paper a novel, simple, and effective approach for lossless compression of largedynamic-range integer sequences is presented. This new coding scheme is called cascade coding. A method for finding the optimal coding parameters is described, a variant of the best curve fitting in the sense of least weighted and squared errors. With these new techniques, a lossless data compression algorithm for a large-dynamic-range integer sequence is developed. The cascade code can reach an efficiency of 95%. The algorithm applied in conjunction with LPC to sonar data offers a compression ratio between 2 and 3 [18], where with time-varying gain LPC results in large dynamic range residual sequences because of the nature of sonar data (the terms efficiency and compression ratio will be defined mathematically in Section 2). The compression algorithms can be implemented using hardware circuitry to meet the requirements of real-time applications.
2. Cascade coding To approach the maximum compression ratio a two-step process is normally involved: (1) making samples statistically independent (reduction of redundancy); (2) making the average word length (or average bits needed to code each sample) equal, or nearly equal, to the source entropy (the term source entropy will be defined mathematically in the next paragraph) of the statistically independent samples. The second step is the focus of the paper. In this paper, it is assumed that the residue sequence consists of independent, nonzero, integer samples with sign. Such an integer sequence, for example, may be obtained from LPC applied to various signals from analogue-to-digital converters [14,18]. The original data (integers) are operated upon according to a particular algorithm to produce the compressed data. Reversing the process, the compressed data is decompressed to reproduce the original data. The degree of data reduction obtained as a result of the compression process is known as the compression ratio. This ratio measures the quantity of compressed data compared to the quantity of original data and is given by [17] length of original data string (bits) . (1) compression ratio = length of compressed data string (bits) The efficiency of coding is defined as [10] source entropy efficiency = × 100%, (2) average length code word length where the source entropy is defined as the average of the logarithm base 2 of the inverse of the source symbol probabilities. Huffman coding can be applied to compress such a sequence. However, compression efficiency decreases with increased dynamic range of the integers [11]. This is due to the growth of the Huffman coding tree and the storage for it. To maintain high efficiency for large dynamic range integers, a novel, simple, and effective approach is proposed for lossless compression of integers. The new coding scheme is called cascade coding. The cascade code can reach an efficiency of 95%.
56
L. Wu et al. / Digital Signal Processing 14 (2004) 54–71
(a)
(b) Fig. 1. Primary structure of a cascade coder. (a) Encoder. (b) Decoder.
The most straightforward coding of an integer x such that |x| K is natural binary coding. Here the number of bits required for each integer is given by ceil(log2 K) + 1, where ceil(·) rounds the argument up to the nearest larger integer. While simple and fast, it is far from optimum since the distribution of integer values is generally not uniform. The proposed cascade coding will utilize these statistics. The original data are compressed by encoding symbols that are more frequent with fewer bits per symbol and the least frequent with more bits per symbol and, as a result, the average bits per symbol value is reduced. Several stages are involved in the encoding process. Cascade encoders are realized as a sequence of small natural binary encoders, each one operating on the residual of the preceding natural binary encoder. Figures 1a and 1b show the primary structure of a cascade coder. Several encoding stages are involved in the encoding process. The general rules for the cascade coding are described as follows: • Encoding (1) The first bit of code words is used for coding the sign of nonzero integers with ‘0’ for positive and ‘1’ for negative.
L. Wu et al. / Digital Signal Processing 14 (2004) 54–71
57
(2) Each positive integer, or simply integer, is passed through a series of cascade encoding stages. (3) The input of an encoding stage is an integer and the outputs are a binary code and an integer called a residue. The residue is the maximum of the input minus (2b − 1) and 0, where b is the number of bits assigned to this stage. The binary code is the natural binary code of the input if the residue is zero, otherwise, all zeros of length b. (4) The output residue is the input of the next encoding stage. (5) All stages with nonzero input are called active, while stages with zero input are called inactive. (6) An active stage with zero residue output turns all the following stages inactive. (7) An inactive stage has no binary code output. (8) A bit accumulator collects the binary code outputs from all active stages to form a binary sequence which represents the integer. • Decoding (1) A binary sequence is fed to a bit distributor which is followed by a series of parallel decoding stages. (2) The last active stage is identified by the presence of a nonzero binary code at that stage and the following stages are turned to inactive. (3) The bit distributor partitions the binary sequence to binary codes and distributes to every active stage. (4) The active decoding stage decodes the binary input and outputs a residue, which is the integer representation of the binary input if it is nonzero, otherwise, 2b − 1. (5) An inactive stage has no residue output. (6) The original integer is obtained by summing all residues. As an illustration, Fig. 2 shows the cascade procedure for coding nonzero integers. The encoder and decoder have four stages; 3, 2, 2, and 2 bits are assigned to the first, second, third, and fourth stages, respectively. Figure 2a illustrates the encoding of an integer 8. The fourth stage in this example is inactive due to the zero residue output of the third stage. The first bit of the binary output at the first stage denotes that the encoded integer is positive. Figure 2b demonstrates the decoding of the integers 8. The fourth stage is inactive due to the nonzero binary input of the third stage. The first bit of the binary input at the first stage indicates that the encoded integer is positive.
3. Coding efficiency To discuss the efficiency of cascade codes, assume that the distribution function of symbol probabilities is a symmetric stepwise function. In such a case only the right-half distribution function corresponding to positive integers is of interest. Consequently, the sign bit in cascade codes is not considered in the discussion. Furthermore, if each step
58
L. Wu et al. / Digital Signal Processing 14 (2004) 54–71
(a)
(b) Fig. 2. A cascade procedure for coding nonzero integers. The encoder and decoder have four stages; 3, 2, 2, and 2 bits are assigned to the first, second, third, and fourth stages, respectively. (a) Encoding integer 8. (b) Decoding integer 8.
height in the distribution function P (k) is a negative power of 2, the entropy of the source can be expressed as source entropy = −
P (k) log2 P (k) =
N
ni ai 2−ai ,
(3)
i=1
where 2−ai is the height of the ith step in the distribution function, ni is the number of symbols contained in the ith step, and N is the total number of steps in the distribution function. Now assume a cascade encoder that consists of N stages. Each stage is assigned bi bits such that 2bi − 1 ni for i < N , and 2bN nN . Such an encoder can encode all symbols with the distribution function discussed above. The average length of the encoded codes is given by average code word length =
N i=1
ni
i j =1
bj 2−ai .
(4)
L. Wu et al. / Digital Signal Processing 14 (2004) 54–71
59
If the bit allocation bi is chosen such that i
bj = ai ,
(5)
j =1
the cascade codes can reach an efficiency of 100%. To ensure the normalization condition 1=
Pk =
N
ni 2−ai =
i=1
N
ni 2−
i
j=1 bj
,
(6)
i=1
the bit allocation must also satisfy 2bi − 1 = ni ,
for i < N,
(7)
and 2bN = nN .
(8)
This can be verified by substituting Eqs. (7) and (8) into Eq. (6). Shown in Fig. 3 is an illustrative example of a cascade encoder that reaches an efficiency of 100%. The entropy of the source is calculated from the given distribution function, that is source entropy = 2 · 2−2 · 2 + 6 · 2−4 · 4 + 8 · 2−6 · 6 = 3.25.
(9)
The cascade encoder consists of three stages. The bit allocation of the encoder is {2, 2, 2}. Excluding the sign bit in the first stage, the bit allocation is {1, 2, 2}. Such a bit allocation satisfies the conditions given by Eqs. (5), (7), and (8). The cascade code for each symbol can be obtained using the encoding rules described in Section 3. For convenience these cascade codes are tabulated in Table 1. The average length of the cascade codes is then calculated. In this example average code word length = 2 · 2 · 0.25 + 6 · 4 · 0.0625 + 8 · 6 · 0.015625 = 3.25.
(10)
Therefore, this cascade encoder reaches an efficiency of 100%.
4. Optimization of coding parameters The efficiency of cascade coding depends on the choice of bit allocations, bi , where bi is the number of bits assigned to the ith stage (i.e., bits required for encoding nonoversize residues at the ith stage). Knowing the occurrence probability of each integer value, the average bits per integer (ABPI, or the average number of bits needed to represent an integer) in the cascade coding can be expressed as n¯ = b1
−1 −1 2b1
k=1
Prob(integer = k) + (b1 + b2 )
2b1 −1 +2b2 −2 k=2b1 −1
Prob(integer = k)
60
L. Wu et al. / Digital Signal Processing 14 (2004) 54–71
(a)
(b) Fig. 3. A cascade encoder that reaches an efficiency of 100%. (a) Distribution function of symbol probabilities. (b) Bit allocation.
+ (b1 + b2 + b3 )
b2 +2b3 −3 2b1 −1 +2
Prob(integer = k) + · · ·
k=2b1 −1 +2b2 −1
+ b1
1 −1 −1) −(2b
Prob(integer = k) + (b1 + b2 )
Prob(integer = k)
k=−2b1 −1
k=−1
+ (b1 + b2 + b3 )
−(2b1 −1 +2b2 −2)
b b −(2b1 −1 +2 2 +2 3 −3)
k=−(2b1 −1 +2b2 −1)
Prob(integer = k) + · · · .
(11)
L. Wu et al. / Digital Signal Processing 14 (2004) 54–71
61
Table 1 Cascade codes for different integer symbols Integer symbol
Binary code
Symbol probability
Integer symbol
Binary code
Symbol probability
1
00
0.25
−1
10
0.25
2 3 4
0001 0010 0011
0.0625 0.0625 0.0625
−2 −3 −4
1001 1010 1011
0.0625 0.0625 0.0625
5 6 7 8
000001 000010 000011 000000
0.015625 0.015625 0.015625 0.015625
−5 −6 −7 −8
100001 100010 100011 100000
0.015625 0.015625 0.015625 0.015625
The ABPI can be minimized by selecting optimal values of the set of bit allocations {bi }. One possible method is the exhaustive search to find a set of {bi } which produces the minimum ABPI. Since the method is based on exhaustive trials, an impractical number of computations may be required. For a cascade encoder of N stages to code an integer sequence with a maximum K, the number of trials can be up to [ceil(log2 K)]N . The exhaustive search, however, results in an absolute minimum that can be used to verify the minimum obtained by other optimization algorithms. The exhaustive search also provides the ABPI for every possible bit allocation. From the integer data tested using the exhaustive search, many suboptimal bit allocations are found near the optimum. This indicates that cascade coding can provide a stable performance even if the bit allocation is shifted from its optimum. In order to obtain optimal coding parameters more efficiently, a new heuristic method called least squares is proposed here, whereby the above optimization problem is replaced by best fitting a curve to the distribution function of symbol occurrence probabilities, Prob(integer = k). In order to describe the method of least squares, a new objective function has to be derived. As discussed in Section 2, the first bit in the binary representation of a nonzero integer is used to indicate the sign of the integer. The cascade coding technique encodes the magnitude of the integer only. Therefore, it is assumed that all integers are positive integers k with the occurrence probabilities P (k). Without losing the generality, it is further assumed that P (k − 1) P (k), where k K and K is the maximum integer. As proved in Section 3, cascade codes are optimal when the distribution function of symbol occurrence probabilities is a symmetric stepwise function, where each step height in the distribution function is a negative power of 2. Assume a cascade encoder of N stages with each stage assigned bi bits, or a bit allocation {bi } for i = 1, . . . , N , or a bit vector b, where b = [b1 , b2 , . . . , bN ]. The optimal codes (integers with a certain distribution function that can be coded with a 100% coding efficiency) associated with this encoder have a symmetric stepwise distribution function. The height of the ith step in the distribution function, corresponding to the symbol occurrence probabilities of 2bi − 1 integers in the i −t i ith region, is equal to 2 , where ti = j =1 bj is the code word length for encoding integers in the ith region. This distribution function is called the optimal stepwise function. The optimal stepwise function S(k) can be then written as
62
L. Wu et al. / Digital Signal Processing 14 (2004) 54–71
−t1 2 , 2−t2 , 2−t3 , S(k) = ... −tN 2 ,
1 k 2b1 − 1, 2b1 k 2b1 + 2b2 − 2, 2b1 + 2b2 − 1 k 2b1 + 2b2 + 2b3 − 3, .. . N−1
2bj − (N − 2) k K
j =1
N
(12)
2bj − N + 1.
j =1
Note that the symbol occurrence probability of the integer k is also given by S(k). From the definition of ti , the code word length for encoding the integer k is t1 , 1 k 2b1 − 1, t2 , 2b1 k 2b1 + 2b2 − 2, b b b b b t3 , 2 1 + 2 2 − 1 k 2 1 + 2 2 + 2 3 − 3, . . c(k) = .. (13) .. N−1 N bj t , 2 − (N − 2) k K 2bj − N + 1. N j =1
j =1
The coding efficiency for the above optimal codes can be calculated using Eq. (2). That is, source entropy average code word length − K k=1 S(k) log2 S(k) = K k=1 c(k)S(k) K S(k)c(k) = k=1 = 100%. K k=1 c(k)S(k)
efficiency =
(14)
Now assume that the actual integers to be coded have a distribution function P (k), for 1 k K, instead of the optimal stepwise function S(k). The coding efficiency for this case can be recalculated as − K k=1 P (k) log2 P (k) efficiency = . (15) K k=1 c(k)P (k) Under a realistic assumption that the deviation of the actual distribution function P (k) from the optimal stepwise function S(k) is on both sides (plus or minus) and can be arranged in pairs, it is shown that K
c(k)P (k) =
k=1
K
c(k)S(k),
(16)
k=1
and −
K k=1
P (k) log2 P (k) = −
K k=1
S(k) log2 S(k) −
K k=1
w(k)d 2 (k),
(17)
L. Wu et al. / Digital Signal Processing 14 (2004) 54–71
63
where w(k) = (ln 2)−1 2c(k) and d(k) = P (k) − S(k) is the error between the actual distribution function and the optimal stepwise function (see Appendix A for details). Thus, Eq. (15) can be written as − K k=1 P (k) log2 P (k) efficiency = K k=1 c(k)P (k) K 2 − k=1 S(k) log2 S(k) − K k=1 w(k)d (k) = K k=1 c(k)S(k) =1−
K
W (k)d 2 (k),
(18)
k=1
where W (k) = K
w(k)
i=1 c(i)S(i)
(19)
is a weighting function. It is clear that the reduction of the coding efficiency for a given bit allocation {bi }, or b, is directly related to the sum of weighted and squared errors E(b) as given by E(b) =
K
W (k, b)d 2 (k, b)
(20)
k=1
and E(b) = 0, or d 2 (k, b) = 0, leads to a 100% efficiency as the optimal codes do. Therefore, the sum of weighted and squared errors E(b) is the new objective function for the method of least squares. In curve fitting, the method of least squares states that the best-fit curve of a given type is the curve that has the minimal sum of the deviations squared (least square errors) from a given set of data. In cascade coding, the method of least squares concludes that the best bit allocation of the N -stage encoder is the optimal stepwise curve (function) of that bit allocation with the minimal sum of the deviations weighted and squared (least square errors) from a given symbol occurrence distribution curve (function) of data. Note that the number of stages N is not a variable in the objective function. The optimization is carried out for every given N , starting from N = 1. Let n¯ opt,N be the minimum ABPI for an N -stage cascade encoder. The optimization algorithm is stopped on the result of a larger optimal ABPI, while increasing N to add one more stage. That is, n¯ opt,N < n¯ opt,N+1
(21)
is the criterion for the termination of the optimization (see Appendix B for details). The bit allocations bi have the constraint bi 1 for i = 1, . . . , N , and
N−1 bj bN = ceil log2 K − 2 +N −2+1 ,
(22)
j =1
to ensure that bN is larger enough to code the integers left over by the preceding N − 1 stages. The initial bit allocations in an optimization algorithm may be set to b1 = ceil(log2 K) and bi = 1 for i = 2, . . . , N , for steep descending distribution functions.
64
L. Wu et al. / Digital Signal Processing 14 (2004) 54–71
5. Performance To evaluate the performance of the cascade encoder we apply cascade coding to a sequence consisting of 1000 integers. The integers are obtained by quantizing a continuous Gaussian random sequence with zero mean and variance σ 2 . The quantizer is defined by i, 0 i − 1 x < i, Q(x) = (23) −i, 0 −(i − 1) x > −i, where x is the sample of the Gaussian random sequence and i is a positive integer. Integer sequences with different variances are generated. The optimal set of bit allocations for the sequence is determined using the proposed method. The integers are cascade encoded using optimal coding parameters. Coding efficiencies for the optimal set of bit allocations for different integer variances are tabulated in Table 2. It is seen that cascade coding can achieve an efficiency of about 95%. The bit allocations listed in Table 2 shows the structure of cascade coders. Except the first stage, cascade coders have an equal, or nearly equal, stage size (number of bits assigned to the stage). Therefore, these stages can be implemented recursively by using one stage only. This reduces the complexity of hardware circuitry if the compression algorithm is hardware implemented. As the dynamic range of integers increases, the coding efficiency decreases slightly. To compare cascade coding with other coding methods, the generated integer sequences are encoded using Huffman coding. A comparison of Huffman and cascade coding for different integer variances is shown in Fig. 4. Cascade coding is superior to Huffman coding even when the integer variance is small. Table 2 Coding efficiencies and optimal bit allocations for different sample variances σ2
Range
Entropy
Bits/sample
Efficiency
Bit allocation
2 4 8 16 32 64 128 256 512 1024
{−4, 6} {−7, 7} {−12, 11} {−16, 16} {−21, 21} {−21, 28} {−33, 36} {−46, 60} {−72, 68} {−123, 95}
2.583 3.121 3.596 4.036 4.517 4.995 5.457 6.016 6.429 6.87
2.688 3.219 3.639 4.195 4.616 5.197 5.742 6.233 7.023 7.3
96.11 96.95 98.81 96.2 97.86 96.12 95.04 96.52 91.54 94.11
2048 4096
{−132, 137} {−185, 207}
7.315 7.801
8.011 8.647
91.31 90.21
8192 16384
{−266, 309} {−361, 458}
8.185 8.494
9.111 9.96
89.83 85.28
32768
{−530, 602}
8.816
2, 1, 1, 1, 1, 1 3, 1, 1, 1, 1 3, 1, 1, 1, 1, 1, 1, 1, 1, 1 4, 1, 1, 1, 1, 2, 1, 1 4, 2, 2, 1, 1, 2, 2 5, 2, 3, 1, 1, 1 5, 3, 3, 2, 1, 2 6, 3, 3, 1, 3, 3 7, 3, 1, 1 7, 3, 2, 3, 3, 3, 3, 3, 3, 3, 1 8, 3, 2 8, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 2 9, 3, 3, 3, 3, 3, 3, 3, 3 9, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3 10, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3
10.07
87.5
L. Wu et al. / Digital Signal Processing 14 (2004) 54–71
65
(a)
(b) Fig. 4. Comparison between Huffman coding and cascade coding for different integer variances. (a) Efficiency. (b) Average number of bits per integer.
66
L. Wu et al. / Digital Signal Processing 14 (2004) 54–71
Cascade coding is also beneficial because of its simplicity. It is much faster than other available coding methods except natural binary coding. The data compression algorithm can be implemented using hardware circuitry to meet the requirement of real-time applications. For variable-length integer coding, each different scheme is best suited for a particular distribution. Cascade coding is more appropriate than other compression methods when the distribution of symbol probabilities is close to a stepwise function. Since the stepwise function has many more parameters to vary than any known distributions, the cascade coding is suitable for a wide range of data with various distribution functions. Furthermore, only a few bits are needed for transmitting the side information (bit allocation). Therefore, cascade coding can still maintain high efficiency when the number of total symbols is large. The fast optimization algorithm for finding the optimal coding parameters and the simplicity of the coding scheme make it possible to adapt the coding parameters during transmission. For stringent applications, where transmission delay is not acceptable, a training technique can be used. That is, past data samples can be used to find the optimal coding parameters for current data samples.
6. Conclusions This work has contributed two advances to lossless large-dynamic-range integer sequence coding: (1) a new coding scheme, called cascade coding, for efficiently coding integers with large dynamic range, and (2) an optimization technique, called least squares, for quickly finding the optimal set of coding parameters. These techniques result in coding efficiency gains and significant computational improvements in the implementation of data compression.
Acknowledgment This work was supported by the Natural Sciences and Engineering Research Council of Canada.
Appendix A Assume a cascade encoder with each stage assigned bi bits, or a bit allocation {bi }. The optimal codes associated with this encoder have a symmetric stepwise distribution function. The height of the ith step in the distribution function, corresponding to the symbol occurrence probabilities of 2bi − 1 integers in the ith region, is equal to 2−t , where t = i j =1 bj is the code word length for encoding integers in the ith region. This distribution function is called the optimal stepwise function, as illustrated by a dashed line shown in Fig. 5. Now assume that the actual distribution function of 2bi − 1 integers in the ith region deviates from the optimal stepwise function associated with the bit allocation {bi }, as
L. Wu et al. / Digital Signal Processing 14 (2004) 54–71
67
Fig. 5. An illustration of the actual distribution function (solid line) and the optimal stepwise function (dashed line) for integers in the ith region.
indicated by a solid line shown in Fig. 5. Further assume that the deviations are symmetric to the optimal stepwise function (this assumption will be removed later). That is, if P (m) = 2−t + d, then there exists a unique n in the ith region such that P (n) = 2−t − d, where P (m) and P (n) are the symbol occurrence probabilities of the integers m and n, respectively, and d is the error between the symbol occurrence distribution function and the optimal stepwise function, as shown in Fig. 5. The contribution of the integer m to the source entropy is given by −P (m) log2 P (m) = −(2−t + d) log2 (2−t + d) = −(2−t + d) = −(2−t + d)
ln 2−t + ln(1 + 2t d) ln 2
2t d 2 −t ln 2 + 2t d d = 2−t t + td − − . ln 2 ln 2 ln 2
(24)
Here ln(1 + 2t d) is expanded in Taylor series and neglect the 2nd or higher order terms since d 2−t or 2t d 1. Similar to the above, the contribution of the integer n to the source entropy is as follows: −P (n) log2 P (n) = −(2−t − d) log2 (2t − d) = 2−t t − td +
2t d 2 d − . ln 2 ln 2
(25)
The sum of the contributions of the integers m and n to the source entropy is then −
k=m,n
P (k) log2 P (k) = 2−t +1 t −
2t +1 d 2 . ln 2
(26)
68
L. Wu et al. / Digital Signal Processing 14 (2004) 54–71
If the symbol occurrence probabilities of the integers m and n do not deviate from its optimal level 2−t , then the sum of contributions of the integers m and n to the source entropy is P (k) log2 P (k) = 2−t t + 2−t t = 2−t +1 t. (27) − k=m,n
Therefore, the source entropy for the case of the deviated distribution function is reduced by a weighted square error (ln 2)−1 2t +1d 2 . This results in a reduction in the coding efficiency. The weighted square error can be written as
2t +1 d 2 2−t +1 d 2 = , (28) ln 2 ln 2 2−t where d/2−t is the relative error. Since 2−t +1 is a decreasing function it is interesting to see that smaller integers are weighted more in the reduction of the coding efficiency. This is expected since these integers have higher symbol occurrence probabilities. On the other hand, the sum of contributions of the integers m and n to the average code word length is given by P (m)t + P (n)t = (2−t + d)t + (2−t − d)t = 2−t +1 t.
(29)
If the symbol occurrence probabilities of the integers m and n do not deviate from its optimal level 2−t , then the sum of contributions of the integers m and n to the average code word length is 2−t t + 2−t t = 2−t +1 t.
(30)
The above equation is identical to that given by Eq. (29). Therefore, the symmetric deviations of the distribution function from its optimum have no effect on the average code word length. The reduction of coding efficiency in this case is purely due to the decreased contributions of integers to the source entropy because of the deviations of the distribution function. For a chosen bit allocation {bi }, the overall reduction of the coding efficiency is the sum of weighted square errors between the symbol occurrence distribution function and the optimal stepwise function associated with the bit allocation {bi }. It is therefore concluded that the optimal stepwise function of a chosen bit allocation should approximate the symbol occurrence distribution function as closely as possible. For the case of nonsymmetric deviations, integers in the ith region can be arranged in pairs such that P (m) = 2−t + d(m) and P (n) = 2−t − d(n), with d(m) ≈ d(n). Equation (26) can be then rewritten as
1 2t 2 d (m) + d 2 (n) , − Pk log2 Pk = 2−t +1 t + t − (31) δm,n − ln 2 ln 2 k=m,n
where δm,n = d(m) − d(n) is the nonsymmetric error of the deviation. Summing up the second term in Eq. (31) for all integer pairs in the ith region that deviate from the optimal stepwise function, the contribution ∆ due to nonsymmetric deviation errors to the source entropy is given by
1 δm,n , (32) ∆= t − ln 2
L. Wu et al. / Digital Signal Processing 14 (2004) 54–71
69
where δm,n is a zero mean random variable that leads to δm,n ≈ 0. Therefore, the contribution ∆ vanishes and the source entropy is reduced by the sum of weighted square errors (ln 2)−1 2t k d 2 (k). The above derivation can be applied to the calculation of the average code word length and that yields (33) P (m)t + P (n)t = 2−t +1 t + t d(m) − d(n) . The second term will vanish after the summation for all integer pairs. Thus, the deviations of the distribution function from its optimum have no effect on the average code word length. Therefore, the conclusion obtained for symmetric deviations is also valid for nonsymmetric deviations.
Appendix B Now assume a cascade encoder that consists of N stages and that each stage is assigned bi bits. The optimal codes associated with this encoder have a symmetric stepwise distri−(k+bi ) , bution function. i−1The height of the ith step in the distribution function is equal to 2 where k = j =1 bj is the sum of the bits of all stages that precede the ith stage in the encoder. The ith stage can code 2bi − 1 integers in the ith region each having a symbol occurrence probability of 2−(k+bi ) . If splitting the ith stage into two sub stages a and b and assigning n1 and n2 bits to each stage, respectively, 2n1 − 1 integers can be coded with stage a and the rest of 2bi − 1 − (2n1 − 1) integers with stage b. Note n1 < bi , otherwise, stage a codes all integers in the ith region and leads to n2 = 0. For a chosen n1 , the integer n2 can be calculated as follows: 2n2 − 1 2bi − 1 − (2n1 − 1), 2n2 2bi − 2n1 + 1,
n2 log2 (2bi − 2n1 + 1) ceil log2 (2bi − 2n1 + 1) ceil log2 2n1 2bi −n1 − (1 − 2−n1 ) n1 + ceil log2 2bi −n1 − (1 − 2−n1 ) .
Since n1 1 then 1 c < 1. 2 Furthermore, since bi − n1 1 then c 1 < . 2bi −n1 2 Thus, c 1 < 1 − b −n i 1 2 2 and this leads to
Let c
(34)
= 1 − 2−n1 .
2bi −n1 −1 < 2bi −n1 − c < 2bi −n1 .
(35)
(36)
(37)
(38)
70
L. Wu et al. / Digital Signal Processing 14 (2004) 54–71
That is, bi − n1 − 1 < log2 (2bi −n1 − c) < bi − n1 . Therefore, ceil log2 2bi −n1 − (1 − 2−n1 ) = bi − n1 .
(39)
(40)
Substituting Eq. (40) into Eq. (34) yields n2 bi .
(41)
The contributions of − 1 integers coded in the ith stage to the average code word length in the calculation of coding efficiency are 2b i
w1 = (2bi − 1)(k + bi )2−(k+bi )
(42)
for the case of no split (one stage, the ith stage) and w2 = (2n1 − 1)(k + n1 )2−(k+bi ) + 2bi − 1 − (2n1 − 1) (k + n1 + n2 )2−(k+bi ) (43) for the case of split stages (stages a and b), where 2bi − 1, 2n1 − 1, and 2bi − 1 − (2n1 − 1) are the numbers of integers to be coded and k + bi , k + n1 , and k + n1 + n2 are the code word lengths at the ith stage, stages a and b, respectively. To compare the contributions for both cases, the common factor 2−(k+bi ) in Eqs. (42) and (43) is removed for clarity and n2 = bi is chosen from Eq. (41) for the shortest code word for stage b. Then, Eq. (42) minus Eq. (43) leads to w1 − w2 = (2n1 − 1)(k + n1 ) + 2bi − 1 − (2n1 − 1) (k + n1 + bi ) − (2bi − 1)(k + bi ) = 2bi n1 − 2n1 bi + bi − n1 > 0,
(44)
since bi − n1 1, and 2bi n1 − 2n1 bi 0, for bi − n1 1 (see Appendix C for details). Thus, a single stage to code 2bi − 1 integers in the ith region contributes less to the average code word length. It is, therefore, concluded that the number of stages of a cascade encoder should be as few as possible.
Appendix C To prove 2bi n1 − 2n1 bi 0, for bi − n1 1, a continuous function f is defined as x (45) f (x) = 2x − 1 − , n1 where x 1 and n1 1. Then the derivative of f is 1 > 0, (46) f (x) = 2x ln 2 − n1 since 2x ln 2 > 1.38 for x 1 and 1/n1 1 for n1 1. Therefore, f (x) is an increasing function for x 1 and n1 1. Observe that 1 0. (47) f (1) = 1 − n1
L. Wu et al. / Digital Signal Processing 14 (2004) 54–71
71
Thus, f (x) = 2x − 1 −
x 0. n1
(48)
Now let x = bi − n1 , then f (bi − n1 ) = 2bi −n1 − 1 −
bi − n1 0. n1
(49)
This leads to 2bi n1 − 2n1 bi 0.
(50)
References [1] N.S. Jayant, Digital Coding of Waveforms, Prentice Hall, New York, 1984. [2] T. Robinson, Shorten-compression for waveform, Anonymous ftp at ftp://svr-ftp.eng.cam.ac.uk/comp. speech, Cambridge University Engineering Department, 1994. [3] S. Takamura, M.A. Takagi, Hybrid lossless compression of still images using Markov models and linear prediction, Lect. Notes Comp. Sci. 974 (1995) 203–208. [4] C.V. Peterson, Lossless compression of seismic data, in: Proc. 26th Asilomar Conference on Signal, Systems, and Computing, Pacific Grove, CA, 1992, pp. 26–28. [5] S.D. Stearns, L. Tan, N. Magotra, Lossless compression of waveform data for efficient storage and transmission, IEEE Trans. Geosci. Remote Sens. 31 (1993) 645–654. [6] M. Hans, R.W. Shafer, Lossless compression of digital audio, IEEE Signal Process. Mag. (2001) 21–32. [7] P. Craven, M. Gerzon, Lossless coding for audio discs, J. Audio Eng. Soc. 44 (9) (1993) 706–720. [8] Y. Takamizawa, M. Iwadara, A. Sugiyama, An efficient lossless coding algorithm and its application to audio coding, Electron. Commun. Fundament. Electron. Sci. Part 3 82 (4) (1999) 1388–1395, in Japan. [9] G.D.T. Schuller, B. Yu, D. Huang, B. Edler, Audio coding perceptual audio coding using adaptive pre- and post-filters and lossless compression, IEEE Trans. Speech Audio Process. 10 (6) (2002) 379–390. [10] T.C. Bell, I.H. Witten, J.G. Cleary, Text Compression, Prentice Hall, Englewood Cliffs, NJ, 1990. [11] L. Tan, Theory and techniques for lossless waveform data compression, Ph.D. dissertation, University of New Mexico, 1992. [12] M. Hankamer, A modified Huffman procedure with reduced memory requirement, IEEE Trans. Commun. 27 (6) (1979) 930–932. [13] S.D. Stearns, L. Tan, N. Magotra, A bi-level coding technique for compressing broadband residue sequences, Digit. Signal Process. J. 2 (3) (1992) 146–156. [14] S.D. Stearns, Arithmetic coding in lossless waveform compression, IEEE Trans. Signal Process. 43 (1995) 1874–1879. [15] R.G. Gallager, D.C. van Voorhis, Optimal source codes for geometrically distributed integer alphabets, IEEE Trans. Inform. Theory IT-21 (1975) 228–230. [16] Y. Hu, C. Chang, A new lossless compression scheme based on Huffman coding scheme for image compression, Signal Process. Image Commun. 16 (4) (2001) 367–372. [17] S. Ruth, P. Kreutzer, Data compression for large business files, Datamation 18 (9) (1972) 62–66. [18] L. Wu, A. Zielinski, J.S. Bird, Lossless compression of hydroacoustic image data, IEEE J. Ocean. Engineer. 22 (1) (1997) 93–101.