Neurocomputing 122 (2013) 298–309
Contents lists available at ScienceDirect
Neurocomputing journal homepage: www.elsevier.com/locate/neucom
Self-organizing maps for the design of multiple description vector quantizers Giovanni Poggi n, Davide Cozzolino, Luisa Verdoliva Department of Electrical Engineering and Information Technology, University Federico II of Naples, Italy
art ic l e i nf o
a b s t r a c t
Article history: Received 3 January 2013 Received in revised form 24 May 2013 Accepted 12 June 2013 Communicated by B. Hammer Available online 2 July 2013
Multiple description coding is an appealing tool to guarantee graceful signal degradation in the presence of unreliable channels. While the principles of multiple description scalar quantization are wellunderstood and solid guidelines exist to design effective systems, the same does not hold for vector quantization, especially at low bit-rates, where burdensome and unsatisfactory design techniques discourage its use altogether in applications. In this work we use the self-organizing maps to design multiple description VQ codebooks. The proposed algorithm is flexible, fast and effective: it deals easily with a large variety of situations, including the case of more than two descriptions, with a computational complexity that remains fully affordable even for large codebooks, and a performance comparable to that of reference techniques. A thorough experimental analysis, conducted in a wide range of operating conditions, proves the proposed technique to perform on par with well-known reference methods based on greedy optimization, but with a much lower computational burden. In addition, the resulting codebook can be itself optimized, thus providing even better performance. All experiments are fully reproducible, with all software and data available online for the interested researchers. & 2013 Elsevier B.V. All rights reserved.
Keywords: Self-organizing maps Vector quantization Multiple description coding Reproducible research
1. Introduction Worldwide data traffic is by now dominated by multimedia content, which requires typically some form of source coding, that is, ultimately, quantization. An ever higher share of such a traffic flows over highly heterogeneous networks, including wireless and reconfigurable networks, with large packet loss probabilities, calling for a complex interplay between source and channel coding. In this scenario, the conventional error control approaches, based on error detection and packet retransmission, are not acceptable because of exceedingly long delays, significant bit-rate overhead, or practical implementation issues [1]. In such cases, multiple description coding (MDC) represents a valuable alternative [2–5] since it guarantees, with limited overhead, a graceful degradation of the decoded signal in the presence of one or multiple channel failures. The simplest model of MDC, depicted in Fig. 1, comprises only two alternative channels over which the information is split. If both channels work properly, the signal is reconstructed by the central decoder, with the full quality allowed by the source coding technique at the given bit rate. However, even in the case of failure of one channel, the surviving side description will allow a n
Corresponding author. Tel.: +39 0817683151. E-mail addresses:
[email protected] (G. Poggi),
[email protected] (D. Cozzolino),
[email protected] (L. Verdoliva). 0925-2312/$ - see front matter & 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.neucom.2013.06.021
reasonable reconstruction of the source, although at a reduced quality. This paradigm is therefore similar to scalable source coding, with the difference that, here, the descriptions are equally important, and each one can be used by itself to recover an approximation of the source. Needless to say, just like the constraint of scalability entails a performance loss with respect to the unconstrained coding, the more stringent constraint imposed by MDC leads to a further loss, which is the price to pay in order to guarantee an acceptable performance on unreliable networks. In the special case of quantization (be it scalar or vectorial) the descriptions are just two scalar indexes, say i1 and i2. These are used to address dedicated look-up tables at the side decoders or, if both available, combined to form a new index i0 ¼ Iði1 ; i2 Þ to address a look-up table at the central decoder. The combination rule Ið; Þ is known as index assignment. Theoretical results on MDC are quite limited, major contributions are due to Ozarow [6] and El Gamal and Cover [7] which provided the rate-distortion region for the special but important case of memoryless Gaussian sources with squared error distortion. Later on, Vaishampayan proposed [8] a Lloyd-type algorithm for the design of MD scalar quantizers, shedding some light on the importance of a suitable index assignment, and suggesting some practical solutions to this problem. After this seminal work, the research on MDC has become very intense, with developments in several directions, including the index assignment problem which has drawn significant attention until very recently [9,10].
G. Poggi et al. / Neurocomputing 122 (2013) 298–309
Fig. 1. Block diagram of a simple multiple-description codec.
A topic of obvious interest is the extension of MD ideas from scalar to vector quantization, that is, the design of multiple description vector quantization (MDVQ) systems, so as to take advantage of the higher structural freedom, and hence the potential gains, existing in a higher-dimensional space. Such an extension, however, is not trivial, just because of the index assignment problem, since codewords have a natural order only in a 1-dimensional space. As a consequence, most of the research in this field focused on the high-resolution case, where elegant solutions based on lattice vector quantization can be devised [ 11–13], and some theoretical results can be obtained for simple sources [14]. However, the high-resolution case is, by definition, of limited practical interest for source coding applications and, in any case, lattice VQ works well only with sources that possess (or acquire after some preprocessing) nice smooth statistics. In the low-resolution case, as said before, the key problem in the design of multiple description VQ codebook is to find a suitable index assignment, and several techniques have been proposed for this task. A fairly common approach consists in first designing a generic VQ codebook, with no special constraint, and then reassign the indexes to maximize a suitable quality metric. A major drawback of this approach is the computational complexity of the index assignment, which grows very quickly with the codebook size. A brute-force approach can be used only for very small codebooks, and sophisticated optimization techniques, such as simulated annealing [15,16], move the limit only a little bit farther. Therefore one must resort to some greedy optimization techniques, such as the multiple description BSA (MDBSA) proposed in [17] which is a suitable modification of the well-known binary switching algorithm (BSA) [18]. MDBSA pursuits an optimal solution through local moves consisting in the reassignment of just a few indexes at a time. At each step, only the best move is accepted, until a local optimum is reached. MDBSA provides generally good results but, despite the simplified optimization procedure, its complexity is still prohibitive if the codebook size is relatively large, especially if more than two descriptions are considered. A more interesting approach consists in designing, by means of suitable constraints, a VQ codebook that possesses a priori the desired index organization, thus avoiding any burdensome postprocessing. In [19], for example, a simple algorithm is proposed to generate a tree-structured codebook which can be readily used for multiple description VQ. However, it works only for the case of non-redundant descriptions, a very special situation that implies a large performance loss w.r.t. the single description case. In this work1, we follow this second approach and propose a new algorithm for MDVQ design based on Kohonen's self-organizing maps (SOM) [21]. Originally developed in the context of neural networks, based also on Hubel and Wiesels's studies on the organization of cat's visual cortex [22], the SOM has found uncountable practical applications in various fields of engineering like in [23–25], to mention just some of the most recent ones
1
Preliminary results of which were presented in [20].
299
(see [26] and references for a thorough review). The link with vector quantization is especially strong, since SOM can be regarded as a tool to design vector quantization codebooks characterized by suitable topological properties useful to address a number of problems, as shown in Section 2. Here, we develop a modified version of the SOM which, thanks to suitable constraints enforced during the design phase, produces near-optimal VQ codebooks which possess already the index structure suitable for multiple description coding. Avoiding subsequent index rearrangement, the overall computational burden reduces to that of central codebook design, while all additional codebooks needed by the side decoders are obtained as byproducts. In summary, the proposed approach is characterized by the following major properties:
all codebooks are designed at once, with no need for further rearrangement;
central codebook quality and computational complexity are comparable to those of the generalized Lloyd algorithm;
it is extremely flexible, it allows for the design of MDVQ systems with any number of descriptions, be they redundant or non-redundant, possibly asymmetric, etc. In next section we recall some basic concepts and notation on scalar and vector quantization and their multiple-description versions, then in Section 3, after a brief description of SOM, with special focus on its applications to VQ, we present the proposed technique and analyze in detail its major steps. Section 4 discusses the results of a large set of simulation experiments aimed at assessing the algorithm potential, and Section 5 draws conclusion and outlines future research.
2. Multiple description vector quantization A quantizer can be regarded as the combination of two elementary processing blocks, as shown in Fig. 2, an encoder E : x-i and a decoder D : i-yi , where x; y∈RK and i∈f1; 2; …; Ng. The input vector x is therefore approximated by a new vector x^ ¼ QðxÞ ¼ D½EðxÞ ¼ yi
ð1Þ
chosen in a finite collection of template vectors (or codevectors or codewords) called codebook Y ¼ fy1 ; y2 ; …; yN g. ^ and the probability Given a suitable distortion measure, dðx; xÞ, law of the input vectors pðxÞ, a quantizer is characterized by its average distortion Z ^ ¼ ^ D ¼ E½dðX; XÞ pðxÞdðx; xÞdx ð2Þ RK
with E½ indicating statistical average, and by its coding rate R ¼ log 2 N=K
ð3Þ
that is, the number of bit per sample needed to single out the index i. The optimal quantizer for a source with distribution pðxÞ provides the minimum average distortion achievable with the available coding rate.2 Only in some simple cases, however, optimal quantizers can be designed exactly.
Fig. 2. Block diagram of quantizer coding and decoding.
2 We neglect entropy coding in this quick summary of concepts. The reader is referred to [27] for a thorough treatment.
300
G. Poggi et al. / Neurocomputing 122 (2013) 298–309
As shown in Fig. 2, the decoder, equipped with the codebook Y, carries out a simple table look-up. Likewise, to perform its work, the encoder needs a partition of the input space, Ω ¼ fω1 ; ω2 ; …; ωN g with ⋃i ωi ¼ RK and ωi ∩ωj ¼ ∅, and outputs the index i whenever x∈ωi . Designing a quantizer, therefore, amounts to choosing a suitable couple of codebook and space partition. By using the squared error dðx; yÞ ¼ ∥xy∥2 as distortion measure, as we will always do in the following, and expanding the distortion integral as: Z D ¼ ∑ pðxÞ∥xyi ∥2 dx ð4Þ ωi
i
it follows readily that necessary conditions for optimality are: yi ¼ E½xjx∈ωi
ð5Þ
and ωi ¼ fx : ∥xyi ∥2 ≤∥xyj ∥2 g
ð6Þ
that is, the codevectors must be the centroids of their quantization regions, while the latter divide the space according to a minimum distance rule w.r.t. the codevectors. The iterated alternate application of these conditions leads to the well-known generalized Lloyd algorithm [28] which provides a locally optimum quantizer. To describe a multiple-description quantizer we start again from its structure, shown in Fig. 1 with reference to the case of L¼ 2 descriptions. This simple case allows us to introduce all major concepts avoiding unnecessary heavy notation, while the extension to an arbitrary number of descriptions is conceptually obvious. As shown in the figure, the encoder outputs two indexes, i1 ∈f1; 2; …; N 1 g and i2 ∈f1; 2; …; N 2 g which are sent to the decoder on independent lossy channels, characterized as discrete memoryless channels with symbol loss probabilities p1 and p2. At the decoder side there are now four (2L) decoders, to cope with the different loss events. When one of the indexes is lost, the side decoder associated with the remaining index, either Dð1Þ or Dð2Þ , is used, and therefore x^ ¼ Dð1Þ ði1 Þ ¼ yið1Þ 1
ð7Þ
or x^ ¼ Dð2Þ ði2 Þ ¼ yið2Þ 2
ð8Þ
A central role in the design is assumed by the so-called index assignment matrix (IAM) which performs the mapping from the side indexes output by the encoder and the central codebook index used when both descriptions are received i0 ¼ Iði1 ; i2 Þ. To gain insight into the role of the IAM let us consider the simple scalar quantizer depicted in Fig. 3. Assuming a uniform pdf in ½1; þ1 for the input x, the 10-level central quantizer shown in part (a), with quantization cells ωi ¼ ½ði6Þ=5; ði5Þ=5; i ¼ 1; …; 10 and codewords yi placed at the center of each cell, is optimal. Now we build a multiple-description quantizer by using Y as central codebook Y ð0Þ , with associated partition Ωð0Þ ¼ Ω, and by selecting the index assignment matrix shown in part (b) of the figure, where rows are indexed by i1 and columns by i2, and the matrix entries are central-description indexes i0. Assume the input falls into the sixth quantization cell, x∈ωð0Þ 6 , for example. The encoder then sends over the channels the row and column indexes corresponding to i0 ¼6, therefore i1 ¼ 2 and i2 ¼3. If both are received, the decoder retrieves the correct index, i0 ¼ 6, and uses the associated codeword for reproduction, x^ ¼ y6ð0Þ . However, if one description is lost, say, the second one, the decoder uses the received index, i1 ¼2, to address the first side decoder and hence produces x^ ¼ yð1Þ 2 . By looking at the IAM, it is clear that the codeword yð1Þ is used whenever the input 2 ð0Þ falls into cells ωð0Þ and ωð0Þ 3 , ω4 6 , whose union corresponds therefore to the second quantization cell, ωð1Þ 2 , of the first side quantizer. More in general, the central partition and the IAM identify univocally the side partitions Ωð1Þ and Ωð2Þ . These, in turn, by the centroid condition, allow one to design the side codewords as ðlÞ yðlÞ il ¼ E½XjX∈ωil
ð12Þ
and therefore identify the side codebooks Y ð1Þ and Y ð2Þ . In our running example, the side quantizers are shown in parts (c) and (d) of the figure. Note that some cells of the side quantizers are composed of disjoint segments, violating the minimum-distance rule of Eq. (6) and giving rise to quantizers that cannot be optimal. This observation has general validity: optimal side quantizers are achievable only in the trivial case when N1 ¼N2 ¼N, at the price of very low transmission efficiency (large redundancy), since the same
If both indexes are lost, a trivial decoder is used, not shown explicitly in the figure, which outputs always the same codevector, y0 ¼ E½X. When both indexes are received, instead, they are used by the so-called central decoder to address a codebook of size N, with max ðN 1 ; N2 Þ ≤N ≤N 1 N 2 , and produce a higher quality approximation of the input vector x^ ¼ Dð0Þ ði1 ; i2 Þ ¼ yð0Þ i1 ;i2
ð9Þ
The design goal becomes the minimization of the expected distortion DðpÞ ¼ ð1p1 Þð1p2 ÞDð0Þ þ ð1p1 Þp2 Dð1Þ þp1 ð1p2 ÞDð2Þ þ p1 p2 s2X where Dð0Þ ¼ ∑
Z
i1 ;i2 ωi1 ;i2
2 pðxÞ∥xyð0Þ i1 ;i2 ∥ dx
ð10Þ
ð11Þ
and similar formulas hold for Dð1Þ and Dð2Þ , while s2X is the variance of the input samples. It is apparent that, even in this simple case, and even assuming balanced descriptions (N1 ¼ N2) and channels (p1 ¼p2 ¼p), as we will do in the following, the design problem becomes much more complex than with ordinary quantization, as it involves several codebooks and a non-trivial encoding rule, with many more degrees of freedom.
Fig. 3. Example of MDSQ: (a) 10-level central quantizer, (b) index assignment matrix, (c) and (d) 4-level side quantizers.
G. Poggi et al. / Neurocomputing 122 (2013) 298–309
301
information is duplicated over the two channels. In general the quality of side quantizers becomes worse and worse as N grows towards N1N2 and the redundancy diminishes. There is therefore a (obvious) trade-off between redundancy and robustness against channel errors. In any case, given a near-optimal central codebook of size N, the quality of side decoders depends almost exclusively on the IAM. The matrix of Fig. 3(b) is certainly a sensible choice as it produces side quantization cells which are fairly compact, with associated codewords that approximate well all input samples falling in their cell. A different IAM would generate, for the very same central quantizer, different side quantizers, with different (and probably higher) expected distortions. By this example it should be clear that selecting a suitable assignment matrix is the central problem in multiple description quantization: one should ensure that cells that appear in the same row or in the same column of the matrix are as close as possible. In scalar quantization, since the cells of the central quantizer are aligned along an axis, this goal is easily obtained by minimizing the index spread along rows and columns, a well-known rule first suggested in [8]. In vector quantization, however, involving a higher-dimensional space, cells are no longer aligned, and building a good IAM becomes a challenging problem, infeasible by brute force as soon as the codebook sizes grow beyond a few units.
3. Design of MDVQ by the SOM In this section we show how the SOM can be used for the efficient design of high-quality multiple description vector quantizers. Given our focus, we will use a terminology closer to the quantization world than to neural networks, speaking for example of codevectors rather than synaptic weights. A brief preliminary review of the SOM under this point of view will help understanding the proposed solution. 3.1. Self-organizing maps for vector quantization The SOM can be seen as an improvement of the well-known K-means clustering algorithm [29] frequently used for VQ codebook design. In sequential K-means, an arbitrary initial codebook Y ¼ fy1 ; y2 ; …; yN g, is gradually adjusted in response to the submission of a training set. Specifically, for each training vector xs the nearest codevector yns is singled out and displaced towards the input yns ¼ yns þ αðsÞðxs yns Þ;
ð13Þ
thus reducing coding distortion, on the average. The function αðsÞ controls the speed of adaptation to the training set: it is rather large initially to let codewords quickly match the training set and then decreases slowly to zero to guarantee convergence. The major improvement of SOM w.r.t. K-means consists in the organization of the codebook according to a desired structure, typically a regular lattice in a multidimensional space. This is obtained by introducing suitable mutual interactions among codevectors. Accordingly, the adaptation formula becomes yn ¼ yn þ αðsÞβðs; n; ns Þðxs yn Þ
ð14Þ
valid for each n ¼1,…,N, implying that all codevectors, in general, and not just the best matching, are updated with each new training vector. The function βð; ; Þ is responsible for the strength of interaction among codevectors: if codewords yn and ym are desired to be close in the codebook, βð; n; mÞ must have a positive, and possibly large, value, while a negative value will ensure that the two codewords will remain far apart. In any case, all interaction
Fig. 4. Three codebooks with different a priori index structure designed for the same source.
strengths should vanish as s goes to infinity, when the desired organization has been already obtained, so as to allow for a fine tuning to the training set. By properly choosing the strength of each connection, namely, the intensity of interaction between couples of codevectors, one forces the codebook to assume the desired topology. An example is shown in Fig. 4, where three codebooks with different desired structures are designed for the same source. All of the codebooks adapt quite well to the input distribution, a bivariate correlated Gaussian, but each of them preserves its own topology due to the interaction among codevectors. In practice, the mutual interaction function depends in its turn on a measure dðn; mÞ of the distance among indexes, that is, βðs; n; mÞ ¼ βðs; dðn; mÞÞ. Different index distances suit different problems involving the use of ordered codebooks. For example, a linear distance dðn; mÞ ¼ jnmj guarantees that codewords with close indexes are also close in the Euclidean space, a property that can be used to carry out an effective compression of the indexes output by a vector quantizer [30,31]. The use of a Hamming distance dðn; mÞ ¼ hðn; mÞ, instead, produces a codebook with a hypercubic structure, which provides zero-redundancy protection to VQ indexes transmitted on lossy channels [30]. A more complex index distance can be used to design trellis-coded VQ systems [32]. It is worth emphasizing that using the SOM in place of the generalized Lloyd algorithm (GLA) does not entail, in general, a performance loss. In fact, although SOM codebooks are not optimal for the given training set, the increase in distortion w.r.t. the locally optimal GLA codebooks is very limited, if a reasonably large training set is used. In addition, for out-of-training data, SOM codebooks usually outperform GLA's since each training vector is
302
G. Poggi et al. / Neurocomputing 122 (2013) 298–309
used to adjust several codewords at a time, thus increasing the effective training set size. As for the computational burden, although SOM performs more updates than GLA for each training vector, it needs a much smaller training set to converge and therefore presents a lower overall complexity [33]. 3.2. Proposed solution We now show that good multiple description VQ codebooks can be easily designed by means of the SOM using suitable mutual interactions among the indexes. Indeed, Section 2 makes clear that, given a suitable IAM, and a low-distortion codebook, we only need to make sure that codewords in the same line of the matrix are close to one another. This goal can be met by designing the codebook by means of the SOM and choosing suitable interaction strengths that enforce the constraints of interest. To keep reasoning by examples, let us consider the design of a MDVQ system with 3-bit central decoder and two 2-bit side decoders for a 2D source with correlated ðρ ¼ 0:5Þ, unit variance, Gaussian distribution. To this end, we use the toroidal IAM shown in Fig. 5(a) and, with reference to (14), define αðsÞ ¼ α0 ½Δα s
ð15Þ
and
8 > <1 βðs; n; mÞ ¼ β0 ½Δβ s > : 0
n ¼ m; n; m∈same line;
ð16Þ
otherwise
Choosing reasonable values for the parameters, α0 ; Δα ; β0 ; Δβ ∈½0; 1, the SOM produces the codebooks shown in Fig. 6, where central codewords are marked by small dots, while the larger dots indicate the codewords of the first (left) and second (right) side description. A few comments are in order: first of all, the average signal-tonoise ratio of this SOM codebook is 13.06 dB as opposed to 13.03 dB for the corresponding GLA codebook (both computed on 100 trials and the same training set) showing that the mutual interaction constraints did not interfere with the correct development of the codebook. Moreover, codewords are obviously
Fig. 7. A 5-bit MDVQ codebook (small dots) with two 3-bit side codebooks (large dots).
well-organized in order to provide good-quality side descriptions, as is made visually clear by the color coding. For example, if the encoder selects a (small) red codeword, but the second index, say, is erased by the channel, the codeword used by the first side decoder (large red) is still very close to the original, because it was obtained as the average of two codewords (small red), which were very close to one-another from the beginning. Finally, notice that the IAM does not respect the minimum-spread rule. This should be obvious, after a moment thought, since the first and last codewords need not be far apart in a k-d space, but this observation makes clear that there are indeed new degrees of freedom to be exploited, and problems to be solved. This trivial example should not hide the complexity of the problem. Considering the same 2D distribution as before, but going to 5-bit central and 3-bit side descriptions, SOM provides again well organized codewords, as shown by Fig. 7, a result that could not be easily devised otherwise. In higher-dimensional spaces, of course, guessing a good solution is much more challenging. At this point, once conveyed the basic idea, and given for granted the ability of SOM to enforce proximity constraints, we can summarize the design procedure as follows: 1. select the index assignment matrix; 2. define the mutual interaction strengths and their temporal evolution; 3. run the SOM algorithm with a suitable training set to design the central codebook; 4. compute side codewords by weighted averaging of central codewords. and discuss in some depth each of these steps.
Fig. 5. Some index assignment matrices: toroidal (used in the example of Fig. 6), diagonal, raster-scan.
Fig. 6. A 3-bit MDVQ codebook (small dots) with two 2-bit side codebooks (large dots).
3.2.1. Index assignment matrix The examples discussed above make clear that the minimumspread rule is not a reliable guideline anymore, since the optimal assignment can depend on several factors, such as the dimensionality of the input space (more precisely, the effective dimensionality of the input pdf), the coding rate, and the number of descriptions. On the other hand, looking for a universal rule that allows one to build satisfactory IAMs for all operative conditions is not a reasonable goal, given the variety of situations one can legitimately consider. A major element to keep into account, however, is certainly the expected entropy of the side codebooks. Given the probabilities pi ¼ PrðX∈ωi Þ; i ¼ 1; ⋯; N, of using the various codewords of a VQ codebook Y, we define the codebook entropy as HðYÞ ¼ ∑N i ¼ 1 pi log 2 pi . This quantity is a measure of how evenly the codewords are used, and in fact it reaches its maximum, log 2 N, when all codewords are used with equal probability pi ¼1/N. It is important to realize that a quantizer performance does not really depend on its size, N, as usually
G. Poggi et al. / Neurocomputing 122 (2013) 298–309
4
4
3
3
2
2
1
4
1
3
1
2
3
4
3
3
2
2
1
1 2
2
2
4
1
4 3
1
3
3
1
4
303
4
1
2
2 1
4
3
4
Fig. 8. Diagonal and toroidal 3-d IAMs (up) for a 4-bit central codebook with three 2-bit descriptions, and their 2D projections (down) after losing one description.
assumed, but rather on its effective size, defined as N eff ¼ 2HðYÞ , hence on its entropy. Leaving some codewords unused impairs the performance of a quantizer, but this is only a limiting case of under-using some codewords and over-using others, namely, using them unevenly. When designing a MDVQ system, therefore, one should try to guarantee that the codewords of all side quantizers are used as evenly as possible. Consider again the IAM of Fig. 5(a) and assume that all central codewords have the same probability of occurrence. If one description is lost, the eight central codewords are mapped two-by-two into the four codewords of the other side codebook, which will have again equal probabilities of occurrence, allowing (though not ensuring) good encoding performance. A badly designed IAM, instead, like the one shown in Fig. 5(c), would generate one side codebook with reduced effective size and poor performance. We now restrict our attention to two types of IAMs, which we call “diagonal” and “toroidal”. For the sake of simplicity, we consider only the case of balanced descriptions, the extension to the general case being straightforward, therefore the IAM is a hypercube of side NS in a discrete L-dimensional space. Diagonal IAMs are introduced and described in [8] based on the minimumspread rule and are characterized by codewords that lie close to the main diagonal of the hypercube, i1 ¼ i2 ¼ ⋯ ¼ iL , with 1 ≤il ≤NS the index associated with the lth description. A possible 8-codeword diagonal IAM is shown in Fig. 5(b) for the case L¼ 2 and NS ¼4. In toroidal IAMs, instead, central codewords occupy the cells ði1 ; i2 ; …; iL Þ which obey the formula ! ðiL 1Þ ¼
L1
∑ ðil 1Þ þ o
l¼1
ð17Þ mod N S
where o is an offset going from zero to omax. As an example, for L ¼2, NS ¼4, and omax ¼1, we obtain the matrix of Fig. 5(a), where o ¼0 singles out the codewords on the main diagonal, and o¼ 1 the codewords on the next diagonal in modulus-4 geometry. With L¼ 2 descriptions, assuming near-uniform probability for the central codewords, the maximum-entropy constraint is
approximately met by both types of matrices. Therefore, the choice must be guided by other considerations. In particular, since toroidal IAMs add the unnecessary, and possibly detrimental, constraint that the last codeword be close to the first one (see again the example of Fig. 6) we will always use diagonal IAMs in this case. On the contrary, for L 4 2 descriptions, diagonal IAMs perform very badly when one description is lost, because diagonal codewords of the (L-1)-dimensional matrix obtained by projection (diagonal itself) are severely overloaded, while some off-diagonal codewords are unused. In this situation we will therefore use toroidal IAMs, which associate always the same number of central codewords to any side codeword, thereby exhibiting maximum entropy, irrespective of the number of lost descriptions. To gain insight into this point consider the 16-codeword IAMs shown in Fig. 8 for the case L¼ 3 and NS ¼4. When an index of the diagonal IAM (up-left) is lost, the codewords are projected in a side codebook (down-left) where only 10 of the 16 available combinations are used, characterized by an effective size N eff ≤10. On the contrary, the codewords of the toroidal IAM (up-right) are projected into a side codebook (down-right) where all combinations are used, with an effective size that can potentially reach 16.
3.2.2. Mutual interaction strengths Given the IAM, we must define the mutual interactions among codewords, which break down to selecting the neighborhood of each codeword, and the parameters that govern the evolution of the interaction strength over time. For two descriptions, the obvious choice is to consider as neighbors all codewords belonging to the same row or column, as in (16). The same approach, extended to the case of L¼3 descriptions,3 requires taking into account not only codewords belonging to the same line, but also those belonging to the same plane. In fact, when just one 3
The case L 43 is dealt with in the same way.
304
G. Poggi et al. / Neurocomputing 122 (2013) 298–309
description goes lost, the two indexes correctly received single out a line in the IAM where all codewords are required to be close to one another and hence to their weighted average. Likewise, if two descriptions are missing, the correctly received index singles out a plane in the IAM where, like before, all codewords should be relatively close to one another. Therefore, with respect to the case L¼ 2, described by Eq. (16), we should introduce and carefully tune beforehand another couple of parameters, say γ and Δγ, and increase significantly the computational burden of the SOM as more codewords need be updated with each training vector. However, we can spare this action because same-plane codewords always have some common neighbors (with a sensible IAM), and hence will be kept relatively close anyway through indirect interactions. Therefore, we will use exactly the rule of (16) for any number of descriptions, including the functional form of the interaction strength, with weights that are all equal in the neighborhood for reasons of symmetry.
3.2.3. Running the SOM Once defined the IAM and the form of the interaction strengths, it only remains to select a training set that represents faithfully the input signal, and suitable parameters that guarantee the development of a near-optimal central codebook with the desired organization. As for the first item, a commonly accepted rule of thumb is that the ratio between training set size and codebook size exceeds 100. The training set is then cycled until convergence. Although SOM works well, and better than GLA, also with smaller training sets, we will use instead quite large training sets to ensure a correct analysis of results. Faster schedule can be certainly used for actual operation. As for the parameters, α0 ; Δα ; β0 ; Δβ , they were set once and for all based on just a few preliminary experiments. It should be realized, however, that their choice is not very critical, provided that both αðsÞ and βðsÞ decrease slowly enough to guarantee multiple scans of the training set, and that βðsÞ vanishes well before αðsÞ to allow fine tuning of the codebook to the input after reaching the desired organization.
and eventually N s2
yð1Þ i1 ¼ ∑
Sið0Þ 1 1 ;i2
ð1Þ i2 ¼ 1 Si 1
N s2
∑½xs jxs ∈ωð0Þ i1 ;i2 ¼ ∑
s Sið0Þ 1 ;i2
i2 ¼ 1
pi1 ;i2 ð0Þ y pi1 i1 ;i2
ð0Þ where Sð0Þ i1 ;i2 is the number of training vectors in ωi1 ;i2 , pi,j is ð0Þ estimated as Si1 ;i2 =S, and similar relations hold for the corresponding side codebook quantities. It is clear that all these quantities, computed just ones in the design process, involve negligible complexity.
4. Experimental results In this section we present the results of a number of experiments carried out in order to assess the performance of the proposed solution. Since our primary goal is to establish the soundness of the proposed technique we will focus on comparing its performance with a well-known reference technique, GLA followed by MDBSA, in the same operating conditions. This bars the option to work on the central/side-distortion trade-off, improving the former to lower significantly the latter, which is possible with SOM but not with GLA. It is worth underlining that this research is fully reproducible. All source code and all details on the experiments are available at 〈www.grip.unina.it〉 allowing interested researchers to replicate the experiments, perform new tests of their choice (e.g., working on the aforementioned central/side-distortion trade-off) and modify the source code in order to improve the algorithm performance or enlarge the spectrum of viable applications. Originally, the source code had been developed all in Matlab, but the most challenging routines have been rewritten in C++ to reduce the running times and, also, to free complexity assessment from Matlab-related technicalities, thus allowing a more reliable comparison among different algorithms. Apart from compilation-time optimization, no effort is made to further reduce running times by means of algorithm-specific tricks, such as the use of fast algorithms to compute the best matching codeword, e.g. [34], the efficiency of which can vary wildly from case to case. 4.1. Organization of the experiments
3.2.4. Design of side codebooks The side codewords are obtained as weighted averages of selected central codewords. In fact, assuming that the centroid condition holds for the SOM codebook (which is approximately although not rigorously true) the central codewords can be written as ð0Þ yð0Þ i1 ;i2 ¼ E½XjX∈ωi1 ;i2
where we considered the two-description case for the sake of notational simplicity. In practice, statistical means are replaced by sample averages on the training set, namely
αðeÞ ¼ α0 ½Δα e ; βðeÞ ¼ β0 ½Δβ e
ð0Þ yð0Þ i1 ;i2 ¼ 〈x s jx s ∈ωi1 ;i2 〉
As for side codewords, focusing for example on the first description, they can be written as N s2
ð1Þ ð0Þ yð1Þ i1 ¼ 〈x s jx s ∈ωi1 〉 ¼ 〈x s jxs ∈ ⋃ ωi1 ;i2 〉 i2 ¼ 1
or, by expliciting the averages yð1Þ i1 ¼
1
N s2
∑ ∑½xs jxs ∈ωið0Þ 1 ;i2
s Sð1Þ i1 i2 ¼ 1
These experiments will compare the proposed multiple description version of the SOM algorithm, together with the SOM+MDBSA suite (SOM+ for short), with the GLA+MDBSA (GLA+) reference technique. Unfortunately, there is no source code available on the web to establish other comparisons, for example with techniques based on lattice VQ. Results will be given in terms of signal-to-noise ratio (SNR) and CPU-time and will be always obtained, unless otherwise stated, as the average over 10 independent repetitions of the experiment. To save computation, SOM parameters are updated at each new pass of the training set (epoch e) rather than at each sample, therefore ð18Þ
The weights, selected by means of some preliminary analysis as α0 ¼ 0:5, Δα ¼ 0:9, β0 ¼ 1:0 and Δβ ¼ 0:8, have been used throughout the experiments. The algorithm stops when the relative improvement in central distortion over successive epochs goes under 10 4. After a few epochs the codebook is typically already well organized, βðeÞ is quite close to zero, and further passes serve only to improve the encoding SNR. Performance depends on many different factors, regarding both the MDVQ system and the channels. To simplify the analysis we will consider balanced codebooks and channels, the latter characterized by their symbol erasure rate (SER) p, namely, the
G. Poggi et al. / Neurocomputing 122 (2013) 298–309
25
probability that an index is erased by the channel. We use the mean square error (MSE) distortion measure and therefore should consider MSE-vs-p curves. However, the decoding MSE is completely defined by the central and side-decoder MSEs computed on the test set L L l p ð1pÞLl MSEl MSE ¼ ∑ ð19Þ l¼0 l
in a compact tabular form. As for the running times, they refer to a desktop PC with 64 bit CPU (Intel(R) core(TM) i7-2600 at 3.40 GHz), 8 GB central memory, and Matlab version R2011b. All codebooks are designed using a training set of size 1024 N0, with N0 the cardinality of the central codebook. This is somewhat more than the usual rule of thumb because we want to reliably assess the proposed technique, leaving speed-ups for future improvements. The same stopping condition is used for SOM and GLA codebook design, while for the BSA convergence is declared when no binary switch can further reduce the average SNR. The major (independent) parameters of interest in the experiments are
the the the the the
number of descriptions, L; number of bits for side description, RS ¼ log 2 ðN S Þ; source pdf; number of components per source vector, K; redundancy, α;
where the redundancy is defined, here, as the relative excess coding rate, that is α ¼ ðLRS R0 Þ=R0 . Derived quantities are instead the number of bits for the central codebook, R0 ¼ LRs =ðα þ 1Þ, and the coding rate in bit/sample R ¼ Rs L=K. Exploring a parameter space this large is obviously a challenging task. To shed some light on the major behaviors of interest we proceed systematically, considering first a significant pivot experiment, and then studying the dependence of performance on just one of its parameters at a time, varying it in a suitable range. 4.2. Pivot experiment In this experiment we consider a source that emits independent identically distributed (i.i.d.) Gaussian vectors with K ¼2 uncorrelated components, and a multiple description VQ system with L ¼2 descriptions, 64-codeword side codebooks (namely, RS ¼6), and redundancy α ¼ 0:5. Therefore we have a 256-codeword central codebook (R0 ¼8), and a 6 bit/sample coding rate. Table 1 reports the synthetic performance parameters. Only in this table we show also some figures that, to save space, will not be Table 1 Performance parameters for the Pivot experiment. Measure
GLA
SNRD 0 SNR0 SNRD 1 SNR1 CPU-time
GLA+
SOM
SOM+
21.14
21.11
21.04
21.07
1.13
13.78
12.36
13.85
1.13 29.1
13.78 51.8
12.35 13.8
13.84 22.6
20
15
SNR
where MSEl, for l ¼ 0; …; L1, is the average distortion observed when l descriptions are lost, and MSEL ¼ s2 . Therefore, apart for a few cases of special relevance, we will provide only these synthetic performance parameters, actually the corresponding SNRs 2 s ð20Þ SNRl ¼ 10 log 10 MSEl
305
10
5
0 10−5
GLA GLA+ SOM SOM+ 10−4
10−3
10−2
10−1
100
p Fig. 9. Overall encoding performance for the Pivot experiment.
considered further. For example, we consider also straight GLA with no post-processing, which, of course, exhibits the same central distortion as GLA+ but a much higher side distortion, reflecting the lack of any useful structure. We also report the design SNRs (superscript D), computed on the training set, which are always very close to the corresponding quantities computed on the test set, confirming the correctness of the design and allowing us to discard this information in further tables. Turning to the GLA vs SOM comparison, we see that SOM exhibits a slightly higher central SNR than GLA's, a small gain that will be consistently observed in all experiments. In terms of side SNR, instead, SOM looses about 1.5 dB w.r.t. GLA+, a loss that is completely recovered by optimizing the codebook by the MDBSA (SOM+). On the other hand, SOM is much faster than GLA+ (13.8 vs 51.8 s), so the small performance loss might be acceptable in view of the higher efficiency. To gain a better insight on actual performance, Fig. 9 shows the overall SNR observed at the decoder as a function of the symbol erasure rate p. It seems obvious that the small loss in side SNR exhibited by SOM does not entail a significant loss in the overall quality, suggesting that this strategy can be preferable when the cost of design begins to grow, and especially when the codebook sizes increase. It is therefore important to study this dependence.
4.3. Dependence on codebook size In this experiment we keep the same setting as in the pivot case but change the codebook size. The side codebooks go from eight codewords (RS ¼ 3) up to 512 (RS ¼9), and the central codebook grows accordingly from 16 to 4096 codewords. It is worth underlining that for the small vectors considered here (K ¼2) the largest codebooks correspond to a fairly high coding rate, with up to 9 bit/sample sent on the channel. As for distortion, the experimental results, shown in Table 2, provide little new information: both SNR0 and SNR1 grow quite regularly with a relatively stable gap of about 3 dB, the former, and just above 3 dB, the latter, w.r.t. the theoretical bounds [1,35]. SOM exhibits a small loss of about 1 dB w.r.t. GLA+ in terms of SNR1 which, however, is completely eliminated with the optimization.
306
G. Poggi et al. / Neurocomputing 122 (2013) 298–309
Table 2 Performance parameters as a function of RS. RS
3 4 5 6 7 8 9
SNR0
Table 4 Performance parameters as a function of K. K
SNR1
GLA
SOM
GLA+
SOM
SOM+
9.61 13.27 17.16 21.04 24.95 28.79
9.62 13.29 17.19 21.07 24.98 28.79 32.42
6.10 8.81 11.44 13.78 15.79 17.66
5.75 8.28 10.55 12.35 14.95 16.80 18.72
6.06 8.79 11.57 13.84 16.21 18.44
Table 3 Computational burden as a function of RS. RS
SNR1
GLA
SOM
GLA+
SOM
SOM+
39.66 21.04 10.11 4.91 2.36 1.14
39.67 21.07 10.13 4.92 2.36 1.13
25.92 13.78 6.34 2.66 1.07 0.45
26.01 12.35 5.86 2.05 0.66 0.32
26.14 13.84 6.34 2.65 1.07 0.47
Table 5 Performance parameters as a function of ρ. ρ
CPU-time (s) GLA
3 4 5 6 7 8 9
1 2 4 8 16 32
SNR0
SOM
Design
Optim.
Design
Optim.
0.1 0.4 3.1 29.1 333.1 3623.7
0.0 0.0 0.5 22.7 1172.5 74,940.4
0.1 0.4 2.4 13.8 89.5 570.0 3786.2
0.0 0.0 0.3 8.8 322.8 26,539.1
Results in terms of CPU-time, reported in Table 3, are much more interesting, since we observe a very fast growth of the computational costs as the codebook resolutions increase beyond a few bits. The optimization phase (MDBSA), in particular, becomes very quickly the dominant burden, as its CPU-time grows by a factor close to 50 with each new bit of the side codebooks. The computational cost is somewhat smaller when starting from a SOM codebook, thanks to the better starting point, but still unaffordable in the long run. Indeed, we run a single experiment (instead of 10 repetitions) for GLA+ and SOM+ for RS ¼ 8, and none for RS ¼9. However, we can keep using the SOM codebooks, without subsequent optimization, at least until RS ¼9 and possibly beyond, since its cost grows much more slowly than MDBSA's and also more slowly than GLA's. Table 2, on the other hand, shows that the central and side SNRs of SOM codebooks keep growing regularly, at about the same pace of the theoretical bounds, suggesting that the same good performance can be obtained at higher bit-rates. 4.4. Dependence on other parameters for L¼2 In this subsection we present the results obtained by varying other parameters w.r.t. the pivot experiment, but always for the case of L¼ 2 descriptions. We will be somewhat less thorough than in the previous subsection because most results are fairly predictable and do not point out phenomena of special relevance. Table 4 shows SNR results obtained for various vector dimensions, from K ¼ 1 (multiple description scalar quantization) to K ¼32. This case is especially important for actual applications because VQ guarantees a larger and larger performance gain w.r.t. SQ as the vector dimension K grows. In our pivot experiment we considered K ¼2 because this is the smallest non-trivial case, obtaining a coding rate R¼RsL/K ¼6 bit/sample, which indeed is relatively large for compression applications. However, if we consider typical values for VQ, such as K ¼16, the coding rate soon drops to 0.75 bit/sample, which is perfectly reasonable for low bitrate compression of images or video, for example. Since the coding
0 0.5 0.9 0.99 0.999
SNR0
SNR1
GLA
SOM
GLA+
SOM
SOM+
21.04 21.66 24.62 29.45 34.27
21.07 21.70 24.71 29.69 34.46
13.78 14.43 17.57 22.27 24.93
12.35 12.90 16.12 22.12 25.16
13.84 14.56 17.42 22.35 25.28
rate decreases as 1/K, we observe a sharp worsening of performance with increasing K, pretty well aligned with the theoretical bounds. In all cases, the performance of GLA+ and SOM+ is practically the same, while SOM alone is just a little bit worse. As for the design times, they grow very slowly with increasing vector dimension: when K goes from 2 to 32, the CPU-time goes from 53 to 134 s for GLA+ and from 23 to 150 s for SOM+, a negligible dependence if compared to that on the codebook size. The case K ¼ 1 is added for completeness but must be considered separately since the codewords can be put in order and associated with the indexes of the diagonal IAM right away. Of course the performance is about the same as that of [8]. Starting from this last remark, it is interesting to consider the results of Table 5, concerning 2D Gaussian vectors with correlation index, ρ, going from 0 (the pivot experiment) to 0.999. As ρ approaches 1, the performance of SOM in terms of SNR1 gets closer and closer to that of GLA+, surpassing it eventually for ρ ¼ 0:999, even though the GLA codewords are sorted for increasing mean, after design, in order to improve the subsequent optimization. In this case, the small further improvement granted by SOM+ is clearly immaterial. For this experiment, we also tested the SOM with a toroidal rather than diagonal IAM. Central distortions are always very close to those of GLA, while SNR1 values show a bimodal behavior. For low correlation, the toroidal matrix seems slightly preferable, e.g., 13.04 vs 12.35 dB before optimization and 14.04 vs 13.84 dB after optimization, for ρ ¼ 0. On the contrary, a significant loss is observed at high correlations: for ρ ¼ 0:999 the SNR1 value drops from 25.16 to 19.25 dB, and the optimization recovers only partially this loss, reaching just 23.43 dB as opposed to 25.28. In view of the higher robustness, diagonal IAMs seem therefore preferable for the case L¼ 2, but other assignments could provide improvements in some cases, as shown by this example. Table 6, instead, shows results for ρ ¼ 0 but different pdf's. Again, the performance of GLA+ and SOM+ appears to be always aligned, and just a little bit better than SOM. In absolute terms, the SNR's decrease somewhat, as obvious, going from the simplest uniform source, to the most challenging mixture of Gaussians. It is worth pointing out that MDVQ guarantees a pretty good performance also for this source, which has incorrelated but not independent
G. Poggi et al. / Neurocomputing 122 (2013) 298–309
30
Table 6 Performance parameters as a function of the pdf.
Uniform Gaussian Laplace Mixture
SNR0
α=0.25
SNR1
25
GLA
SOM
GLA+
SOM
SOM+
24.11 21.04 19.95 22.65
24.12 21.07 19.96 22.69
16.37 13.78 12.59 15.00
15.10 12.35 11.43 13.52
16.32 13.84 12.89 15.29
α=0.50
20
SNR
Pdf
Table 7 Performance parameters as a function of α. α
0.25 0.50 0.75
SNR0
307
α=0.75
15
10 SNR1
GLA
SOM
GLA+
SOM
SOM+
25.72 21.04 17.70
25.59 21.07 17.72
10.32 13.78 14.12
8.58 12.35 13.27
10.35 13.84 14.03
5
GLA+ SOM SOM+
0 10−5
10−4
10−3
10−2
10−1
100
p components and would be difficult to handle for multiple descriptions coders based on linear methods. CPU-times are hardly worth mentioning, we only notice that GLA+ slows down a bit with more complex sources, reaching 80 s for Laplace. SOM and SOM+, instead, are very stable at about 14 and 24 s respectively. Finally, with Table 7 we try to shed some light on the dependence on the redundancy level α. As in the previous cases, while all other parameters are kept fixed, three redundancy levels α are considered, 0.25, 0.5 and 0.75. This leads to very different balances between central and side distortions for the three MDVQ systems which, given the same coding rate, should be more appropriate for reliable ðα ¼ 0:25Þ or unreliable ðα ¼ 0:75Þ channels. Whatever the situation, also in this case we observe a clear equivalence between GLA+ and SOM+ design, with SOM showing a limited loss. In Fig. 10 we show the overall SNR observed at the decoder as a function of the symbol erasure rate p, and in all cases the gap between optimized and nonoptimized SOM codebooks appears to be very thin. It is also interesting to observe that the high-redundancy design, at least for this choice of parameters, never grants a significant gain, even in the high SER region. Significant CPU-times are observed only for α ¼ 0:25, when the central codebook is larger.
4.5. Sample results for L 4 2 In this subsection we deal with the case of more than two descriptions, which can be of interest for very unreliable environments like, for example, ad hoc wireless networks. Although the increased degrees of freedom would call for a wider experimental investigation, we will only present some sample experiments, for several reasons. First of all, most major trends highlighted for L¼ 2 do not change when more descriptions are available. The most compelling reason, however, is that the computational burden of our reference technique, GLA+MDBSA, becomes intolerably large even for experimental environments comparable to those easily explored for L¼ 2. Considering that, in addition, there are no simple theoretical bounds for this case, we limit ourselves to a few basic experiments, comparable to the pivot experiment of Section 4.2, for which a meaningful comparative analysis is still feasible. In Table 8 we show results for L¼3. As in the pivot experiment, we consider independent i.i.d. Gaussian vectors with K¼ 2 incorrelated components, with a coding rate of 6 bit/sample obtained by using three symmetric 4-bit side codebooks. The central codebook
Fig. 10. Overall encoding performance for various levels of redundancy.
Table 8 Performance parameters for L ¼3 and various values of R0. # bits
Technique
SNR0
SNR1
SNR2
CPU-time
R0 ¼8
GLA+ SOM SOM+
21.03 21.08 21.08
19.59 21.08 19.43
2.74 0.27 2.99
5250 13 6570
R0 ¼9
GLA+ SOM SOM+
23.96 24.01 24.01
15.11 16.97 18.27
1.23 1.12 1.33
4418 52 12,599
R0 ¼10
GLA+ SOM SOM+
26.89 26.81 26.81
9.15 11.43 12.52
1.12 1.22 1.33
2044 253 9875
has 256 codewords (R0 ¼8, α ¼ 0:5), like in the pivot experiment, or else 512 (R0 ¼9, α ¼ 0:33) or 1024 (R0 ¼ 10, α ¼ 0:2) codewords. As in all other experiments, GLA and SOM central codebooks guarantee a very close performance in the absence of losses. Turning to SNR1, we observe for the case R0 ¼ 8 a quite peculiar situation, since SOM guarantees a much better performance than the optimized codebooks. This happens because 2RS ¼R0, in this case, namely two descriptions are sufficient to single out unambiguously the central codeword, and the toroidal IAM used for SOM enacts precisely this kind of mapping (see again Fig. 8). On the down side, since in this case there is just one codeword on each line, no indirect interaction between codewords takes place, and in fact the value of SNR2 is much smaller than in the other cases. Using BSA afterwards,4 some codewords change position in the IAM, thus reducing SNR1 while increasing SNR2. After optimization, GLA and SOM codebooks are almost identical, and in fact their overall coding performance curves, shown in Fig. 11, are indistinguishable, while SOM curve exhibits a small loss in the high SER region. In the figure we also show results obtained by SOM with a diagonal IAM, obviously not competitive.
4 In this case, the optimization phase depends on the estimated SER p. We put p ¼0.05, corresponding to a rather unreliable channel, which is reasonable for this kind of systems.
G. Poggi et al. / Neurocomputing 122 (2013) 298–309
30
30
25
25
20
20
SNR
SNR
308
15
10
10
GLA+ SOM SOM+ SOM diagonal
5
0 10−5
10−4
5
10−3
10−2
10−1
100
25
20
15
10
GLA+ SOM SOM+ SOM diagonal 10−4
10−4
10−3
10−2
10−1
100
Tables 8 and 9 refer to a single run of the experiment rather than 10 repetitions. In the cases R0 ¼9 and R0 ¼10, where the toroidal IAM becomes fully sensible, SOM turns out to be even more convenient, both in terms of CPU-time and encoding quality, with a gain of about 2 dB in SNR1 w.r.t. GLA+. SOM+, starting from a much better initial point, grants a further small gain in performance, although at the price of an even higher computational burden. The overall performance curves for the case R0 ¼10, shown in Fig. 12, confirm this analysis, pointing by far to SOM as the preferable design technique. Finally, in Table 9 we show some results for L ¼4, using the same general setting as before except for the four side codebooks that use three bits each instead of four, leading to the usual six bit/ sample coding rate. However we only report results for the case R0 ¼10, for which the toroidal IAM guarantees indirect interaction between codewords at design time. Again, SOM outperforms GLA+ both in terms of quality (see also Fig. 13) and CPU-time, while subsequent optimization guarantees just a small improvement in quality with a huge increase in the computational burden.
30
0 10−5
0 10−5
Fig. 13. L ¼ 4. Overall encoding performance for R0 ¼10.
Fig. 11. L ¼ 3. Overall encoding performance for R0 ¼ 8.
5
GLA+ SOM SOM+
p
p
SNR
15
10−3
10−2
10−1
100
p Fig. 12. L ¼ 3. Overall encoding performance for R0 ¼ 10.
Table 9 Performance parameters for L ¼ 4 and R0 ¼ 10. Technique
SNR0
SNR1
SNR2
SNR3
CPU-time
GLA+ SOM SOM+
26.89 26.86 26.86
10.95 12.45 15.37
1.74 1.79 2.13
0.46 0.64 0.77
3653 240 22,628
The small performance increase guaranteed by the BSA comes at the price of a huge computational burden. Even setting the maximum number of BSA iterations to 1000, the optimization of such a small codebook required over 5000 s of CPU, to be compared with SOM's 13 s. For this reason, figures reported in
5. Conclusions and future research We have presented a new tool for the design of low-resolution multiple description VQ codebooks, based on a suitable version of the self-organizing feature maps. The proposed algorithm is both conceptually and computationally simple, it can be readily applied to any source, and allows one to deal equally well with two or more descriptions. Experiments show that its performance is comparable to that of a MDVQ designed by resorting to direct optimization techniques, but with a much reduced computational cost. Although we spent a considerable effort to provide insight about the merits of this approach, a number of issues remain to be examined, like the use of asymmetric codebooks, the trade-off between central and side-distortion, or the use of time-saving design strategies. On the other hand, our goal was to provide a proof of concept of this approach, leaving to further studies its more thorough characterizations and, especially, its fine-tuning to the needs of actual MDC systems where replacing scalar quantization with vector quantization can help improving the performance with minimal complexity and design effort.
G. Poggi et al. / Neurocomputing 122 (2013) 298–309
References [1] V.K. Goyal, Multiple description coding: compression meets the network, IEEE Signal Proc. Mag. (September) (2001) 74–93. [2] B. Wah, D. Lin, LSP-based multiple-description coding for real-time low bitrate voice over IP, IEEE Trans. Multimedia 7 (2005) 167–178. [3] M. Quaresma, A.M. Arrifano, M. Pereira, M.M. Freire, M. Antonini, Robust peerto-peer video streaming based on multiple description coding, in: IEEE International Symposium on Broadband Multimedia Systems and Broadcasting, March 2010. [4] C. Greco, M. Cagnazzo, B. Pesquet-Popescu, Low-latency video streaming with congestion control in mobile ad-hoc networks, IEEE Trans. Multimedia 14 (2012) 1337–1350. [5] X. Lan, M. Yang, Y. Yuan, S. Zhao, N. Zheng, Adaptively post-encoding multiple description video coding, Neurocomputing 101 (2013) 149–160. [6] L. Ozarow, On a source coding problem with two channels and three receivers, Bell Syst. Tech. J. (December) (1980) 1909–1921. [7] A.A. El Gamal, T.M. Cover, Achievable rates for multiple descriptions, IEEE Trans. Inf. Theory (November) (1982) 851–857. [8] V.A. Vaishampayan, Design of multiple description scalar quantizer, IEEE Trans. Inf. Theory (May) (1993) 821–834. [9] J. Cardinal, Entropy-constrained index assignments for multiple description quantizers, IEEE Trans. Signal Process. 52 (1) (2004) 265–270. [10] R. Ma, F. Labeau, A family of fast index and redundancy assignments for error resilient multiple description coding, Signal Process. Image Commun. 27 (July (6)) (2012) 612–624. [11] V.A. Vaishampayan, N.J.A. Sloane, S.D. Servetto, Multiple-description vector quantization with lattice codebooks: design and analysis, IEEE Trans. Inf. Theory 47 (July (5)) (2001) 1718–1734. [12] X. Huang, X. Wu, Optimal index assignment for multiple description lattice vector quantization, in: Data Compression Conference (DCC), March 2006, pp. 272–281. [13] M. Liu, C. Zhu, M-description lattice vector quantization: index assignment and analysis, IEEE Trans. Signal Process. 57 (6) (2009) 2258–2274. [14] G. Zhang, J. Klejsa, W.B. Kleijn, Analysis of K-channel multiple description quantization, in: Data Compression Conference (DCC), March 2009, pp.53–62. [15] P. Koulgi, S.L. Regunathan, K. Rose, Multiple description quantization by deterministic annealing, IEEE Trans. Inf. Theory (June) (2003) 2067–2075. [16] P. Yahampath, On index assignment and the design of multiple description quantizers, in: International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 4, May 2004, pp. 597–600. [17] N. Gortz, P. Leelapornchai, Optimization of the index assignments for multiple description vector quantizers, IEEE Trans. Commun. (March) (2003) 336–340. [18] K. Zeger, A. Gersho, Pseudo-gray coding, IEEE Trans. Commun. (December) (1990) 2147–2158. [19] M. Fleming, M. Effros, Generalized multiple description vector quantization, in: Data Compression Conference (DCC), March 1999, pp. 3–12. [20] G. Poggi, L. Verdoliva, Design of low-resolution multiple description vector quantizers by means of the self organizing maps, in: European Signal Processing Conference (EUSIPCO), August 2008. [21] T. Kohonen, Self-Organizing Maps, 3rd ed., Springer, Berlin, 2001. [22] D.H. Hubel, T.H. Wiesel, Receptive fields, binocular and functional architecture in the cat's visual cortex, J. Physiol. (1962) 106–154. [23] S.B. Katwal, J.C. Gore, J.C. Gatenby, B.P. Rogers, Measuring relative timings of brain activities using FMRI, NeuroImage 66 (2013) 436–448. [24] V. Nourani, A.H. Baghanam, J. Adamowski, M. Gebremichael, Using selforganizing maps and wavelet transforms for space-time pre-processing of satellite precipitation and runoff data in neural network based rainfall-runoff modeling, J. Hydrol. 476 (2013) 228–243. [25] K. Iwata, T. Nakashima, Y. Anan, N. Ishii, Error prediction methods for embedded software development using hybrid models of self-organizing maps and multiple regression analyses, Stud. Comput. Intell. 443 (2013) 185–200. [26] T. Kohonen, Essentials of the self-organizing map, Neural Networks 37 (January) (2013) 52–65. [27] R.M. Gray, D.L. Neuhoff, Quantization, IEEE Trans. Inf. Theory (October) (1998) 2325–2384. [28] Y. Linde, A. Buzo, R.M. Gray, An algorithm for vector quantizer design, IEEE Trans. Commun. (January) (1980) 84–95.
309
[29] M.R. Anderberg, Cluster Analysis for Applications, Academic Press, New York, 1973. [30] G. Poggi, Applications of the Kohonen algorithm in vector quantization, Eur. Trans. Telecommun. Relat. Technol. (March) (1995) 191–202. [31] G. Poggi, Generalized-cost-measure-based address-predictive vector quantization, IEEE Trans. Image Process. (January) (1996) 49–55. [32] C. D'Elia, G. Poggi, Self-organizing codebooks for trellis-coded VQ, IEEE Signal Process. Lett. (December) (2002) 404–406. [33] E. de Bodt, M. Cottrell, P. Letremy, M. Verleysen, On the use of self-organizing maps to accelerate vector quantization, Neurocomputing (2004) 187–203. [34] G. Poggi, Fast algorithm for full-search VQ encoding, IEEE Electron. Lett. (June) (1993) 1141–1142. [35] Y. Wang, A.R. Reibman, M.T. Orchard, H. Jafarkhani, An improvement to multiple description transform coding, IEEE Trans. Signal Process. (November) (2002) 2843–2854.
Giovanni Poggi received the Laurea degree in electronic engineering from the University Federico II of Naples, Italy, in 1988. He is currently a Professor of telecommunications with the University Federico II of Naples, Italy, Department of Electrical Engineering and Information Technology and Coordinator of the Telecommunication Engineering School. His current research interests are focused on statistical image processing, including compression, restoration, and segmentation of remote-sensing images, both optical and SAR, and detection of image forgeries. Prof. Poggi has been an Associate Editor for the IEEE Transactions on Image Processing and Elsevier Signal Processing.
Davide Cozzolino received the Laurea degree in computer engineering from the University Federico II of Naples, Italy, in 2011. He is currently a Ph.D. student in electronic and telecommunications engineering at the University of Naples. His study and research focus on interests include image processing, particularly image compression and detection of image forgeries.
Luisa Verdoliva received the Laurea degree in telecommunications engineering and the Ph.D. degree in information engineering from the University of Naples Federico II, Naples, Italy, in 1998 and 2002, respectively. She is currently a Researcher with the Department of Electrical Engineering and Information Technology, University of Naples Federico II. Her current research interests include image processing, particularly denoising and compression of remote sensing images.