Signal Processing 17 (1989) 353-364 Elsevier Science Publishers B .V .
353
THE PERCEPTUAL RELEVANCE OF SCALE-SPACE IMAGE CODING I .B .O.S . MARTENS and G .M .M . MAJOOR Instimre for Perception Research, PO. Box 513, 5600 MB Eindhoven, The Netherlands
Received 25 May 1988 Revised 11 November 1988
Abstract. In our research on image coding algorithms we have adopted the following starting points . First, processing by coding algorithms should as close as possible match what we know about the human visual system . Second, due to the lack of acceptable objective criteria, proper evaluation of coding algorithms, as well as parameter settings, require perceptual experiments . In this paper we summarize the so-called scale-space model and describe its application to image coding . In the scale-space model an image is passed through Gaussian filters of decreasing bandwidth . The variation between successively filtered responses is very systematic, so that little information is needed to pass between them . Starting from a low resolution version of the original image, we make a prediction for a higher resolution version . Only the prediction errors need be transmitted to recover this higher resolution picture . The process is repeated at a number of resolutions (called scales) in order to arrive at the original image . For data-reduction purposes, several approximations of these prediction errors can be studied . Evaluation of the resulting coded images is done by means of perceptual experiments . It is also shown in this paper that a one-to-one correspondence can be established between the different stages of the scale-space coder and a well-known model of the human visual system that is based on psychophy .,ical data .
Zusammenfassung. Bei unseren Untersuchungen uber Bildcndierungsalgorithmen Bind wir von folgenden Punkten ausgegangen . Erstens sollte die Verarbeitung mit Hilfe von Codierungsalgorithmen unserem Wissen uber das menschliche visuelle System moglichst nahekommen . Und zweitens sand, wegen der fehlenden akzeptablen objektiven Kriterien, fur die korrekte Auswertung von Codierungsalgorithmen sowie fur die Parameterinstellungen Wahrnehmungsexperimente erforderlich . In diesem Beitrag geben wir eine Zusammenfassung des sogenannten Scale-Space-Modells and heschreiben seine Anwendung bei der Bildcodierung. Im Scale-Space Modell wird ein Bild durch GauB-Filter mit abnehmender Bandbreite geschickt . Die Unterschiede zwischen nacheinander gefilterten Antworten sind sehr systematisch, so wenig zusatzliche Information erforderlich ist, um von einer zur nzchsten zu gelangen . Ausgehend von einer Version mit niedriger Auflosung des Originalbildes wird fur eine Version mit hoherer Aufiosung eine Vorhersage gemacht . Nor die Vorhersagefehler mussen ubertragen werden, um das Bild mit hoherer Aufosung zuriickzugewinnen . Der Prozel wird fur eine bestimmte Zahl von Auflosungen (Scales genannt) wiederholt, um schlieBlich zum Originalbild zu gelangen . Zum Zweck der Datenreduktion konnen mehrere Annaherungen dieser Vorhersagefehler untersucht werden . Die Auswertung der erhaltenen codietten Bilder erfolgt mat Hilfe von Wahmehmungsexperimenten . In diesem Beitrag wird auBerdem gezeigt, daB ein eindeutiger Zusammenhang zwischen den verschiedenen Stufen des Scale-Space-Codierers and einem bekannten Modell des menschlichen visuellen Systems, das ant psychologischen Daten beruht, hergestellt werden kann.
Resume. Dans nos recherches sur les algorithmes de codage d'images, nous averts adopte les principes suivants . Tout d'abord, le traitement par algorithme de codage doit s'accorder aussi etroitement que possible avec cc que nous savons de l'appareil visuel humain . Deuxiement, en raison du manque de criteres objectifs acceptables, ('evaluation correcte des algorithmes de codage, ainsi que des parametres choisis, necessite des experiences de perception. Dans le present article, nous resumons le modele dit "scale-space" et decrivons son application au codage d'images . Suivant cc modele spatial graduel, une image traverse des filters de Gauss de largeur de bande decroissante . La variation des reponses successivement filtrees est tres systematique, de sorte que tres pea d'information est necessaire pour passer de Tune a l'autre . Partant d'une version a faible resolution de ('image originale, nous predisons cc que sera une version a resolution plus elevee . II sulfrt de transmettre les erreurs de prediction pour recouvrer cette image a resolution superieure . Le processus se repete 0165-1684/89/$3 .50 © 1989, Elsevier Science Publishers B .V.
354
JB.O.S. Martens, G.M.M. Majoor / Scale-space image coding pour un certain nombre de resolutions (on "scales", echelles) pour parvenir a ('image originate . Dans le but de reduire la masse des donnees, on pent etudier plusieurs approximations de ces erreurs de prediction . L'evaluation des images codees resultantes s'effectue a ('aide d'experiences de perception . 11 est egalement montre qu'une correspondance hiunivoque pent etre etablie entre les differents stades du codage "scale-space" et un modele bien connu de l'appareil visuel humain, sur Is base de donnees psychophysiques .
Keywords. Image coding, scale space, visual perception, human visual model, multiresolution, image pyramid .
1 . Introduction In a recent review of image-coding algorithms, Knot et al . [9] state that only systems that incorporate properties of the human visual system will be able to achieve the high data compression factors (10 and higher for static images) we are currently aiming at. Four examples of recent coding algorithms [4,6,7,18], based on this philosophy, have been presented . They have in common that information at different spatial frequencies is processed separately . The motivation for such an approach can be easily understood . Original images are always sampled on a grid, the mesh size of which is dictated by the highest frequency present in the original analog image . However, these high frequencies occur at relatively few locations in the image (i .e . at the edges) . In the remaining regions, data compression is possible because of the lower frequency content, or equivalently, higher spatial correlation. To separate regions of different spatial frequency content, processing in distinct frequency bands is required . Contemporary coding algirithms often adopt an a priori subdivision of the frequency domain . This subdivision is mostly dictated by technical convenience rather than by perceptual arguments . Recently, an alternative technique, called scalespace filtering, has been proposed for the study of signals under varying bandwidth conditions [3,1921] . A given signal is passed through Gaussian filters of continuously varying bandwidth . The variation between successively filtered responses is very systematic, and important image features, such as edges, can be made explicit in this representation [2,13] . Moreover, several striking correspondences between important scale-space characteristics and properties of the human visual signal Processing
system, such as size invariance and the presence of multiple spatial frequency-tuned channels, have also been pointed out by Koenderink [8] . The purpose of this paper is to make this correspondence more explicit . In the next section, the continuous formulation of scale-space is briefly reviewed . In Section 3, we discuss the implications of spatial sampling and show that current image pyramids are suboptimal, because they use identical filters to perform the decimation and the interpolation operations . The structure of an image coder, based on the scalespace model, is subsequently derived . We also present some results of a subjective evaluation experiment that was performed with this coder in order to estimate the quantization thresholds . We know of only one other recent paper, by Watson [15], in which such a systematic evaluation of an image coding scheme has been tried . In the final section, we review the contrast matching model of Swanson, Wilson and Giese [14] and show its relation to the scale-space model .
2 . Continuous scale space In a scale-space representation, an ndimensional input signal L(x), where x= [x 5 , x2 x n ] T is the position vector, is convolved with Gaussian kernels D(x,s)=~ n /2ena exp(-x Tx/e 2'),
(1)
with spread e'/v'2- ranging from zero to infinity (-oc < s < m) . The frequency characteristic of such a filter is also Gaussian and given by d(w, s) = exp( - c T W e 2 '/4),
(2)
J.B.O.S. Martens, G.M.M. MajoorI Scale-space image coding
where w = [w,, w, m„] T is the vector of angular frequencies . On the basis of argumentation in Koenderink [8], i .e . that the representation should have no preferred scale, we have adopted the scale parameter s to characterize the Gaussian filters . The scale-space representation of the signal L(x) is then defined as the (n + l)-dimensional signal L(x, s) = L(x) * D(x, s),
(3)
for -co < s < ac . An important property of this representation is that it satisfies the following heat equation : L (x, s )=!e2 SL(x, s),
(4)
where AL(x, s) =
d'- L(x, s) ax,z -i
E -
(5)
is the n-dimensional Laplacian and L, is the partial derivative with respect to s . Equation (4) links the variations along the spatial dimensions to the variation along the scale axis . This implies that, at least in a pure mathematical sense, the signal L(x, so ) at any given scale s„ completely determines the entire representation (for all scales) . Indeed, transforming from L(x, s o) to L(x, s,) is accomplished by filtering L(x, s ( ,) with h (w, so - si) = exp(-w ` w(e zs ' - e c `0)/4) . (6) This is however only true if L(x, so ) is known for all positions and with unlimited precision, especially ifs, is smaller than so (because the filter is unstable in this case) . In practical applications, the scale-space signals are sampled . The important question to be answered is henceforth : how much of the information, contained in the samples of L(x, se ), can be recovered from the samples of L(x, s,)? This interpolation problem was considered by Martens [It] in a more general framework, and the main results are summarized in the next section.
355
3. Discrete scale space Given a signal L 0 (x) and a window function V(x), we can define the signal L,(x) = Lo (x) * D(x),
(7)
D(x) = V'(x),
(8)
with as a filtered version of the original . Suppose that the signal L, (x) is sampled on a grid with spacing T It was shown by Martens [ill that the sample L,(kT) minimizes the quadratic error measure (9) 2 (x-kT)[L,(kT)-L o (x)] 2 dx, J V and is hence an optimum estimate for the mean value of L 0 (x) in the neighbourhood of coordinate kT. A prediction for L 0,(x) can be constructed by approximating L0(x), within the window V(xkT), by its mean value L,(kT) . This is equivalent to interpolating the samples L,(kT) with the filter P(x) = V(x)/ W(x),
(10)
where W(x)=Z V(x-kT
(11)
k
is a spatial weighting function which must be different from zero for all positions x . This weighting function guarantees that an input, consisting of a constant value at all sample coordinates kT, gives rise to a constant function output . The above decimation/interpolation process is illustrated in Fig . 1 . The most obvious difference with current practice in pyramid coding is that the decimation and interpolation filter are different . For instance, for the case of the window function V(x)=[1-2a
1
4a
1
1-2a],
(12)
the optimum interpolation function is given by P(x)=z[1-2a 1 ®
4a ®
1
1-2a],
(13)
0 Fo(s)
Lo(x) Fig . I . Optimum decimation and interpolation . Vol . 17, No . 4, August 1989
J.B.O.S. Martens, G.M.M. Majoor / Scale-space image coding
356
assuming that the decimation factor T is equal to two . This function is identical to the one used by Burt and Adelson [4] . The optimum decimation filter is however different from the interpolating filter and given by
D(x)=A [(1-2a)'
1
(4a) 2
1
(1-2a) 2], (14)
where A=2(1-2a) 2 +2+(4a) 2 =4(6a 2 -2a+1)
(15)
L0 (x)and L 1 (x) equal to typical scale-space signals L(x, so ) and L(x, s,) respectively, with s o < s, . The corresponding window function V(X) = e exp(-x Tx/2e2'),
1
s
(16)
is then equal to a Gaussian function with spread eF =Jets ' - e2'o,
(17)
because filtering L(x, s o) with V2 (x) results in the signal L(x, s,) . The limiting condition in this resampling process is that the weighting function W(x) must be different from zero for all positions . . Assuming a square sampling grid with spacing T, this condition can be satisfied by an adequate choice of the sampling parameter r, which is defined by
is a normalization factor. The fact that this choice of filters results in an improvement can be easily verified by a simulation on natural images . We take the difference between the original image and an approximated image, derived by decimation followed by interpolation, as shown in Fig . 1 (with T=2) . The entropy of the difference image is plotted in Fig . 2 as a function of the parameter a for two different input images . The solid lines show the results in case the decimation and interpolation filter are equal, while the dotted lines correspond to the case that these filters are unequal, and given by the above equations . The modified filters demonstrate an improvement over the original filters in all cases . Application of the above results to scale space is straightforward, provided we take the signals
The contrast of the periodic weighting function for the 1-D case was also calculated by Martens [11] and is plotted in Fig . 3 . We conclude that the sampling parameter r must be smaller than 6, because otherwise the weighting function approximates values near zero . Note also that for values of ,r up to 3, the weighting function is well modeled by a sine modulation (i .e . only the first harmonic) . The n-dimensional weighting function is the product of n one-dimensional weighting functions, so that the same range of parameter values for r can be used in the multidimensional case .
Fig- 2 . Entropy in bits of difference image for original and modified filters for two separate input images .
Fig. 3 . Contrast of weighting function as a function of v The dotted line shows the contrast of the first harmonic .
Signal Processing
r=T/e'.
(18)
l.B.O.S. Martens, G.M.M. Majoorl Scale-space image coding
4 . Scale-space coder The application of the discrete scale-space formalism in a predictive coding strategy is straightforward . The resulting pyramid coding scheme is shown in Fig . 4 .
D,
P2
Lo
Fig. 4 . Scale-space coder,/decoder .
The original image L,, with sample spacing T,, is the lowest level of the image pyramid . A number of discrete scale-space signals L ; are derived from L o by Gaussian filtering with D(x, s,), followed by a sampling with spacing T, . In a discrete implementation, the filter coefficients are the samples of band-limited Gaussians . The scale values s 1 , S7 . . . . are the parameters of the coder . In practice, we take the sampling distances T to be multiples to T, . This is technically convenient, because the subsampling operation then reduces to a simple decimation process . If we impose that T, = 2'T„ [4] and furthermore take the sampling parameter r to be constant, we get that the scale factors are determined by e' = d,
2' Tn (19) T
where d 2 =4,,(1-4
),
(20)
for i = 1, 2 . . . . We note that the scale step s ; - s; ,
357
converges to In 2 . This is in agreement with psychophysical data . These data support the existence in the human visual system of a number of distinct spatial frequency-tuned channels approximately one octave apart [16,17] . The Gaussian image at the top L, has a large sampling distance and can be transmitted with a limited bandwidth . By means of the interpolation operator P,, a prediction can be made for the signal L s , one level below L, . Only the prediction error needs to be transmitted . Data compression can be accomplished by quantizing this prediction error before transmission . At the receiver, this prediction error is added back to the prediction for L . Recursive application of this technique at a number of different levels (or equivalently, different spatial scales) leads to a coded image L„ that is an approximation of the original L . The degree of quantization at the different levels determines not only the data reduction, but also the quality of the resulting image . There are two important aspects in which this scale-space coder differs from existing pyramid coders . First, we use broadly tuned Gaussian filters in our coder, amongst others because these filters have characteristics that correlate well with existing data on the human visual system . Second, distinct decimation and interpolation filters are used to get an optimum tuning of the two processes . If we apply the above scale-space coder to natural images, a number of observations can be made . First, it turns out that the prediction error signals have a probability distribution which is centered around zero, and is very peaked . Second, the standard deviation and the entropy of the prediction error signal usually increase with increasing scale . In Fig . 5, we have plotted the entropy of the largest difference image as a function of the sampling parameter r for a number of natural images . For values of r between 3 and 6, we see that the scale-space coder performs better than the Laplacian pyramid coder, even if the decimation filter in this coder is replaced by the optimum choice presented in the previous section . If T is larger than 6, then the weighting function W Vol . 17, No . 4, August 1989
358
IS.O .S. Martens, G.M .M. Majoor I Scale-space image
.......
. x'
70 -
1`
-L3
2`_
30
3.5
4.'.- 4`_ -
5 .0
---
6.5
7 .0
-127 Fig. 5 . Entropy in bits of difference image at scale s, as a function of the sampling parameter r for two input images. The horizontal lines indicate the lowest entropy that can he obtained with the Laplacian pyramid (see Fig . 2) .
approaches values close to zero, and consequently the scale-space coder breaks down . Note that, although we use Gaussian filters with a frequency characteristic that is broadly tuned, the overall system performs better than the Laplacian coder with its more finely tuned frequency filters . This implies that care must be taken in applying arguments in the frequency domain to the efficiency of a signal decomposition in the spatial domain . The prediction error images are usually coded by means of pointwise quantization . It is wellknown that for signals E with a non-uniform probability distribution p(E) the optimum quantization, i .e . leading to the minimum mean square error, is non-uniform [ 12] . A useful approximation to the optimum quantizer was described by Algazi [1] . It consists of a pointwise non-linearity S of the form S(E) =kt Jp(E)"3dE+k2 ,
( 21)
where k, and k2 are arbitrary constants, followed by a uniform quantizer Q . At the receiver, a nonlinear mapping R = S' is applied to compensate for S. The amplification factor k, is determined by the input range of the uniform quantizer . The offset k2 is chosen such that T(0)=0 . In Fig. 6 we show the nonlinear mapping that was derived for the prediction error at scale so . A training set consisting of natural images was used to approximate the Signal Processing
coding
Fig .
6.
0
Nonlinear input mapping derived from the statistics the prediction error at scale s,, .
of
probability distribution of the prediction error signal .
5 . Coding results A software simulation of the coding scheme of Fig . 4 has been realized, and a number of images of natural scenes were coded . Due to historical reasons, the sampling parameter r was taken equal to 2 in our experiments, although we realize that higher values of T result in a better coder performance . The original images were digitized with 8 bits/pixel on a grid of 512 by 512 pixels . Down to 1 bit/pixel (compression ratio 8), the quality of the coded images is usually very close to that of the originals . Bit rates below 0 .5 bit/pixel (compression ratio above 16) imply a too coarse quantization, or even deletion, of the prediction error image on the smallest scale, and consequently always result in images that are noticeably unsharp . In the intermediate region, different degrees of quantization noise and unsharpness are present . Because no acceptable objective criteria for evaluating the quality of (coded) images are available, the best alternative at this point consists in measuring image quality by means of psychophysical techniques . The simplest experiment that can be carried out is a detection experiment, where
J.B.O.S. Martens, G.M.M. Majoor 1 Scale-space image coding
the coded images are compared with the original image . This corresponds to the case that no visible coding artefacts are accepted, and is referred to as perceptually lossless coding by Watson [15] . In the experiment described below, the coding consisted in uniformly quantizing the prediction error images at scales s o and s, (hence, without including the non-linearities S and R in Fig . 4) . The quantization steps were qs„ and qs, respectively, and measurements were performed for different ratios qs,/qs o . From the point of view of coding, the quantization effects at the smallest scales are the most interesting to study . Indeed, signals at higher scales contribute much less to the overall number of hits . The viewing conditions were according to CCIR Recommendation 500 [5] with the exception that the peak luminance was increased to 115 cd/M 2 . The viewing distance, equal to six times the monitor height, corresponded to a viewing angle of 1 arcmin for the sample spacing and approximately nine degrees for the entire screen . A twoalternative forced-choice experiment, using the transformed up-down method of Levitt [10], was used to measure thresholds . The outputs of such an experiment are the quantization parameter values for which the coded image can be distinguished from the original in 79% of the presentations . The two images were presented simultaneously, one on each half of the screen . Results for two subjects and three natural images are shown in Fig . 7 . The different scenes that were used in the experiments are illustrated in Fig . 8 . The images on the left are the originals, while the images on the right are coded with quantization parameters above threshold (at least for the viewing conditions used in the experiment) . In order to enable a better comparison, enlarged parts of these images are shown in Fig . 9 . Although the detection thresholds differ between subjects and scenes, two conclusions can be drawn from these measurements . First, the positions of the thresholds in the quantization plane indicate that the quantization errors on the different scales are largely independent . If
359
A--. ...o
20- subject : HR
wanda harbour tower
o----o o-o
4 0 15-
p
10-
5-
{a)
0
I
0
5
15
10
20
q0
20 - subject : BM
wanda harbour tower
C-0
15 -
-0-
10
504 i'
(b)
0 0
I
I
I
I
5
10
15
20
q0 Fig . 7 . Detection thresholds for 3 different natural images and 2 subjects . The bars indicate the 95% confidence intervals .
this were not the case, then the threshold along the diagonal in the quantization plane should be much closer to the origin . As far as known, this is the first time that such an independence between quantization effects has been demonstrated quantitatively . Note that this independence is also an argument for the perceptual relevance of the approach, because it indicates that the noise contributions of the quantizations at the different scales do not influence each other, and are hence probably perceived by different mechanisms in the visual pathway . Vol . 17, Nu.4, August 1989
3 60
J&O.S. Martens, G.M.M. Maloor / Scale-space image coding
Fig. 8 . Original and coded images . (a) Wandawith q n =20, q,=20. (b) Tower with q, =15, q,=15. (c) Harbour with qo =15, q,=15 .
Second, contrary to what has been reported in the literature [4,15], we find that, at least at threshold, the sensitivity to quantization errors at the different scales is approximately equal . Most studies up to now include only subjective impressions of the developed coding schemes and insufficient information about the viewing and displaying conditions, so that comparisons are Signal Processing
difficult. In the measurements of Watson, two differences in the experimental conditions might be important . First, the screen luminance was proportional to the pixel value . In many practical applications where monitors are used, this situation is unrealistic . Indeed, monitors have a nonlinear characteristic which results in a screen luminance that is proportional to the pixel value
JB .O.S. Martens, G.M.M. Majoor / Scale-space image coding
361
Fig. 9 . Enlarged parts of the images in Fig . 8.
to the power y, with y close to 3 . We have however repeated our experiments with a linearized display and found essentially the same results as in Fig . 7, so that we can exclude the monitor characteristic as a determining factor in these experiments . We are currently performing experiments using nonuniform quantization in order to determine if
this latter difference in experimental conditions can account for the discrepancies. One of the motivations for including our psychophysical measurements in this paper is to demonstrate that more extensive experiments, under controlled and reproducable viewing conditions, are required to decide on the perceptual relevance of Vol . 17, No. 4 . August 1989
362
J.B.O.S. Martens, G.M.M. Mujoor / Scale-space image coding
the quantization errors at the different scales . It is our viewpoint that this perceptual evaluation of image coding schemes has been neglected for too long . The fact that thresholds are scene-dependent demonstrates the importance of using identical, and well-selected, image material for comparing different coding schemes . Even during experiments, thresholds for one specific image varied, as subjects were getting more familiar with the experimental conditions and found new image features to base their judgement on . The results presented in Fig. 7 should therefore be considered as worstcase .
6. Perceptual relevance In the introduction we stated that there are two different ways in which human visual perception enters into our work on image coding . One way i .e . the use of perceptual experiments in order to determine certain parameter values in the coder, was illustrated in the previous section . The second objective of our perceptual approach towards image coding is that we want the analysis performed by the coder and the human visual system to be closely related [9] . Some of these correspondences have already been mentioned, but will now be discussed in more detail. One of the few models that can account quantitatively for both threshold and suprathreshold spatial vision is the one proposed by Swanson et a] . [14] . A block diagram of this model is reproduced in Fig . 10 . The first stage of this model consists of a number of linear spatial filters, also called channels . In Fig . 10, four such filters, denoted N, S, T and U in order of increasing size, are shown . They were derived from the threshold experiments by Wilson and Bergen [16] . Due to the inherent experimental error, it is practically impossible to decide on the exact number and characteristics of these filters . For instance, Wilson and Gelb [17] have presented an alternative threshold model using six filters . The Signal Processing
N
c
I °I
S
M
n
FN(SNC) E
Oi
c
a
Fs ISsCI
S P 0
T L
N TC
U
FT (STC)
s
I •I
u UC
Fu(SuCl
Fig. 10. Outline of the contrast matching model described by Swanson et al . [141. First stage consists of four medium-bandwidth linear spatial filters : N, S, T and U in order of increasing size . Second stage consists of four contrast transfer functions . Third stage consists of a contrast response computed by response pooling among the four mechanisms .
general consensus is that four to six medium-bandwidth filters, usually with a characteristic close to a difference of Gaussians and with peak frequencies approximately one octave apart, enable a close agreement to be obtained between model predictions and measured threshold data . The filters used in our coder clearly satisfy this condition . The second stage of the model in Fig . 10 consists of a number of contrast transfer functions . They correspond to the nonlinear mappings S included in our coder. Differences between these contrast transfer functions can account for suprathreshold matching experiments . In these experiments, test patterns of varying frequency are matched in perceived contrast with a standard pattern of fixed contrast and spatial frequency . When the standard pattern is set to contrasts slightly above threshold, isoresponse curves fall off at high and low frequencies, just as threshold curves do . This implies that the thresholds for the high and low frequency channels are higher than for the intermediate frequency channels . When the standard pattern is set to higher contrasts, isoresponse curves flatten . This implies that perceived contrast must increase more quickly with increasing physical contrast for
JB .O,S, Martens, G.M.M. Majoor / Scale-space image coding
high and low frequencies than for midrange frequencies . At high contrasts, contrast transfer curves must become identical, because isoresponse curves are flat . The contrast transfer functions for the N and S channels are shown in Fig . 11 . The similarity between these contrast transfer functions and the nonlinear mapping of Fig . 5 is obvious .
6
contrail
Fig . 11 . Contrast transfer functions for N and S mechanisms . The N mechanism remains near zero longer because it has a higher threshold, but accelerates more quickly so that both mechanisms reach the same value at high contrasts .
The third stage in the model of Fig . 10 precludes a contrast response that is computed by response pooling among the different mechanisms . This was modelled by
( E F `I l
range 2-4, as is indeed the case, we conclude that there is a close correspondence with the response pooling mechanism. The main difference is the exponent 1/e . This exponent is however irrelevant, because the response F is only used to decide if stimuli match .
7 . Conclusions
26
F=
363
In this paper we have introduced a new coding scheme based on the scale-space formalism . It was shown that the scheme is not only effective in coding images but is also supported by psychophysical knowledge . Current research involves psychophysical experiments aimed at determining image quality and data reduction rates as a function of the coding parameters . We have shown that the few experimental results that are available today (i .e . Watson [15] and the results mentioned in this paper) can even lead to contradictary conclusions . Trying to solve these discrepancies may lead to a better insight of how natural scenes are perceived by human observers . Therefore, we feel that, in order to increase our confidence in current principles of image coding, more experiments are needed in which these principles are tested against the reality of visual perception . In parallel, alternative image coding methods that incorporate other properties of the human visual system, such as directional selectivity, are also being studied .
(22)
where F, is the output of the contrast transfer function in channel i, and F is the combined response . Values of e=2 and a=4 were applied in the model, and were said to make little difference . This third stage corresponds to the reconstruction or synthesis part in our coder, that is the function usually performed by the receiver . This involves passing the prediction error at scale .s ; through the nonlinear mapping R,, and combining these outputs through linear operations . Assuming that these nonlinear mappings can be approximated by power laws with a power in the
References [1] V . Algazi, "Useful approximations to optimum quantization," IEEE Trans. Commun . TechnoL, Vol . COM-14, No . 3, June 1966, pp . 297-301 . [2] H. Asada and M . Brady, 'The curvature primal sketch", IEEE Trans. Pattern Anal. Machine Intell., Vol . PAMI-8, No . 1, January 1986, pp . 2-14. [3] J . Babaud, A. Witkin, M . Baudin and R . Duda, "Uniqueness of the Gaussian kernel for scale-space filtering", IEEE Trans. Pattern Anal. Machine Intell., Vol . PAMI-8, No . 1, January 1986, pp ., 26-33 . [4] P . Burt and E . Adelson, "The Laplacian pyramid as a compact image code", IEEE Trans . Commun., Vol . COM31, No . 4, April 1983, pp . 532-540 . Vol. 17, No . 4, August 1989
364
J.B.O.S Martens, G.M.M. Majoor / Scale-space image coding
[5] CCIR Recommendation 500, Method for the subjective assessment of the quality of television pictures, 1974 . [6] A . Ikonomopoulos and M . Kunt, "High compression image coding via directional coding", Signal Process., Vol . 8, No . 2, April 1985, pp. 179-203 . [7] M . Kocher and M . Kunt, "A contour-texture approach to image coding", Proc . Int. Conf. Acoust. Speech Signal Process ., ICASSP-82, Paris, France, May 1982, pp . 436440 . [8] J . Koenderink, "The structure of images", Biol. Cybern., Vol . 50, 1984, pp. 363-370 . [9] M . Kunt, A . Ikonomopoulos and M . Kocher, "Second generation image-coding techniques", Proc., IEEE, Vol . 73, No . 4, April 1985, pp . 549-574 . [10] H . Levitt, "Transformed up-down methods in psychoacoustics", J. Acoust. Soc. Am ., Vol . 9, No . 2, 1971, pp . 467-477 . [11] J.B . Martens, "Applications of scale space to coding", submitted to IEEE Trans. Commun ., 1987[12] 1 . Max, "Quantizing for minimum distortion", IRE Trans. Inf. Theory, Vol . 6, March 1966, pp . 7-12 . [13] F. Mokhtarian and A. Macworth, "Scale-based description and recognition of planar curves and two-dimensional ., Vol_ shapes", IEEE Trans. Pattern Anal. Machine Intel PAMI-8, No . 1, January 1986, pp. 34-43 .
Signal Processing
[14] W . Swanson, H. Wilson and S. Giese, "Contrast matching data predicted from contrast increment thresholds", Vision Res., Vol . 24, No . 1, 1984, pp . 63-75 . [15] A . Watson, "Efficiency of a model human image code", J. Opt. Soc. Am, A, Vol . 4, No . 12, December 1987, pp . 2401-2417 . [16] H . Wilson and J . Bergen, "A four mechanism model for threshold spatial vision", Vision Res., Vol. 19, No . 1, 1979, pp. 19-32 . [17] H . Wilson and D . Gelb, "Modified line-element theory for spatial-frequency and width discrimination", J. Opt . See. Am . A, Vol . 1, No . 1, January 1984, pp. 124-131 . [18] R. Wilson, H . Knutsson and G . Granlund, "Anisotropic nonstationary image estimation and its applications : Part II-Predictive image coding", IEEE Trans. Commun., Vol. COM-31, No . 3, March 1983, pp . 398-406 . [191 A . Witkin, "Scale-space filtering", Proc.7th Int . Joint Conf. Artificial Intelligence, Karlsruhe, 1983, pp . 1019-1021 . [20] A. Yuille and T . Poggio, "Fingerprint theorems for zerocrossings", . MITArlhciallntell. Lab . Tech. Rep. AIM-730, Cambridge, MA, 1983 . [21] A. Yuille and T. Poggio, "Scaling theorems for zerocrossings", IEEE Trans. Pattern Anal . Machine Intell., Vol. PAMI-8, No . 1, January 1986, pp. 15-25 .