Vision Res
Vol. 27, No.
Printed in Great Britain.
8. pp. 13194326, 1987 Allrightsreserved
Copyright c
0042-6989/87 %3.00 + 0.00 1987 Pergamon Journals Ltd
IS PATTERN MASKING PREDICTED BY THE CROSS-CORRELATION BE~EEN SIGNAL AND MASK? TERRYCAELLIand GIAMPAOLO MORAGLIA Departmentof Psychology, University of Alberta, Edmonton, Alberta, Canada T6G 2E9 (Received 29 June 1986; in raised
form
22
Jnnunry
1987)
Ah&at%-We sought to determine whether the cross-correlation between signals and masks consisting of Gaussian modulated grating patterns is consistent with observed psychophysical masking effects. Our experimental results support this relationship, and suggest further specifications for the determinants of pattern masking effects. Masking
Cross correlation
Gabor signaf
Pattern recognition
INTRODUClION It is generally assumed that the degree to which a masking pattern disrupts the detection of a target depends upon several factors besides their spatio-temporal proximity. One of these factors is the “similarity” between pattern and mask. This paper is concerned with examining one possible ~r~ptually relevant Measure of similarity in pattern masking effects: the crosscorrelation between the luminance profiles of signal and mask. In the context of masking by spatial frequency gratings, it has been assumed that the masking effects (being band-loiter reflect the frequency response ranges of spatial frequency “channels” operating in human pattern vision (see e.g. Wilson et al., 1983, p. 873). The enumeration of such channels has been the concern of many reports over the past decade, and of importance to this approach is that the channels should be few in number and fixed (invariant) in their frequency reponse (ibid). Most recent quantitative formulations for shape and localization of such channels reduce to an ensemble of spatially contiguous twodimensional Gabor profiles, being gaussian modulated sinusoids of the form
_&~)=exp(-a~Kx
--xJ2+(y
-yO)*l}
x exp{ - 2xi ~uo(x - x0)] + b*(Y -Yo)lj.
(1)
Here (x0, y,,) determines the spatial center of one such detector and so its spatial localization
Spatial frequency channel
(Papoulis, 1968; Graham et al., 1978; Daugman, 1983). These signals, having a space constant CI, are also localized in the tw~imensional frequency domain [centered at (uO,L+,)with bandwidth of l/a]. The grating is oriented at: 6, = tan-‘(u&,) and has a radial frequency of & = &m. Such functions also approximate simple cell receptive field profiles in the vertebrate visual cortex (Ku~kowski et al,, 1982).
This formulation for a spatial frequency channel allows for the explicit representation of the channel’s point spread function, by (I), and its (complex) frequency “response” F(u, V) = exp(-
l/cr2[(u - u0)2+ (0 - u,)‘])
x exp( -2ni [x0(1(- ~0) + Y&J -
411. (2)
To this date, it is assumed that an explanation of masking effects, in this context, demands the consideration of the spectral power densities of signal and mask. Since this approach has been generally successful, little effort has been devoted to examining alternative predictors of masking effects in terms of the image-domain properties of such images. In the following experiment we attempted in particular to determine if, and to what extent, in the context of the above stimuli (1,2), masking can be predicted in terms of the image domain similarity between the luminance profiles of test and mask. An answer to this question, we believe, should help to determine whether the current assumptions about mask-
1319
1320
TERRYCAELLI and GLWPAOLOhQO14.4GtlA
ing are indeed a necessary component of a satisfactory explanation of this effect.. A method for capturing the similarity between two images (or detecting a signal in noise) is the matched filter, or cross-correlator (Papoulis, 1968). The cross-correlation function between two signals f(x, v) and g(~, v) is defined by (Rosenfeld and Kak, 1976) J-m
J-cc
x g (a, 8) da dS.
(3)
This function measures the degree to which the two images match each other (in the leastsquare sense, see the Appendix) as a fun&on of their relative displacement [(x, v) in (3)]. This will be shown to be an important consideration in evaluating the comfrrtion between two luminance pro&s, ~~1~ when the latter does not peak at x=y = 0 in (3): the case when f and g do not have identical phase spectra. In the experiment to be reported, we sought to determine whether the simi&ity between the luminance pro&s of signal and mask, as indexed by the peak cross-correlation value, may be a rdiable predictor of making e&cts. We were conoerned, in particular, with investigating whether global two-dimensional huuinauce pro&e simikrities as osptured by ~o~~o~~~on, can index the masking &&et as wefi as the expheit ratio% of such pro&s via frequency, orientation, or phase vahres, per se. The hypothesis to be tested in the foliowing experiment can thus be simply stated: if the degree to which the signal and mask luminance pro&s are correlrrted in the fundamental spuriizl predictor of masking [also considering that the peak in the correlation function may not be at (0,O) in (3)f, we skrplfexpect a sisuiiar amount of masking for a given crmtion value, irrespectitw of the type of parmneter which produces such value. METHOD
Observers Two expepsychophyaioai observers, both mak and with come&d-to-nomal amity, served in this study.
All stimuli were generated by a PIMP I l/23 computer ooup&ed to an Imaging Taco (ITI) image prdng system, with &bit pixel
resolution. The stimuli were displayed on an Electrohome T.V. monitor with P32 phosphor. Signals and masks consisted of Gabor profiles. as defined by (l), with a fixed bandwidth or space constant (a), where the gaussian decayed to l/e in 5.25’ (8 pixels). Two signals were used throughout the experiment: a symmetric Signal 1 fphase, # = O”,frequency f = 2 picture cycles, and o~en~tion 8 = 0” (vertica! grating)], and an asymmetric Signal 2 (4 = 90”) (see Fig 1). The signals were generated in a 32 x 32 pixel array such that the spatial frequency of 2 picture cycles corresponded to 2 cycles in 22 pixels or 1 cycle in 11’. All other spatial frequencies were tabulated, in picture cycles, relative to this digital format, These sign& had identical d.c. and peak-to-peak contrast: the energy (r.m.s) of Signal 1, however, was shghtfy greater than that of Signal 2.64 Gabor masks were obtained from the factorial corn~a~on of 4 different phase (0,30,60,90 deg), radial frequency (2,3,4,8 picture cycles) and orientation (0, 15,30,45 deg) values (see Fig. 1). Each such mask was presented in a 32 x 32 pixels Format, and fitted into one of 64 nonoverlapping positions of a 256 x 255 pixels “plate” having a thin line down its center (as shown in Fig. X).Baoh G&or pro&e subtended a visual angie of 2i’, with the 255 x 256 region of 2.8” from a~viewing subtending a visual distance of 1.78m, set by a~ head rest. The (2,3,4,8) picture cycle fan thus, converted to approximately (6,9, 12,2+ cycles per degree of visual angk. The plate was disptayed for 330msec at high tintrast (97%), and was immediate&yfoIlowed by a 33 msec pressntation of the target sig& at lower contrasf (56%). A central fixation spot was disptayod before and aftereach trial, and the space &rage luminance was permanently set at 3B_cd/m*. The ~a~o~~s between the signai-mask peak cross-corrdati~ values, and their respective frequency, ovation, and phase difTerenoes, are ihustrated in Fig, 2 and 3 for Signai I and sigral2 respe&My. These figures reveal that cross-cormlation decreases monotonioaliy as the sign&m&s di@i$re~& along the parameters d&ning such stimuli increase. These figures, thus, illwtrate the subtle distinction which we are endeavoring to ascertain here: that the explicit use of such parameter states may not be necessary to represent, or predict masking e&&s with these sigrrah, if the latter are weil indexed by ~ro~~~on, a unitary measure of ~~~~ pro& simihuity.
fQ. 1 The 2% x 256 pixels image used in the experiment. consisting of 64 different versions of a Gabor profile. obtained by factorially combining four spatial frequency (2. 3. 4 and 8 picture cycles with respect to a 32 x 32 pixels format), orientation (0. 15, 30. 45~) and phase angle (0. 30. 60. 90 ) values. The target signals (I. 2) are outlined by the square apertures (dark = I. light = 2).
13’1
Pattern masking
Procedure
1323
260
On each trial, the sequence of events was the following. The observer fixated the black spot until ready to initiate a trial by depressing a button. The plate was then presented, to be immediately followed by the signal, displayed at one of 64 orthogonal positions in the now blank display area, each position corresponding exactly to one of the regions previously occupied by one masking element. The observer’s task was to respond as quickly and accurately as possible, by pushing one of two correspondingly positioned buttons, as to whether the signal had appeared to the left or to the right of the vertical center line of the plate. The reappearance of the fixation spot following the response indicated the onset of a new trial. The observers were administered 40 presentations of Signals 1 and 2 in each of the 64 masking positions, for a total of 40 x 20 x 64 = 5 120 trials per observer. The
200
-
160
-
160
-
140
-
120
-
. * \ . \/Ai
‘oolL0
260
t
I
1
1
I
J
8130
240 220
-
200
-
160
-
160
-
140
-
cnm
240t
I?=0
Cmu
260 r
.
240
.
x
38
\
.
.
\
I
160
120
-
100
-
Cmax 160
0
++-F f
II
\
\
x
0
6
k--+-Y
6
f
Fig. 3
t
260
Figs 2 and 3. The peak values of the cross-comlation (C,) between the 64 masks and Signals 1 (Fig. 2) and 2 (Fig. 3) are plotted as a function of the frequency (f), orientation (0’) and phase (4”) values of the masks. In all cases the values of c_ [according to equation (311were divided by the same constant to fit within a O-255 (8-bit) pixel range.
B=30 240
data were collected over 20 separate sessions, each involving the randomized presentation of the target signals twice in each of the 64 positions. 160
RESULTS
C man 160
Fig. 2
Figures 4 and 5 show the proportion of correct responses, averaged over the two observers, for the various frequency, orientation and phase differences of the masks with Signal 1 (Fig. 4) and Signal 2 (Fig. 5). Here we have plotted our results as a function of the peak cross-correlation values (C_) between the appropriate signal and mask, though we have split these data over the mask and signal parametric
I324
TERRY CAELIJ and
descriptions. It is of particular import8nce to note the degree to which C,, indexes detection performance independently of the spatial parameters used to generate specific C,, values. As can be seen, all graphs shown in Fig. 4 show similar slopes. The same is true of the graphs shown in Fig. 5. Further, it is interesting to note that C_ orders the efIects of the parametric manipul8tions of the mask on signal detection, with fmuenoy and orientation having the greatest effects on pe&ormance. Thus, it may not be llcct588~y to assume that the visual system is “less sensitive” to phase than to frequency ami orientation. Rather, it may be su%eient to note that, quite simply, phase difi&nces have, in these signals, less effect on Cm (this is not true, howcver, for images in general). Finally, it is interesting to note that crosscomiation statistics (like C_) enable the comparison between the effects of non~~e~~ variables on the signal and mask im8ges. rerun, sp8tial frequency or phase dif&ences, in other words, creatt various degreesofim8gedif&renceswhicheanbeindexed by a single parameter as C,, . This point cannot be overemph8sized, since it d@!nonstr8tes that
GtAlraPhOLO
conclusions like those pertaining, e.g. to the visual system’s insensitivity to spatial phase, as a function of signal-mask bandwidth, may simply follow from the fact that phase difftrences with these signals do not sigrri&antly aBxt the signal-mask cross-corm&ion peaks. One possible model consistent with the above results is the following. Underlying families of receptive fields are densely mapped over the region of the signal. Mask and target perception comes about by the activities of the more active units. We further assume that the activity proties are Poisson-like; th8t response irariance, in other words, increases with its mean. This has the &ect of increasing the response noise (variance) with incresed activity. Consequently, when the mask is presented, noise is introduced into the more active units trig&red by both mask and sign8l when they 8re highly-~rrelated in space. Obversely, when the mask and signal are uncorrelated, no similar ftiles of detectors are activated, so keeping the internal noise low.
The results of these experiments provide evidence in favour of eross-correl8tion as 8 basic
r
r
1.0
1.0
MORAGLLA
r
Fig. 4
e:'1!5
Pattern
masking
1325
e=ts
e=o
P
0.7
0.6
1.0
r
e=30
0.9
0.5
0.4 t 100
123
150
1?S
200
22s
100
250
123
150
175
200
225
250
c m*.
c WlOI
Fig. 5 Figs 4 and 5. The proportion of correct responses, averaged over the two observers, is plotted as a function of the peak cross-correlation (C_) between the masks and Signal I (Fig. 4) and Signal 2 (Fig. 5) respectively, and in terms of the orientation (B = O,I5,30,45”) and frequency (f: 2,3,4,8 picture cycles) values of the masking profiles. The results for the four phase values (0,30,&I, 90”) of the masking signals at each orientation/frequency combination are also plotted, although they are not denoted by separate symbols. They can be identified, however, since, for each graph, and for each symbol denoting frequency, the data points nearest to the origin of the graph stand for a mask with a phase angle of 0”, the other values being plotttd in ascending order.
spatial comparison measure involved in pattern masking, at least within the spatio-temporal bounds studied here. The observed masking effects were well defined, varying from no masking (P z 1.0) to complete masking (P EZ0.5). In all cases studied, our results indicate that the masking effect of a given luminance profile on a target signal can be largely indexed by the peak of their cross-correlation. Such findings, thus, reveal that a measure of pattern similarity in the image domain can account for masking effects usually explained in terms of the responses of visual channels specifically sensitive to the 2D power spectral components of signal and mask. Our results do not run counter to this assumption. They by no means imply, however, that the latter is a necessary one. Due to the nature of the stimuli used in such studies, image similarity among these
stimuli is explicitly related (see, for example, Figs 2 and 3) to their spectral similarity, In the case of natural images, however, structurally different images may well have identical power spectra (Oppenheim and Lim, 1981). This allows for the possibility that image similarity, as captured by cross-correlation, would lead to predictions of masking effects significantly different from those to be expected on the basis of the power spectral characteristics or, indeed, the spectral distance, of signal and mask. While we intend to address these situations shortly, the findings of a recent study should be mentioned here. Caelli and Yuzyk (1985) explored the conditions under which an image appeared, to observers, to be perceptually fused with, segregated from, or hidden by, another image added to the first. The results demonstrated that their power spectra similarities were poor predictors
Tkaav CA&U and GU$@AOU)MORAGLIA
1326
of the perceptual outcome. Though image pattern information is more critically encoded by the phase spectrum (Oppenheim and Lim, 19811, the experimental results could not be predicted by the images’ phase spectra either. By contrast, these results were found to be largely accountable for in terms of cross-correlation. These findings, thus, strongly suggest that characteristics of the cross-correlators may provide valuable constraints for the understanding of the processes invoked in pattern masking, detection and recognition.
Kutikowski J. J., Marcelja S. and Bishop P. (1982) Theory of spatial position and spatial frequeney relations in the receptive fields of simple cells in the visual cortex. Biol. C&rnet. 43, 187-198. Oppenheim A. V. and Lim J. S. (1981) The importance of phase in signals. Proc. IEEE 69, 529-~41. Papoulis A. (1968) SYstms und Trmfwm.swith Applicatims in Optics. McGraw-Hill, New York, Rosenfcki A. and Kak A. C. (1976) Digital picture processing Academic Press, New York. Wilson N., McFadane D. and Phillips G. (1983) Spatial frquency tuning of orientation seftctivt units estimated by oblique masking. Vision i&s. 23, 873482. APPENDIX
Acknowle&e~nrs-Thisproject was funded by grant No. A2668 from the Natural Science and Engineering &search Council of Canada. The second author is supported by Grant No. RA2525 from the Alberta Heritage Foundation for Medical Rexarch. We would Iike to thank the two anonymous fwhcrs
whose mnst.nictive criti*
We prove here that matching two images in terms of tire peak value of their cross-correlation sad&es a least-squares matching criterion. Tbis latter criterion corrcspodds to the tinding of shift values (x,~) which rninienizc the fanctim
were
gnatiy apptwziatcdby the authors.
Jl(X*Y) =
tS(~-x,B-~)-_g(a,8)l~dQdB. siB =
(Al)
That is
REFEBENCES Caelh T. M. and Yuzyk 3. (1985) What is p6miwd when two images am com&ned? PerceptiOn11,41-@. Daugman J. (1983) Six formal pof twodhcn&nal a&ovOpiC VbuaI fdte~: StNCt& principksand frequency/otientotionsek%tivity,IEEE Trakt sysmm, Man cybmet. SbiC-13, 882-888. Graham N., Rolmn J. Cr.ami Na&das J. (1978) Grating summation in fovea and pefipbefy. Es&n &es. 1% 815-825.
ti(x,v)=
JJ JJ x,B-yhfa, 8)da dS JIf+. f*(cr -x,8 8 1
- 2
-y)dad@
+
g2(a. 8)
f(a - X,B - YMa. B)da dS. (A.21
B= Clearly (A.2) is minimized when r
* P is maxim&&, the tatter Wtg the mt&ng cros+&wl&in process.
tA.3)
criterion for a