Neural Networks, Vol. 4, pp. 543-564, 1991
0893-61180/91 $3.00 + .00 Copyright © 1991 Pergamon Press plc
Printed in the USA. All rights reserved.
ORIGINAL CONTRIBUTION
On the Network-Based Emulation of Human Visual Search JACK F. GERRISSEN Philips Research, Institute for Perception Research, Eindhoven. The Netherlands (Received 22 December 1989; revised and accepted 20 February 1990)
Abstract--We describe the design of a computer emulator of human visual search. The emulator mechanism is eventually meant to support ergonomic assessment of the effect of display structure and protocol on search performance. As regards target identification and localization, it mimics a number of characteristics of human search performance under experimental conditions of target/nontarget separability and stimulus complexity. Performance aspects accounted for, include: feature space adaptation, attention orienting and cueing, processing stages in early vision, data-limited and resource-limited processing, automatic and controlled processing, search asymmetry and illusory conjunction, spatial and temporal masking, and disengaged~engaged attentional states. Feature space adaptation results from the (supervised) training for optimized target/ nontarget discrimination. In the actual emulation experiments there are three processing modes: Global identity processing, Sectoraldirectional emphasis of identity processing, and Local identity processing. The chosen network approach (which was motivated on information theoretical grounds) proved instrumental in our quest for a set of 'humanoid' system behaviors that would function as a descriptive framework of visual search performance. Within the task constraints imposed on the human perception and action in the usual experimental setting the emulator may serve as a means to study subject training aspects, input data-limitation (e.g., low input excitation or short exposure time), memory data limitation (e.g., search asymmetries, illusory conjunction), and resource-limitation (e.g., display size, lateral marking), and the effects of self-paced or externally paced shifts of attention.
Keywords--Visual search, Feature space, Pooled response modeling, Visual attention, Visual search, Neural network, Primed network training. and ego motion--partial scene changes occur continuously. The term "visual search performance" refers to the selection behavior, and its efficacy.
1. THE 'WHY' AND 'HOW' OF EMULATING HUMAN VISUAL SEARCH
Selection of an element from the visual surround for further processing, for instance, with the objective of verifying its state or significance (in a given task context), is taken to be a crucial (early) phase of visual perception. Failure to select for further perceptual processing would render forms of functional blindness (e.g., Posner, Walker, Friedrich, & Rafal, 1984). If we (human beings) would not be equipped with the "early selection facility," then perception of our visual surround is built from a sequence of snapshots of which each needs to be fully processed before the next could be taken in for processing. Early selection is a prerequisite for real-time and continuous monitoring of elements or segments in a visual surround where--due to object change or motion
1.1. The Measurement of Search Performance
Visual search performance is generally studied in experimental settings where subjects are required to rapidly check the presence and, if so, the location of a search target which competes for visual attention with surrounding distractors or noise patterns. The knowledge gained from these studies is used to deepen the insights in visual sensory processing, and is being applied in the design of visual information (signs, pictures, displays, etc.). The paradigmatic assumption is that for discrimination of the target from its distracting surround, the observer's visual attention is directed to particular sensory features, which may include: spatial saliency, color, configurative saliency, and motion specifics. There is a principle learning effect on the observer's sensitivity for the discriminative feature(s). To not contaminate the search performance
Requests for reprints should be sent to Jack F. Gerrissen, Philips Research, Institute for Perception Research, P.O. Box 513, 5600 MB Eindhoven, The Netherlands.
543
544
/ I~i (;erri.sse~
results with the learning effects, in the experiment design usually the sensitivity variance is taken care of by either using visual stimuli with culturally inherent ease of discrimination (e.g., alphanumeric characters, regular shapes, symbols), or by exhaustive (supervised) training of the subjects on target., nontarget relations in the stimulus set, A second objective of the instruction or supervised training is to suppress the sources of variance "between" input processing and the required decision or action output (e.g., left/right button press). For the sake of psychophysical soundness the subject's perceptual action is thus downgraded to a level of performing a Stimulus-Response (SR) transformation.
1.3. Global Design Requirements
1.2. Emulation as a Means to Integrating Fragmentary Knowledge
We fully support this position m wew of research objectives having to do with modeling the animal~ total perceptual system. Our modest goal, however, is to emulate a subordinate process that is closer to conditioning than to the creative and intelligently adaptive meaning formation central to true visual perception. As was noted in Section 1.1. supervised learning and simplicity of input/output transformation are inherent to visual search performance in many experimental settings.
The research reported here is motivated by a longfelt need for a device, that from a human-factors point of view, could produce sensible predictions, or even reliable estimates, of human search performance for given stimulus displays. Thereby taking account of the many aspects of human visual search, including: feature space adaptation (Lambert, 1987), attention orienting and cueing (Posner & Cohen, 1984), processing stages in early vision (Treisman, 1985), data-limited and resource-limited processing (Norman & Bobrow, 1975), automatic and controlled processing (Shiffrin & Schneider, 1977), search asymmetry and illusory conjunction (Treisman & Gormican, 1988), spatial and temporal masking (Butler & Currie, 1986), and disengaged/engaged attentional states (Posner & Cohen, 1984; Fischer & Breitmeyer, 1987). Knowledge on each of these aspects has so far only been validated within the display structure and protocol of the corresponding experimental paradigm. For a human-factors evaluation of a display structure and protocol involving several aspects the fragments of knowledge do not offer much help in composing an overall prediction of the related visual search performance. As a consequence, for every new (or not yet documented) display structure and protocol the human-factors specialist needs to run elaborate experiments to come up with a sound judgement. An emulator of visual search performance, which properly integrates the fragments into a coherent framework, might save time and effort both at the part of the subjects and at the part of the experimenter, Instead of the term "simulation" we use the term "emulation" to emphasize our intention to mimic behavorial characteristics rather than investigate their underlying mechanisms, as is usually done in simulation studies.
1.3.1. Transformation Simplici 0 and Supervised Learning. There is a notable objection to perception models employing supervised learning principles in network architectures having onI',, a couple of layer, of processing between the cnviromnental input and the action or decision output, l.mskcr (1988). for instance, argues that: such d process involving more than a tew layers appears biologically implausible, and tts performance ma~ scale poorly as the number of layers ~- increased. . i. a complex network, or in an ammal's brain, it is totall~ unclear how a component layer is to "decide" what transformation its connections should perform- IF we assume that the laver needs to 'know" what environmental features are importanl for the animal to respond to . ~p 116~
1.3.2. Data-limited and Resource-limited perJormance. In the capacity, or resource, modeling vtew ot attention the success of the required structure identification depends upon both the quality of the data and the processing resources that are used (Norman & Bobrow. 1975). Attention can be seen as a pool of processing capacity that may be recruited in varying amounts for the execution of different tasks, possibly at different locations of the visual field. Whenever an increase in the amount of processing resources can result m improved performance, they say that the task (or performance on that task) ~s resource-limited. Whenever performance is independent of processing resources. Norman and Bobrow say that the task is data-limited. For efficient target/surround discrimination under substantial task loading, the human subject attempts to apply a search strategy which optimizes the tradeoff between discrimination error rate and the drawing on information intake resources, as these are constrained by the task loading (e.g., execution time limitation, low contrast, etc.). For instance, a need for longer eye fixation duration ( = a processing resource) is in conflict with constrained execution time of the visual discrimination task. and as a result. when the task is sufficiently loaded, the time constraint will impose a suboptimal (shorter) fixation, causing a performance degradation. Resourcelimited processes generally impose speed-accuracy trade-offs (Norman & Bobrow, 19751.
Visual Search Emulation
The emulator should allow for studying the impact of task constraints on the data-limited and resourcelimited character of search performance. Data limitations could be related to exposure time, exposure rate, and exposure quality of stimuli, while resourcelimited processing might be induced by numerosity of stimulus objects, uncertainty about target identity and target position, and the like. 1.3.3. Automatic and Effortful Processing. In a situation of resource-limited processing the human subject deals with external elements competing for allocation of processing capacity; they solicit, in a way, attention. Because of the limitations of central capacity, at a realistic level of task complexity a form of serial (or sequential) processing can be observed. In the related literature this is regarded effortful (or attentive) processing (see, e.g., Posner, 1978). When a process occurs without interference or potential competition with other (parallel) processes then, apparently, there is no need for coordinated allocation of attention, and--as opposed to effortful or attentive processing--sensory processing is regarded to be in the automatic-processing mode. Our emulator starts the search in a mode of automatic (parallel) processing aimed at global identity analysis of the stimuli and, if further identification and location are needed, then its mode of processing gets more sequential in support of increasingly local verification of the stimulus identity.
1.4. Feature Space Adaptation Preceding the Identification and Localization of Targets 1.4.1. Feature Space Adaptation. If the task execution is feature-based, the success of identification and location of target elements is expected to vary with target-surround discriminability along the relevant feature dimension(s). In other words, supervised learning for optimized execution of a search task produces a psychological set, which might be regarded a feature space adaptation such that the discriminative target features have a minimal overlap with the nontarget features. This bias or foreknowledge about the target category of features and its relation to the surround causes the processing resources to be "specialized" towards the detection of target category elements (e.g., Lambert, 1987). Thus, the emulator should allow for a learning phase similar to the traditional target/nontarget discrimination learning in visual search experiments, i.e., starting from a set of basic feature dimensions there is a specialization (adaptation) with respect to the specific target/nontarget combination(s) to be dealt with. Once learned, the capacity to optimally deal with a specific target-and-surround category may be "'stored" as a characteristic "neural" structure, and made available as a set or "vocabulary" (Treisman & Gormican, 1988) to be recruited at a
545
next occasion. Shiffin and Schneider (1977), in their treatment of automatic and attentive processes, have found that exhaustive learning potentially establishes an increased capacity for automatic processing, and report evidence for hypotheses concerning the storage, maintenance, and initiation modes of this capacity. Establishment of feature-space adaptation (either by learning or by instant recruitment of a vocabulary) is the starting condition that biases the three modes of processing to be implemented by our emulator: Global identity processing, Sectoral-directional emphasis of identity processing, and Local identity processing. 1.4.2. Global Identity Processing. In view of the potential variety of sets or vocabularies, Treisman and Gormican (1988) propose the Pooled-Response Model in which they conjecture the interactive coexistence of a map of locations that "specifies where in the display things are, but not what they are," and a free-floating-feature representation that potentially supports the what analysis, but does not capture the actual locations of the features. Figure 1 depicts the generics of the Pooled-Response Model, here showing a feature space composed of three feature maps containing a global decomposition along the three corresponding feature dimensions. The map of locations is used to emphasize in each of the feature maps a localized region of (sub-)patterns exhibiting the features. The term Global Identity Processing refers to the identity classification based on feature map contents, but without the localized emphasis imposed by the interaction with the map of locations. As regards the identity/location ("what/where") process duality, we shall follow a search-oriented line of argumentation, meaning that in the prototype situations within the visual search paradigm the subject faces the task to rapidly decide whether or not there is a Recognition/classification processing Temp~ora! Object Representation
"eature X m ps
I
I
",. feature Z maps
/,
Map of locations
FIGURE 1. The framework of locations map and feature maps as suggested by Treisman and Gormican (1988) in support of their Pooled Response Model,
546
./ b~ (;eeriss(:~
target-like pattern in the field of view, and if there is, then find out where it is. We argue that an early and global identity decision is key to the efficient allocation of (expensive) location processing resources. In addition to the global identity decision, to further increase the efficiency of allocation there is a sectoral-directional mode "between" the global and local modes of identity processing. 1.4.3. Sectoral-directional Emphasis of Identity Processing. A number of researchers have found that attention cueing (= foreknowledge of spatial target location), produces benefits not just around the "indicated" spot but rather across the retinal segment where the cue occurred (e.g., Tassinari, Aglioti, Chelazzi, Marzi, & Berlucchi, 1987). Rizzolati, Riggio, Dascola, and Umilta (1987) propose a premotor theory of attention and postulate a link between covert orienting of attention and the programming of explicit ocular movement. They argue that the direction component of the programming be available prior to the distance component of the movement. This direction-contingence of location is supported by findings from Neurophysiology (Bruce & Goldberg, 1985; Cohen & Henn, 1972; Hou & Fender, 1979). Many experiment results can be taken to support the hypothesized directional facilitation, i.e., there are reaction-time or accuracy (error rate) benefits within a retinal sector centered on the directional axis of the cue (Cheal & Lyon, 1989; Eriksen & Webb, 1989; Julesz & Bergen, 1983; Muller & Rabbit, 1989; Posner & Cohen, 1984). The effects of directional facilitation, and related aspects of spatial and temporal masking are manifest in the way the emulator "narrows down" the initial global identity analysis (across the whole field of view) to identity analysis in a relevant sector only. 1.4.4. Local Identity Processing. Direction is but one component of location. Local rather than directional facilitation may be established in three ways: (a) by an external explicitly positional cue (e.g., Cheal & Lyon, 1989); (b) positional expectation (e.g., the arrow experiment by Posner & Cohen, 1984; also: Egly & Homa, 1984; and Muller & Findlay, 1987); (c) development of a more localized region within a retinal sector where there was initial sectoral-directional facilitation. Bruce and Goldberg (t985) find neurophysiological evidence for the latter route to local facilitation. We implemented (c) in a way to also accommodate (a) and (b).
of their theoretical constructs. One ~mportant construct was termed "Levels of Processing," meaning that • . . image dimensions, such as mtensmes, wavelengths, retinal locations, and binocular disparmes, are coded at one level• then combined and transformed to define at another level the dimensions of real-world objects and surfaces, such as reflectances, surface colors, distances. and locations in three-dimensional space. Early segregation and grouping may depend on one set of elements, and a new vocabulary of elements, specialized for the purpose, may be recruited to describe objects rather than local surfaces and edges . . . (p. 16). We formalize this by way of a four-level "Global Structure System" (Gerrissen. t982). At level t) (dataless systems) the visual world is the object system and image systems represent the objects as scene patches with well-defined variables of visual quality (luminance. color, contrast, etc.). At level 1 we find data systems which segment the scene m clusters of image points exhibiting visual qualities that make them perceptually alike (see also the segment definition by Ballard. 1984). Inadequate or erroneous input data may invalidate processing results at higher levels: ideally, the data systems organize their processes to comply with the data relevance and data quality conditions which would optimize the efficacy of the resources recruited at those higher levels. At level 2 (generative systems) a feature system was defined, registering feature occurrence in a feature space. Structure and vocabularv of the feature space is such that the structure system at level 3 could apply its structure identification function for successful detection, discrimination, or recognmon. Attentive (or effortful} allocation of resources is found at each one of the levels 1. 2, and 3. Input data quality is established at levei t only, and is at best maintained in the higher level processing In the first section we discuss data and resource limitations at the input (level 1 ) of the emulator, and the possible trade-offs for optimized search performance. In the second section, a rate-distortion-theoretic treatment of feature encoding (level 2) and feature space adaptation ~s given. In the third section it is shown that learning and consequent classification properties of back propagation (BP) network architectures can very well implement the discussed feature encoding and feature space adaptation. The resultant structure identification (level 3) based on optimal target/nontarget feature classification shall be discussed in Section 3.
2. DESIGN CONSIDERATIONS
2.1. Data-Limited and Resource-Limited Intake of Visual Data
The basic architecture of our emulator was inspired by the Treisman and Gormican (1988) views on "response pooling," and in the sequel we shall use some
In this section we discuss the resource-limited and data-limited processes at level 1, the input of the emulator. We think of the input data system as a
Visual Search Emulation
547
lOS ]. ® lO4 >,m
lO3 ~
o "o
102
I C=1.0 ]
~=~N~
a • -° --
IogP=2 IogP=3 looP=4 IogP=6 IogP=8
.E ¢Z 1
0
"
1
0
I
~
1o
20
retinal eccentricity in degr. of angle FIGURE 2. Information capacity at different retinal eccentricities for different levels of average photon counts (i.e., log I. t = 2, 3, 4, 6, 8) and mean contrast C = 1.
retinal array of intensity measurements. With a finite number of array elements, the reconstruction of detail of the visual scene suffers from quantization noise. Snyder, Laughlin, and Stavenga (1977) have developed a computational framework for evaluating the information capacity of a humanoid retinal array. The computed information capacity is a logarithmic measure of the number of distinct patterns possibly encoded by the array, at given element size, photon intensity I and its fluctuations C (or contrast) averaged over time. For a kind of retinal array (see Figure 5), as we use in our emulator, built from concentric rings containing small receptors near the center but increasing in size toward the periphery, we have computed the information capacity (Gerrissen, 1982) as a function of mean photon intensity 1 and contrast C, eye integration time ~, and retinal eccentricity. Results are graphically depicted in Figure 2. We see that, at givenC = 1.0, for low photon counts P (e.g., log P = log I.~ = 2) the central region is very poor on information capacity, while for high photon counts (e.g., log P = 8) it is superior to the rest.
At certain levels of photon count (e.g., log 1.~ = 3) information capacity is more or less uniform over the retinal array, whereas for other levels of photon intensity there is an optimal information capacity at some eccentricity. When confronting a visual stimulus, at given ) and C, the eye integration time r is a modulator of quality of information intake (Gerrissen, 1984a, 1984b). The integration time, the particular array configuration, and the photic properties of the modeled retinal receptors, in fact, constitute a resource allocation facility at level 1 (data systems). Allocation of the resource, i.e., the available optimal information capacity, is a ppliedin response to the objective data conditions I and C. Due to limitations of duration of eye integration (at the low end a minimum counting time is needed to overcome photon noise, and at the high end for several reasons are we limited to a few seconds at the most) we deal with a finite pool of information capacity. Hence, general "limited-capacity" arguments might be applied to analyze the data-limited
10 4 ,
~>" a. o ,o
1°3 'i. ~
o
= ~
! =
1oiI
]
"-
~
l
o .001
o .01
~ .1
--
ecc. = 2 ecc. = 4 ecc. = 6
~
102 •
o= E
~
1
•
ecc. = 8
-
ecc. = 12
---
e=.=
A
ecc.= 20
16
10
t in secs FIGURE 3. Information capacity development for increased integration time is of a data-limited nature for eccentric retinal locations, while for more central locations there is a cross-over to a resource-limited nature.
i F, (;err~sen
548
and resource-limited aspects of the level 1 processes. The curves in Figure 3 depict the development of information capacity as a function of integration time for a number of retinal eccentricities, when I :-: 100,000 and C = 1.0. Both data-limited and resource-limited processes may be identified. At 2 degrees of eccentricity (Figure 3) there is a steady increase of information capacity ( = the resource) when the integration time is extended. If integration time is too short, then at this eccentricity data intake might suffer from tack of spatial resolution, but as more time is alloted the quality of data intake improves due to increased spatial resolution. Thus, in the depicted situation we have resourcelimited processing at 2 degrees eccentricity. Whenever performance does not depend on processing resources the task is data-limited. The curves (Figure 3) for 8, 16, and 20 degrees of eccentricity clearly reach their respective asymptotes of information capacity. Further increase of integration rime has virtually no effect on intake performance. At these eccentricities we have data-limited processing for given 1 and C values. The curves for 4 and 6 degrees of eccentricity (Figure 3) are in transition from a resource-limited to a data-limited nature of processing. The curves, seen as a sequential development of information capacity, show how optimal allocation of information capacity "grows from the outside in " i.e., information capacity steeply increases at a certain eccentricity until it tapers off, to gradually change into a data-limited process, while a more central part takes over. This cross-over point represents the optimal intake condition at that eccentricity. ~mplicating a maximization of performance without wasting process resources. Further increase of integration would delay the response, while resultant improvement of accuracy would be less than at the more central part. The particular integration time involved was coined the "optimal intake moment" ~ (Gerrissen. 1984a). For the range of usual integration times (10 < r < 200 msecs), the daylight range of 1 and C conditions, and an array cell diameter Aq~g(at retinal eccentricity i) we have found that information intake is optinal at moment
eel
,::ell 'XC ?f. at. iO r ,
ret
n@l e c c e n l r l c i t y
:e11 ~ c L i v a t l o n
FIGURE 4. The "smoothed.threshold" transfer function at the Input cell layer; the four stages of Input !nt~ration appiled in the emulations span the range from iubthreshold to saturation excitation.
two faces of the integration time versus eccentricity relation: 1. Information intake in the paracentral region of the retina is optimal at short integration times (for the human eye in the range of 30-60 msecs); 2. The quality of information intake in the more central part may to a large extent be maximized by the integration time. As a correlate to eye integration time m our emulations we employ an incremental development of input activation (see Figure 4), so that at the mean stimulus "'intensity'" in the first increment only peripheral parts reach a significant activation, while at the final increment the central region may reach its peak information capacity. 2.2. Feature Space Adaptation Rate-Distortion Theory may be applied to feature encoding (Gerrissen. 1982), such that the encoding performance is equated to the average distortion at a rate of information transfer from object data to classified feature(s). Equation (3) evaluates the average distortion as a function of a p r i o r i object characteristics and subjective process "settings." d ~ ~_,~__, P(j)Q(k/j)#(i~ k),
(3)
k
where:
(2)
• P(j) = the a p r i o r i probability of occurrence of the (discriminative) feature j in the visual input; • Q ( k / j ) = the probability of a j-feature causing a target identification (if k = the target), or a nontarget identification (if k = a nontarget); • p ( j , k) = the distortion coefficient, which may be regarded the cost of a k-classification in case of a j-feature.
The hyperbolic nature of this function highlights the
We suppose that t h r o u g h a d e q u a t e and unconstrained target/nontarget learning the human subject
~(L C, A,~,)
2
(A¢,,)~/C'
(1)
If, as a first approximation, we assume the cell diameter (A~) to be linearly proportiona__ll (constant k) with eccentricity (a), then for given I C
~('~')
-
(k,~,):
Visual Search Emulation establishes a processing bias ("set"), which minimizes d by emphasizing the low costs, while de-emphasizing the high costs. Low costs are generally associated with nontarget features causing a nontarget classification, or target features causing a target classification. All other "causal" relations between features and target/nontarget classifications then incur a higher cost. We designate with {j,,} occurrences of one or more instances from specific nontarget feature classes and with {j,} occurrences of one or more instances from specific target feature classes. Rate-distortion optimization requires that with a given target/nontarget set of stimuli and its related frequencies of occurrence of {J,,t} and of {jr}, the discriminative encoding matrix Q(k/j) should be composed in a manner to maximize the rate of feature encoding while having the average (cost-weighted) distortion d not exceeding the permissable-error criterion. For a treatment of the rate constraint in relation to the information capacity of the encoder and the permissable error we refer to the literature on rate-distortion theory (e.g., Berger, 1971). Classification on the basis of a feature space division in a class {in,} of specifically nontarget features and a complementary class {j,} of specifically target features is a simplification that does not properly take account of the occurrences of combinations of features drawn from both classes. We argue, however, that if Q(k/j) is indeed optimized in a way to obtain a small d then the incurred inaccuracy will not fundamentally affect the reasoning behind our hypothesis that feature encoding characteristics need to change with changing target/nontarget stimulus sets, to maintain optimal target/nontarget discrimination performance. In other words, at a given information capacity of the encoder (see, e.g., Figs. 2 and 3) if the permissable-error criterion is Dp then a matrix composition Qp(k/j) is needed for optimal rate of encoding of a stimulus set with frequencies of o c c u r r e n c e Fp{j,} and Fp{j,,}, respectively, such that d -< Dp. Any other feature class division (Fq{j,}/Fq{j,,}) would lead to either a failure to meet the distortion constraint or a suboptimal rate of encoding. The (p, D~,, Qp, Fp{j,}, Fp{jn,})-tuple is a model of the encoding tactics developed by the human subject in the learning or instruction phase preceding the actual experiment. The p, D, and F-s are task-driven entities while development or recomposition of the Q implies the feature (encoding) space adaptation in response to these entities.
2.3. A Connectionist System Emulating the Feature Encoding and Classification In this section we briefly motivate our "connectionist option" for configuring the emulator's feature space.
549 Feature space adaptation, as described in the preceding section, might well be considered the product of a Maximum A Posteriori (MAP) estimation algorithm in search for "most probable" values of a random variable given some knowledge about the actual value distribution of the variable. In "A Unified F r a m e w o r k for Connectionist S y s t e m s , " Golden (1988) expresses this view of the MAP-character of connectionist classification by defining optimal classification as computing a "most probable" or MAP estimate of the response vector for a given stimulus vector. Optimal learning in this context is defined by Golden as "computing MAP estimates of the network's inter-unit connection strengths for a given set of vectors learned by the network." Following Golden's (1988) formalism, connectionist classification is characterized by the system's (or network's) subjective probability function p,(X; A) and a function V(X; A), where X is the input vector and A the (multidimensional) vector of connection strengths in the network. Changes in the classification dynamics are reflected by changes of V(X, A), i.e., the classification process minimizes the function value for the input expectancy X*, and if this input expectancy changes or if the connection strength matrix changes then there will be a general change of V-values for the given set of input vectors X. In other words, the classification process attempts to minimize the distance between V(X; A) and V(X*; A) by adaptation of A for a training set of input vectors X (invoking a certain expectancy X*), and while doing so evaluates the function V(X; A). Golden (1988) states and proves "his" Fundamental Theorem which connects the function V(X; A) to the subjective probability function p.~(X; A) expressing the network's belief that the X occurs in the network's environment. For the BP network architecture applied, this function is a conditional probability function p,,(O]l; A), indicating the network's belief that O is the appropriate response for stimulus I given some set of connection strengths, A. If O were the target/nontarget response, and I the input features drawn from the {jn,} and {j,} feature classes, then the subjective probability function could well be compared to the discriminative encoding matrix Q(k/j) in view of its assignment tactics that stem from the input-expectancy-related constraints in eqn (3). Thus, the MAP character of BP connectionist classification can be made to implement the suggested rate-distortion-based optimization of response, and consequently we propose a BP connectionist architecture that develops encoding tactics similar to those implemented by the discriminative encoding matrix Q(k/j). The role of the distortion coefficient p(j, k), the permissable-error criterion D, and the frequencies of occurrence F{j,} and F{j,,} will implicitly be
550
incorporated through the training conditions and the classification criteria applied in the network learning phase. Information intake constraints, as discussed earlier, are being imposed on the activation patterns at the network's input.
r a d i a l beam (RB)
/
.
\,
J
ceil layer
In this Section we discuss the actual emulation of identity processing, illustrating the formation of the feature space, and the verification of the adaptation hypothesis. The validity of this "early part" of the emulator (yet lacking the localization process) as a descriptive framework for phenomena in human visual search was evaluated by checking its target/nontarget discrimination performance against observed human discrimination performance in a number of paradigmatic settings. Part of the results of this evaluation has been reported in an earlier publication (Gerrissen, 1989); in the present paper we have chosen to only demonstrate the model's descriptive power in a "search-asymmetry" setting, which is known to induce a few salient characteristics of human target search. We focus on feature-space adaptation, and the first two modes of identity processing, i.e., global identity processing and sectoral-directional emphasis of identity processing (Section 1.4).
3.1. Network Training for Global Identity Processing The network's input is a crude representation of a retinal hemifield, built from cells arranged in radial beams (Figure 5). A radial beam (RB) is a single file of eight cells beginning with its smallest cell in the center of the hemifield, and ending with its largest cell in the periphery. Adjacent active RBs may constitute a sector. There is, in fact, an underlying input layer of uniformly distributed "counters" of input excitation which "feeds" these retinal cells. We see that larger cells receive activation from more counters, and that the receptive fields of the cells overlap. It has been shown (Linsker, 1988) that a layer structure of this kind (overlapping local feedforward connections) can be self-adaptive so as to develop center-surround cell or even orientation-selective cell formation, confirming the Hubel and Wiesel (1968) findings. However, for reasons of flexibility and simplicity of feature learning we maintain "hard" links between counters and cells. The only neural characteristic of the input data system relates to the application of the sigmoidal activation function:
mDut'coun{ers'
N;N ~
(4)
\ .//
n e m i f i e l d bUiH f r o m radial bearrs :; r : e [ 1s
FIGURE 5. The ~
~
array built from radial beams,
each co.tal~ng ~ : ( o v ~ ) " ( : . s , "
~ ~ .
that Increase ~ : ~ a l
w ~ ce. sites
with net~ = ,~ f , , q + &,
(5)
i
where • f,i = 1 if there exists a link from counter j to cell i. else fij = 0: • q = the output of counter ]; • 0i = the bias at cell i input: In subsequent layers of our feedforward network (Figure 6), a similar activation function implements the smoothed differentiable version of a threshold function that is generally found in BP multilayer networks (see, e.g., Rumelhart & McClelland, 1986). Its application in the input data system adds to the "neural" character of input data processing, in that cell excitation only leads to significant activation after it exceeds the cells noise threshold and that activation saturates at high excitation. Because of the temporal development of input
I 2nd layer of 10 'cells]
l.ttitttt 'cells organized In 12 radl~l Dean'lSI wltn 8 'cells' eac~ .
_ .
.
. RB1
FIGURE 6. The four-layer ~ ~ack-propalwna. m ~ m r , ~
taq~tm
1 1 + e ....'
;
!!/ /
3. EMULATION OF EARLY IDENTITY PROCESSING
a, -
t
fl'oms flint (Figure 7).
.
. . RBi÷ 1
t
_
RBi+2
that ~ in a conventional o ~ ~ i.dno--
Visual Search Emulation
551
Since the links between the two layers of the input data system do not take part in the network longterm dynamics (the learning), in the further discussion we will only refer to the cell layer. In our current implementation of the model we have a hemifield consisting of 12 radial beams with each 8 cells, making up an array of 96 cells. These cells project their inputs into a multilayer network (Figure 6); all cells are fully connected with the nodes in the first hidden layer. In a first phase of network learning, a subnetwork architecture (Figure 7) was used to establish a basic feature space. The subnetwork was trained to classify the orientations (vertical, horizontal, 45 degree of angle, 135 degree of angle) of line elements incident at the input array, regardless of their actual position. This feature abstraction capacity was then, at the start of a second phase, transferred (in the form of the resultant matrix of connection strengths) to the larger network (Figure 6), for priming of the learning of target/nontarget classification. Target and nontarget patterns were built from the line elements used in the first phase. In going from the first phase to the second phase, the input cell layer remained unchanged, while the first hidden layer was extended with 7 new cells, and the output cells from the first phase were incorporated into a second hidden layer consisting of 10 cells. These extra cells were included to enable the development of additional feature dimensions. For transparency of exposition in the present paper we shall only report and discuss our findings when using a single-feature dimension: orientation. It should be noted, however, that an important property of our emulator is its flexibility of feature space establishment. Any set of feature dimensions (e.g., element
lt ttttt t ce115 o r g a n i z e d m 12 r a d i a ) b e a m s _ with 8 cells eacr~--
-"
--~ J ~ t o t o t , 4 1 1
~
RBi
1~,~4.~41.,9 RBi+I
. . . .
RBi+2
FIGURE 7. The subnetwork that was trained to classify four basic orientations of line elements (vertical, horizontal, 45 degree of angle, and 135 degree of angle), regardless of their actual position on the array.
activation (Figure 4), as first mentioned at the end of Section 2.1, the larger, eccentric cells will achieve a significant activation at an earlier moment (e.g., at tl or t2) in the input-integration interval (correlate of eye integration interval) than the smaller, more central cells. Later--say at t4--in the integration interval the latter may cause a higher activation per unit of array area because of the sigmoid-functioninduced saturation of eccentric cell activation. Under conditions of unconstrained stimulus exposure, at moments t2 and t3 there may be same or more information capacity per unit area in the central region than there is in the periphery due to the difference in spatial resolution (Figure 1). For the discussion in this section we assume input integration to be at t2 (Figure 4) when the excitation of small cells has begun to exceed the threshold, while the largest cells in the array have not yet started to saturate. From Section 3.3 onwards shall we discuss the particular role of input bias 0i and the emulator mechanisms which more extensively exploit the temporal development of activation.
Ca ]
I j
I
Cb )
I I i -- " •
'_
C¢ ]
Cd ]
f" -- --I
I
I
I
l__'
r" - - I I
'
I
'
I ~--~ / I--," , /
nodes
!arget ~ontarget
0.098 0.902
0.908 0.092
0.333 0.661
0.872 0.128
FIGURE 8. Claselfication performance with single-pattern stimuli, while the N and the K were not part of the training set of patterns; targetnese values in response to the X and the H confirm the success of satisfying the learning (error) criteria applied, while In response to the N and the K there is a clear classification bias related to the established class separation in feature space.
552
size, curvature, color, etc.) deemed sensible in view of the applied stimuli may be learned to span the feature space. Line orientation is an arbitrary first choice of feature dimension. Two nodes constitute the output layer; one output node evaluating the "targetness" of the input pattern, and the other evaluating the "nontargetness." Knowledge gained from the first learning phase was in a way transplanted by priming the weight matrix of the second phase with values from the weight matrix of the first phase. Learning started, figuratively speaking, from a level of knowledge of line-orientation features. The second hidden layer could, already at the beginning of the pattern learning phase, exploit the classification qualities installed in the connections to the four cells, which formerly served in the output layer of the feature-learning network. The output cells produce values in the interval {0, 1}, which might be regarded probabilities of target or nontarget presence in the hemifield. Figure 8 illustrates classification performance of the network. In each of the depicted cases an isolated pattern occurs at one of the five possible locations on the hemifield. The network learned to classify X as the target and H, L, and T, as nontargets. Consequently, in cases (a) and (b) we find proper target and nontarget values at the output nodes. For the cases (c) and (d) the network gracefully generalizes its target/nontarget discrimination capabilities for the (biased) classification of patterns which were not part of the training set of patterns. Targetness evaluation should be regarded an answer to the question: "Do we have something of a target in view?," prior to facing the subsequent: ". o . If so, where is it?" It will be shown in the sequel that when introducing nontargets on the hemifield competing with actual target presence, the network responds with lower targetness value, implying a degradation of target certainty.
/. h Gerrt.~et;
8); (b) FIX-FEATURES, where the connectum weights matrix from the first phase was kept un-. changed, and the network could only reorganize the connections outside this matrix in order to achieve correct classification of patterns: (c) N O - F E A ~ TURES, where we did not transfer any prior learned feature "knowledge," but instead had the network learn the patterns from scratch. Figure 9 depicts the rate of convergence in pattern learning when we af,plied the three different approaches, Note, that PRIME-FEATURES and FIX-FEATURES started with a trained subnetwork, while NO-FEATURES started with a completely (randomly) initialized network. In Figure 10 are depicted the ~ctivation profiles of the second hidden layer in the cases of the three single-pattern stimuli, each at one of the five possible locations in the hemifield, as shown in the upper row of the figure (first a single L at position i: then a single T at position 5, and finally a single X at position 3). In the second row we find the activation profiles when applying respectively the PRIMEFEATURES approach, the FIX-FEATURES approach, and the NO-FEATURES approach. Target/ nontarget classification was in all eases correct and well above the 0.9 mark. With the two approaches that ,~hare the line-feature learning phase we see the clear utilization of the feature sensitivity established in ~he first (leftmost) nodes of the second hidden layer, ie., pattern X abundantly activating nodes 3 and 4. while inhibiting nodes 1 and 2, and patterns I. and T causing a somewhat reversed profile among those four nodes. with the exception of node 2. which for the FIXFEATURES approach does not get near the fir.,~ node activation level. The activation profiles for the NO-FEATURES approach also show complementary shapes in the 10
3.2. Effects of Prior Feature Learning First-phase learning of the line-orientation features establishes a ("free-floating") feature pool that underlies the "vocabulary" (Treisman & Gormican, 1988) of specific feature combinations which, in turn, supports the classification in the target/nontarget layer (see Figure 1). In order for us to get a feel for the impact of prior feature abstraction learning, we experimented with three different approaches to feature-knowledge transfer: (a) PRIME-FEATURES, where we only primed the pattern-learning phase with the connection matrix from the first phase, but allowed the network to freely modify those connection weights as needed during pattern learning (we followed this approach to learning in the case depicted in Figure
1
j
.1 >
O
.01
PRIME-FEAT, FIX-FEATURES NO-FEATURES
-----,t--'--
•
.001 0
1000
2000
Iterations
FIGURE 9. Performance of ~
target X and n ~
H i L, ~
~ing
T; ~
and nX-FEATURES ~ ~ ~ reduction, but stem overall emeor~ 0.2 the curves d i ~ , fast as In the PRIME-FEATU~ ~ pensation for the lack of f i r s t ~ N
In the ~ o f
P ~ R £ S
~ ~ ~~iwound
~
of eft'or
~
~ ~Mt~ I t h d ~ priming.
Visual Search Emulation
553
i_ _ i,. I
I
~
FIX
[]
No
I---I
I
I
I
i" - "" . . . . I I I
I": - - ~ I i |
PRIME []
I I ii--I I I I
I I . I
t I I
I
I .
.
.
. I
I
I
I i t
I
i
I
I
06
~ o4
1
2
3
4
5
6
7
8
9
10
1
2
3
4
5
6
7
8
9
10
1
2
3
4
5
6
7
8
9
10
nodes in second hidden layer 10. The activation profiles of the second hidden layer prove that the P R I M E - F E A T U R E S a n d t h e F I X - F E A T U R E S networks more or less follow up on the feature classes established in the first learning phase, while the N O - F E A T U R E S network establishes a feature space division of a very different nature. FIGURE
respective target and nontarget situations, but now all nodes seem to participate in pattern discrimination. One might have expected that with such massive participation, the NO-FEATURES approach would produce networks with discriminative characterist-
ics superior to those resulting from the PRIMEFEATURES and FIX-FEATURES approaches• We studied discrimination robustness in a setting as illustrated in Figure 11. The depicted three-pattern stimulus was "presented" at the input of the same (fully trained) networks, while we observed the tarPRIME-FEATURES n e t w o r k
1.0
target node
0.8
x)
I
I
~
I
0.4
"L_.',-'-r,I I
i
06
02
i
O0 1
2
3
4
5
6
7
8
9
10
nodes In second hidden l a y e r
10 0,8 ¸
~o 0 6 -
4~
FIX-FEATURES n e t w o r k
17~ tarqet node ~ i activati°n
NO-FEATURES n e t w o r k
lO 0.806i
~J
04-
~'~ target node
~N~~
activat~'7
CO
0200. 2 3 4 5 6 7 8 91
00"" I
2
3
4
5
6
7
8
9
I
nodes In second hidden l a y e r F I G U R E 11. Activation profiles a n d t a r g e t n o d e values for the three approaches in the case of a multipattern stimulus at the retinal a r r a y ; t h e N O - F E A T U R E S n e t w o r k p r o d u c e s the l o w e s t t a r g e t n e s s value•
554
get node activation and the activation profile of the second hidden layer. It should be noted that in the pattern learning cycle, for every three different nontarget instances (H, L, and T) there were two identical target (X) instances per hemifield position, and as a result, target learning is more extensive than individual nontarget learning. In the case of the three-pattern stimulus, the one target (X) succeeds in more or less neutralizing the activation from the two nontargets (T and L). From the activation profiles in Figure t 1 it could be observed that the X-like activation is more pronounced than the L-or-T-like activation for the P R I M E - F E A T U R E S and FIX-FEATURES networks, For the NO-FEATURES network, however, there is not such X-like dominance. This observation is reflected in the activation level of the network target nodes, i.e., the PRIME-FEATURES network target node reached 0.69, the FIX-FEATURES network target node 0.76, while the NO-FEATURES target node only reached 0.53. Addition of more nontarget patterns to the input stimulus caused a rapid breakdown of the level of activation of the NOFEATURES target node, which contrasted the continued robustness of the two other networks. This is in itself a notable result; feature-based priming has its merits in terms of classification generalization along the chosen feature class partitioning. But a more essential reason for prefering the PRIMEFEATURES and FIX-FEATURES over the NOFEATURES approach for our particular modeling purposes, is the actual presence of a representation of a feature map (the activation profile of the second hidden layer) with an a priori meaning, such that map development or change can be described and reproduced. We shall from now on refrain from the NO-FEATURES approach, and further compare the two other approaches. Once target/nontarget discrimination learning has stabilized, and has reached the criterion level of minimal 0.9 activation for correct node (target or nontarget), FIX-FEATURES networks perform as good as or better than PRIME-FEATURES networks. But the discrimination learning part is problematic for the FIX-FEATURES networks. As long as the discriminative feature conjunctions correspond largely with the learned line-orientation features, then discrimination learning is feasible but very slow, as could be inferred from the curves in Figure 9. If, however, the discriminative feature conjunctions do not neatly map onto the set (or a subset) of line-orientation features, then learning becomes extremely slow, and often tends not to converge toward the 0.9 criterion level. Figure t2 depicts the learning curves for a training set in which the pattern T was the target and horizontal and vertical bars were the
;.:. ~ Ge;'r~ssct~ 10
t,
PRIME-FEAT. FIX-FEATURES
1
,1 m o
0>
.01 .001 o
1000
2000
Iterations FIGURE 12. Learning perlormancein PRIME-FEATURES and FIX-FEATURES networks, when T Wall the ~ and hori-
zontal and vertical bars were the nontargots; the FIX-FEATURES network le not only ~ slower, but r e a a ~ its asymptote at a high overall error of c l u a l f l e M I o n .
nontargets; the PRIME-FEATURES network converges after a limited number of iterations, whereas the F I X - F E A T U R E S network's learning curve reaches an asymptote at a high level of overall error. This problematic learning behavior of the FIXFEATURES networks makes them tess desirable for our emulation purposes. The training set that caused the learning performance as depicted in Figure 12. for instance, was needed for the emulation of visual attentive performance in situations that impose "search asymmetry" and "illusory conjunction" characteristics (Gerrissen. 1989). A conclusion more fundamental to the issues treated, is that the FIX-FEATURES" failure to accommodate equally well different target/nontarget combinations could be seen to support our "Feature space adaptation" reasoning. For optimal discrimination performance in cases of different target/nontarget combinations, corresponding recompositions of the feature space are needed. The PRIME-FEATURES networks do. indeed, fulfill the adaptation requirement and succeed in discrimination-learning for different pattern sets. It was decided, on these grounds, that our further emulations of visual discrimination performance would be exclusively based on the PRIME-FEATURES approach.
3.3. Sectoral-Directional Emphasis of Identity Processing Figure 13 illustrates some of the (PRIME-FEATURES) network's performance characteristics in multicharacter cases. In the first two cases (Figs. 13(a) and 13(b) one might observe the general effect of lateral masking, in that the targetness value ( = activation level of the target node) is lower than in the case of isolated target occurrence (see Figure 8). The third case (Figure 13(c)) illustrates the tend-
Visual Search Emulation
555
[°]
Cb]
,-,
',~
I_
h--, 7 U:;;"
,.,,--,,,,
LP;"
LbY
nodes :arget
~ontarget
0.878 O.123
0.824 O.178
0.556 0.450
0.736 0.268
FIGURE 13. Classification performance in terms of targetness and nontargetness values in multipattern cases.
ency for the targetness evaluation to break down in the face of more interaction with nontarget patterns surrounding the target. The fourth case only differs from the third case in its increased relative "contrast" of the target, which in our simulation was implemented through amplification (1.5 × ) of input activation at the cells contained in the target (X) projection. As a result, a significant increase of targetness value could be observed, which might be interpreted as an increased certainty about target presence in the hemifield. This is in accordance with the general experience that a high contrast visual target tends to stand out among surrounding lower contrast patterns. If we consider object contrast enhancement to be an objective (or external) bias of feature analysis, then presently we introduce a mechanism that causes
a subjective (or internal) spatial bias of feature analysis. The grouping of cells in radial beams is essential to the functioning of the mechanism we now describe. From Figure 14 we see that the ceils within a radial beam (RB) might be collectively disinhibited by means of efferent signals originating from the Sectoral Bias Controller (SBC). The cells communicate their outputs along two paths: (a) they project their level of activation to a first hidden layer of the network, and (b) per RB they participate in activating their RB-summator. RB activity, as relayed to the SBC by the corresponding RB-summator, may trigger a disinhibitive feedback to the cells of the RB so as to increase their input bias 0i (eqn (5)). As a result, the incoming excitation (from the 'counters') is boosted at the cells' input, leading to increased cell
+
SecTM
rese )-t
--
ta.ge yn0ntarget
input
--
~#~.~IJ,._.RB-summators
dlSlnh~b%erY
feedback, ~
N
--
"-
-- r
. --~ f
[
I I
~
~
~
I
I s °nd h dden I
~ iIi.i~ -...i-i~ RBi f RBI+lf
I f i r s t hidden
~
\X . ~ I RBi+2
.
",, input layer .
.
FIGURE 14. Extending the initial feedforward network with a selective feedback path, where a Sectoral Targetness Monitor (SecTM) checks for targetness improvement when the Sectoral Bias Controller (SBC) selectively dislnhibits one radial beam of cells, or a few radial beams (forming a sector); if the SecTM does not register a significant improvement then it issues a reset to the SBC, which results in a reallocation of disinhibition to another active sector (or single radial beam).
556 activation. If the increase of activation causes a significant improvement of targetness then the Sectoral Targetness Monitor (SecTM) allows the maintenance of the disinhibitive feedback, and a further development of input activation may eventually establish target certainty. If, however, there is no significant improvement of targetness the SecTM will issue a reset wave towards the SBC disabling its nodes currently involved in the application of the feedback. This causes cancellation of the current selective disinhibition, and reallocation of feedback to another active sector (in our system, adjacent high-activity RBs are being clustered into a sector). Disinhibitive feedback is then sent to this sector while the SecTM checks for significant targetness increase, etc. (Reset waves temporarily disable SBC nodes in a fashion similar to the response of "dipole fields" to nonspecific reset waves as described by Carpenter & Grossberg, 1987). Improvement of targetness is checked by the SecTM through measurement of the targetness differential Tdiff, defined as the target node activation minus the nontarget node activation. In the course of the input-activation integration, the SecTM-SBC cooperation sets off a rapid succession of reallocations of sectoral facilitation until a high targetness sector is found. The net product of this automatic process will be a more selective identity verification (superior to the global identity verification, discussed in Section 3.1), plus early 'knowledge' about the direction component of target location. The latter may be taken to be consistent with the findings that in preparing an attention movement the direction information is earlier than the distance (or amplitude) information (Bruce & Goldberg, 1985; Cohen & Henn, 1972; Hou & Fender, 1979). Any pattern or pattern element incident on a sector, which has been selected for SBC-based disinhibition is thus subjectively emphasized in the process of target evaluation. To study the effect of this sectorai emphasis on Tdiff yet without the intricacies of the SecTM-SBC dynamics, we experimented with application of a sector-selective gain of 2.0 (cell excitation in the sector was doubled). Figure 15 illustrates the application of selective gain and its impact on Tdiff. Application of the gain in a sector-selective manner causes a Tdiff increase only in the sector actually containing the (learned) target X, while the application of gain in the other sectors causes a decrease. When gain is applied to the sector as depicted in Figure 15(a) the SecTM would register a decrease instead of the desired increase of Tdiff and then causes the SBC to turn its initial disinhibition off, whereas in the situation as depicted in Figure 15(b) with increase of Tdiff as a function of gain, the SBC
J, E Gerr s~e~
Ca]
Cb)
i:i
It)
[,~]
.......
Tdiff +0.8 . . . . . . . . . . . . . .
+0.4-
.o., ......... , ..... iiiii:i.11111 ..... ,
1.0
2.0
1.0
sectoral
2.0
1,0
2.0
gain
FIGURE 15. Application of selective disinhibition to three different sectors of the retinalarray; only in the sec~r containing the target X does it cause an increase of the ta N ness differential Tdlff.
could lock the feedback loop and support the development of target certainty in the course of the input-activation integration.
3.4. Application in a Case of "Search Asymmetry" Elsewhere (Gerrisen, 1989) we have reported more results of our emulation experiments. It was shown in several wavs that sectoral emphasis of identity processing clearly improved target/nontarget discrimination performance in cases where global identity processing grew unreliable due to increasing numbers of nontargets on the display. For illustration purposes in the context of the present paper we shall elaborate our findings concerning "search asymmetry," an aspect of visual search not easily accounted for in computational models of pattern discrimination. Detection or search performance tends to differ between two conditions that involve the same discrimination when the role of target and nontarget are interchanged. Treisman and Souther (1985) obtained striking asymmetries of search performance when using patterns a and ~, which shared all but one element (present in a ot while absent in fl) in an arrangement where the condition {a = target; ]~ = nontarget} was contrasted with the reversed condition {a = nontarget; ]~ = target}. Figure 16 depicts the different stimulus configu-
Visual Search Emulation
557
Ca]
Cb] I
Cd]
i
I
i
F-'" I__'
' I
'
' - -I
I-'" I =
I
-- --ii
!--T/
'
!
r--ij--}
--
r-, i _L
- /
Tdiff
I +0.8-~........ J........................................ +0' 4-1-t-;-/ "~tl 0.0~ii, ....... tl _o.4
. . . . . . .
-0.8 1.0
2.0
1.0 sectoral
2.0
1.0
2.0
.0
2.0
gain
FIGURE 16. Tdiff response to target sector facilitation in cases of multipattern stimuli that invoke "search asymmetry." In cases (c) and (d) the additional (more central) nontarget hardly affects targetness at short integration time (e.g., 1).
rations used in the experiments; the graphs indicate Tdiff development when selective gain was applied in the sector containing the searched target. In cases (a) and (c) (Figures 16(a) and 16(c)) Twas the learned target, and the vertical bar was the learned nontarget, while in cases (b) and (d) (Figures 16(b) and 16(d)) the vertical bar was the target and T was the nontarget. We see that presence of the discriminative feature in the target T was responded to with a clear Tdiffincrease, but that with absence of same feature when the vertical bar was the target the application of such gain did not result in a comparable increase. Similar emulation results were obtained when using E and F patterns. Regarding the impact of input integration time on sectoral targetness, we observe that the additional, more centrally positioned, nontarget in cases (c) and (d) only started to manifest itself at t2. Indeed, as was concluded in several experimental studies (e.g., Bouma, 1970), at short stimulus exposure there is more lateral masking from eccentric patterns than from more central patterns. In the next section we review the network's mechanisms underlying the emulated search asymmetry, and in Section 3.6 we show how the discrimination
characteristics combined with the SecTM-SBC dynamics let the emulator closely mimic human search behavior. 3.5. Data-limited and Resource-Limited Feature Classifications The graphs in Figure 16 hint at resource allocation issues which might be treated similarly to the treatment of resource and data limits at the input-data system level (Section 2.1). Selective disinhibition is a resource, and its allocation in cases (a) and (c) of Figure 16 resulted in an improved identity processing; hence we have in these cases a resource-limited performance. In cases (b) and (d) the allocation of selective disinhibition does not significantly improve the identity processing, and we might conclude that there is some form of data limitation on performance. Norman and Bobrow (1975) distinguish two forms of data limitations: those resulting from the signal and those from memory. When performance is directly dependent on the quality of the input data signal, they call the (intake) process signal data-limited (as in Section 2.1). When the "quality of the
i #: ©errt,~.~ei~
558
representation of the stored paradigm" has a clearly greater impact on performance than the allocation of more resources, they call the process m e m o , T data-limited. We shall now relate Norman and Bobrow's "quality of the representation of the stored paradigm" to some emulator properties underlying the performance differences. In the pattern learning phase for the cases in Figure 16, each one of the patterns T and "bar" was subsequently "presented" at the five positions on the hemifield. This training cycle was repeated until classification performance stabilized. Correct classification for single pattern presentation was achieved, i.e., when T was the target a T at one of the positions caused at the target node an activation value in excess of 0.9 and at the nontarget node an activation value smaller than 0.1, while a bar at the input invoked the reversed situation (nontarget node activation in excess of 0.9, and target node activation lower than 0.1), and same for the bar at the input when the bar was the target. In Figure 17 are graphically depicted the activation profiles of the second hidden layer (feeding the output nodes) of the network, in cases of individual bar and T occurrences, and in the two configurations (two l-s and one T: I + T + 1, and two T-s and one I: T + l + T) in which they operate as targets and nontargets within the multipattern stimuli of Figures 16(a) and (b). It is clear that for bar classification the network exploited the vertical-line orientation sensitivity which was readily available at the first node of this hidden layer (see Figures 6 and 7). For T-classification, it might have been expected that there would be high activation at the first and second nodes, accounting for respectively the vertical and horizontal bars of the T-patterns, but the activation profiles prove that for a T-pattern at the input, the activation of the first node is kept low and only at the second node do we find high activation. Thus, for the sake of optimally discriminating between bar I and T single-pattern stimuli
and T, the network has chosen to develop negatfve connection weights on the links terminating at the first node of the second hidden la}er, coming from those first hidden-layer nodes registering the co-~,c currence of vertical and horizontal bars (as with the T pattern). As a result, co-occurrence of vertical and horizontal bars tends to inhibit or ~,ffset the activation of the first node in the second b~ye~. This, indeed, is a sensible strategy for discriminating a subset pattern (the bar) from its cover or superset pattern (the T) when they are presented subsequently at the input (as happens in the network training). Now, when the bar and T patterns simultaneously occur to constitute a multipattern stimulus. the inhibition due to co-occurrence will lower the activation that originates from tht: features representing the bar. Even more so+ when selective sectotal gain is applied in the sector, containing the cooccurrence (the T), as is depicted by the barchart in Figure 17 for the I + it" ~ lease When the bar is the target and ~ the nontarget (see the barchart in Figure 17 for the T ~ l + T case) the two 7-patterns strongly inhibit the activation of the first node, and the level of second node activation clearly dominates the activation profile of the second hidden layer, causing+ in turn, a high value at the nontarget (T) node+ Selective sectoral gain in the sector containing the bar, only causes a decrease of this dominance without any significant restoration of the first node activation, which would be a prerequisite for proper target classification at the output of the network. The "search asymmetry" seems t o result from ~ paradox, in that a classification strategy which successfully distinguishes pattern cYfrom pattern #, when patterns are each presented in isolation, is causing an asymmetric breakdown of discrimination between the two when they occur simultaneously in the context of a multipattern stimulus. The search asymmetry characteristic of our emulation stems from a learned pattern of excitatory and inhibitory connec-
I ,T +I multipattern stimulus
+1+! multipattern stimulus
1.0
% >
.++er1I
sect• gain = I 0 s e c t gain = 20
]
sect gain : ; 0
o
('3
1 2 3 4 5 6 7 8 910
I 2
.~ 4 5 6 7 8 9 1 0
7
2 3 4 6 6 7
9
9 10
nodes in second hidden layer FIGURE 17. Comparison of activation profiles at the ucond h~den l a ~ for s i n g l e - ~ r n stimuli, T and I+ ~ t v e t y , and multlpattern stimuli where first a ~ T h a s t o compete with two n ~ s I; ~ t h ~ a t a r ~ I that ~ W i t h two nontargets T; in both muitipattem cases the T-features d o m t n ~ t h e activation ~ , causing a T-versus,/search esymmetry,
Visual Search Emulation
559
tions which emphasizes the difference in quality of feature-representation. The co-occurrence of the vertical and horizontal bar in the T allowed a superiority of representation quality relative to the (vertical) bar by way of development of inhibition of vertical bar features. More in general, improved quality of feature-representation may result from more extensive or more effective learning. For instance, when discriminating vertical versus tilted bars the search asymmetry favored the vertical bars (Treisman & Souther, 1985), which may be due to extensive cultural training on vertical bar features prior to tilted bar learning in the experimental setting. In terms of Norman and Bobrow's (1975) "quality of the representation of the stored paradigm" it could be observed in the discussed case that for classification of subsequently presented single-pattern stimuli the representation was adequate for both the T and the bar, but for classification of multipattern stimuli the representation is better for the T than for the bar.
sectors at the rate of 6 msecs per sector, the SBC soon comes to a point where all its (potentially active) nodes have been disabled, and as a result the SecTM-SBC interaction ceases. In our emulations, when the SecTM-SBC activity dwindles down another part of the processing architecture takes over: Local identity processing. Its principal reliance on accuracy of local processing-emphasis makes for two basic processing characteristics: long time constants of processing and sequentiality of processing. In Section 4 we discuss the mechanisms and show that they very well support the emulation of the 40 msecs-perpattern findings by Treisman and Souther (1985). It is important to note that the SecTM-SBC dynamics may cease in two ways:
3.6. Temporal and Spatial Aspects of the SecTM-SBC Dynamics
Local identity processing thus has its starting conditions dependent on the outcome of the SecTMSBC processing,
In search-asymmetry settings (Figure 16) Treisman and Souther (1985) demonstrated that with presence of the feature in the target (cases (a) and (c)) detection is very fast and with little effect of display size ( = total number of patterns displayed), but that with absence of the feature (cases (b) and (d)) the target was detected only through a slow and apparently serial search, requiring around 40 msec per pattern. We now incorporate the SecTM-SBC dynamics in the emulation operation. In case (a), since there is only a clear Tdiff increase in the sector containing the target T the SBC shall not lock on any but the target sector to allow target certainty to develop. In the cases where detection was fast, addition of another nontarget to the display caused in Treisman and Souther's experiment an average search delay of 3 msecs. As long as the display does not get overly cluttered with patterns, the addition of a pattern potentially imposes in average (because of the random sector selection) a delay of about half the time needed to check a sector for Tdiffincrease. Case (c) search performance thus differs from case (a) search performance only in terms of a short average delay of a few msecs. Sector rejection due to a lack of Tdiff increase is in our emulations equated with twice the incurred average delay (2 x 3 = 6 msecs). In the cases where detection was slow the sectorselective identity processing apparently fell short of producing sufficient target certainty in an early phase of the input integration interval. Having rejected
1. In fast detection cases the SBC has locked on to a sector and facilitates development of target certainty through disinhibitive feedback to cells producing the pattern activations in the sector; 2. In slow detection cases the SBC has run out of active nodes, and does not issue disinhibitive feedback to any cell.
4. LOCAL IDENTITY PROCESSING Apart from learning target-nontarget identities, in the learning or instruction phase of many visual search experiments, the subjects also develop a mental map of the possible positions of target occurrence. Analogous to the simplicity and flexibility of basicfeature-space learning in our emulations, we wanted to enable the supervised learning of a map of locations (see Figure 1). The network architecture should allow for various map configurations, including a number of isolated positions (e.g., the five we are using), rings or circles, and sectors or segments. After establishment of the map of locations, its interaction with feature space processing imposes a local focus of identity processing. Humans may localize a search target overtly (employing eye movement) or covertly (shifting attention without shifting the line of sight). While overt movement (called "foveation") produces increased acuity, covert selection of a spatial area will show lower threshold (Bashinski & Bacharach, 1980) and faster reaction time (Posner & Cohen, 1984) for target detection. In the present chapter we shall relate the interaction between the map of locations and the feature space to covert selection of a spatial area that is scrutinized for target identity. We have assumed that the part of human visual search which is emulated by our global and sectoral-
560
d. I{ (;err~sset
directionally emphasized identity processing is automatic (without consequent mental load). Local identity processing emulates the part of human visual search that is supposed to be of a "controlled" or "attentive" nature. Covert shifts from one location to another involve three component mental operations. Posner and Cohen (1984) suggest that first attention is disengaged from its current focus, then attention must move to the target, and finally attention engages the target. In much of the experimental work, in support of this view on attentional shifts, some form of spatial precueing is applied to facilitate the detection at the (precued) location(s). Two effects were noted with respect to the cued area: the facilitation (of detection) effect, and the inhibitionof-return effect (Posner & Cohen, 1984). It was found that while the facilitation effect appears to move with the eyes (i.e., it is mapped in retinotopic coordinates), the inhibition-of-return effect remains in a fixed location (i.e., mapped in environmental coordinates). Posner and Cohen (1984) feel that such an inhibition effect may have evolved to maximize sampling of the visual environment. We shall elaborate the emulator's facilitation mechanisms that stem from our map of locations; the inhibition effect will only be briefly discussed in view of future extensions to the network architecture.
tation of such interaction in the emulator. If we consider the processes described under the headings "global identity processing" and "Sectional-directional emphasis of identity processing" as being automatic ("effortless"), then the local identity processing could be seen to implement the more effortful scrutiny of (local) occurrences of features. Figure 18 schematically depicts the proposed neural network architecture. From the input arra~ ~f cells there ~ now an additional processing path to a layer which represents the map of locations Salient characteristics of this processing path are: t) the units in the locations map operate in a winnertake-all fashion, and 2) they project top-down feedback to the input layer. The map of locations cooperates with the Local 'l?argetness Monitor (LocTM) to eventually select (or engage) a location causing a high activation of the target node. The LocTM-LocMap architecture and its dynamics are in some ways similar to SecTM-SBC architecture and dynamics. Nodes in the LocMap arc associated with "'receptive fields" of first-layer cctt,: a~. a result of the supervised (back propagation) le~rning of lhe possible target locations in the field ~f view~ Each node in the LocMap then will summate the: activation from the first-layer cells incident on its associated receptive field. Whereas in the SBC one ~ode or one cluso ter of nodes was selected randomly for application of disinhibitive feedback to the corresponding radial beam or sector, in the LocMap lhe highest active node supresses in a winner-take-all (~e. e.g.. Grossberg, 1976) fashion the other LocMap nodes while selectively disinhibiting the first-layer cells within its receptive field. In other words, a sufficiently active region at the input layer will cause the bottom-up
4.1. The Locations Map Focussing the Identity Processing The interaction between a locations map and the feature space should result in an identity decision (target/nontarget) relative to a position in the field of view. In this section we describe our implemen-
Target/Nontarget reset , ~ *
ecTM
--
LocTM t
' I
location node
I
\ FIGURE 18. Extending the network with a location ~ l n g
path, where a Locationsi Target Monitor (LocTM) checks for s~ ~ to !heceil army for scrutiny ~ local not reglstm' ~ Td/ff im,wr o ~ ~ it i S s ~ s ~ WaV~ to ~ L O C ~ : ~ (and its associated' receptive field ), which ~ l y r e d i r e c t s the (local)
Tdiff improvement when the ,'map of Iocafions" layer ~
feature ~urrences; if the LocTM ~ causing the engagement of ~ ~ focus of identity processing.
Visual Search Emulation links to assign the potentially winning node, which then reinforces the assignment by sending disinhibitive feedback through the top-down links to the input region. In so doing, the hidden layers in the feature-space network will mainly be exposed to cell activation from this local region, and effectively, base their identity processing on the locally present features. If the LocTM registers a significant Tdiff increase it causes engagement of the LocMap node to the region to further scrutinize its targetness. But, if the LocTM detects a lack of Tdiff increase, it issues a reset wave towards the LocMap, causing a selective inhibition of the active (winning) node. Grossberg (1980) uses a similar map-reset mechanism in his adaptive resonance theory (ART) architectures as a correlate of dipole field dynamics (a dipole field consists of opponent processing channels which are gated by habituating chemical transmitters . . . . a nonspecific arousal burst induces selective and enduring inhibition of the active units within a dipole field). The enduring "chemical" inhibition is supposed to underly the inhibition-of-return in that by suppressing the currently highest active node the subsequent next highest nodes get a chance to have their receptive fields sampled for targetness. A new winning node now emerges so that disinhibitive feedback is sent "down" to its associated region of input cells, and inhibitory action is effectuated onto cells beyond this region. If again Tdiff does not significantly improve then a reset wave will be issued, etc. Emergence of a winning node and the consequent establishment of selective disinhibitive and inhibitive "traffic" in the top-down links is seen to emulate the discussed engagement of attentional focus. Abortion of the traffic to restore the neutral situation necessary to enable unbiased competition among the nodes yet to be assigned, is seen to emulate disengagement of attentional focus. We relate an engagement-disengagement cycle to the roughly 40 msec search delay per display pattern as found in the slow-detection cases (Section 3.6). 4.2. Rapid and Slow Localization
At the end of the previous section it was noted that the transition from the SecTM-SBC mode to the LocTM-LocMap mode follows one of two paths. If, at transition time, the SBC "delivers" a disinhibited sector then the LocMap nodes in this sector will most probably dominate the winner-take-all competition, but if there is not any disinhibited sector at transition time then none of the LocMap nodes will have an a priori advantage. In this section we shall explain why and how the emulator establishes a rapid localization along the first path, and a slow localization along the second.
561 In many attention-shift experiments the applied stimuli only activate a few well-separated locations in the field of view. For such stimuli the SecTM-SBC mechanism would certainly produce a disinhibited target sector (if any) at the beginning of the local identity processing. Node (location) engagement in the LocTM-LocMap manner then rapidly takes place within that sector. This correctly emulates the experimental findings by Fisher and coworkers (e.g., Fischer, 1986; Fischer & Breitmeyer, 1987) with subjects facing similarly simple stimuli. It was found that the time required to initiate an eye saccade (the "saccade reaction time," SRT) to a target is significantly shortened when there had not been a prior engagement of attention. In our emulation frame of mind, we interpret the findings as resulting from the establishment (through training) of an utterly simple locations map in which the target node is often the first winning unit within the SBC-disinhibited sector. And consequently, since the saccade-direction component can be derived from the sectoral-directional priming, the saccade programming then only needs an update on the saccade-amplitude component. Along the second path of transition from the SecTM-SBC mode to the locTM-LocMap mode, due to the absence of an SBC-disinhibited sector, there is neither directional priming of the saccade programming, nor is there a sectoral facilitation of the potential target node. Instead, the presently highest active location will develop the winning node causing engagement, as well as the incorporation of the corresponding direction and amplitude components in the saccade programming. If this location turns out not to contain the target (low Tdiff) then disengagement of attention is initiated, a no-go saccade decision is issued, and a new saccade programming needs to be prepared. Without a disinhibited sector much processing time and effort are spent in going from one map location to the other. With an SBC-disinhibited sector, however, maintaining the sectoral facilitation until all its active LocMap nodes have been checked is in line with insights in the processing of saccade direction and amplitude, as developed by Hou and Fender (1979). They have found that the partial saccade program concerning the direction component is kept in a buffer memory; if a new engagement occurs in a direction different from the old one, the partial program has to be erased, which takes an extra 40-80 msec of processing time. Sector priming of local identity processing, thus, emulates the buffer storage of the saccade direction component and its consequent impact on covert as well as overt shift of attention, When the LocTM fails to register high Tdiff at the locations in the sector (which may happen when the sectoral emphasis has produced "illusory con-
562
! E Gerrtsser~
junction" of features (see Treisman & Gormican. 1988)), the sectoral facilitation is effectively nullified due to the enduring node inhibition caused by the LocTM reset waves. The highest active node beyond the sector then attracts engagement, etc Our current implementation of the map of locations does not accommodate actual eye movement. A change of eye fixation will cause location shifts on the input array of cells. Only when map architecture and map learning secures translation-invariant location-to-node association can the proposed local identity processing deal with eye movement. For a new version of the emulator, we experiment with "chord-spaces" that represent interlocation relations long a spatial-direction dimension and a spatial-distance dimension (Gerrissen. 1982. 1984al. In a chord-space-like locations map. when a direction ~s fixed, then at the transition from the "'old" to the "new" engagement the distance relation between the two is highlighted. The map evolves in this manner into a relative-distance (or amplitude) look-up table, which is inherently translation-invariant.
5. DISCUSSION The emulator incorporates a number of constructs from the "Pooled Response Model" by Treisman and Gormican (1988), the resource-limited/data-limited process dichotomy by Norman and Bobrow (1975), and the "Orienting of Attention" framework by Posner and Cohen (1984). Also, a "Levels of Processing" (Treisman & Gormican. 1988) approach was followed, starting at level 1 with an array of cells receiving photon-count-like excitation P from objects in the visual world, at level 2 target/nontarget feature separation processing in feature maps, and feature-based target/nontarget classification at level 3. Network extensions were discussed which emulate the growth of localized identity, initially with a sectoral bias of identity processing ( = a sense of direction) developing into a location-selective feature analysis under control of a map of locations. Feature-space adaptation as we hypothesized it to develop during target/nontarget classification learning, was indeed observed when we studied the activation profiles of the hidden layers of the network. Specifically geared to optimal separation of target from nontarget features when patterns were represented subsequently, the adapted feature space induced search "suboptimalities" similar to those observed in human visual search. Basic feature-space learning, as well as locations map learning, are kept simple and flexible to allow for inclusion of a variety of feature dimensions and map configurations so as to easily associate with
much of the human subject training in search experiments. For transparency of exposition, we limited ourselves in the present paper to u single (arbitrary) choice of feature dimension and a fivenode map associated with five well-isolated "receptive fields.'" More and other feature dimensions may be learned, as well as other configurations of the locations map. Also, the issues related to object "'photon intensities" and "'eye integration time" merit a more elaborate treatment. For instance, a scene where object features have different intensities will lead to a search performance, which differs (in both spatial and temporal aspects) from performance on an identical scene but without intensity differences. We need to further study the performance consequences of these lemporal and spatial (function ol eccentricity) developments of feature space segmentations. Yet another limitation concerns lhe cell density we applied in the data input system. With more celt 'acuity" the emulation results might even more realistically mimick the human visual search. We do not expect scaling problems to occur when the number of input cells is increased, as long as the second hidden layer and the locations map do not need to drastically grow in order to accommodate complex feature spaces and map configurations.
5.1. Feature Space Adaptation The common point of departure for all simulation cases was the initial subnetwork capacity of line-orientation classification. It was shown that from then onwards the network training resulted in feature space changes that tended to optimize target/nontarget separation in the particular stimulus set. The emulator thus parallels the human observer in that it initially may possess a basic feature space enabling a workable--not yet optimal-target/nontarget discrimination, and through training on the particular stimulus set it will optimize its discrimination capactty. In other words, the emulator will exhibit learning effects on discrimination performance. ~.e, insufficient training would cause suboptimal performance. Differences in training efficacy from one pattern to the other cause memory-data-limited (Norman & Bobrow. 19751 discrimination performance underlying search asymmetries. Once established, an adapted feature space effectively operates as a psychologic set which optimizes the execution of the learned task. but which may turn inadequate when a task change occurs t e.g., exposure to a pattern not present in the training set). This task-oriented adaptation that is achieved through task or task-component learning is a pow-
563
Visual Search Emulation
erful characteristic generally absent from nonnetwork models.
while sector assignment by the latter is automatic and free-running until the SecTM registers a high Tdiff. The mode difference is manifest in the processing time-constants:
5.2. Global Identity Processing In our somewhat simplistic chronometrics of visual search we assume the minimal duration of an engagement-disengagement cycle to be in the order of 40 msec. For the very first engagement to fully develop there is roughly 20 msec. needed. As far as our emulation cases are concerned, we position Global identity processing and Sectorai-directional emphasis of identity processing in that initial interval. And since we assumed a 6 msec set-up time for sectotal facilitation, pure global identity processing (without sectoral emphasis) should be seen to only operate during the first few msec. In our emulations, this would be prior to or around moment tl (Figure 4) of input integration, and consequently the paracentral patterns are then favored over the more central patterns. Properly masked stimulus flashes in the order of 3-5 msec indeed enable human subjects to gather sufficient feature information for a first global identification of the stimulus, but they generally fail to relate identity elements to their location or direction of occurrence.
5.3. Sectoral-Directional Emphasis of Identity Processing SecTM-SBC assignment of sectoral disinhibition is random, and hence induces automatic processing, i.e., the assignment does not result from competition among the visual objects. In the literature on visual search very often automatic processing seems to imply parallel processing, and controlled (or effortful) processing to imply sequential processing. The SecTM-SBC mode of processing is a hybrid that can not be reconciled with the parallel/sequential dichotomy. Whereas (local) focus is total in LocTM-LocMap processing, i.e., feature space processing is exclusively concentrated on the contents of one receptive field with all other inputs inhibited, the SBC disinhibition causes only a gradual facilitation of the sector contents relative to all other inputs present at the cell layer. Processing for target/nontarget classification is essentially parallel and global, but supported by a rapid sequence of randomly assigned (subglobal) facilitations. The essential difference between the controlled processing by the LocTM-LocMap subsystem and the automatic processing by the SecTM-SBC subsystem is that assignment/selection by the first is externally solicited (the "winning node" competition),
• typically 6 msec per sector assignment for the SecTM-SBC subsystem, and • typically 40-80 msec per node assignment for the LocTM-LocMap subsystem. Early direction finding by the emulator is in line with results from experimental work on attention shifts, both covertly and overtly. Successful sectoral-directional priming of the local identity processing leads to correct emulation of rapid target localization, as opposed to the slow localization which is emulated when there is no sectoral-directional priming.
5.4. Local Identity Processing Establishment of the network weights in the location network does not depend on the choice of target/ nontarget patterns. In general, the map of locations is expected to not favor any particular feature other than local input activation. The charm of the dual (identity/location) architecture is that adaptation for optimal target/nontarget separation is limited to feature space learning only; the location network then remains unchanged. The nonfeature-oriented location selection has led to successful emulations of the slow-detection cases in a series of visual search experiments by Treisman and Souther (1985). Given a locations map that relates to a number of disjunct regions on the input array, there will be a clear sequence of processing foci, each on one of the learned map locations. Location assignment, thus, is of a sequential processing nature. The identity processing itself, while locally focussed, is the same as in the other (more parallel-processing) modes. The efficacy of location assignment in the context of a search task depends to a lage extent on the success of the preceding SecTM-SBC operation. If the sector was found that contained the target, then assignment will rapidly converge on the target location, but if not, then assignment turns into a slow serial search. The LocTM-LocMap operation so far might be considered an objective location selection, as opposed to subjective location selection which could result from location-map activation through pathways originating at "higher" levels of processing. Such subjective location selection would enable the emulation of symbolic precueing (Posner & Cohen, 1984). In conclusion, it could be stated that in view of our emulation goals the chosen network approach
564
/. 1( (;errisse;;
was instrumental in our quest for a set of "humanoid" system behaviors constituting a descriptive and predictive framework of visual search performance. Within the task constraints imposed on the human perception and action in the usual experimental setting, the emulator may serve as a means to study subject training aspects, input data-limitation (e.g., low-input excitation or short exposure time), memory data limitation (e.g., search asymmetries, illusory conjunction) and resource-limitation (e.g., display size, lateral masking), and the effects of selfpaced or externally paced attention shifts. At the input the emulator needs a digitized version of the single-pattern stimuli during the training phase and of the (generally) multipattern stimuli for the actual search emulation.
oretical and applied aspects o] eye movement research ~pp. 535-
Ballard, D. H. (19841. Parameter nets. Artificial Intelligence, 22. 235-267. Bashinski, H. A., & Bacharach, V. R. (19801. Enhancement of perceptual sensitivity as a result of selectively attending to spatial locations. Perception and Psychophysics, 28, 241-248. Berger, T. (1971). Rate distortion theory, New Jersey: PrenticeHall. Bouma, H. (1970, April). Interaction effects in parafoveal letter recognition, Nature, 226, 177-178. Bruce, C. J., & Goldberg, M. E. (1985). Primal frontal eye fields. I: single neurons discharging before saccades. Journal of Neurophysiology, 53, 603-635. Butler, B. E., & Currie, A. (1986). On the nature of perceptual limits in vision. Psychological Research, 48, 201-209. Carpenter, G. A., & Grossberg, S. (1978). Neural dynamics of category learning and recognition: Attention, memory consolidation, and amnesia. In S. Grossberg (ed.), The adaptive brain 1 (pp. 239-286) New York: North-Holland. Cheal, M-L., & Lyon, D. (1989). Attention effects on form discrimination at different eccentricities. Quarterly Journal of Experimental Psychology, 41A, 719-746. Cohen, B. & Henn, V. (1972). Unit activity in the pontine reticular formation associated with eye movements. Brain Research, 46, 403-410. Egly, R. & Homa, D. (1984). Sensitization of the visual field.
542). Amsterdam: North-Holland. Gerrissen. J, F. (19891. Computer emulation of preattentive mcch o amsms in vision: Emergent WHAT and WHERE In l). t~roga~ (Ed.L Vi,~ual .search (pp .77--:)1~ ~,ondon: Favlo~ & Francis Golden. R. M (19881. A unified framework, ior connecliom,st systems. Biological Cybernetic,s. 59. 109 i2I? Grossberg, S. (19761. Adaptive pattern classilication and univel sal recoding: ParÁ I Parallel development :rod coding of neural leature detectors. Biological Cvbernetic.~, L~_ 121-134. (irossberg, S. 119801. How does a brain build ,, cognitp;c code' l~sychological Review. 87 1-51 Hou. R. L.. & Fender. D. H. (19791. Proce,~,~mgof direction and magnitude by the saccadic eve-movement system. Vision Research. 19. 142t--1426. Hubel 11. H.. & Wiesel, T. N. c 1968). Rec~,ptivc fields and functional architecture of monke'~ striate c~rtcx. Journal of Phv.~iologv, 195 215--243. ,lulcsz. B.. & Bergen. J. R (1983L lexrof~s, tile fundamental elements in preattentive vision and percepuon of texlures. Bell %'stem Techmcal Journal. 62, 16191645 l,ambert. A (19871 Expecting different categories al different Iocatkms and spatial attention, Quarter/)' Journal of Expertmental Psychology 39A. 61-76 Linsker R. |19881 Self-organizauon in perceptual network IEEE Computer. 21. March. 105-117 Muller. H. J., & Findlay, J M. (19871. Sensitivity and criterion effects m the spatial cuing ot visual attention. Perception and Psvchophysics. 42. 383-399. Muller. H. J . & Rabbit. P. M. A. (1989|. Spatial cueing and the relation between lhe accuracy of "whet,:" and "what" decisums in visual search Quarterly JournaJ r~i k~xperimental Ps> chology, 41A 73.7--773 Norman D A.. & Bobrow. D (,i. (1975t. ~)n data-hmltcd an~ resource-limited processes. Cognitive Psychology. 7, 44-~4. Posner, M I. ~19781 Chronometrk e_~lorations o1" the mind Hillsdale: Lawrence Erlbaum. Posner_ M. I.. & Cohen. Y. (1984t. Components ot visual orienting. In H. Bouma & D. G. Bouwhuis ~Eds. ~. Attention and Pe
Journal of Experimental Psychology: Human Perception and Performance, 10,778-793.
processing: explorations in the micro,~tructure o (cognition, Vol* ume l: Foundations. London: The MIT Press
Eriksen, C. W., & Webb, J. M. (19891. Shifting of attentional focus within and about a visual display. Perception and Psychophysics, 45(2), 175-183. Fischer, B. (19861. The role of attention in the preparation of visually guided eye movements in monkey and man. Psychological Research, 48, 251-257. Fisher, B., & Breitmeyer, B. (19871. Mechanisms of visual attention revealed by saccadic eye movements. Neuropsychologia, 25(1A), 77-83. Gerrissen, J. F. (1982). Theory and model of the human global analysis of visual structure. IEEE Transactions on Systems, Man, and Cybernetics, SMC-12, 6, 805-817. Gerrissen, J. E (1984a). Theory and model of the human global analysis of visual structure; Part lI: The space-time and visual value segment. IEEE Transactions on Systems, man, and Cybernetics, SMC-14, 6, 847-862. Gerrissen, J. F. (1984b). Modulation of globality at the input of the visual system. In A. G. Gale & E Johnson (Eds.), The-
Shiffrin, R, M., & Schneider, W. ( 19771. Controlled and automatic human information processing: II. Perceptual learning, automatic attending, and a general theory P~ychological Review, 84. 127-190. Snyder, A. W., Laughlin, S. B., & Stavenga, D, G. (1977), Information capacity of eyes. Vision Research, 17, 1163-t 175. Tassinari, G., Aglioti, S., Chelazzi, L , Marzi, C. A,, & Berlucchi, G. (1987). Distribution in the visual field of the costs of voluntarily allocated attention and of the inhibitory after-effects of covert orienting. Neuropsychologia, 25, 55-71. Treisman, A. (t 985). Preattentive processing in vision. Computer Vision, Graphics, and Image Processing, 31, I56-177. Treisman, A., & Gormican, S. (1988). Feature analysis in early vision: Evidence from search asymmetries. Psychological Review, 95(1), 15-48. Treisman, A., & Souther, J. (1985). Search assymetry: A diagnostic for preattentive processing of separable features. Jour.. hal of Experimental Psychology': General, 114,285--'q~.
REFERENCES