l > P > > p ½ðp Þr ðq Þr if r 4 0 > > l;i l;i l;i : l
where if r 41, the system exhibits a risk-seeking posture with respect to gambles on location-dependent attention, while r o 1 implies risk aversion. Risk neutrality is given by r ¼ 1. Proof. See Appendix C. From Proposition 3, we have that for an image reconstruction fidelity given by bitrate r the attention score of location Pi will be the total sum of utilities for consequences, I½Pi ; t=Q, that were provided at times t when attention was directed to Pi , up to the given bitrate r. Fig. 1 illustrates two different plots of I½Pi ; t=Q for varying r.
ARTICLE IN PRESS J.A. Garcı´a et al. / Pattern Recognition 43 (2010) 1618–1630
1621
-0.6 -0.4 risk -0.2 0.0 att itud 0.2 0.4 e( 0.6 r)
I [Pi; t/Q]
I [Pi; t/Q]
r = -0.2 r = -0.1 r = 0.0 r = 0.1 r = 0.2
t t
Fig. 1. (Left) 3D plot of expected increase in utility I½Pi ; t=Q provided by the allocation of attention to spatial location Pi at time t, when the initial probability distribution Q is strictly positive, for risk attitudes r (with respect to gambles on location-dependent attention) in the range ½0:6; 0:6; (Right) 2D plots of I½Pi ; t=Q provided by the allocation of attention to spatial location Pi for varying time t, with risk attitude r set to 0:2; 0:1; 0; 0:1; and 0.2.
Hence we can now give a formal definition of the global attention map for a given image, following the rational approach to the measurement of attention score: Definition 2 (Multi-bitrate attention map). The multi-bitrate attention map that measures the attention score, following Postulates 1–7, at any spatial location Pi and any bitrate r of image reconstruction fidelity, is fAðPi ; rÞgPi ;r
ð2Þ
where AðPi ; rÞ ¼
X I½Pi ; t=Q
ð3Þ
tPi
with the sum over times tPi , up to the given bitrate r, such that attention is directed to Pi at time tPi ; and I½Pi ; t=Q is given by Eq. (1). The multi-bitrate attention map fAðPi ; rÞgPi ;r will provide us with a computational attention score for each spatial location Pi at high and low quality versions of the image reconstruction. The novelty of this map is that: (i) it allows distinct attention score for the same spatial location at different picture quality, which may be relevant for example in applications of advertisement on Internet; (ii) it avoids certain forms of behavioral inconsistency in the absence of a priori knowledge about the locations of interest, which is a characteristic of rational systems; and (iii) a particular integration of feature information (e.g., color, intensity, orientation) is not used in assigning attention scores to the points, therefore computational attention is not tuned for only certain images. Figs. 2 and 3 illustrate the multi-bitrate attention map for two different scenes that represent a military vehicle in a complex rural background (see Fig. 4). The first column—(A)–(C)— shows the attention scores for reconstructions given in the second column—(D)–(F). The higher intensity in (A)–(C) means a higher attention score for the respective reconstructions in (D)–(F). To obtain image reconstructions at different bitrates of visual quality, the computational attention model follows Refs. [18,19]. Here we are using a very efficient implementation of the rational model of computational attention whose interest is not only in cost reduction but also in a real-time analysis of new images. Score maps and image reconstructions were blended into each other to form the images illustrated in the third column—(G)–(I). Fig. 2(J)–(L) show that the most salient locations are within the target area, for the three different quality reconstructions in Fig. 2(D)–(F). But the situation is more complex for the target
scene illustrated in Fig. 3. As can be seen from Fig. 3(K), the target is the most salient area for the image reconstruction given in Fig. 3(E); while there are other areas within the complex rural background (see Fig. 3 (J) and (L)) with greater saliency (compared to the target saliency) for the lower and higher quality versions of the reconstruction (Fig. 3 (D) and(F)).
4. Experimental results In this section, we study the relationship between computational attention and the visual target distinctness measured by human observers. Rohaly et al. [20] proved that if computational models of early human vision give good predictors of target saliency for humans performing visual search and detection tasks, they may be used to compute visual distinctness of image subregions (target areas) from digital imagery. To this aim we compute the multi-bitrate attention map fAðPi ; rÞgPi ;r for each target scene in a database that was presented in [21,22] (see Figs. 4–7). The images used in the experiment are target sections from slides made during the DISSTAF (distributed interactive simulation, search and target acquisition fidelity) field test, that was designed and organized by NVESD (Night Vision & Electro-optic Sensors Directorate, Ft. Belvoir, VA, USA) and that was held in Fort Hunter Liggett, California, USA [21]. Next we calculate the lowest bitrate, r , of reconstruction fidelity for which the attention score of some point in the target area will be in the upper quartile of the attention map fAðPi ; r ÞgPi at bitrate r . A small value of r means that the computational model brings the attention onto the target using a low bitrate of picture quality which corresponds to a high saliency of the target area. Also, a psychophysical experiment is performed in which human observers estimate the visual distinctness of targets in the same database. The procedure of the psychophysical experiment is described in [21,22]. The subjective ranking induced by the psychophysical target distinctness is adopted as the reference rank order. An evaluation function may then used to study the efficacy of the computational attention model. To avoid the perils of inferring too much from correlations, the evaluation function PCC is defined by the fraction of correctly classified targets (using the computational model) with respect to the reference rank order (target distinctness measured by human observers): PCC ¼ Number of correctly classified targets=Number of targets.
ARTICLE IN PRESS 1622
J.A. Garcı´a et al. / Pattern Recognition 43 (2010) 1618–1630
Fig. 2. (A)–(C) Attention scores for three different quality versions—given in (D)–(F)—of a highly visible target; (G)–(I) blending of the respective score maps and image reconstructions; (J)–(L) most salient locations.
In the following we analyze the comparative results of computational attention and various quantitative measures for predicting visual target distinctness. The quantitative measures include signal-to-noise ratio (SNRlog ), root mean square error (RMSE), and mean absolute error (MAE). They quantify the target distinctness by means of the difference between the signal from the target-and-background scene and the signal from the background-with-no-target. Also we compare the performance between the rational model and a well known model of computational attention, the Itti’s attention model [5].
4.1. Experiment 1: assessment of parameter r for the computational attention model Following Proposition 3, this section is intended to elicit information on risk attitudes of computational attention with respect to gambles on location-dependent attention (see Eq. (1) in Proposition 3) using the DISSTAF images (see Figs. 4–7). The elicitation of the optimal risk attitude in Eq. (1) using a rational system of computational attention is performed as
follows. The target images of the DISSTAF database are rankordered using computational attention with parameter r in Eq. (1) taking values between 1 and 1. The respective fraction of correctly classified targets PCC using computational attention models with parameter r between 1 and 1 is illustrated in Fig. 8. The optimal value of the parameter r produces a model achieving the highest fraction of correctly classified targets PCC over the rational models of computational attention being compared (with parameter r between 1 and 1). From Fig. 8 we conclude that a computational attention model with parameter r around 0:6 is best able to compute a visual target distinctness rank ordering that correlates with human observer performance. Hence, in the following experiments, the rational model of computational attention is to be used with parameter r ¼ 0:6.
4.2. Experiment 2 A subset (dataset #1) of twelve complex natural images containing a single target (and twelve empty images of the same
ARTICLE IN PRESS J.A. Garcı´a et al. / Pattern Recognition 43 (2010) 1618–1630
1623
Fig. 3. (A)–(C) illustrate attention scores for three quality versions, respectively (D)–(F), of a military vehicle in low visibility conditions; (G)–(I) show blendings of the respective score maps and image reconstructions; (J)–(L) most salient locations.
rural backgrounds with no target) was used in this second experiment. Following the psychophysical experiment described in [21], the image pairs from dataset #1 were clustered into four subsets of targets with comparable visual distinctness: {1,2,7,11}, {17,20,21,22}, {24,27,32}, and {35} (see Figs. 4–7). The comparative results of the rational model of computational attention with risk-aversion ðr ¼ 0:6Þ, and those of both quantitative and qualitative measures are illustrated in Table 1. The bottom of each of the columns shows the probability of correct classification PCC of the rank order in that column with respect to the reference rank order given in column 2. Significant rank-order permutations are displayed in boxes in Table 1. We have to take into account that rank-order permutations of targets of the same cluster with comparable visual distinctness are not significant and so they are correctly classified. Both RMSE and SNRlog produce a rank order with six significant order reversals. The other targets have been attributed rank orders which do not differ significantly from the reference rank order. RMSE and SNRlog yield a probability of correct classification PCC ¼ 0:5. MAE produces rank orders with five significant order reversals, and it yields a probability PCC ¼ 0:58. Also Table 1 shows the rank order using the rational
model of computational attention. The highest value of the evaluation function, PCC ¼ 0:83, is obtained for the computational attention which produces a rank order with two significant order reversals. Fig. 9 (-EXP. 2-) shows these results with a bar chart to emphasize that the computational attention model is a better predictor than MAE, RMSE, and SNRlog .
4.3. Experiment 3 A second subset (dataset #2) of fifteen targets (which are grouped into four clusters {2,4,5,9,11}, {12,15,18,19}, {24,26,27,29,32}, and {36}) was used in this second experiment (see Figs. 4–7). The resulting rank orders of MAE, RMSE, and SNRlog are listed in Table 2. Again at the bottom of each of the columns is shown the probability of correct classification of the rank order in that column with respect to the reference rank order in column 2. For these three quantitative measures, SNRlog yields the highest probability (PCC ¼ 0:6) with six significant order reversals (they
ARTICLE IN PRESS J.A. Garcı´a et al. / Pattern Recognition 43 (2010) 1618–1630
1624
CLUSTER # 1 OF TARGETS WITH SIMILAR DISTINCTNESS
#1
#2
#4
#3
#5
#6
#7
#8
#9
# 10
# 11 Fig. 4. First cluster of targets with comparable visual distinctness: target and non-target scenes.
ARTICLE IN PRESS J.A. Garcı´a et al. / Pattern Recognition 43 (2010) 1618–1630
1625
CLUSTER # 2 OF TARGETS WITH SIMILAR VISUAL DISTINCTNESS
# 12
# 13
# 14
# 15
# 16
# 17
# 18
# 19
# 20
# 21
# 22 Fig. 5. Second cluster of targets with comparable visual distinctness: target and non-target scenes.
ARTICLE IN PRESS 1626
J.A. Garcı´a et al. / Pattern Recognition 43 (2010) 1618–1630
CLUSTER # 3 OF TARGETS WITH SIMILAR VISUAL DISTINCTNESS
# 23
# 24
# 25
# 26
# 27
# 28
# 29
# 30
# 31
# 32
# 33
# 34
Fig. 6. Third cluster of targets with comparable visual distinctness: target and non-target scenes.
ARTICLE IN PRESS J.A. Garcı´a et al. / Pattern Recognition 43 (2010) 1618–1630
1627
CLUSTER # 4 OF TARGETS WITH SIMILAR VISUAL DISTINCTNESS
# 35
# 36
Fig. 7. Fourth cluster of targets with comparable visual distinctness: target and non-target scenes.
4.4. Experiment 4
0.7 0.65
PCC
0.6 0.55 0.5 0.45 0.4 0.35
-1
-0.6
1 r risk attitude
Fig. 8. 2D plot on the risk-attitude r and PCC , with r between 1 and 1.
Table 1 Column 1: dataset in Experiment 2; column 2: the reference rank order; Columns 3–6: the resulting rank order of MAE, RMSE, SNRlog , and computational attention.
The bottom of each column shows the probability of correct classification of the rank order in that column with respect to the reference rank order in column 2.
are displayed in boxes). Fig. 9 (-EXP. 3-) illustrates these results with a bar chart. Table 2 also displays the rank order of the computational attention model. Again the highest value of the evaluation function, PCC ¼ 0:8, is obtained for the computational attention that produces a rank order with only three significant order reversals (see Fig. 9 -EXP. 3-).
Now we study the comparative performance between the proposed attention model and a well known model of computational attention, [5], using the DISSTAF database. Itti and Koch [23] applied the Itti’s model of human visual search based on the concept of a ‘‘salience map’’, [5], to a wide range of target detection tasks using the DISSTAF images. Through a 2D map, the saliency of objects in the visual environment is encoded. In Itti and Koch [23], low-level visual features are extracted in parallel from nine spatial scales, and the resulting feature maps are combined to yield three saliency maps for color, intensity, and orientation. These, in turn, be feed into a single saliency map (a 2D layer of integrate-and-fire neurons). Competition among neurons in this map yields to a single winning location corresponding to the next attended target. Inhibiting this location transiently suppresses the currently attended location, causing the focus of attention to shift to the next most salient location. With respect to the predicted search times of the Itti’s attention model on the DISSTAF images, Itti and Koch [23] found a poor correlation between human and model search times (see Fig. 8 in [23]). It may be a consequence of the fact that the Itti’s attention model was originally designed not to find small, hidden targets, but rather to find the few most obviously conspicuous objects in an image. For a dataset of 33 image pairs (target and non-target images) from the DISSTAF database, we have also calculated the probability of correct classification PCC using the computational attention model that follows Postulates 1–7. From the psychophysical experiment [21], the image pairs in the dataset can be clustered into four subsets of targets with comparable visual distinctness: {1,2,3,4,5,6,7,8,10,11}, {12,13,14,15,16,17,18,19,20,21,22}, {23,24,25,26,27,28,29,30,31,32,34}, and {35} (see Figs. 4–7). In this case the rational model of computational attention once again yields a high probability of correct classification (PCC ¼ 0:7272). It implies a correlation between human and model predictions of visual attention. Recall that the rational model of attention does not extract any visual feature like as color, intensity or orientation. Instead, it is only based on the multi-bitrate attention map fAðPi ; rÞgPi ;r for each target scene, where AðPi ; rÞ is defined as given by definition 2. Summarizing, the rational model of computational attention that follows Postulates 1–7 with risk-aversion attitude ðr ¼ 0:6Þ, shows the best overall performance in these four experiments.
5. Conclusions In this paper we have proposed that a different approach to computational attention can be to first state some general
ARTICLE IN PRESS J.A. Garcı´a et al. / Pattern Recognition 43 (2010) 1618–1630
1628
-EXP. 31
0.8
0.8
0.6
0.6
Pcc
Pcc
-EXP. 21
0.4
0.4
0.2
0.2
0
0 MAE
RMSE
COMP. SNRlog ATTENTION
MAE
RMSE
COMP. SNRlog ATTENTION
Fig. 9. Probability of correct classification PCC using MAE, RMSE, computational attention, and SNRlog .
Table 2 Column 1: dataset in Experiment 3; column 2: the reference rank order; Columns 3–6: the resulting rank order of MAE, RMSE, SNRlog , and computational attention.
The bottom of each column shows the probability of correct classification of the rank order in that column with respect to the reference rank order in column 2.
attention. It allows that the rational model of attention can be tuned for some kinds of images but also reacts very well to other kinds of pictures by simply changing risk attitudes within the same framework. For the DISSTAF database, it was demonstrated that rational models of computational attention with a riskaversion attitude with respect to gambles on location-dependent attention can be used to predict visual target distinctness. The risk-aversion attitude seems to be a consequence of the possible presence of small, hidden military vehicles in some complex rural backgrounds of the DISSTAF database. The validity and generalizability of the elicitation of risk attitudes for a rational model of computational attention as that presented here could be enhanced by exploring different databases. While this empirical assessment should be regarded cautiously, on the basis of these preliminary findings it appears that the multi-bitrate attention map, which measures the attention score following Postulates 1–7, gives consistent results and is suitable for further work.
Acknowledgments The authors thank to the referees for suggesting several good ways to improve the original manuscript.
principles that the solution of the problem must obey, and then derive the solution that satisfies exactly the principles. To the computational attention problem we have imposed three coherence axioms, two quantization axioms, and two additional axioms that restrict the form of utilities for consequences. The result is a rational model of computational attention. It is not rare that one would like to impose more axioms that are jointly compatible. It may also happen that the axiomatic computational attention resulting from the original list of axioms is found to react very bad to some significant image. One must then formalize the characteristics of the image and state an additional axiom that specifies how the computational attention should behave in this situation, and finally determine the greatest subset of axioms from the original list that are compatible with the new axiom. Of course, compatibility may hold for several distinct such subsets. In any case, the critical difference with respect to the approaches discussed in the Introduction section is that we will be able to predict exactly the behavior of the axiomatic solution according to its principles. Thus, the rational model of computational attention can choose at any time among alternative spatial locations in such a way as to avoid certain forms of behavioral inconsistency. We have proved in Proposition 3 that a rational system for the allocation of attention might exhibit either a risk seeking posture or risk aversion with respect to gambles on location-dependent
Appendix A. Proof of Proposition 1 (a) Proposition 1(a) states that, to avoid certain forms of behavioral inconsistency when a computational system chooses a location of interest upon which to focus attention, at any time, the gray-level occurrence sets Gi fGl;i ; l A Lg should be represented by probability distributions Ri fpðGl;i jPi ; tÞ; l A Lg. In such a framework, the actions available to the system are the various probability distributions Ri over Gi, the latter constituting the gray-level occurrence set corresponding to each possible location of interest upon which to direct attention. And this result directly comes from Proposition 2.11 in Bernardo and Smith [24], which establishes formally that coherent, quantitative measures of uncertainty about events must take the form of probabilities: (i) coherent, quantitative degrees of belief have the structure of a finitely additive probability measure; moreover, (ii) significant events, i.e., events which are practically possible but not certain, should be assigned probability values in the open interval ð0; 1Þ. (b) Proposition 1(b) asserts that options in the selection of a spatial location to allocate attention at time t cannot be ordered without a specification of utilities (numerical values) for the consequences. Assuming a definition of utility that
ARTICLE IN PRESS J.A. Garcı´a et al. / Pattern Recognition 43 (2010) 1618–1630
only involves comparison among consequences and options constructed with standard events, we would expect the utility of a consequence to be uniquely defined and to remain unchanged as new information is obtained, since the preference patterns among consequences is unaffected by additional information. This is indeed the case, as Proposition 2.21 in [24] establishes for decision problems in which extreme consequences are assumed to exist. In our problem it is attractive to have available the possibility, for conceptual and mathematical convenience, of dealing with sets of consequences not possessing extreme elements. But Proposition 2.23 in [24] also extends Proposition 2.21 in [24] to a more general situation in which extreme consequences are not assumed to exist.
We next prove that these three functions are the only ones consistent with Postulate 7 (to this aim we follow a proof suggested in another context by Keeney and Raiffa [28]). We twice differentiate Eq. (6) and divide the second derivative of each side by the first derivative, which gives (suppressing the subscripts) u0 ðpÞ u00 ðqpÞ ¼q 0 : u0 ðpÞ u ðqpÞ
u00 ðpÞ u00 ðqn p1 Þ u00 ðp1 Þ ¼ lim qn p1 0 n ¼ p1 0 ; n-1 u0 ðpÞ u ðq p1 Þ u ðp1 Þ
ð9Þ
where (a) follows from Eq. (8). And, similarly, for any p2 we have
A well-behaved utility function, [25–27], is local and thus the value of distribution Ri fpðGl;i jPi ; tÞ; l A Lg is to be assessed in terms of the probability it assigned to the actual outcome. It leads to simplifications that make it a useful working hypothesis:
lim p
p-0
with pl;i ¼ pðGl;i jPi ; tÞ being the probability of gray level l in the neighborhood of location Pi that would result from the improvement in reconstruction fidelity achieved by allocating attention to Pi at time t. Let uP1 ;P2 ;...;Pn ðpl;1 ; pl;2 ; . . . ; pl;n Þ represents the utility function for consequences over spatial locations P1 ; P2 ; . . . ; Pn . Keeney and Raiffa [28] have demonstrated that the hypothesis of value independence given in Postulate 6 holds if and only if, for all Pi , there are utility functions uPi ðpl;i Þ and constants ai such that: X ai uPi ðpl;i Þ; ð4Þ uP1 ;P2 ;...;Pn ðpl;1 ; pl;2 ; . . . ; pl;n Þ ¼ i
P
where i ai ¼ 1. That is, uP1 ;P2 ;...;Pn ðpl;1 ; pl;2 ; . . . ; pl;n Þ has an additive form. Let pw l;j be the probability of gray level l using the worst reconstruction fidelity in the neighborhood of location Pj ; and pbl;j be the probability of gray level l using the best reconstruction fidelity in the neighborhood of Pj . Without loss of generality we b scale utility function uPj ðpl;j Þ so that uPj ðpw l;j Þ ¼ 0 and uPj ðpl;j Þ ¼ 1. We have that Postulate 7 states that uP1 ;...;Pn ðpl;1 ; . . . ; pl;i ; . . . ; pw l;j ; . . . ; pl;n Þ ¼ ð5Þ
with 0 oq o 1, and for all pl;i , where 1 q is the proportion of pl;i that would be given up to achieve the improvement in b reconstruction fidelity that results by changing pw l;j to pl;j . Then, substituting Eq. (4) in Eq. (5), we find that ð6Þ
for all pl;i . Following Pliskin et al. [29], we first show that the three functional forms are consistent with Postulate 7. If uðpÞ ¼ log ðpÞ, then ai ¼ 1=ð1 log qÞ and aj ¼ log q=ð1 log qÞ are consistent with Eq. (6). Similarly, if uðpÞ ¼ pr or uðpÞ ¼ pr , then the values of ai ¼ qr and aj ¼ 0 are consistent with Eq. (6). Summarizing, we have already proved that logðpÞ, pr for r 40, and pr for r o0, are consistent with Postulate 7 when two particular probabilities pw l;j to pbl;j are involved in the trade-offs. By a corollary given in [29] we can easily prove that this is sufficient for Postulate 7 to hold for any pair of probabilities.
u00 ðpÞ u00 ðqn p2 Þ u00 ðp2 Þ ¼ lim qn p2 0 n ¼ p2 0 : n-1 u0 ðpÞ u ðq p2 Þ u ðp2 Þ
ð10Þ
From Eqs. (9) and (10), it follows that for any p1 and p2 : p1
uðRi ; Gl;i Þ ¼ uðpl;i Þ;
ai uPi ðpl;i Þ ¼ ai uPi ðqpl;i Þ þ aj ;
ð8Þ
for all p and all n Z 0. Given that u is a ‘‘well-behaved’’ function, it implies the existence of limp-0 pu00 ðpÞ=u0 ðpÞ. Then, for any p1, we have lim p
uP1 ;...;Pn ðpl;1 ; . . . ; q pl;i ; . . . ; pbl;j ; . . . ; pl;n Þ;
ð7Þ
Recursively substituting qp for p in Eq. (7) it follows that: u00 ðpÞ u00 ðqn pÞ ¼ qn 0 n ; 0 u ðpÞ u ðq pÞ
p-0
Appendix B. Proof of Proposition 2
1629
u00 ðp1 Þ u00 ðp2 Þ ¼ p2 0 ; u0 ðp1 Þ u ðp2 Þ
ð11Þ
or, equivalently, p
u00 ðpÞ ¼ c: u0 ðpÞ
ð12Þ
By integration we obtain that the utility function uðpÞ must have one of the three functional forms logðpÞ for r ¼ 0, pr for r 4 0, and pr for r o 0, where r ¼ 1 c, and c is the constant in Eq. (12). Following Machina [30], we know that the shape of uðpÞ determines risk attitudes. We have that for r o 1, the utility is a concave function: (i) pr for 0 or o 1; (ii) logðpÞ for r ¼ 0; or (iii) pr for r o0. Since a system with a concave utility function will in fact always prefer receiving a sure gain to the ‘‘gamble’’ itself, concave utility functions are termed risk averse. For r 4 1 the utility function must have the form pr , which is a convex function. It implies that the resulting system prefers bearing the risk rather than receiving the sure gain of the expected value of the ‘‘gamble’’ on location-dependent attention. Hence, such utility function is termed risk loving. This proves Proposition 2.
Appendix C. Proof of Proposition 3 From Proposition 2, we have the utilities of reporting that pðGl;i jt 1Þ or pðGl;i jPi ; tÞ might be log pðGl;i jt 1Þ and log pðGl;i jPi ; tÞ, respectively. Thus, conditional upon the allocation of attention to spatial location Pi at time t, the expected increase in utility that would result from the improvement in visual quality achieved by allocating attention to Pi at t, would be given by X pðGl;i jPi ; tÞ½log pðGl;i jPi ; tÞ log pðGl;i jt 1Þ I½Pi ; t=Q ¼ l
X pðGl;i jPi ; tÞ ; pðGl;i jPi ; tÞlog ¼ pðGl;i jt 1Þ l which, by Theorem 1 in Garcia et al. [22], is non-negative and verifies the nilpotence condition. That is, the expected increase in utility in this case is the Kullback–Leibler information gain, which has a minimal number of properties that are natural and, thus, desirable for predicting visual distinctness from 2D digital images (see [22] for further details).
ARTICLE IN PRESS J.A. Garcı´a et al. / Pattern Recognition 43 (2010) 1618–1630
1630
Similarly, it can be proved that if preferences are described by the utility function for r o 0, then the expected increase in utility provided by the improvement in visual quality achieved by allocating attention to Pi at t, when the initial probability distribution Q fpðGl;i jt 1Þ; l A Lg is given by X I½Pi ; t=Q ¼ pðGl;i jPi ; tÞf½pðGl;i jt 1Þr ½pðGl;i jPi ; tÞr g; ð13Þ l
while if preferences are described by the utility function for r 4 0, then the expected increase in utility is given by X pðGl;i jPi ; tÞf½pðGl;i jPi ; tÞr ½pðGl;i jt 1Þr g: ð14Þ I½Pi ; tÞ=Q ¼ l
This proves Proposition 3. References [1] J.R. Bergen, B. Julesz, Parallel versus serial processing in rapid pattern discrimination, Nature 303 (1983) 696–698. [2] C. Koch, S. Ullman, Shifts in selective visual attention: towards the underlying neural circuitry, Hum. Neurobiol. 4 (1985) 219–227. [3] M.I. Posner, Y. Cohen, R.D. Rafal, Neural systems control of spatial orienting, Philos. Trans. R. Soc. Lond. B Biol. Sci 298 (1982) 187–198. [4] W. James, The Principles of Psychology, Dover, New York, 1950. [5] L. Itti, C. Koch, E. Niebur, A model of saliency-based visual attention for rapid scene analysis, IEEE Trans. Pattern Anal. Mach. Intell. 20 (1998) 1254–1259. [6] A. Treisman, Features and objects: the fourteenth Barlett memorial lecture, Q. J. Exp. Psychol. A 40 (1998) 201–237. [7] O. Le Meur, P. Le Callet, D. Barba, A coherent computational approach to model the bottom-up visual attention, IEEE Trans. Pattern Anal. Mach. Intell. 28 (5) (2006) 802–817. [8] G.M. Boynton, Attention and visual perception, Curr. Opin. Neurobiol. 15 (2005) 465–469. [9] M. Mancas, Computational Attention: Towards Attentive Computers, Presses universitaires de Louvain, ISBN: 978-2-87463-099-6, Belgium, 2007, pp. 1–267. [10] W. Osberger, A.J. Maeder, Automatic identification of perceptually important regions in an image using a model of the human visual system, in: 14th International Conference on Pattern Recognition, Brisbane, Australia, 1998. [11] K.N. Walker, T.F. Cootes, C.J. Taylor, Locating salient object features, in: P.H. Lewis, M.S. Nixon (Eds.), Proceedings of British Machine Vision Conference, vol. 2, BMVA Press, 1998, pp. 557–566. [12] T.N. Mudge, J.L. Turney, R.A. Voltz, Automatic generation of salient features for the recognition of partially occluded parts, Robotica 5 (1987) 117–127.
[13] A. Oliva, A. Torralba, M.S. Castelhano, J.M. Henderson, Top-down control of visual attention in object detection, in: Proceedings of IEEE International Conference on Image Processing, vol. 1, 2003, pp. 253–256. [14] L. Itti, P. Baldi, Bayesian surprise attracts human attention, Advances in Neural Information Processing SystemsMIT Press, Cambridge, MA, 2006, pp. 1–9. [15] C.M. Privitera, L. Stark, Algorithms for defining visual regions of interest: comparison with eye fixations, IEEE Trans. Pattern Anal. Mach. Intell. 22 (2000) 970–982. [16] T. Bosse, W. Doesburg, P. Maanen, J. Treur, Augmented metacognition addressing dynamic allocation of tasks requiring visual attention, in: Proceedings of 12th International Conference on Human–Computer Interaction, Lecture Notes in Computer Science, vol. 4565, Springer, Berlin, 2007. [17] P. Maanen, L. Koning, K. Dongen, Design and validation of HABTA: human attention-based task allocator, in: Proceedings of First International Workshop on Human Aspects in Ambient Intelligence, Darmstadt, Germany, November 10, 2007. [18] J.A. Garcia, R. Rodriguez-Sanchez, J. Fdez-Valdivia, Justice in quantizer formation for rational progressive transmission, Opt. Eng. 43 (2004) 2105– 2119. [19] J.A. Garcia, R. Rodriguez-Sanchez, J. Fdez-Valdivia, Progressive Image Transmission: The Role of Rationality, Cooperation and Justice, PM-140, SPIE Press, Bellingham, Washington, USA, 2004, p. 230. [20] A.M. Rohaly, A.J. Ahumada, A.B. Watson, Object detection in natural backgrounds predicted by discrimination performance and models, Vision Res. 37 (23) (1997) 3225–3235. [21] A. Toet, F.L. Kooi, P. Bijl, J.M. Valeton, Visual conspicuity determines human target acquisition performance, Opt. Eng. 37 (7) (1998) 1969–1975. [22] J.A. Garcia, J. Fdez-Valdivia, X.R. Fdez-Vidal, R. Rodriguez-Sanchez, Information theoretic measure for visual target distinctness, IEEE Trans. Pattern Anal. Mach. Intell. 23 (4) (2001) 362–383. [23] L. Itti, C. Koch, Target detection using saliency-based attention, in: NATO SCI12 Workshop on Search and Target Acquisition, Utrecht, The Netherlands, June 21–23, 1999, pp. (3-1)–(3-10). [24] J.M. Bernardo, A.F.M. Smith, Bayesian Theory, Wiley Series in Probability and Statistics, Wiley, Chichester, UK, 1994. [25] P.C. Fishburn, The Foundations of Expected Utility, D. Reidel Pub. Co., Dordrecht, The Netherlands, 1982. [26] G. Herden, N. Knoche, C. Seidel, W. Trockel (Eds.), Mathematical Utility TheorySpringer, Wien, 1999. [27] B.P. Stigum, F. Wentsop, Foundations of Utility and Risk Theory with Applications, Kluwer Academic Press, Dordrecht, The Netherlands, 1983. [28] R.L. Keeney, H. Raiffa, Decisions with Multiple Objectives: Preferences and Value Tradeoffs, Wiley, NY, 1976. [29] J.S. Pliskin, D.S. Shepard, M.C. Weinstein, Utility functions for life years and health status, Oper. Res. 28 (1980) 206–224. [30] M.J. Machina, Choice under uncertainty: problems solved and unsolved, Econ. Perspect. 1 (1) (1987) 121–154.
About the Author—J.A. GARCI´A was born in Almeria, Spain. He received the M.S. and Ph.D. degrees both in Mathematics from the University of Granada in 1987 and 1992, respectively. Since 1988 he has been with the Computer Science Department (DECSAI) at Granada University where he is now Full Professor. Author of over 100 technical papers and three books, he has devoted the last 14 years to developing computer vision models for biomedicine, astronomy, cartography, feature extraction, clustering, image representation, image distortion, visual target distinctness, and image compression.
About the Author—ROSA RODRIGUEZ-SA´NCHEZ was born in Granada, Spain. She received the M.S. and Ph.D. degrees both in Computer Science from the University of Granada in 1996 and 1999, respectively. Currently she is with the Computer Science Department (DECSAI) at Granada University where she is now an Associate Professor. Her current interest includes computer vision, visual perception and image coding.
About the Author—J. FDEZ-VALDIVIA was born in Granada, Spain. He received the M.S. and Ph.D. degrees both in Mathematics from the University of Granada in 1986 and 1991, respectively. Since 1988 he has been with the Computer Science Department (DECSAI) at Granada University where he is now Full Professor. His current interest includes computer vision, image representation, feature detection, visual target distinctness, image coding, and biomedical applications. His research work is summarized in over 100 papers in scientific journals and conference proceedings and three books in the field of Computer Vision.