An almost general theory of mean size perception

An almost general theory of mean size perception

Vision Research 83 (2013) 25–39 Contents lists available at SciVerse ScienceDirect Vision Research journal homepage: www.elsevier.com/locate/visres ...

909KB Sizes 34 Downloads 66 Views

Vision Research 83 (2013) 25–39

Contents lists available at SciVerse ScienceDirect

Vision Research journal homepage: www.elsevier.com/locate/visres

An almost general theory of mean size perception Jüri Allik a,b,⇑, Mai Toom a, Aire Raidvee a, Kristiina Averin a, Kairi Kreegipuu a a b

Department of Psychology, University of Tartu, Estonia Estonian Academy of Sciences, Estonia

a r t i c l e

i n f o

Article history: Received 14 September 2012 Received in revised form 10 February 2013 Available online 13 March 2013 Keywords: Mean size perception Ensemble characteristics Compulsory pooling Associative law of addition Internal noise Capacity limitation Selection

a b s t r a c t A general explanation for the observer’s ability to judge the mean size of simple geometrical figures, such as circles, was advanced. Results indicated that, contrary to what would be predicted by statistical averaging, the precision of mean size perception decreases with the number of judged elements. Since mean size discrimination was insensitive to how total size differences were distributed among individual elements, this suggests that the observer has a limited cognitive access to the size of individual elements pooled together in a compulsory manner before size information reaches awareness. Confirming the associative law of addition means, observers are indeed sensitive to the mean, not the sizes of individual elements. All existing data can be explained by an almost general theory, namely, the Noise and Selection (N&S) Theory, formulated in exact quantitative terms, implementing two familiar psychophysical principles: the size of an element cannot be measured with absolute accuracy and only a limited number of elements can be taken into account in the computation of the average size. It was concluded that the computation of ensemble characteristics is not necessarily a tool for surpassing the capacity limitations of perceptual processing. Ó 2013 Elsevier Ltd. All rights reserved.

1. Introduction Knowing how much Sir Francis Galton was obsessed with counting, it is not surprising that, when he visited a livestock fair in Plymouth in 1906, he collected all 787 ballots on which participants of a weight-judging competition marked their guesses about the weight of a particularly fat ox, after it was slaughtered and dressed. Galton published a note in Nature, showing that, if the democratic principle ‘‘one vote, one value’’ was applied, the mean of all 787 guesses was only a pound off the actual weight of 1,198 lb, which was closer than the prediction made by any single voter (e.g. Galton, 1907). James Surowiecki documented in his well-received book this and many similar examples demonstrating the wisdom of the crowd (Surowiecki, 2004). These demonstrations served to rehabilitate, partly at least, the image of the crowd, which was usually portrayed, at least since the publication of The crowd (Le Bon, 1896), a famous book by Gustave Le Bon of France, as sentimental, hysterical, and with an intellect equal to that of a small child or a ‘‘savage.’’ Later studies, usually under the heading ‘‘intuitive statistician,’’ demonstrated that not only a crowd but also individual observers can make, occasionally at least, remarkably accurate statistical judgments (Peterson & Beach, 1967). Please have a brief look at ⇑ Corresponding author. Address: Department of Psychology, University of Tartu, Tiigi 78, Tartu 50410, Estonia. Fax: +372 7376152. E-mail address: [email protected] (J. Allik). 0042-6989/$ - see front matter Ó 2013 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.visres.2013.02.018

the following list of two-digit numbers and try to guess, immediately after reading, them what their arithmetic mean is. Here they are: 54, 25, 79, 39, and 83. What was your guess? If your guess is no more than few numbers off 56, then you may be proud of your statistical intuition. Many studies have shown that human observers give estimates surprisingly close to the actual mean, even if the amount of numbers approaches 20 or more and there is no time for elaborate mental arithmetic (Anderson, 1967; Smith & Price, 2010). A reasonable explanation is that we all have an intuitive number sense which allows us to provide an approximate but reasonably accurate solution to different statistical tasks (Dehaene, 2011; Dehaene, Dehaene-Lambertz, & Cohen, 1998). After replacing numbers with geometric figures, it was also noticed that people can make fairly precise judgments about the average size of several geometric objects, typically lines or circles (Ariely, 2001; Chong & Treisman, 2003, 2005b; Fouriezos, Rubenfeld, & Capstick, 2008; Miller & Sheldon, 1969; Spencer, 1961, 1963). These studies were mainly inspired by an observation that the reduction of a set of similar items to a prototypical mean helps to economize on the limited capacity of the visual system by replacing multiple representations of individual elements with a statistical summary characterizing the set as a whole (Ariely, 2001; Chong & Treisman, 2003; Chong et al., 2008). These studies have also demonstrated that the mean size of a group of geometric figures can be judged almost as precisely as the length of a single line or the size of a circle. Applying statistical rules to the perception of mean size, one could expect that precision will increase

26

J. Allik et al. / Vision Research 83 (2013) 25–39

with the number of judged elements. Since the precision of the sample mean improves with the square root of the sample size p ( N), it is also expected that mean size judgment would improve with the square root of the elements to be judged (e.g. Fouriezos, Rubenfeld, & Capstick, 2008). In fact, the precision of mean size judgment is typically found to be independent of the number of judged elements (Alvarez, 2011; Ariely, 2001; Chong & Treisman, 2005b; Fouriezos, Rubenfeld, & Capstick, 2008; Spencer, 1961, 1963), which indicates that the human observer is not fully able to use the advantages of statistical aggregation. However, some researchers have seen the irrelevance of the total number of elements as evidence that the process of perceptual averaging is carried out automatically, presumably by an array of parallel ‘‘computers’’ (Alvarez, 2011; Ariely, 2001, 2008; Chong & Treisman, 2005b; Chong et al., 2008). The idea of involuntary and massively parallel processing of mean size was substantially strengthened by the finding that observers can fairly accurately report the average size of an array of geometric objects, even when they cannot recall the size of individual elements in the array (Ariely, 2001; Chong & Treisman, 2003, 2005a; Corbett & Oriet, 2011). Since mean size can be computed without explicit knowledge about individual elements, this has been taken as additional evidence in favor of automated and parallel computing. Nevertheless, in spite of considerable interest and an increasing number of studies, the main properties of parallel and effortless computation of mean size remain poorly understood. Even one of the central claims that mean size can be computed outside of focused attention needs to be viewed with reservation since it was demonstrated that all published evidence can be explained, in principle at least, through various focused-attention strategies, without invoking a special mechanism for average size perception (Myczek & Simons, 2008; Simons & Myczek, 2008; Solomon, Morgan, & Chubb, 2011). Indeed, all advocates of effortless and massively parallel computation assume, more or less explicitly, that all or at least nearly all exposed elements are taken into account in the judgment of the mean size. This premise, however, is very difficult to maintain, after many convincing demonstrations that the observer is at times virtually blind to a substantial amount of information. Several well-known phenomena such as the invisible gorilla (Chabris & Simons, 2010; Simons & Chabris, 1999) and inattentional (Simons, 2000) and change blindness (Simons & Rensink, 2005) suggest that the proposition about the use of all available information is unrealistic. Many studies in the framework of ideal observer analysis have also shown that, in numerous situations, the observer is able to use only a fraction of available information (Burgess et al., 1981; Raidvee et al., 2011; Rose, 1948). In the most extreme example, Myczek and Simons (2008) demonstrated that an observed precision of about 4–7% in the judgment of mean size can be explained by assuming that only 2–3 elements are enough to achieve accuracy in the mean size judgment observed. Another reason why there is not a sufficiently general theory is that the previous research on the perception of mean size has identified several vital but still facultative properties in the computation of average size. It was observed, for instance, how concentration of attention (Alvarez & Oliva, 2008, 2009; Ariely, 2001), different visual cues (Alvarez, 2011; Chong & Treisman, 2005a), rapid temporal presentation (Corbett & Oriet, 2011; Joo et al., 2009), minimally required exposure time (Whiting & Oriet, 2011), resistance to object substitution masking (Jacoby, Kamke, & Mattingley, 2012) and previous adaptation (Corbett et al., 2012) affect the ability to estimate mean size. Surprisingly little attention has been paid to the defining properties of the statistical averaging process itself. For example, it is well known that the order in which addends are summed does not change the end result. Similarly, the grouping of added numbers does not affect the sum (the associative law). If the observer’s task, for example, is to

discriminate the mean size of four circles in comparison to a reference, then it does not matter whether we add, for instance, 4 size units to the diameter of only one of them or we add one size unit to the diameters of all four circles. Intuitively, it is more likely that the human observer can more easily notice an outlier which is 4 size units larger than the reference size, rather than four small increments of 1 size unit added to each of the four circles. Albeit counterintuitive, any theory insisting that the perceptual system is capable of computing mean size must confront the challenge of showing that these two cases result in an identical perceptual outcome. Grouping these 4, or any other number of units, into different packages does not affect the sum nor consequently, in theory, the perception of the mean size. As far as we know, this very easily falsifiable prediction has never been tested before. The series of studies reported in this paper were conducted with the goal of carrying out a systematic study of mean size perception. In addition to testing some of the fundamental properties of statistical averaging, we will propose a very simple and almost universal explanation for a large range of facts related to the perception of mean size. By theory, we understand a formal system expressed in exact mathematical terms which is able to reproduce all observer response functions with the required accuracy. The proposed theoretical explanation should be sufficiently general so that it depends on only some universal stimulus attributes (number of elements, their sizes, etc.), not particular details (color of elements, their exposure time, sharing of attention between two tasks, etc.) which could vary from one situation to another, but have only marginal effects on mean size judgments. The proposed model is also expected to contain a minimal number of free (to be determined) parameters, all of which are anticipated to have a very transparent psychological interpretation. One of the two basic assumptions on which the proposed Noise and Selection (N&S) Theory is based on, is that neither the size of a single object nor the mean size of a set of objects can be measured with absolute precision. The size of every single element must be measured and transformed into its subjective representation. This process of transformation is inherently noisy and the measured value varies from one trial to another, even if the physical size remains constant. This means that the perceived size is represented on a psychological continuum, with some positional error, what Thurstone called ‘‘discriminal dispersion’’ (Thurstone, 1927a, 1927b). This dispersion is obviously not a constant, but rather demonstrates a systematic relationship with some stimulus parameters. One of the very first attempts in the history of experimental psychology established that the just noticeable difference (JND) between two sizes increases approximately in proportion to the total size of these objects (for a review see Wolfe, 1923). The interest of the founders of experimental psychology as well as the current researchers in this quantitative relationship, usually known as Weber’s ‘‘law,’’ was mainly motivated by an understanding that, from an error function of discrimination, it is possible to recover the metrics that the observer is using to make decisions about spatial intervals, or any other visual attribute (Dzhafarov & Colonius, 2011). The existence of non-zero discriminal dispersion means that, even if the number of processed elements is well within capacity limitation, their size cannot be judged without error (Im & Halberda, 2012; Solomon, Morgan, & Chubb, 2011). On the other hand, we need to assume that, if the number of judged elements exceeds the capacity of attention or working memory, then it is very unlikely that all available information can be processed. In order to explain experimental data, it is necessary to assume that the observer’s decisions about a large number of elements may be actually based on a substantially smaller subsample of these. When the number of judged elements exceeds capacity limitation, a sizable number of elements is simply ignored and the presence of the elements in the stimulus has no effect on

J. Allik et al. / Vision Research 83 (2013) 25–39

judgment accuracy (Myczek & Simons, 2008; Raidvee, Averin, & Allik, 2012). This selection principle means that the judged sizes of individual elements are indeed pooled together, very much similar to arithmetic addition or some other process comparable to summation, provided that addends are not all, but only a subsample, of the displayed elements. Both of these two principles – no size can be judged without measurement error and not all elements are necessarily taken into account – are well known to psychologists and are used as one of the basic tools of psychophysical analysis. What is perhaps less common is that the joint application of these two principles is sufficient to formulate a very simple model, the N&S theory, which is able, with remarkable accuracy, to explain the majority of empirical facts established to date about mean size perception. Along with the testing of basic axioms of addition, they constitute a sufficiently general explanation for how human observers estimate the mean size of simple geometric figures. 2. General methods There were four separate studies conducted to analyze the perception of the mean size of geometric figures. Like many previous researchers, we decided to study one of the simplest geometric shapes – circles. Stimuli were presented on a flat LCD monitor screen, which was viewed from about 70 cm by an observer. From this viewing distance, one pixel subtended about 2 min of arc. All stimuli were generated and answers recorded by programs written in MATLAB (The MathWorks, Inc.). Three observers with normal or corrected to normal vision participated in all four studies. Sets of circles with various sizes were presented on a display screen for a short time period and observers had to indicate whether the mean size of the test circles was larger than the previously seen reference circle by pressing the respective keys on the keyboard. No feedback about the correctness was given. The typical stimulus arrangement used in studies 1, 3, and 4 is shown in Fig. 1. The figure shows the central reference circle (dotted line), which was switched off 0.5 s before the appearance of the test circles in 8 fixed positions. These positions were placed on an imaginary circle, with a radius equal to 375 pixels (approximately 16° in diameter when viewed from a distance of 0.7 m). The width of the contours was 2 pixels. The test circles were either smaller or larger

27

than the size of the reference circle and, in this figure, they occupy all 8 available positions. If the number of test circles N was less than 8, then the non-occupied positions were chosen randomly. The only exception to the general method was one series in Study 2 which had a slightly different arrangement. In that series of experiments, only two circles (test and reference) were exposed simultaneously at equal horizontal distances from a central fixation point. Observers were instructed to report which of the two presented circles was larger in size. At the beginning of each trial, a blue dot was exposed, marking the central fixation point. The test and reference circles were exposed for 500 ms. After each response, there was 0.5 s pause before the next trial. 3. Study 1: The precision of mean size judgment as a function of the number of judged elements As shown in several previous studies, precision in mean size discrimination varies very little with the number of objects judged (Alvarez, 2011; Ariely, 2001; Chong & Treisman, 2005b; Spencer, 1961, 1963). As noted above, from a statistical theory viewpoint, such a result indicates that the precision of mean size computation decreases with the number of elements. The precision of mean size judgments is expected to improve with the square root of the p number of elements ( N), provided that the variance in judged sizes remains fairly constant (e.g., Fouriezos, Rubenfeld, & Capstick, 2008). On the other hand, it is also firmly established that size discrimination precision decreases nearly proportionally with the length of the spatial extent being judged (Burbeck & Hadden, 1993; Solomon, Morgan, & Chubb, 2011; Wolfe, 1923). As an inevitable consequence, if the mean size of a set of geometric figures is estimated on the basis of the total length of the judged attribute – the total height of N lines, the cumulative diameter of N circles, etc. – the precision of judgment is expected to increase proportionally with the number of judged elements. For example, the stimulus property that is supposedly judged in the eight-item (N = 8) display must be eight times larger than the same property judged in the single-item display. Since many previous studies have shown that mean size judgment accuracy remains approximately invariant, regardless of the total number of elements being judged, this poses a challenge to any theory attempting to explain perception of mean size. Interestingly, Daniel Kahneman noticed a substantial psychological difference between the average length of simple geometric figures, such as lines, and their cumulative length (Kahneman, 2011). Without providing details, Kahneman claims that it is difficult, or even impossible, to compute the total length of a set of lines at a glance, while it is easy—automatic and effortless—to register the average length of these lines, just as it is easy to register the color of these lines (Kahneman, 2011; see Fig. 8). Thus, an experiment in which the total number of judged elements is varied systematically is a rather crucial benchmark for various theories explaining the perception of mean size. In this study, we try to establish how the precision of mean size judgment depends on the number of judged elements by looking for a clue as to why the precision of mean size judgments does not improve with the square root of the number of judged elements, as predicted by elementary statistical considerations. 3.1. Methods

Fig. 1. A schematic view of stimulus consisting of 8 circles presented in the Study 1. The reference circle (dotted) was removed before the presentation of the test circles.

Sets with N = 1, 2, 4, and 8 circles were used to form separate blocks in this study. Within each block of the experiment, the total number of circles presented was kept constant, but their position and size were varied from trial to trial. To construct the stimuli, a set of base-circles was generated for each series so that the average of the set was equal to the size of the reference circle used,

28

J. Allik et al. / Vision Research 83 (2013) 25–39

μ

= 11.58

JA

Empirical

0.2

Theoretical

μ

KA

= 2.14; σ

KA

Empirical Theoretical

MT

= 2.62; σ

MT

0.6

KA

= 4.61; σ

KA

0.4

KA, N = 2 K = 1.6 ς = 7.2

Empirical Theoretical

Empirical

0.2

Theoretical

MT

= 3.27; σ

MT

−18−12 −6 0 6 12 18 Δd (pixels)

JA

JA, N = 4 K = 2.6 ς = 11.6

Empirical

0.2

Theoretical

KA

= 0.49; σ

KA

0.8 0.6

MT, N = 2 K = 1.4 ς = 4.1

KA, N = 4 K = 2.8 ς = 7.2

Empirical Theoretical

Empirical

MT

= 2.47; σ

MT

−18−12 −6 0 6 12 18 Δd (pixels)

Empirical

0.2

Theoretical

−18−12 −6 0 6 12 18 Δd (pixels)

KA

= −1.41; σ

KA

= 6.9

0.8 0.6

KA, N = 8 K = 4.6 ς = 7.2

0.4 Empirical

0.2

Theoretical

0

−18−12 −6 0 6 12 18 Δd (pixels) μ

MT

= 2.6; σ

MT

= 6.79

1 MT, N = 4 K = 2.5 ς = 4.1

Empirical Theoretical

0

= 9.36

0.4

= 8.7

0.4 0.2

JA

JA, N = 8 K = 3.7 ς = 11.6

μ

−18−12 −6 0 6 12 18 Δd (pixels) μ

0.6

= −2.39; σ

1

0.2

0.8

0.6

= 8.18

0.4

0

0.8

0

−18−12 −6 0 6 12 18 Δd (pixels)

1

Theoretical

0

μ

= 10.96

0.4

= 8.05

0.4 0.2

JA

1

μ

−18−12 −6 0 6 12 18 Δd (pixels) μ

0.6

= −3.87; σ

1

0.2

0.8

0.6

= 8.16

1 MT, N = 1 K=1 ς = 4.1

0.8

0

−18−12 −6 0 6 12 18 Δd (pixels)

0.4

= 4.14 Choice probability

Choice probability

Theoretical

0

−18−12 −6 0 6 12 18 Δd (pixels) μ

0

Empirical

0.2

μ

1

0.6

JA, N = 2 K = 1.6 ς = 11.6

0.4

0.8

JA

1

Choice probability

KA, N = 1 K=1 ς = 7.2

0.2

0.8

μ

= 11.24

1

0.4

0

0.6

= 7.2 Choice probability

Choice probability

0.6

0.8

0

−18−12 −6 0 6 12 18 Δd (pixels)

1 0.8

JA

Choice probability

0.4

0

= 0.62; σ

Choice probability

0.6

JA, N = 1 K=1 ς = 11.6

Choice probability

Choice probability

0.8

JA

1

Choice probability

= 1; σ

Choice probability

JA

−18−12 −6 0 6 12 18 Δd (pixels)

Choice probability

μ 1

0.8 0.6

MT, N = 8 K = 4.3 ς = 4.1

0.4 0.2

Empirical Theoretical

0

−18−12 −6 0 6 12 18 Δd (pixels)

Fig. 2. Mean size discrimination curves as a function of the number of circles (N = 1, 2, 4, or 8) for the three observers. Blue circles and red rectangles represent the empirical and the theoretical data, respectively. The dotted line is the cumulative normal distribution, with the parameters l and r providing the best fit to both the empirical and the simulated data. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

which was 150 pixels in all trials. The base value for a single test circle was 150 pixels or 6.4°. For two test circles (N = 2), values were 144 and 156 pixels. For four test circles (N = 4), four base sizes were chosen: 138, 144, 156, and 162. For eight test circles (N = 8), sizes were varied around the following basic values: 126, 132, 138, 144, 156, 162, 168, and 174. In each trial, the average size of the presented stimuli was 2%, 4%, 6%, 8%, 10%, or 12% larger or smaller than the size of the reference circle. To achieve this required difference in average size, diameters of the circles in the corresponding base set were modified in two ways: the diameters of all N base-circles were increased or decreased by delta (Dd) pixels or the diameter of only one randomly selected base-circle was altered by NDd pixels. Here, Dd was calculated according to the values in the corresponding base set and proposed mean size of the ensemble. The only test-sets allowed were those in which the difference between the diameter sizes of the largest and smallest circles did not exceed half of the size of the reference. For example, if the mean size of the eight test circles presented must be 4% greater than the reference, then the stimulus may consist of circles having diameters 126, 138, 144, 156, 162, 168, 174, and 180 pixels. Here Dd = 6 and 8Dd was added to the diameter of the second circle in the corresponding base set. On average, there were 80 trials for every condition.

8) and the rows represent the data of the three observers (J.A., K.A., and M.T.). The probabilities that observers indicated that the mean size of N test objects was larger than the size of the reference circle are shown as blue circles. All empirical psychometric data were approximated with a cumulative normal distribution for which the best fitting values of the mean (l) and standard deviation (r) were determined (shown in the inset to each panel). The mean value l controls the horizontal position of the psychometric function and is expected to fluctuate around zero. Indeed, there was no remarkable tendency to under- or overestimate the judged size of test circles relative to the reference. The standard deviation r characterizes the slope of the psychometric function, showing the precision with which the mean size of the test elements could be discriminated from the reference size. Since one standard deviation corresponds to the 84.1 percentile point, sigma (r) can be also interpreted as a measure of the JND: how many pixels the mean test element size needs to be enlarged or shrunk compared to the reference circle for the observer to tell the difference between them in 84.1% of cases.1 In all 12 panels, the best fitting psychometric function is remarkably close to the empirical data points (blue symbols). We estimated the quality of fit by computing the Pearson product correlation between the observed and predicted data points. Since all correlations were exemplarily high from 0.982 to 0.997 (median

3.2. Results and discussion The mean size discrimination curves are shown in Fig. 2 as a function of the number of circles for the three observers. The columns correspond to the number of judged circles (N = 1, 2, 4, or

1 Some readers may remember that JND is typically defined as a 75% point of correct discrimination. Since one percentage is not more ‘‘natural’’ than any other we selected one standard deviation as a definition for the JND.

J. Allik et al. / Vision Research 83 (2013) 25–39

0.991), we concluded that error accounted for a negligible amount (typically about 1.9%) of the total variance. If we compare the columns in Fig. 2 with one another, we can see that the slopes of the psychometric functions do not systematically vary. It seems that the number of judged elements N seems to have only a minor effect on the slope of the psychometric functions (r), which automatically means JND. Although there were considerable differences between the observers, the number of judged elements had only insignificant effect on JND [F(3, 6) = 1.59, p = .287]. Since several studies have argued that not only the mean value of sizes but also deviation from the mean plays an important role in the mean size judgments (Fouriezos, Rubenfeld, & Capstick, 2008; Im & Halberda, 2012; Solomon, Morgan, & Chubb, 2011) we also had a closer look at the variance of the test circles diameters. Although we did not control the variance its average value in separate trials increased with the number of test circles (of course there was no variance when only one test circle was presented). Nevertheless, these differences were too small to exert a considerable impact on the mean size judgments. For example, the average standard deviation of diameter of four circles was 17.6 pixels and eight circles 18.9 pixels. We analyzed in more details judgments of the mean size of sets in which N = 8 test circles were presented on the screen. The standard deviation in these sets varied from 14.7 to 23.3 pixels. When the answers across all observers were predicted from the mean size and standard deviation of each set of circles presented in a given trial only the mean size had a significant contribution (b = 0.58, p < .000001) while the role of standard deviation was rather small (b = 0.02, p = .079). Thus, we have no evidence that variance has been taken into consideration in mean size judgments. This implies that the observer’s judgments are mainly based on the mean value of sizes and the precision with which the size of a single circle is perceived is almost as good as the perception of the mean size of eight circles. Thus, our results are in good agreement with the previous literature, in which invariance from the number of elements was repeatedly demonstrated (Alvarez, 2011; Ariely, 2001; Chong & Treisman, 2005b; Fouriezos, Rubenfeld, & Capstick, 2008; Spencer, 1961, 1963). There were occasional reports that the increasing number of judged elements results in better accuracy (e.g., Robitaille & Harris, 2011) but these studies suffer from a considerable methodological problems. As a matter of fact, based on statistical considerations, judgment of the mean size of the 8 elements is expected to be about 2.8 times more accurate than judgment of a single element. At the same time, it almost certainly rules out any explanation which assumes that observer judgments are based on computing the total length of the perimeters of the individual circles, since length increases in a linear manner with the number of circles. One possibility is to be satisfied with a phenomenological level of explanation and speculate about possible reasons as to why p judgment of the mean size of eight circles is about 8 times less efficient than the size judgment of a solitary circle. Another option is to move from these ‘‘half-way explanations’’ towards a more elaborate description, which could identify the more atomistic mechanisms responsible for the measurement of the size of each individual element. As it turned out, a combination of two very simple mechanisms routinely used in many psychophysical analyses – additive noise and random selection – were able to explain all obtained psychometric curves with sufficient precision.

29

measure) of individual circles displayed, accurately summing these measures and dividing them by the number addends which gives the estimate of their mean size. This process of perceptual aggregation can be correctly understood by applying two very simple psychophysical principles. First, directly following the Thurstonian tradition, we assume that each individual act of measurement can be only executed with a certain degree of unavoidable error, which disperses the measured size randomly around a certain mean value. Technically speaking, this means that a normally distributed random value with a standard deviation equal to 1 (final sigma)2 is added to each individual measurement result. This implies that exactly the same size of some geometrical figure is perceived to have slightly different subjective sizes on different occasions. This measurement error caused by internal noise cannot be suppressed or avoided by allocating attention to the processed item. Strictly speaking we need to assume that the internal noise 1 increases with the size of measured element but since circles within an experiment had not very remarkable size differences it is only the average noise that matters. The second proposal was that not all displayed elements are necessarily taken into account in the decision-making process. If the number of elements N exceeds capacity limits, then a smaller number of elements K is randomly selected from these N elements (K < N). Although it would be more realistic to assume that the selection process is not completely random and observers have certain preferences, we can still start from the simplifying assumption that the selection process is nearly stochastic. The assumption of random selection is not very restrictive, provided that all stimulus conditions are randomized. Obviously, the ratio between the sampled and presented elements K/N determines the precision of the mean size judgment, which is manifested in the slope of the psychometric function. Thus, the selection principle assumes that the mean size is not always computed on all presented elements. Sometimes only a fraction of these are used in the judgment of the mean size. It was thus proposed that it is necessary to specify only two values – 1 and K – in order to reproduce an empirically obtained psychometric function of mean size discrimination. In order to simulate observer judgments, we wrote a simple simulation program which was supplied with exactly the same values of the circle’s diameters as shown in the real experiments. If the number of circles N surpassed the capacity limit K < N, then K elements out of N were randomly selected for inspection. To the diameter of each selected element, a random value was added. This pixel value was randomly generated from a normal distribution N(0, 1). These diameters and the random values were added together and divided by the number of addends K. If the mean value of these K addends was larger than the diameter of the reference circle, then the answer ‘‘larger’’ was selected. If the mean value was smaller than that of the reference circle, then the answer ‘‘smaller’’ was given. Finally, if the mean of these K addends was equal to the reference size, then either answer, ‘‘smaller’’ or ‘‘larger’’, was chosen with equal probability, 0.5. In order to eliminate statistical flukes, all simulations were replicated 100 times. This number of replications was chosen because it was sufficient to eliminate noticeable random fluctuations in the final estimate. Since it is unrealistic to expect observers to be able to select exactly K elements in each trial, we allowed K to take not only integer but also fractional values. These fractions have a very simple interpretation. For example, K = 3.6 could mean that the observer picks out 3 elements in 40% of all trials and 4 elements in 60% of trials. Thus, fractions of K can be interpreted as a mixture of different integer valued Ks in across a variety of trials.

4. The proposed N&S model We propose that the observer is indeed capable of measuring the size (radius, diameter, circumference, or any other linear

2 We use this symbol (final sigma) to distinguish it from another sigma r, which is the habitual symbol for the standard deviation of the normal distribution.

30

J. Allik et al. / Vision Research 83 (2013) 25–39

Next we proposed that the judgment of the size of a single circle is a special case where the selected and presented numbers of elements are undeniably identical (K = N). This means that the slope of the psychometric discrimination function (r) is solely determined by the internal noise (1) and unaffected by under-sampling. Thus, we found a value for the internal noise (1) in the condition where the size of a single circle was judged (N = 1), assuming that the level of internal noise remains approximately the same as the number of elements increases. Although this assumption may appear somewhat artificial, it is certainly more acceptable than supposing that internal noise (1) increases proportionally with the number of judged elements N. For each observer and each number of circles N combination, a program searched for the best value of K which could provide the closest fit to both the observed data (blue circles) and the points generated by the simulation model (red rectangles). The optimal values of the N&S model, K and 1, are shown in the inset to each panel. As can be seen, the simulated model data are barely distinguishable from the empirically collected data, both falling close to the best fitting normal distribution (see Fig. 2). As expected, in most cases simulated models fall almost perfectly along the cumulative normal distribution, which provided the best fit to the empirical psychometric functions. All the correlations between the simulation fit and the best fitting psychometric function were equal to 1.0. Since observers were not able to press the response buttons with equal probability, all empirical psychometric functions shown in Fig. 2 were shifted slightly either to the right or left. Although these shifts were rather small, they still may artificially increase the discrepancy between the empirical results and the theoretical prediction. For this very reason, the predictions for all models were shifted l units along the abscissa so that the mean response would be equal to 0.5. As this kind of transformation would further prohibit the direct application of discrete computational methods in the assessment of model fit, we chose to compare the empirical and theoretical curves via the best fitting cumulative normal distribution. In addition to visual inspection, more formal approximation statistics – the percentage of explained variance – also confirm very good agreement between the empirical data and the predictions of the two-parameter model. Since both the empirical and theoretical points fitted exactly the same cumulative normal distribution almost flawlessly, it is possible to conclude that the proposed model provides a realistic picture of what is going on in the head of the observer when she or he is estimating the mean size. It is very satisfying that such a simple model, the N&S model, with only two free parameters, the number of randomly selected elements K and the magnitude of internal noise 1, is able to predict mean size judgments so well. It was slightly surprising that, according to the proposed model, observers started to omit elements, even if there were only two of these on the screen (cf. Solomon, Morgan, & Chubb, 2011). When there were 8 circles to judge, observers behaved as if they were able to notice the size of only approximately one-half of them. Not knowing exactly what could happen if the number of presented elements exceeded 8, processing capacity seems to indeed be less than the magical number 7 (Miller, 1956) and more close to an equally magic 4 (Cowan, 2001). This conclusion may need a revision since there were reports that observers’ accuracy of mean size judgments was better with the entire display compared to the presentation of only subsamples (Chong et al., 2008; Experiment 2). Also, recently Im and Halberda (2012) reported findings that 7 individual sizes were used in the mean size computation which certainly overpass the above mentioned capacity limitations. However, this estimate 7 individual sizes is very likely mistaken because it is based on a wrong equation which ignores the fact that the internal noise depends on the number of judged elements and in the p case of the comparative judgements a correction by a factor 2 should be applied (cf., Thurstone, 1927a, Case V).

These results provide an answer to the question of why mean size judgment does not improve as the square root of the number of judged elements increases. The explanation, in terms of the proposed model, is rather obvious. With an increase in the number of displayed elements, observers were not able to attend to all elements, but only a fraction of them. As the number of presented elements increased, the number of unattended elements increased as well. This indicates that, in terms of the ideal observer analysis, the observer’s performance decreases with an increase in the number of displayed elements. Previous researchers were obviously mistaken in assuming that the observed indifference to the total number of presented elements hints to the use of a global, parallel process that extracts a statistical summary of the average size of simple geometric objects in the display (Alvarez, 2011; Alvarez & Oliva, 2008; Ariely, 2001, 2008; Chong & Treisman, 2003, 2005b; Chong et al., 2008). Even if it is a global, parallel process that computes a statistical summary, it is obvious that the efficiency of this process decreases as the number of displayed elements increases. The invariance of the number of the judged elements can be explained by the fact that an increasing number of elements are ignored by the observer. Therefore, our results are in better accordance with the views of those few skeptics who have been suggesting that the results of mean size perception can be explained through various focusedattention strategies, without appealing to somewhat obscure massively parallel and effortless mechanisms (De Fockert & Marchant, 2008; Myczek & Simons, 2008; Simons & Myczek, 2008; Solomon, Morgan, & Chubb, 2011). Even if details of our proposed explanation turn out to need some correction, it still provides indisputable evidence that the averaging of all the items in the display is not a requirement to explain perception of the mean size. 5. Study 2: The just noticeable difference as a function of reference size All hypotheses concerning the human ability to compute statistical summaries need to start from the assumption that whatever visual attributes we talk about – size, orientation, color, etc. – it is inevitable to have some perceptual mechanism which is able to measure this physical attribute and transform it into an internal representation. If we talk, for example, about computing the average size of an ensemble of objects, then we need to postulate an ability to measure and represent the size of each of these objects, whether they are presented alone or together with some other similar objects. Perhaps it is not a coincidence that one of the first observations, pivotal in the birth of experimental psychology, was the observer’s ability to discriminate between two spatial intervals (Burbeck & Hadden, 1993; Wolfe, 1923). Before proceeding to the question of how the average size of an ensemble is computed, we need to establish a benchmark by demonstrating how the size of a single object is measured. For instance, it was noticed already many years ago that measurements of subjective length are very similar, whether they concern the length of a single geometric object or the average length of a collection of similar geometric objects (Miller & Sheldon, 1969). The purpose of this study is to test how the absolute size of judged elements affects the JND: whether it is the size of a single object or it is the mean size of a group of objects. 5.1. Methods In this study, we run two series of experiments. In the first of these two series, the size of a single circle was judged relative to a reference one. Two circles, the test and the reference, were presented in each trial. The diameter of one of the two circles was kept constant (the reference) for all trials, while the diameter of the

31

J. Allik et al. / Vision Research 83 (2013) 25–39

corresponds to judgments of a single circle and panel 3B to the judgment of the mean size of 4 circles. In addition to the empirical data (blue circles), results of the best simulation model (red rectangles) are shown. Empirical data were again distributed very closely to a cumulative normal distribution. The correlations between observed and predicted data varied from .939 to .998 (median .995), showing superb fit. Simulating data of the judgment of a single circle, it was natural to suppose that K = 1: the observer is attentive throughout all trials and can judge the size in each of them. In the inset to each figure, it is shown that the best predicting value of 1 increases nearly proportionally with the size of the reference circle. When judgments of the mean size of 4 circles were simulated (Fig. 3B), the model’s free parameters K and 1 were allowed to vary freely in order to search for the best fitting combination of these two parameters. As shown in the insets to each figure, the value of K remains fairly stable, being on average equal to 2.2, 3.1, and 2.4 for observers J.A., K.A., and M.T., respectively. It looks as if only 2–3 elements were taken into account in the judgment of the mean size of all four elements. What is remarkable is a nearly proportional increase in the standard deviation 1 with an increase of the reference size. In order to observe the relationship between JND and the reference size more transparently, we plotted, in Fig. 4, the standard deviations in pixels as a function of the reference size, also in pixels (please remember that one pixel was about 2 min of arc). Since all data points lay sufficiently close to a line, this indicates that the Weber law holds for both the judgment of a single circle and the

second circle (the test) was randomly chosen from a set of circles with diameters 1%, 2%, 4%, 6%, 8%, or 10% larger or smaller than the reference. In the three different experiment blocks, we used a reference circle with a diameter 75, 150, or 300 pixels. The diameters of test circles ranged from 68–83, 135–165, and 270–330 pixels, respectively. Circles were drawn at a distance approximately equal to the diameter of the reference presented on the left and right of the central fixation point. To avoid circle alignment as a potential cue, the locations of the centers of the presented circles were varied randomly in the window at a distance 5% of the size of the reference. The position of the test circle (to the left or right side) in reference to the central fixation point was chosen randomly in each trial. The design of the second series was identical to Study 1. Four test circles were presented in eight possible positions, which were randomly selected. Like the first series, there were three different experiment blocks for the reference circle, with a diameter of 75, 150, or 300 pixels. The test circle diameters were in the range of 10% to +10% of the reference size. The observer’s task was, like in the previous study, to indicate whether the mean size of the four test circles was smaller or larger than the reference circle. 5.2. Results and discussion Fig. 3 demonstrates the mean size discrimination probabilities for the three different reference sizes (75, 150, and 300 pixels) and for the three observers (J.A., K.A., and M.T.). Panel 3A

μ

JA

= 0.22; σ

μ

= 2.09

JA

JA, Ref = 75 K=1 ς = 2.1

0.4 Empirical

0.2

Theoretical

0

−7 −4 −1 2 5 Δd (pixels) μ

KA

= 0.1; σ

KA

Empirical Theoretical

−7 −4 −1 2 5 Δd (pixels) μ

MT

= −0.09; σ

MT

0.8 0.6

MT, Ref = 75 K=1 ς=2

= −0.6; σ

KA

Empirical

−7 −4 −1 2 5 Δd (pixels)

8

0.6

= 10.91

JA

JA, Ref = 300 K=1 ς = 10.9

0.4 Empirical

0.2

Theoretical

μ

= 5.43

KA, Ref = 150 K=1 ς = 5.5

Empirical

0.2

Theoretical

MT

0.6

0.8

0 −30 −18

15

0.4

0.8

= −0.66; σ

KA

−6 6 18 Δd (pixels)

= −0.77; σ

KA

30

= 9.82

1

μ Choice probability

Choice probability

KA

−3 3 9 Δd (pixels)

−3 3 9 Δd (pixels) = 0.54; σ

MT

0.8 0.6

KA, Ref = 300 K=1 ς = 9.8

0.4 Empirical

0.2

Theoretical

0 −30 −18

15

μ

= 3.78

MT

1

Theoretical

0

Theoretical

= 1.95

0.4 0.2

Empirical

0.2

0 −15 −9

8

1

0.6

0.4

μ

KA, Ref = 75 K=1 ς = 2.1

0.2

0.8

0.6

JA, Ref = 150 K=1 ς=5

1

0.4

0

0.8

= 2.1 Choice probability

Choice probability

0.6

JA

1

0 −15 −9

8

1 0.8

μ

= 4.98

JA

Choice probability

0.6

= 0.22; σ

Choice probability

0.8

JA

1 Choice probability

Choice probability

1

−6 6 18 Δd (pixels) = 0.45; σ

MT

30

= 7.19

1 MT, Ref = 150 K=1 ς = 3.8

0.4 0.2

Empirical Theoretical

0 −15 −9

−3 3 9 Δd (pixels)

15

Choice probability

(A)

0.8 0.6

MT, Ref = 300 K=1 ς = 7.3

0.4 0.2

Empirical Theoretical

0 −30 −18

−6 6 18 Δd (pixels)

30

Fig. 3. Mean size discrimination probabilities and approximation functions for three different reference sizes (75, 150, and 300 pixels) and for three observers (J.A., K.A., and M.T.). Blue and red symbols represent empirical and theoretical data, respectively. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

32

J. Allik et al. / Vision Research 83 (2013) 25–39

μ

JA

= −0.08; σ

JA

μ

= 5.18

Empirical

0.2

Theoretical

−9 −6 −3 0 3 Δd (pixels) μ

KA

= −0.47; σ

KA

6

KA, Ref = 75 K = 3.1 ς = 2.1

Empirical

0.2

Theoretical

−9 −6 −3 0 3 Δd (pixels) μ

MT

= 1.55; σ

MT

6

0.2

Theoretical

0.8 0.6

KA

= 5.29; σ

Empirical

MT

Theoretical

−9 −6 −3 0 3 Δd (pixels)

6

9

0.4 Empirical

0.2

Theoretical

KA

KA, Ref = 150 K = 3.3 ς = 5.5

μ

Empirical

0.2

0.6

JA, Ref = 300 K=2 ς = 10.9

μ

Theoretical

0.6

= 23.97

JA

= −6.58; σ

= 11.79

KA

1

0.2

0.8

0.8

= 4.73

KA

= 1.81; σ

MT

0.8 0.6

KA, Ref = 300 K=3 ς = 9.8

0.4 Empirical

0.2

Theoretical

0 −36 −24 −12 0 12 24 36 Δd (pixels)

12 18

μ

= 8.15

MT

1 MT, Ref = 75 K = 2.3 ς=2

= −3.71; σ

0 −36 −24 −12 0 12 24 36 Δd (pixels)

12 18

0.4

= 4.9

0.4

0

Empirical

0 −18 −12 −6 0 6 Δd (pixels)

9

Choice probability

Choice probability

0.6

0.4

μ

1 0.8

JA, Ref = 150 K = 2.3 ς=5

1

0.4

0

0.6

= 2.73 Choice probability

Choice probability

0.6

0.8

0 −18 −12 −6 0 6 Δd (pixels)

9

1 0.8

JA

1 Choice probability

JA, Ref = 75 K = 2.2 ς = 2.1

0.4

0

μ

= 9.91

JA

Choice probability

0.6

= 1.69; σ

= −5.72; σ

MT

= 17.78

1 MT, Ref = 150 K = 2.6 ς = 3.8

0.4 Empirical

0.2

Theoretical

0 −18 −12 −6 0 6 Δd (pixels)

Choice probability

0.8

JA

1 Choice probability

Choice probability

1

12 18

0.8 0.6

MT, Ref = 300 K = 2.4 ς = 7.3

0.4 0.2

Empirical Theoretical

0 −36 −24 −12 0 12 24 36 Δd (pixels)

Fig. 3. (continued)

Judgment of the size of one circle

Judgment of the mean size of four circles

26

26 JA KA MT

Precision (pixels)

24

JA KA MT

24

22

22

20

20

18

18

16

16

14

14

12

12

10

10

8

8

6

6

4

4

2

2

0

0 0

100 200 300 Size of the reference circle

400

0

100 200 300 Size of the reference circle

400

Fig. 4. Precision of the judgment of the size of a single circle or of the mean size of four test circles, as a function of the size of the reference circle (75, 150, and 300) for the three participants (J.A., K.A., and M.T.).

J. Allik et al. / Vision Research 83 (2013) 25–39

mean size of four circles. In both cases, JND increases in proportion with the judged size. What is different, however, is the slope for Weber’s function: JND increases more rapidly when it concerns the mean size judgment of four circles. This indicates that the precision of the mean size judgment of multiple objects is less precise than the judgment of a single object. The obtained result is obviously not in accordance with those theories which claim that the mean size of a group of geometric figures can be judged almost as precisely as the length of a single line or the size of a circle (Alvarez, 2011; Ariely, 2001; Chong & Treisman, 2005b). The fact that the internal noise 1 increases nearly proportionally with the size of the circle must have some obvious consequences to the proposed N&S model. In the current version of the model a random value with the same standard deviation 1 was added to each individual measurement result irrespective of its absolute size. A more realistic model requires that the magnitude of noise will be in the proportion to the circle’s diameter. However, the range of the test circles’ diameters in each trial was rather small. For that reason making the added noise proportional to the circle’s diameter only negligibly changed the model’s predictions. Thus, we are aware that the internal noise increases with the size of the test elements but because of pragmatic reasons do nothing since model’s predicting accuracy improves minimally. Probably for the same reason the exact form of the psychophysical function for the perceived size (Teghtsoonian, 1965) has only theoretical and very little practical importance in the context of the current experimental paradigm.

6. Study 3: Testing the associative law of addition One obvious shortcoming of many previous studies of ensemble characteristics—any abstract property of an incoming visual image which is computed from multiple individual measures—is the failure to test the fundamental properties of the operations that are required for the computation of these properties. Of course, it is extremely important to know, for instance, what is the minimal exposure time needed for mean size estimation (Whiting & Oriet, 2011) or how allocation of attention affects judgment precision (Alvarez & Oliva, 2008; Ariely, 2001; Haberman & Whitney, 2011), but these and other properties tell very little about the averaging process itself. As was stressed above, surprisingly little attention has been paid to the defining properties of the statistical averaging process, or, simply, addition. For example, it takes only two elements to average. If people, indeed, could see a statistical summary of the individual elements in the display, it is sufficient to present only two elements, and ask observers to judge the average size of these. Since two elements almost certainly do not exceed the capacity of the focused attention strategy, proponents of computing ensemble characteristics can prove their theories using, logically, the minimal number of elements for averaging, that is, two. As we have mentioned above, there are several fundamental properties of averaging that have never been critically tested. For example, every schoolchild learns that the order in which addends are summed does not change the end result (the commutative law). Similarly, the grouping of added numbers does not affect the sum (the associative law). For example, (4 + 1) + (4 + 1) is expected to give exactly the same result 10 as is the case with a slightly different grouping 4 + (4 + 2). If the observer’s task, for example, is to discriminate the mean size of two circles in comparison to a reference, then it does not matter whether we add 2 size units to the diameter of only one of them or we add one size unit to the diameters of two circles. Intuitively, it is more likely that the human observer can more easily notice an outlier which is 2 size units larger than the reference size, rather than two small increments of 1 size unit added to each of the two circles. Any theory insisting that the perceptual system is capable of computing the

33

mean size must demonstrate that these two cases will result in identical perceptual outcomes. Otherwise we need to assume that the mean size is inferred, not on the basis of arithmetic aggregation, but rather some other operation which does not obey the associative law. As far as we know, this very easily falsifiable prediction has never been tested before. 6.1. Methods In each trial, the mean size of two test circles was compared to the size of a previously seen reference circle. One circle was drawn 188 pixels to the left and the second 188 pixels to the right of the central fixation point. In each trial, the mean size of the presented stimuli was 1%, 2%, 4%, 6%, 8%, or 10% larger or smaller than the size of the reference (150 pixels). Two base-sets of circles with diameters – [132, 168] and [150, 150] – were constructed. The diameters of both circles from chosen base-set were increased (decreased) by delta (Dd) or the diameter of only one of the base circles was altered by 2Dd, before presentation. The observer’s task was to indicate whether the mean size of the two test circles was smaller or larger than the reference circle. 6.2. Results and discussion As it turned out, it was largely irrelevant whether the diameter of the both circles was modified by Dd pixels or only one of them was altered by 2Dd pixels. Fig. 5 shows the results of the mean size judgments for equal (Ref = 150) and unequal (Ref = 132/168) base test sizes. Unlike the number of modified elements (adding 2Dd to one or Dd to two elements), the similarity of sizes of the two judged circles had a substantial effect on judgment precision. All three observers were significantly more accurate in the judgment of the mean size of two approximately equal circles than two circles where one was clearly smaller and the other clearly larger than the reference size. When we simulated empirical data with the noise and selection model, the estimated internal variance was substantially smaller for the approximately equal-sized circles (Ref = 150) than for the unequal-sized circles (Ref = 132/168). To demonstrate the effect of these two factors – the number of changed elements and the disparity in sizes – we aggregated data for all three observers and plotted these in Fig. 6. It can be clearly seen that modification in the size of only one circle (filled symbols) is not very different to modification of both elements (empty symbols). What really matters is whether the sizes of judged elements were approximately equal (blue symbols) or there was a considerable size disparity between their sizes (red symbols). A normal cumulative approximation shows that the slope of the psychometric function is considerably steeper for about equal-sized circles (r = 10.96) than it is for unequal sized circles (r = 16.73). Thus, there was approximately a 6-pixel or 12-min-of-arc difference in mean size discrimination performance, which worsened when circles had unequal sizes. Strictly speaking, this may be seen as a violation of the laws of arithmetic. Any calculator is expected to find that (132 + 168)/2 is equal to (150 + 150)/2. Although there was a general tendency for test circles to appear slightly larger than the reference size (the mean of the psychometric function was shifted toward the positive values l = 5.67 and 5.06, for one and two element change respectively), these mean values were nevertheless approximately equal. Thus, the main problem is in the precision, not in the ability to compute the mean value. The perceptual system is considerably more precise in the computation of the mean size of approximately equal addends than addends which have a significant size disparity. One possible interpretation is that the precision of internal representation depends on the dispersion of the judged values. Models

34

J. Allik et al. / Vision Research 83 (2013) 25–39

μ

JA

= −2.3; σ

JA

μ

= 8.77

1

KA

0.2

Empirical

0.8 0.6

0 −15 −9

μ

JA

−3 3 9 Δd (pixels) = 4.65; σ

JA

ς = 7.9

0.4 0.2

Empirical

0 −15 −9

μ

= 5.23

1

0.4 0.2

Empirical

= 2.59; σ

0.8 0.6

−3 3 9 Δd (pixels)

15

= 9.06

0.6

K = 1.8 ς = 9.5

0.4 0.2

Empirical

KA

Theoretical

0 −15 −9

15

μ

= 4.61

MT

−3 3 9 Δd (pixels)

= −0.41; σ

MT

15

= 5.25

1 MT, Ref = 150

K = 1.9 ς = 6.1

0.4 0.2

Theoretical

0 −15 −9

0.8

KA, Ref = 150

Choice probability

JA, Ref = 150

Choice probability

KA

−3 3 9 Δd (pixels)

1

ς=5

MT

Theoretical

15

K = 1.5

= 6.68; σ

MT, Ref = 132/168

K = 1.9

Theoretical

0.6

MT

1 Choice probability

ς = 9.1

0.4

0.8

μ

= 6.58

Empirical

Choice probability

0.6

= −1.54; σ

KA, Ref = 132/168

K = 1.8

Choice probability

Choice probability

JA, Ref = 132/168

0.8

KA

1

0.8 0.6

K = 1.5 ς = 5.2

0.4 0.2

Theoretical

0 −15 −9

−3 3 9 Δd (pixels)

15

Empirical Theoretical

0 −15 −9

−3 3 9 Δd (pixels)

15

Fig. 5. The mean size discrimination curves for the three observers when the diameter of the both circles was modified by Dd pixels (upper panel, Ref = 132/168) or only one of them was altered by 2Dd pixels (lower panel, Ref = 150). Blue and red dots represent empirical and theoretical data, respectively. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

(1)+(2) μ = 5.67; σ = 10.96; %EV = 99.4 (3)+(4) μ = 5.06; σ = 16.73; %EV = 98.87

1

All subjects

Choice probability

0.8

0.6

0.4

(1) (2) (3) (4)

0.2

0 −30

−18

−6 6 Δd (pixels)

18

30

Fig. 6. Mean size discrimination probabilities and the approximation curves when the size of one (filled symbols) or two (empty symbols) circles was changed separately for initially equal (blue) and unequal (red) circles. %EV is the percentage of the explained variance. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

that were inspired by the signal detection theory postulate that the observer is also sensitive to variance of the designated stimulus property (Solomon, Morgan, & Chubb, 2011). More simple explanation is that if the judged values concentrate around one location on the internal number line (Dehaene, 2011), then each value can be represented with a higher precision than those values that land on disparate positions of the internal representation. In other words, it seems that the human observer can concentrate on only

one particular region of the internal number line and performs worse when she or he must attend to two or more different regions of the internal number line simultaneously. The most interesting result, however, was that approximately the same outcome was obtained in the two conditions in which size increments were added or decrements were subtracted either from only one circle or equally from both. Perhaps unexpectedly, these two conditions (empty versus filled symbols) were practically identical. For the observers, it made no difference whether, say, 12 pixels were added to the size of only one circle or 6 pixels to both circles to make them distinctly separate from the size of the etalon. In short, the observers were sensitive to the mean size of two circles, not to the sizes of individual elements in isolation. Had one of the two elements been presented separately, its difference from the etalon size would certainly have been noted, even though, in the presence of another test circle, perhaps because it does not carry relevant information, it remains unnoticed. The fact that the observers were sensitive to mean size, ignoring the size of individual elements, seems to suggest that they may have a limited cognitive access to the size of individual elements. One unlikely explanation is that the observer simply follows instruction and ignores the size of individual element even if it could provide an easier answer for the posed question. Unfortunately, instructions are not sufficient to guarantee that the observer’s judgment is based solely on the designated property of the presented stimuli (Nachmias, 2006). We have no knowledge of any demonstrations according to which the observer can follow exactly instructions even if a more easy solution is available. Previous studies have also shown that fairly good discrimination of the mean size can be achieved, even when observers cannot recall the size of individual elements (Ariely, 2001; Chong & Treisman, 2003; Corbett & Oriet, 2011). Only when the summary size difference reached a critical level did observers reliably discriminate the mean difference from the reference size. Awareness of the size of individual elements seems to be blocked by the presence of another element. The other side of the same coin is compulsory averaging. The presence of

35

J. Allik et al. / Vision Research 83 (2013) 25–39

another element on the display and the task to estimate the mean size do not lead to loss of information, which seems to happen as part of another perceptual phenomenon known as visual crowding (Levi, 2008; Pelli, Palomares, & Majaj, 2004). In the judgment of mean size, we are obviously dealing with compulsory averaging, where individual size information is not lost but rather combined into the perception of an ensemble characteristic, the mean size of which is computed on the basis of at least two separate elements. It is important to notice that even in this simplest case statistical representation reduces the information needed to preserve by 50% (cf., Ariely, 2001; Chong & Treisman, 2003). Compulsory averaging is not an unknown concept in the explanation of visual perception. For example, it was previously reported that, despite their inability to report the orientation of an individual patch, observers can reliably estimate average orientation, demonstrating that orientation information is pooled, even though components may not be individually identifiable (Parkes et al., 2001). Also in random-dot cinematograms, the observer is unaware of individual displacements but is nevertheless able to compute and perceive a vector sum of these individual displacements (Allik, 1992; Allik & Dzhafarov, 1984). Since mean size discrimination performance depends critically on the total size difference of the judged elements from the reference, not how this total difference is distributed among individual test elements, it is also possible to conclude that compulsory pooling of size information happens before information about size reaches awareness. Since the associative law of addition was shown to be preserved, this study provided rigorous proof for the perception of ensemble characteristics.

7. Study 4: Compulsory pooling of size information The surprising discovery that the associative law holds in the perception of the mean size of two circles obviously needs further elaboration. If the observer’s task, for example, is to discriminate the mean size of four circles in comparison to a reference, then it does not matter whether we add four size units to the diameter of only one of them or we add one size unit to the diameters of all four circles. Intuitively, it is more likely that the human observer can more easily notice an outlier which is four size units larger than the reference size, rather than four small increments of one size unit added to each of the four circles. Again, any theory insisting that the perceptual system is capable of computing the mean size must predict that the two cases result in an identical perceptual outcome. Grouping these four, or any other number of units, into different packages does not affect the mean value and, consequently, the perception of the mean size.

μ

JA

= 0.96; σ

JA

μ

= 7.37

KA

1

Similarly to the previous study, the way in which the summary increment or decrement was distributed among individual elements had only negligible effect on the discrimination of mean size. Observers discriminated displays in which the diameter of only one circle was modified by 4Dd pixels almost equally as well as displays in which all 4 base-circles were enlarged or diminished by Dd pixels. Fig. 7 demonstrates the empirical psychometric functions for the three observers, irrespective of the number of circles (1, 2, or 4) in which the diameter was modified. Again, it was easy to find a combination of K and 1 values of the N&S model, providing the best fit (red rectangles) to the empirical data. Like the previous studies, observers’ data were explained as if they were able to take into account only 2–3 elements out of the four, in addition to a considerable amount of internal noise, which ranged from about 4 to 12 pixels. Psychometric functions aggregated across three observers for the one (green), two (blue), and four (red) elements for which size was increased or decreased are shown in Fig. 8. The choice probability is plotted against the mean size difference from the reference. Since the slopes of all three psychometric functions were close to 5.9 pixels, this means that it was necessary to enlarge or diminish one of the four circles by about 23.6 pixels to make it distinguishable from the reference. If we compare this value with the respective 1 values in Study 1 (N = 1), then we see that it is minimally 2–3 times above the discrimination threshold. When the size of two of the elements was altered, a discrimination probability of 84.1% was achieved, with about Ds = 11.8 pixels, which is also the discrimination threshold of a solitary circle. This could only mean that observers were unaware of the sizes of the individual

= 1.09; σ

KA

μ

= 4.89

MT

0.4 Empirical

0.8 0.6 0.4 0.2

−18−12 −6 0 6 12 18 Δd (pixels)

Empirical

0.8

= 5.93

−18−12 −6 0 6 12 18 Δd (pixels)

K = 1.9 ς = 4.1

0.6 0.4 0.2

Theoretical

0

MT

MT

K = 3.1 ς = 7.2

Theoretical

= 2.1; σ

1

Choice probability

Choice probability

Choice probability

0.6

0

7.2. Results and discussion

KA

K = 3.2 ς = 11.6

0.2

The design of this study was similar to the case N = 4 of Study 1. Here the base-set consists of four circles with diameters equal to the size of reference circle (150 pixels). In each trial, the average size of the presented stimuli was 1%, 2%, 4%, 6%, 8%, or 10% larger or smaller than size of the reference circle. These required values for average sizes were obtained by increasing or decreasing the diameters of the base-circles. A difference delta (Dd) between 150 and the required mean size was distributed, as evenly as possible, among base-circles in three ways: the diameters of all basecircles were increased or decreased by Dd pixels, or the diameters of two base-circles were increased or decreased by 2Dd pixels, or the diameter of only one base-circle was altered by 4Dd pixels. Thus, in three different conditions, one, two, or all four elements were different from the reference size, although the summary change was always identical.

1 JA

0.8

7.1. Methods

Empirical Theoretical

0

−18−12 −6 0 6 12 18 Δd (pixels)

Fig. 7. Mean size discrimination probabilities with the best theoretical approximation functions. Blue and red symbols represent empirical and theoretical data, respectively. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

36

J. Allik et al. / Vision Research 83 (2013) 25–39

(1) μ = 1.47; σ = 5.96 (2) μ = 1.24; σ = 6.12 (4) μ = 1.23; σ = 5.54 1

All subjects

Choice probability

0.8

0.6

0.4

(1) (2) (4)

0.2

0 −15

−9

−3 3 Δd (pixels)

9

15

Fig. 8. Psychometric functions aggregated across three observers for one (green), two (blue), and four (red) elements, the size of which was increased or decreased. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

elements, and all their responses are guided only by the mean size, which is pooled together across sizes of all elements they were able to take into account.

8. General discussion 8.1. The perception of ensemble characteristics An ensemble characteristic is usually defined as any attribute that is computed from multiple individual measures combining them either across space or time (Alvarez, 2011). One of the most essential properties of ensemble characteristics is that they represent a whole set of objects so that there may not be a single individual item which embodies this ensemble characteristic (Ariely, 2001). Thus, with ensemble characteristics, it is impossible that the observer could confuse introspection with inspection, what Titchener called The Stimulus Error (Titchener, 1912), since these properties can only exist in the consciousness which is induced by a brief exposure of the stimuli. Ensemble characteristics have been presented under various names, including ‘gestalt properties,’ ‘stimulus prototypes,’ or ‘statistical summaries’ (Alvarez, 2011), and they have served as paradigmatic examples for different research traditions, including gestalt psychology and various waves of the cognitive revolution. With this historical background, it is understandable why the perception of statistical properties has attracted such an interest. Beside mean length (Ariely, 2001; Miller & Sheldon, 1969), orientation (Dakin & Watt, 1997; Miller & Sheldon, 1969; Parkes et al., 2001), spatial position (Alvarez & Oliva, 2008; Morgan, Hole, & Glennerster, 1990; Morgan et al., 2012), and luminance (Bauer, 2009), it is also claimed that the human observer is able to compute average emotion (Haberman & Whitney, 2009), or even average sex (Haberman & Whitney, 2007). What appears to make ensemble characteristics particularly attractive is a conviction that these properties are supposedly computed automatically by an effortless and parallel array of mechanisms which can operate beyond the severe capacity limitations imposed by attention and short term memory (Alvarez, 2011; Ariely, 2001; Chong & Treisman, 2003, 2005b). This may

be, however, a simplification since one of the leading researchers has maintained that he has never thought that statistical summary can be computed via powerful parallel processer that has unlimited capacity (Sang Chul Chong, personal communication, December 9, 2012). One of the main arguments in favor of effortless and massively parallel processing is judgment accuracy, which typically varies in the range of 4%  12%. But, as was shown in this and several previous studies (Myczek & Simons, 2008; Simons & Myczek, 2008), this precision does not guarantee that all displayed elements will be taken into account in the judgment of the mean size or some other perceptual attribute. The observed judgment precision can be explained, principally at least, by assuming that only 2–4 elements are inspected and their parameters are used for the inferring of the average size. One of the most serious disappointments arising from previous studies was the failure to provide solid proof that the human observer indeed computes the mean size or anything else which is a sufficiently good proxy for the statistical average. Previous research was able to demonstrate, in the best case, only that the observer’s judgments were not based on some other statistical parameters such as minimum, maximum, median, or variance (Ariely, 2001; Chong & Treisman, 2003; Fouriezos, Rubenfeld, & Capstick, 2008). However, there is some clear evidence that observers are not able to follow instructions exactly and substitute the computation of the statistical mean with some other statistical measure. For example, it was shown that the critical ratio, the forerunner to the modern t test, was the most important predictor of intuitive judgments about the sizes of groups of similar objects (Fouriezos, Rubenfeld, & Capstick, 2008). There is also some evidence about the tendency to ignore extremely small and large stimulus values in the calculation of the mean (de Gardelle & Summerfield, 2011). In most cases, the ability of statistical averaging is presumed rather than tested with logical argument or a consistent experimental procedure. Since computing a statistical summary presumes abiding by axioms of addition, it was, as we believe, quite a fresh idea to test whether the associative law holds when the observer’s task is to estimate the mean size. There was a very high risk that the associative law of addition would be falsified, since, intuitively, it is very likely that the observer would notice the size of an outlier rather than average size of the whole set of displayed elements. Two studies (Studies 3 and 4) provided, however, a very definite answer: observers’ judgments were indeed based on the mean size, not the sizes of any individual element in isolation. This may be seen as a paradox, since having enough time to inspect static images, like the one shown in Fig. 1, most people would immediately notice that one circle is conspicuously smaller or larger than the rest. Nevertheless, with a relatively short exposure time and the task to estimate the mean size, the observer seems to be almost completely unaware of the size of the individual elements. In spite of this ignorance, the sizes of the individual elements are pooled together to give an impression of the mean size in an apparently compulsory manner. There is evidence that compulsory pooling operates, for example, in the perception of orientation (Parkes et al., 2001), selecting location for saccade landing (Van der Stigchel, Heeman, & Nijboer, 2012), and motion direction in random-dot patterns (Allik, 1992; Allik & Dzhafarov, 1984), but we are unaware of direct proof of compulsory pooling in the perception of mean size. Since mean size discrimination performance depends critically on total size difference, not how this total difference is distributed among individual elements, it is also possible to conclude that compulsory pooling of size information happens before information about size reaches awareness. In sum, this was the first rigorous proof that the perceptual system indeed can compute a statistical summary or any other statistic which satisfies the associative law of addition.

J. Allik et al. / Vision Research 83 (2013) 25–39

An ability to compute and perceive ensemble characteristics is usually perceived as a remedy to an established bottleneck in focused attention and working memory, both of which are able to handle a few objects only. Once the precision argument is placed into doubt, there are very few arguments left in favor of the global parallel process of extraction of a statistical summary of the average size of the objects in the display. One of these arguments is the replicable observation that the precision of mean size judgment is nearly independent of the number of judged elements (Alvarez, 2011; Ariely, 2001, 2008; Chong & Treisman, 2003, 2005a, 2005b; Chong et al., 2008). Apparently by a misleading analogy with reaction-time data in visual search tasks, it was concluded that a flat response function speaks in favor of a practically unlimited perceptual capacity (Alvarez, 2011; Ariely, 2001, 2008; Chong & Treisman, 2005b; Chong et al., 2008). Since statistical aggregation predicts improvement of mean size judgments with the number of judged elements, an almost flat response curve is a firm sign of a decrease in precision. Results of this and previous studies (Myczek & Simons, 2008; Simons & Myczek, 2008) cannot sustain the optimism that the ability to compute statistical averages is an evolutionary adaptation for surpassing the capacity limitations of focused attention and working memory. In this study, the compulsory pooling of information was fully operational when the number of processed elements did not exceed the capacity of information processing. Two or even four elements are clearly within the limits of both focused attention and short-term memory (Cowan, 2001; Miller, 1956). This seems to suggest that researchers who have advocated for the perception of ensemble characteristics were partly wrong in assuming, tacitly at least, that the existence of this ability automatically implies an effortless and massively parallel processing capability which can surpass a focused attention bottleneck (Alvarez, 2011; Ariely, 2001). Among the general public and some researchers, it has become fashionable to argue that intuitive and automatic methods of cognition, the perception of ensemble characteristics being one of them, are superior to more deliberate and analytic methods (Chabris & Simons, 2010; Kahneman, 2011). However, just as inconsistent are those researchers who may have thought that the focused attention strategy is incompatible with the perception of ensemble characteristics (Myczek & Simons, 2008; Simons & Myczek, 2008). Perhaps one of the most remarkable conclusions of this study is the observation that focused attention and the perception of ensemble characteristics do not exclude each other. As we were able to demonstrate, the observer can concentrate on a small number of elements but nevertheless perceive them in a holistic manner by extracting attributes which belong to a group of elements, not to any one of them in isolation.

8.2. What is new in the N&S theory? All components of the proposed N&S theory are well-known and widely used. For example, Myczek and Simons (2008) demonstrated that most existing data on the mean size perception can be explained by sampling a subset of elements each of which is represented accurately and free of any noise. Most other explanations use the idea of noisy internal representation but make a tacit assumption that all available information is used in the judgments of the mean size (Alvarez, 2011; Ariely, 2001; Chong & Treisman, 2003; Fouriezos, Rubenfeld, & Capstick, 2008). Only in a very few studies the both principles – selection and noise – were applied simultaneously to integration of local visual information into summary statistics (Dakin, 2001; Im & Halberda, 2012; Solomon, Morgan, & Chubb, 2011). These studies confirmed that not only the noise but also under sampling that limits the mean size or orientation judgment of multiple objects.

37

Evidently, the perception of the mean size cannot be a very demanding ability since a quite primitive neural network is able to compute the mean size of a collection of geometric figures (Šetic, Svegar, & Domijan, 2007). These authors demonstrated that a twodimensional neural network with three layers can compute something similar to the mean size. However, there is no hint of how this network will behave if, for example, the number of judged elements increases or size increments are equally distributed between all elements or allocated to only few of them. These models that are more specific about relevant aspects of the mean size or orientation perception are usually inspired by techniques used in engineering and exploit terminology of the signal detection theory (Im & Halberda, 2012; Solomon, 2010; Solomon, Morgan, & Chubb, 2011). For example, the equivalent-noise approach is a way to measure the amount of internal noise by adding external noise to a stimulus and determine the point at which subject’s performance begins to deteriorate (Dakin, 2001; Dakin & Watt, 1997; Im & Halberda, 2012; Solomon, Morgan, & Chubb, 2011). Without any doubt, the closest model to the N&S was proposed by Solomon, Morgan, and Chubb (2011) which was specifically tailored for the equivalent-noise approach. Similarity between these two models is not surprising because so far none of the proposed explanations have managed to escape fundamental Thurstonian postulate that visual attributes cannot be recorded without making measurement errors and which distributions are sufficiently close to Gaussian function. The both models also use the idea of statistical efficiency meaning that from the total available sample of elements only a fraction is used for judgment. In details, however, there are several relevant differences between the N&S model and the explanation proposed by the equivalent-noise approach (Im & Halberda, 2012; Solomon, Morgan, & Chubb, 2011). These models are applicable for situations where external noise is added and systematically varied. The application of this model is substantially limited if there is no external noise (e.g. there is only one test or several tests with equal size). For instance, in the absence of the external noise it is impossible to separate early and late noise in the Solomon’s model since in the formula they are represented as two almost identical addends into which a sum can be arbitrarily split. This model is also based on an unproven assumption that neighboring elements are pooled together in an effectively noise-free way which is a retraction from the basic Thurstonian postulate. However, the most distinctive feature of this and other models that have been inspired by the signal detection theory is that they are formulated in terms of macroscopic variables (mean and variances of distributions from which circle diameters are supposedly drawn). In this respect they resemble the purely phenomenological theory of gases which was devised by Gay-Lussac and his contemporaries to describe the relationships between macroscopically observed variables (e.g., pressure and volume are inversely proportional). This theory was soon replaced by the kinetic theory of gases which describes a gas by the random motion of a large number of small particles (atoms or molecules) which collide with each other and the walls of the container. These collisions explain the macroscopic properties of gases, such as pressure, temperature, and volume. In variance from previous explanations, the proposed N&S model describes every single stimulus element and postulates simple rules how measures of these elements are transformed into the judgement. No commitments whatsoever were made on the distribution from which circle diameters were drawn. In the result only two variables were needed to describe all data of how the mean size is judged by the human observer. These two variables representing two very simple principles can be easily implemented into a mathematical formula, or just a few lines of programming language which simulates the observer’s judgments. In addition to transparency, these two familiar principles have very clear

38

J. Allik et al. / Vision Research 83 (2013) 25–39

psychological content. The first of these two principles, forming the basis for Thurstonian psychophysical analyses, states that there is always an irreducible amount of internal noise, due to which exactly the same physical size is transformed into a set of psychological states corresponding to slightly different perceived sizes. The second principle states that the capacity to process relevant information is limited to a certain number of elements which can be taken into account in the judgment. An obvious consequence of this second principle is that, if the number of elements exceeds the capacity limitation, then there must be a considerable number of elements for which existence is simply ignored in stimulus processing. Of course, the idea that the observer’s ability to receive, process, and remember is severely limited in terms of the amount of information was one of the most basic discoveries in psychology long before George Miller’s published his magical 7 ± 2 (cf., Jevons, 1871). What is perhaps less common in our proposal is that these two principles – internal noise and capacity limitation – need to be applied simultaneously (see also Im & Halberda, 2012; Solomon, Morgan, & Chubb, 2011). A sufficiently general theory of mean size perception cannot be built on only one of these two basic principles. Unfortunately, neither internal noise nor capacity limitation can be treated as some sort of psychological constant. Unlike physical constants like elementary electric charge, the minimal internal noise is not identical for all observers and can vary dependent on stimulus value. If the internal noise and maximal number of processed elements are not constants, there should nevertheless be some basic rules that apply to them. For example, even this study added additional evidence in support of the well-established rule that internal noise is proportional to the estimated size, irrespective of whether this is of a single object or the mean size of multiple objects. Similarly, we could expect that analogous rules could relate internal noise and capacity limitation either to some other stimulus properties or the observer’s psychological state (e.g., fatigue). It was certainly a significant observation that internal noise depended on the distribution of values along internal number scales. Internal noise was smaller when values were grouped around the same mean value, rather than when there was considerable disparity between them (Study 3). Although there is not enough food for substantial speculation, it looks as it is indeed costly to divide attention between two disparate regions of the internal number line. Another way to express the same idea is to say that the observer is not only sensitive to individual sizes but also to higher order moments (e.g. variance) of distribution of sizes (cf. Solomon, Morgan, & Chubb, 2011). However, the number of such kind of established rules is very limited, to say the least. Since we know very little indeed about the factors which determine or at least restrict internal noise and/or capacity limits, we cannot pretend that the proposed theory is genuinely universal. One of the largest theoretical challenges is a coupling of these two principles, internal noise and capacity limitation. Although, in theory, they are clearly distinct concepts, the effects they exert on the slope of the psychometric discrimination function are very similar. For instance, it is very easy to see why the slope of the psychometric discrimination function becomes more flat with an increase in internal noise. Similarly, it is understandable why discrimination functions become more flat when the observer starts to omit elements which are available but, due to a capacity limitation, cannot be used for judgment. Both of these two factors lead to a loss in the amount of visual information used, which is reflected in the flattening of the psychometric discrimination function. Mathematically at least, these two causes produce changes in the slope of the psychometric function which are formally inseparable (Raidvee et al., 2011). This leads us to some kind uncertainty principle: although it is indisputable that two factors – internal noise and capacity limitation – affect mean size

judgment simultaneously, there seems to be a limitation on the precision with which values of these two factors – K and 1 – can be simultaneously determined. 9. Conclusions It is surprising that observers ignore, in some conditions at least, the size of individual elements and their judgments correspond only to the mean size of a subsample of objects. This is the strongest proof so far that the observer computes one of the ensemble characteristics – the mean size – sufficiently close to the statistical summary. By testing the associative law of summation, this study provided rigorous proof for the perception of the mean size, which has, thus far, been more often presumed than fastidiously demonstrated. There was no need to propose a new mechanism for average size perception. Two familiar psychophysical principles – the size of an element cannot be measured with absolute accuracy and only a limited number of elements can be taken into account in the computation of the average size – were enough to formulate a formal model, the N&S model, which accurately predicts all mean size empirical discrimination functions. Acknowledgments This research was supported by grants from the Estonian Ministry of Science and Education (SF0180029s08 and IUT02-13) and the Estonian Science Foundation (#ETF8231). We are grateful to Delaney Skerrett for the comments and Sang Chul Chong for very helpful suggestions. References Allik, J. (1992). Competing motion paths in sequence of random dot patterns. Vision Research, 32(1), 157–165. Allik, J., & Dzhafarov, E. N. (1984). Motion direction identification in random cinematograms: A general-model. Journal of Experimental Psychology—Human Perception and Performance, 10(3), 378–393. http://dx.doi.org/10.1037//00961523.10.3.378. Alvarez, G. A. (2011). Representing multiple objects as an ensemble enhances visual cognition. Trends in Cognitive Sciences, 15(3), 122–131. http://dx.doi.org/ 10.1016/j.tics.2011.01.003. Alvarez, G. A., & Oliva, A. (2008). The representation of simple ensemble visual features outside the focus of attention. Psychological Science, 19(4), 392–398. http://dx.doi.org/10.1111/j.1467-9280.2008.02098.x. Alvarez, G. A., & Oliva, A. (2009). Spatial ensemble statistics are efficient codes that can be represented with reduced attention. Proceedings of the National Academy of Sciences of the United States of America, 106(18), 7345–7350. http://dx.doi.org/ 10.1073/pnas.0808981106. Anderson, N. H. (1967). Application of weighted average model to a psychophysical averaging task. Psychonomic Science, 8, 227–228. Ariely, D. (2001). Seeing sets: Representation by statistical properties. Psychological Science, 12(2), 157–162. http://dx.doi.org/10.1111/1467-9280.00327. Ariely, D. (2008). Better than average? When can we say that subsampling of items is better than statistical summary representations? Perception & Psychophysics, 70(7), 1325–1326. http://dx.doi.org/10.3758/pp.70.7.1325. Bauer, B. (2009). Does Stevens’s power law for brightness extend to perceptual brightness averaging? Psychological Record, 59(2), 171–185. Burbeck, C. A., & Hadden, S. (1993). Scaled position integration areas: Accounting for Weber law for separation. Journal of the Optical Society of America A—Optics Image Science and Vision, 10(1), 5–15. http://dx.doi.org/10.1364/ josaa.10.000005. Burgess, A. E., Wagner, R. F., Jennings, R. J., & Barlow, H. B. (1981). Efficiency of human visual signal discrimination. Science, 214(4516), 93–94. Chabris, C. F., & Simons, D. J. (2010). The invisible gorilla: And other ways our intuitions deceive us. New York: Crown. Chong, S. C., Joo, S. J., Emmanouil, T. A., & Treisman, A. (2008). Statistical processing: Not so implausible after all. Perception & Psychophysics, 70(7), 1327–1334. http://dx.doi.org/10.3758/pp.70.7.1327. Chong, S. C., & Treisman, A. (2003). Representation of statistical properties. Vision Research, 43(4), 393–404. http://dx.doi.org/10.1016/S0042-6989(02)00596-5. Chong, S. C., & Treisman, A. (2005a). Attentional spread in the statistical processing of visual displays. Perception & Psychophysics, 67(1), 1–13. http://dx.doi.org/ 10.3758/BF03195009. Chong, S. C., & Treisman, A. (2005b). Statistical processing: Computing the average size in perceptual groups. Vision Research, 45(7), 891–900. http://dx.doi.org/ 10.1016/j.visres.2004.10.004.

J. Allik et al. / Vision Research 83 (2013) 25–39 Corbett, J. E., & Oriet, C. (2011). The whole is indeed more than the sum of its parts: Perceptual averaging in the absence of individual item representation. Acta Psychologica, 138(2), 289–301. http://dx.doi.org/10.1016/j.actpsy.2011.08.002. Corbett, J. E., Wurnitsch, N., Schwartz, A., & Whitney, D. (2012). An aftereffect of adaptation to mean size. Visual Cognition, 20(2), 211–231. http://dx.doi.org/ 10.1080/13506285.2012.657261. Cowan, N. (2001). The magical number 4 in short-term memory: A reconsideration of mental storage capacity. Behavioral and Brain Sciences, 24(1), 87–114. http:// dx.doi.org/10.1017/S0140525X01003922. Dakin, S. C. (2001). Information limit on the spatial integration of local orientation signals. Journal of the Optical Society of America A—Optics Image Science and Vision, 18(5), 1016–1026. Dakin, S. C., & Watt, R. J. (1997). The computation of orientation statistics from visual texture. Vision Research, 37(22), 3181–3192. De Fockert, J. W., & Marchant, A. P. (2008). Attention modulates set representation by statistical properties. Perception & Psychophysics, 70(5), 789–794. http:// dx.doi.org/10.3758/pp.70.5.789. de Gardelle, V., & Summerfield, C. (2011). Robust averaging during perceptual judgment. Proceedings of the National Academy of Sciences of the United States of America, 108(32), 13341–13346. http://dx.doi.org/10.1073/pnas.1104517108. Dehaene, S. (2011). The number sense: How the mind creates mathematics (revised and updated ed.). New York: Oxford University Press. Dehaene, S., Dehaene-Lambertz, G., & Cohen, L. (1998). Abstract representations of numbers in the animal and human brain. Trends in Neurosciences, 21(8), 355–361. Dzhafarov, E. N., & Colonius, H. (2011). The Fechnerian idea. American Journal of Psychology, 124, 127–140. Fouriezos, G., Rubenfeld, S., & Capstick, G. (2008). Visual statistical decisions. Perception & Psychophysics, 70(3), 456–464. http://dx.doi.org/10.3738/ pp.70.3.456. Galton, F. (1907). One vote, one value. Nature, 75, 414. Haberman, J., & Whitney, D. (2007). Rapid extraction of mean emotion and gender from sets of faces. Current Biology, 17(17), R751–R753. Haberman, J., & Whitney, D. (2009). Seeing the mean: Ensemble coding for sets of faces. Journal of Experimental Psychology—Human Perception and Performance, 35(3), 718–734. http://dx.doi.org/10.1037/a0013899. Haberman, J., & Whitney, D. (2011). Efficient summary statistical representation when change localization fails. Psychonomic Bulletin & Review, 18(5), 855–859. http://dx.doi.org/10.3758/s13423-011-0125-6. Im, H. Y., & Halberda, J. (2012). The effects of sampling and internal noise on the representation of ensemble average size. Attention, Perception, & Psychophysics, 1–9. http://dx.doi.org/10.3758/s13414-012-0399-4. Jacoby, O., Kamke, M. R., & Mattingley, J. B. (2012). Is the whole really more than the sum of its parts? Estimates of average size and orientation are susceptible to object substitution masking. Journal of Experimental Psychology: Human Perception and Performance. http://dx.doi.org/10.1037/a0028762. Jevons, W. S. (1871). The power of numerical discrimination. Nature, 3, 281–282. Joo, S. J., Shin, K., Chong, S. C., & Blake, R. (2009). On the nature of the stimulus information necessary for estimating mean size of visual arrays. Journal of Vision, 9(9). 710.1167/9.9.7. Kahneman, D. (2011). Thinking fast and slow. New York: Farrar, Straus and Giroux. Le Bon, G. (1896). The crowd: A study of the popular mind. London: T.F. Unwin. Levi, D. M. (2008). Crowding – An essential bottleneck for object recognition: A mini-review. Vision Research, 48(5), 635–654. http://dx.doi.org/10.1016/ j.visres.2007.12.009. Miller, G. A. (1956). The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review, 63, 81–97. Miller, A. L., & Sheldon, R. (1969). Magnitude estimation of average length and average inclination. Journal of Experimental Psychology, 81, 16–21. Morgan, M. J., Hole, G. J., & Glennerster, A. (1990). Biases and sensitivities in geometrical illusions. Vision Research, 30(11), 1793–1810. http://dx.doi.org/ 10.1016/0042-6989(90)90160-m. Morgan, M. J., Mareschal, I., Chubb, C., & Solomon, J. A. (2012). Perceived pattern regularity computed as a summary statistic: Implications for camouflage. Proceedings of the Royal Society B—Biological Sciences, 279(1739), 2754–2760. http://dx.doi.org/10.1098/rspb.2011.2645.

39

Myczek, K., & Simons, D. J. (2008). Better than average: Alternatives to statistical summary representations for rapid judgments of average size. Perception & Psychophysics, 70(5), 772–788. http://dx.doi.org/10.3758/pp.70.5.772. Nachmias, J. (2006). The role of virtual standards in visual discrimination. Vision Research, 46(15), 2456–2464. http://dx.doi.org/10.1016/j.visres.2006.01.029. Parkes, L., Lund, J., Angelucci, A., Solomon, J. A., & Morgan, M. (2001). Compulsory averaging of crowded orientation signals in human vision. Nature Neuroscience, 4(7), 739–744. http://dx.doi.org/10.1038/89532. Pelli, D. G., Palomares, M., & Majaj, N. J. (2004). Crowding is unlike ordinary masking: Distinguishing feature integration from detection. Journal of Vision, 4(12), 1136–1169. http://dx.doi.org/10.1167/4.12.12. Peterson, C. R., & Beach, L. R. (1967). Man as an intuitive statistician. Psychological Bulletin, 68(1), 29–46. Raidvee, A., Averin, K., & Allik, J. (2012). Visibility versus accountability in pooling local motion signals into global motion direction. Attention, Perception and Psychophysics, 74, 1252–1259. http://dx.doi.org/10.3758/s13414-012-0314-z. Raidvee, A., Averin, K., Kreegipuu, K., & Allik, J. (2011). Pooling elementary motion signals into perception of global motion direction. Vision Research, 51(17), 1949–1957. Robitaille, N., & Harris, I. M. (2011). When more is less: Extraction of summary statistics benefits from larger sets. Journal of Vision, 11(12)(18), 1–8. Rose, A. (1948). The sensitivity performance of the human eye on an absolute scale. Journal of the Optical Society of America, 38, 196–208. Šetic, M., Svegar, D., & Domijan, D. (2007). Modelling the statistical processing of visual information. Neurocomputing, 70(10–12), 1808–1812. http://dx.doi.org/ 10.1016/j.neucom.2006.10.069. Simons, D. J. (2000). Attentional capture and inattentional blindness. Trends in Cognitive Sciences, 4(4), 147–155. Simons, D. J., & Chabris, C. F. (1999). Gorillas in our midst: Sustained inattentional blindness for dynamic events. Perception, 28(9), 1059–1074. http://dx.doi.org/ 10.1068/p2952. Simons, D. J., & Myczek, K. (2008). Average size perception and the allure of a new mechanism. Perception & Psychophysics, 70(7), 1335–1336. http://dx.doi.org/ 10.3758/pp. 70.7.1335. Simons, D. J., & Rensink, R. A. (2005). Change blindness: Past, present, and future. Trends in Cognitive Sciences, 9(1), 16–20. http://dx.doi.org/10.1016/ j.tics.2004.11.006. Smith, A. R., & Price, P. C. (2010). Sample size bias in the estimation of means. Psychonomic Bulletin & Review, 17(4), 499–503. http://dx.doi.org/10.3758/ pbr.17.4.499. Solomon, J. A. (2010). Visual discrimination of orientation statistics in crowded and uncrowded arrays. Journal of Vision, 10(14). 1910.1167/10.14.19. Solomon, J. A., Morgan, M., & Chubb, C. (2011). Efficiencies for the statistics of size discrimination. Journal of Vision, 11(12), 13, 11–11. http://dx.doi.org/10.1167/ 11.12.13. Spencer, J. (1961). Estimating averages. Ergonomics, 4, 317–328. Spencer, J. (1963). A further study of estimating averages. Ergonomics, 6, 255–265. Surowiecki, J. (2004). The wisdom of crowds: Why the many are smarter than the few and how collective wisdom shapes business, economies, societies, and nations. New York: Doubleday. Teghtsoonian, M. (1965). The judgment of size. The American Journal of Psychology, 78(3), 392–402. http://dx.doi.org/10.2307/1420573. Thurstone, L. L. (1927a). A law of comparative judgments. Psychological Review, 34, 273–286. Thurstone, L. L. (1927b). Psychophysical analysis. American Journal of Psychology, 38, 368–389. Titchener, E. B. (1912). The schema of introspection. American Journal of Psychology, 23, 485–508. Van der Stigchel, S., Heeman, J., & Nijboer, T. C. W. (2012). Averaging is not everything: The saccade global effect weakens with increasing stimulus size. Vision Research, 62, 108–115. http://dx.doi.org/10.1016/j.visres.2012.04.003. Whiting, B. F., & Oriet, C. (2011). Rapid averaging? Not so fast! Psychonomic Bulletin & Review, 18(3), 484–489. http://dx.doi.org/10.3758/s13423-011-0071-3. Wolfe, H. K. (1923). On the estimation of the middle of lines. American Journal of Psychology, 34, 313–358.