Acta Psychologica 142 (2013) 370–382
Contents lists available at SciVerse ScienceDirect
Acta Psychologica journal homepage: www.elsevier.com/ locate/actpsy
Sample representativeness affects whether judgments are influenced by base rate or sample size Natalie A. Obrecht a,⁎, Dana L. Chesney b a b
Department of Psychology, William Paterson University, 300 Pompton Road, Wayne, NJ 07470, USA Department of Psychology, University of Notre Dame, 118 Haggar Hall, Notre Dame, IN 46556, USA
a r t i c l e
i n f o
Article history: Received 2 August 2012 Received in revised form 14 January 2013 Accepted 18 January 2013 Available online 17 February 2013 PsycINFO classification: 2240 Statistics & Mathematics 2300 Human Experimental Psychology 2340 Cognitive Processes
a b s t r a c t We investigated how people use base rates and sample size information when combining data to make overall probability judgments. Participants considered two samples from an animal population in order to estimate the probability of that animal being aggressive. Participants' judgments were influenced by subpopulation base rates when they were provided and linked to specific samples. When samples were not identified as coming from different subpopulations, judgments typically reflected sample size information. We conclude that 1) People can use base rates when combining samples to make an inference; 2) People can correctly use sampling information to determine when to use base rates, and 3) People are able to consider base rate and sample size information at the same time. Additionally, we found that individuals' numeracy correlates with the extent to which base rate and sample size information is used. © 2013 Elsevier B.V. All rights reserved.
Keywords: Base rates Judgment Reasoning Inductive reasoning Mathematical cognition Individual differences
1. Introduction 1.1. Information integration When making judgments, people today have access to information from many different sources. If, for example, someone is interested in determining how likely a new restaurant is to have good food, she can visit various websites to see customer reviews. One complication of this strategy is that different sources may provide information about different subgroups within a population. For example, one website might show that only 20% of 520 people like a restaurant, while on another site 80% of 65 people report liking that restaurant. As larger samples typically provide more reliable information (Bernoulli, 1713/ 2005) it is statistically normative to weight percentage means by their sample sizes when combining data, in this case giving the 20% statistic more weight than the 80%. However, such a practice may not necessarily yield the best estimate of the general population (e.g. the chance that an individual will like a restaurant) when samples might represent
⁎ Corresponding author at: William Paterson University, Department of Psychology, 300 Pompton Road, Wayne, NJ 07470, USA. Tel.: +1 973 720 2683; fax: +1 973 720 3392. E-mail addresses:
[email protected] (N.A. Obrecht),
[email protected] (D.L. Chesney). 0001-6918/$ – see front matter © 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.actpsy.2013.01.012
different subgroups (see Chesney & Obrecht, 2011, 2012). The ideal of weighting data by sample size assumes that samples have been randomly drawn from the same population. When this assumption holds true, then indeed, larger samples provide more reliable estimates and should be given more weight. If instead samples have been drawn from different subpopulations (e.g. men vs. women), a better estimate of the general population mean will be obtained by weighting sample means in proportion to their subpopulations' prevalence in that general population, i.e., by those subpopulations' base rates. Here we argue that when samples represent different subpopulations, it is normative to ignore sample size and instead weight data according to their base rates. In the example above, it would be unlikely that random samples from a population would yield diverse means of 20% and 80%, especially given how large each sample is. One might suspect that the two websites cater to different groups of people within the general population. If the base rates of the groups are unknown, it would be reasonable to average the two percentages and estimate a 50% chance that a person would like the restaurant. In contrast, if one knows that the first website with the poorer rating caters to the 5% of Americans who are accustomed to fine dining, while the second website represents the 95% of Americans who are not, one should weight the percentage data accordingly, ignoring sample size. This would result in an estimated 77% chance ([20% ∗ 5%] + [80%∗ 95%]) that an American would like the restaurant. If the data were instead weighted by sample
N.A. Obrecht, D.L. Chesney / Acta Psychologica 142 (2013) 370–382
size, the opinions of fine diners would be overweighted relative to their numbers in the general population, yielding a very different estimate of 27% (20% [520/585] + 80% [65/585]). Indeed, if a person is a typical restaurant customer, rather than a fine diner, she could ignore the sample data from the fine diners altogether. However, in the current study we focus on the general case in which a judgment is made about an individual in a population whose subgroup membership is unknown.
1.2. Base rate use Previous results are mixed regarding whether laypeople integrate base rates into their inferences. Base rate neglect has been demonstrated in the Bayesian reasoning literature in which people answer conditional probability problems (e.g. Bar-Hillel, 1980; Kahneman & Tversky, 1972). For example (Gigerenzer & Hoffrage, 1995): “The probability of breast cancer is 1% for women at age forty who participate in routine screening. If a woman has breast cancer, the probability is 80% that she will get a positive mammography. If a woman does not have breast cancer, the probability is 9.6% that she will also get a positive mammography. A woman in this age group had a positive mammography in a routine screening. What is the probability that she actually has breast cancer?___%” A correct response would integrate the hit rate (80%) with the base rates (1% have cancer; 99% don't have cancer) and false alarm rate (9.6%): (cancer base rate ∗ hit rate)/ (cancer base rate∗ hit rate + no-cancer base rate∗false alarm rate)=7.8%. However, in such problems many participants focus their responses on the hit rate, and estimate, for example, an 80% chance that the woman has cancer (Bar-Hillel, 1980; Eddy, 1982; Gigerenzer & Hoffrage, 1995). Bayesian reasoning is improved, however, when data are presented as unstandardized natural frequencies that express probability in terms of subsets within a greater super set (Brase, 2008; Gigerenzer & Hoffrage, 1995; Obrecht, Anderson, Schulkin, & Chapman, 2012). For example, analogous to the problem above: “10 out of every 1,000 women at age forty who participate in routine screening have breast cancer. 8 of every 10 women with breast cancer will get a positive mammography. 95 out of every 990 women without breast cancer will also get a positive mammography. Here is a new representative sample of women at age forty who got a positive mammography in routine. How many of these women do you expect to actually have breast cancer?” (Gigerenzer & Hoffrage, 1995). Such natural frequency formats simplify the conditional probability computation and make set-subset relationships clear which results in greater use of base rates (Evans, Handley, Perham, Over, & Thompson, 2000; Fiedler, Brinkmann, Betsch, & Wild, 2000; Girotto & Gonzalez, 2001; Macchi, 2000; Neace, Michaud, Bolling, Deer, & Zecevic, 2008; Yamagishi, 2003). In another variation of base rate studies, participants are given base rate information about a group made up of two kinds of individuals (e.g. there are 70 engineers and 30 lawyers) and also a personality description of an individual that they are told was randomly drawn from the group. As an example, a personality description might state “Tom W. is of high intelligence, is quite self-confident, and tends to be argumentative…”. The participants' task is to judge the profession of the individual (e.g. engineer or lawyer). The personality description is designed to be stereotypical of one of the subgroups. Here Tom's description may sound typical of a lawyer, but base rate data suggests that lawyer are less common in the group (out of 100 men, 70 are engineers and 30 are lawyers). The classic outcome is that personality descriptions trump base rates; most people say that the individual belongs to the group that matches the
371
personality, even when that group has the lower base rate (Tversky & Kahneman, 1974). Again, although this study casts doubt, other findings are more optimistic about humans' abilities to use base rates when making judgments. The order in which base rates are presented, relative to individual personality descriptions, like Tom's above, has been shown to influence judgments (Krosnick, Li, & Lehman, 1990). When base rates are presented after, rather than before, individual descriptions, judgments more closely reflect the base rate information (Krosnick et al., 1990; Obrecht, Chapman, & Gelman, 2009). Also, base rate use increases when sampling procedures are shown to be random (Gigerenzer, Hell, & Blank, 1988), and under a variety of other conditions (e.g. Bar-Hillel & Fischhoff, 1981; Schwarz, Strack, Hilton, & Naderer, 1991). Further, relating causal mechanisms to base rates in Bayesian problems increases base rate use (Tversky & Kahneman, 1980; also see Krynski & Tenenbaum, 2007). A recent study by Pennycook and Thompson (2012) further tempers the conclusions which can be drawn from Tversky and Kahneman's (1974) seminal base rate neglect finding that stereotypical personality descriptions trump base rates when judging group membership. Pennycook and Thompson found that when participants were given just personality descriptions, without any base rate information, and were asked to judge the chances that a person belonged to a group (e.g. lawyer), responses were quite variable, typically falling between 50 and 100%. However, when base rates were also given, and were congruent with personality data (i.e. when personality descriptions favored the large base rate group) participants gave highly consistent probability estimates regarding the chances of a person belonging to the larger base rate group (i.e. most responses were between 90 and 100%). This decrease in the variability of responses indicates that when data are consistent with one another, people can integrate both base rates and personality descriptions together to make judgments. Moreover, as these patterns were seen even when participants were under time pressure, this suggests that “reasoning with base rates is…relatively effortless” (Pennycook & Thompson, 2012). However, the response pattern seen when personality and base rate information were consistent sharply contrasts to the bimodal responses that were seen when base rates and personality descriptions conflicted; here some responses appeared to reflect the base rate data, and others the personality information (Pennycook & Thompson, 2012). Overall, the literature on both the Bayesian and personality problems demonstrates that people use base rates under some conditions, such as when problems are simplified (e.g. Gigerenzer & Hoffrage, 1995), although this factor is sometimes underweighted relative to normative standards (see Kahneman & Tversky, 1996). 1.3. Sample size Another factor that is of interest to our current study is sample size. As with base rates, the literature is somewhat mixed in regard to people's ability to use sample size when making judgments. When people use a sample to make an inference to a population, they are generally more confident in their judgments when the provided sample is comprised of a larger, rather than smaller, number of items (e.g. Irwin, Smith, & Mayfield, 1956; Jacobs & Narloch, 2001; Nisbett, Krantz, Jepson, & Kunda, 1983). For example, when asked to make an inference about the average value of a deck of cards, people are more confident in their judgment after viewing a sample of 20, rather than 10, cards (Irwin et al., 1956). This appreciation for larger samples is also seen in tasks where people compare data from two populations in order to judge which has the higher mean value (Fouriezos, Rubenfeld, & Capstick, 2008; Masnick & Morris, 2008; Obrecht, Chapman, & Gelman, 2007; Obrecht, Chapman, & Suárez, 2010). In contrast, some studies have shown that the weight given to sample size is overly small or inconsistent (Bar-Hillel, 1979; Obrecht et al., 2007), and other research has shown sample size to be almost entirely neglected or
372
N.A. Obrecht, D.L. Chesney / Acta Psychologica 142 (2013) 370–382
used incorrectly (e.g. Kahneman & Tversky, 1972; but see Evans & Dusior, 1977 and Sedlmeier, 1998, for evidence that this misuse of sample size results from task difficulty). 1.4. Base rate or sample size? When people are asked to make an inference based on data from multiple samples, Obrecht et al. (2009) demonstrated that, they tend to average means in accordance with an unweighted averaging model, rather than weighting samples by their respective sample sizes. In Section 1.1's opening example, this would result in an estimated 50% chance of an individual liking the restaurant, which is the simple average of 80% and 20%, rather than the weighted average of 27% that incorporates sample size. Although this practice at first seems to reflect a non-normative use of sample size, as discussed in Section 1.1, averaging across samples actually makes sense when samples are thought to represent different subgroups within a population whose base rates are unknown (Chesney & Obrecht, 2011, 2012). Following this line of investigation, Chesney and Obrecht (2012) found that people are less likely to weight data by sample size when samples are implied to have come from different subgroups within a population. The authors manipulated the likelihood of samples representing the same population both verbally and numerically. In the verbal manipulation, multiple samples of an animal kind (e.g. leopards) were stated to have been drawn either from the same location, implying that they likely represent the same population, or from different countries, indicating that they might represent different subtypes within the animal population (e.g. different kinds of leopards). Participants were to combine the data from the different samples to make an overall judgment about the animal (e.g. the percent chance of a leopard having round markings). Judgments were more consistent with an unweighted (i.e., unknown base rate) averaging model that ignored sample size when samples came from different locations. Similarly, when numerical information implied samples were statistically less likely to have been randomly drawn from the same population (i.e. when sample means were very different from one another) participants again tended to make inferences that were more consistent with an unweighted average. These results are consistent with other data suggesting that people are sensitive to sampling processes (Gigerenzer et al., 1988; Kushnir, Xu, & Wellman, 2010) and that sample representativeness affects inferences (Rhodes, Brickman, & Gelman, 2008). There is evidence to suggest that a statistical mechanism that allows individuals to extract and make use of base rate information is present in humans from early childhood. Eight-month-old infants show surprise when an experimenter who is ostensibly “not looking” consistently draws a low base rate item from heterogeneous set (Xu & Denison, 2009). However, the infants were not surprised by this feat when the experimenter was able to visually target specific items. The children appear aware that sample rate should match population rate in absence of a causal mechanism justifying a different result. Indeed, Krynski and Tenenbaum (2007) argue for a normative standard of inference that accounts not only for traditional statistical data, but also for causal models: they show that people use base rates when they fit into a causal structure. It may be that the presence of such a causal structure allows people to utilize the same basic, base rate sensitive, statistical mechanism which guided the infants' performance in Xu and Denison's (2009) study. In a similar vein, here we contend that base rate and sample size data should not be used uniformly in all situations, but should instead be integrated into judgments when they are meaningful, as determined by the representativeness of the sample data. Although people temper their use of sample size according to whether data appear to be randomly selected, it is unknown whether individuals will selectively use base rates to weight and combine sample data drawn from different subgroups in a population. As noted in Section 1.1 above, it may be correct to give equal weight to
samples from different subpopulations when base rates are unknown. However, when base rates are given and relevant they should be incorporated into judgments. Chesney and Obrecht's (2012) findings do not distinguish whether their participants were attempting to use base rates (with 50% being the “best guess”) in their judgments, or merely disregarding sample size information when sampling was questionable. With the current study, we further investigate this topic, examining two main questions. First, do people utilize base rate information when combining samples to make an inference? Second, if base rate data are used, are they integrated into judgments only when they are relevant to the inference? Further, participants higher in numerical ability (Chapman & Liu, 2009) and with higher levels of education (Bramwell, West, & Salmon, 2006) have previously demonstrated superior Bayesian reasoning performance, and more numerate individuals show greater use of numeric information when integrating sample data (Chesney & Obrecht, 2012). Thus, we also investigated whether individual differences in numeracy correlate with use of base rates and sample size when integrating information. 2. Method 2.1. Participants Undergraduates at the University of Notre Dame (n = 138) and William Paterson University (n = 49) participated online for course credit. One participant from William Paterson was dropped from the sample for failure to complete the task, leaving a final N of 186. 2.2. Design Participants read 64 scenarios about aggressive behaviors observed in different kinds of fictitious animals and were asked to make inferences about aggression rates in these animal populations. In each scenario, participants were told there were two subgroups within an animal population. Then participants were given statements from two safari guides, each reporting how many animals they had observed, and the percentage of those animals that appeared aggressive. The 64 different scenarios resulted from crossing six dichotomous within-subjects independent variables, outlined in Table 1 and detailed in Sections 2.2.1–2.2.6. When all factors were manipulated to have their respective first level (see Table 1), the scenario read, “In the nusul population, 10% are green nusuls and 90% are yellow nusuls. One safari guide tells you that of the 5 green nusuls he has seen, 20% displayed aggressive behaviors. Another safari guide says that of the 40 yellow nusuls he has seen, 80% displayed aggressive behaviors. How likely do you think it is that the next nusul a safari guide sees will display aggressive behaviors?” In the opposite case in which all factors were manipulated to have their second level (see Table 1), the scenario read, “In the lofak population, some are wide eyed lofaks and some are narrow eyed lofaks. One safari guide tells you that of the 65 lofaks he has seen, 60% displayed aggressive behaviors. Another safari guide says that of the 520 lofaks he has seen, 40% displayed aggressive behaviors. How likely do you think it is that the next lofak a safari guide sees will display aggressive behaviors?”. 2.2.1. Base rate status All scenarios indicated there were two subgroups within a population. The base rates of these subpopulations were provided in half the trials (base rate status: given vs. not given). When base rates were given, one subgroup comprised 10% of the animal kind's population, and the other subgroup made up 90% (e.g. “In the nusul population, 10% are green nusuls and 90% are yellow nusuls.”). Otherwise participants were told only that “some” of the animals were in each subgroup (e.g. “some are wide eyed lofaks…”). This manipulation
N.A. Obrecht, D.L. Chesney / Acta Psychologica 142 (2013) 370–382
373
Table 1 Example manipulations of within-subject factors. Factor
First level with example
Second level with example
Base rate status
Given: “In the nusul population, 10% are green nusuls and 90% are yellow nusuls” Given: “of the 5 green nusuls he has seen, 20% displayed aggressive behaviors” Small: “of the 5 green nusuls…. of the 40 yellow nusuls…” Wide: “20% displayed aggressive behaviors… 80% displayed…” N-weighted mean > 50%: “of the 5 green nusuls he has seen, 20%… of the 40 yellow nusuls he has seen, 80%” Congruent: “10% are green nusuls 90% are yellow nusuls … 5 green nusuls… 40 yellow nusuls” The smaller (larger) base rate is paired with the smaller (larger) sample size.
Not given: “In the lofak population, some are wide eyed lofaks and some are narrow eyed lofaks.” Not given: “of the 5 deegeos he has seen, 20% displayed aggressive behaviors” Large: “of the 65 mottled vorgs… of the 520 mottled vorgs…” Narrow: “60% displayed aggressive behaviors … 40% displayed…” N-weighted mean b 50%: “of the 5 green hatshs he has seen, 80%.... of the 40 yellow hatshs he has seen, 20% ” Incongruent: “90% are hill inakus… 10% are valley inakus… 5 hill inakus … 40 valley inakus ” The larger (smaller) base rate is paired with the smaller (larger) sample size.
Subpopulation status Sample size magnitude Percent spread Sample size weighted mean parity Base rate/sample size congruency
Note: The two levels of each independent variable are shown with an example. The manipulated text is in bold. For the sample size weighted mean parity factor, the sample size weighted model gives estimates either above or below 50%. For the base rate/sample size congruency factor, base rate and sample size weighting models give very similar predictions; e.g. 74% & 73.3%, when congruent, but very different predictions, e.g. 26% vs. 73.3%, when incongruent.
allowed us to test whether responses were affected by the inclusion of base rate data. 2.2.2. Sub-population status The samples provided within a scenario were either stated to be from different subpopulations, (sub-population status: given, e.g. “of the 5 green nusuls he has seen, 20% displayed aggressive behaviors …”) or were only described by the basic animal kind label (sub-population status: not given, e.g. “of the 65 lofaks he has seen…”). In the latter case it was ambiguous whether the two samples represented random selections from the animal's general population, or if instead the samples were drawn from the two subgroups. By varying whether samples were labeled as coming from subpopulations or not, we manipulated the extent to which base rate data, when given, was relevant to the judgment. When samples come from different subgroups, it is reasonable to suspect that their differing aggression rates were caused by the difference in animal kind. For example, perhaps green nusuls are not particularly aggressive, but yellow nusuls are. When sample were instead described with the basic animal kind label, e.g. lofaks, the difference in aggression rates might instead be attributed to sampling error, and sample size, rather than base rates, should then be more relevant. 2.2.3. Sample size magnitude The magnitude of the given pair of sample sizes was large (sample sizes of 65 and 520) or small (sample sizes of 5 and 40). In all scenarios, one sample was eight times larger than the other, and the statement regarding the smaller sample was presented first. Larger samples are generally more representative of the population from which they are drawn, yielding smaller standard errors relative to smaller samples. Thus, sample size magnitude is relevant when considering whether two samples were drawn from the same general population or two different subpopulations. Since the likelihood of divergent samples having come from the two different subpopulation increases as sample size magnitude goes up, larger sample size magnitude could increase base rate use when base rates were given. 2.2.4. Percentage spread The difference between the aggression rates for a given pair of samples, the percentage spread, was wide, differing by 60% (20% vs. 80%), or narrow, differing by 20% (40% vs. 60%). Since the wide spread percentages, all else being equal, are less likely to have been drawn from the same general population than the narrow spread percentages, this factor could affect the extent to which base rates affect judgments. 2.2.5. Sample size-weighted mean parity We varied whether the larger sample size in a scenario was paired with the higher aggression rate (high sample size weighted mean parity;
e.g. “of the 40 yellow nusuls he has seen, 80% displayed aggressive behavior”) or the lower aggression rate (low sample size weighted mean parity; e.g. “of the 40 yellow hatshs he has seen, 20% displayed aggressive behavior”; see Table 1). Thus, if participants weight samples according to sample size, as shown in Eq. (2) below, their judgments should be higher in the high sample size weighted mean parity condition, compared to the low sample size weighted mean parity condition. High parity conditions yielded sample size weighted means above 50% (57.8% when spread was narrow, or 73.3% when spread was wide), while low parity conditions yielded sample size weighted means of comparable distances below 50% (42.2% when spread was narrow, or 26.7% when spread was wide). 2.2.6. Base rate/sample size congruency When sub-population base rates were given and samples were identified as coming from specific subpopulations, it was possible to weight sample aggression rates by subpopulation base rates (see Eq. (3) below). For these conditions, we manipulated whether base rate weighted means were congruent with sample size weighted means (i.e. base rate weighted means were 58%, 74%, 42%, and 26%, when the sample size weighted means were 57.8%, 73.3%, 42.2%, and 26.7%, respectively) or incongruent with sample size weighted means (i.e. base rate-weighted means were 42%, 26%, 58%, and 74%, when the sample size weighted means were 57.8%, 73.3%, 42.2%, and 26.7%, respectively). This was achieved by varying the order in which the base rates were presented (90% and 10%, vs. 10% and 90%), which simultaneously manipulated whether the larger base rate was paired with the larger or smaller sample size within a scenario, and whether base rate weighted and sample size weighted means were congruent (base rate/sample size congruency; see Table 1). The subpopulation listed first in the introductory sentence was always identified as composing the first listed (and thus smaller) sample, when such subpopulations were given. Thus, when the first listed subpopulation was stated to have the smaller (10%) base rate, it was paired with the smaller sample size, and thus yielded a base rate weighted mean congruent with the sample size weighted mean (base rate/sample size congruent). If instead the first listed subpopulation was stated to have the larger (90%) base rate, it was still paired with the smaller sample size, and thus yielded a base rate weighted mean incongruent with the sample size weighted mean (base rate/ sample size incongruent). When base rates were given, but samples were not declared to have been drawn from different subpopulations, we continued to vary the order in which the base rates were presented, which served as an additional presentation order control. Conditions where base rates were not presented were included twice, as though base rate/sample size congruency were still a factor in the design, though an invisible one, so as to balance the number of trials across conditions.
374
N.A. Obrecht, D.L. Chesney / Acta Psychologica 142 (2013) 370–382
2.3. Procedure Participants completed the study online. At the start of the experiment, they read the following introductory text, “Imagine that you work for a company that is developing safari tours through various nature preserves. For insurance purposes, you need to know how likely it is that the different kinds of animals your tour groups might encounter will display aggressive behaviors.” Participants read 64 scenarios in which information was provided about a fictitious animal species. After reading each scenario, participants judged the percent chance of an animal of that species (e.g. inaku) being aggressive (i.e. 0 to 100% chance) and rated how confident they were in their response on a 7 point scale, where 1 indicated not at all sure and 7 indicated extremely sure. The 64 scenarios were presented one at a time in random order. Participants had to click a button to proceed to a new scenario and could not change submitted responses. After completing the 64 scenarios, participants' numerical abilities were assessed via a multiple choice version of the Lipkus, Samsa, and Rimer (2001) 10 question numerical literacy measure that has been used in previous studies (Chesney & Obrecht, 2012). Information regarding the participants' demographics, as well as their educational background was also collected at this time. 3. Results & discussion 3.1. Coding 3.1.1. Numeracy and demographics Preliminary analyses did not yield significant main effects of age, gender, major, or year in school on participants' percentage responses. Also, statistic course experience, i.e. whether participants had taken statistics or not, did not correlate with percentage responses nor individual differences in other factors thought to relate to numerical ability. Therefore, we did not include these factors in subsequent analyses. Direct indicators of participant numerical ability, however, were found to significantly predict responding. These three indicators included participants' raw score on the numerical literacy measure (M = 8.27 out of 10, SD = 2.16), population normalized z-scores of participants' self-reported scores on the math portions of the SAT or ACT (Mean z-score = 1.43, SD = 1.06), and whether the participant reported that they had ever taken calculus (142 yes, 44 no). These indicators were all significantly correlated with each other (math z-score & raw numerical literacy score, r = .54; math z-score & having taken calculus, r = .65; having taken calculus & raw numerical literacy score, r = .45; all ps b .001). We therefore used these three factors to construct high, medium, and low numeracy groups for the purpose of analysis. Participants who took calculus (n = 142, 76%), scored either a 9 or a 10 on the numerical literacy assessment (n = 117, 63%), and had math z-scores greater than the median of the sample (Median = 1.68, n = 78, 42%) were coded as members of the high numeracy group (n = 64). Participants who met two of these three criteria were coded as members of the medium numeracy group (n = 50), and those who met only one or none of these criteria were coded a members of the low numeracy group (n = 52). Twenty participants did not provide either ACT or SAT scores. These participants were coded as members of the high numeracy group if they both took calculus and scored a 9 or 10 on the numerical literacy measure (n = 3), and in the low numeracy group otherwise (n = 17). This resulted in 67 participants in the high numeracy group, 50 in the medium numeracy group, and 69 in the low numeracy group. 3.1.2. Percentage scaling In the following analyses, we considered three possible methods by which participants might average sample information to produce population estimates: a) giving all percentages equal weight (see
Eq. (1)) yielding a mean of 50% in the current experiment, as would be correct if samples were drawn from different subpopulations whose frequencies in the general population were unknown; b) weighting percentages according to their sample sizes (see Eq. (2)) as would be correct if samples were drawn randomly from the same population; and c) weighting percentages according to their base rates (see Eq. (3)) as would be correct if samples were instead drawn from different subpopulations whose frequencies in the general population were known. Although we do not expect people to mentally combine data precisely via these formulas, we are interested in the extent to which their intuitive judgments reflect these models' predictions. Unweighted mean : M ¼
p1 þ p2 2
ð1Þ
Sample size weighted mean : M ¼ p1
n1 n1 þ n2
þ p2
Base rate weighted mean : M ¼ p1 ðBR1 Þ þ p2 ðBR2 Þ
n2 n1 þ n2
ð2Þ ð3Þ
Accordingly, participants' percent estimates were scaled to indicate the extent to which they were consistent with the sample size weighted mean versus the unweighted mean, via a method previously used by Chesney and Obrecht (2012). Because the predictions of the sample size weighted and base rate weighted averaging models were systematically related, as we describe below, this scaling allowed us to evaluate the predictions of all three models at once. Responses were scaled according to the formula defined in Eq. (4): Scaling formula :
X−50% NweightedPrediction−50%
ð4Þ
Percentage responses equal to the unweighted mean (always 50%) were coded as 0 and responses equal to the sample size-weighted mean (57.8%, 73.3%, 42.2%, or 26.7%, depending upon the condition) were coded as 1. All other responses were scaled such that the scaled scores' relative proximity to 0 and 1 reflected the raw responses' relative proximity to the unweighted means and sample size weighted means. For example, if the sample size weighted mean was 73.3% (a wide spread, high sample size weighted mean parity case), a response of 60% would be scaled as .43 (60%–50% = 10%; 73.3%–50% = 23.3%; 10%/23.3%= .43); indicating that the participant gave .43 as much weight to sample size as she should have, assuming data were randomly selected from the general population. Similarly, a response of 70% would be scaled as .86, and 80% would be scaled as 1.29 (indicating overweighting sample size). Conversely, if the sample size weighted mean was 42.2% (a narrow spread, low sample size weighted mean parity case) a response of 45% would be scaled as .64, 40% would be scaled as 1.28, and 55% would be scaled as −.64. Thus, the scaled responses become more like the sample size weighted mean and less like the unweighted mean as values increase from 0 towards 1. Further, recall that the base rate weighted means were approximately equal to the sample size weighted means in the base rate/ sample size congruent conditions (e.g., sample size-weighted mean= 73.3%, base rate-weighted mean= 74%; see Table 1). In contrast, in the base rate/sample size incongruent conditions, these weighted means, while approximately equidistant from 50%, were in the opposite direction (e.g., sample size-weighted mean=73.3%, base rate-weighted mean= 26%). Thus, when base rates and sample sizes were congruent, scaled responses become more like the sample size and base rate weighted means and less like the unweighted mean as values increase from 0 towards 1. In contrast, when base rates and sample sizes were incongruent, scaled responses closer to 1 reflected greater samples size use, while scaled responses closer to −1 reflected greater base rate use. Thus, in the incongruent conditions, responses become more like the base rate weighted mean and less like the unweighted mean as values decrease from 0 towards -1. This scaling allowed us to directly
N.A. Obrecht, D.L. Chesney / Acta Psychologica 142 (2013) 370–382
375
Fig. 1. Marginal means of scaled aggression rate judgments in base rate given conditions, as a function of subpopulation status, base rate/sample size (BR-N) congruency, and numeracy. A score of 0 is equivalent to the unweighted mean, while a score of 1 is equivalent to the sample size weighted mean, and is highly similar to the base rate weighted mean in the base rate/sample size congruent condition. A score of −1 indicates responses highly similar to the base rate weighted mean in the base rate/sample size incongruent condition. Error bars represent standard error. Note: Base rate/sample size congruent conditions are also low base rate first conditions, and base rate/sample size incongruent conditions are also high base rate first conditions.
compare the extent to which participants responses' differed from those predicted by the three averaging models, regardless of spread and sample size weighted mean parity conditions. 3.2. Analyses Nonsensical responses, such as estimations of aggression rates over 100%, were excluded from the analyses. Such responses comprised less than 1% of the data. Base rates were only given in half the scenarios (base rate given vs. base rate not given). As base rate/sample size congruency (or base rate presentation order) was only a relevant factor in the design when base rates were given, we separately analyzed base rate given and base rate not given cases. Note: Means reported are marginal means unless otherwise noted. 3.2.1. Base rates given We ran a 2 × 2 × 2 × 2 × 2 × 3 mixed model ANOVA, with subpopulation status, sample size magnitude, percentage spread, sample size weighted mean parity, and base rate/sample size congruency (base presentation order) as within-subjects factors and numeracy as a between-subjects factor. Base rate/sample size congruency was only relevant when samples were linked to subpopulations. However, to control for order effects, the presentation order of the base rates was alternated in the same fashion whether subpopulations were given or not given. Analyses were run on the participants' scaled responses. 3.2.1.1. Base rates influenced responses. Our main experimental questions were: 1) Do people utilize base rate information when combining samples to make an inference? 2) Are base rates integrated into judgments only when they are relevant to the inference? Our results indicate that the answer to both of these questions is yes. There were significant main effects of subpopulation status (F(1,173)= 94.7, p b .0005, η 2p = .35), and base rate/sample size congruency (F(1,173) = 235.2, p b .0005, η 2p = .58), as well as an interaction between these factors (F(1,173) =250.0, p b .0005, η 2p = .59). Participants' estimates most strongly reflected the base rate weighted model (Eq. (3)) when samples were stated to represent different subpopulations. When base rate weighted means were congruent with sample size weighted means (base rate/sample size congruent condition), individuals gave responses that were consistent with both models (i.e. scaled responses near 1). In base rate/sample size incongruent conditions, the average response favored the base rate weighted model over the sample size weighted model (see Fig. 1), but judgments were actually bimodal (see Figs. 2 and 3), clustering either around the sample size weighted prediction (scaled as 1, see Eq. (2)) or the base rate-weighted prediction (scaled as −1; see Eq. (3)). Note, weighting means by sample size when the base rates are contradictory would make sense if individuals
considered the given sample sizes to indicate the base rates of animal subpopulations on the safari route. Such bimodality in responses sharply contrasts participants' more uniform response distributions when either base rate weighted and sample size weighted means were similar or samples were not linked to specific subpopulations (compare histograms in Fig. 2). Indeed, we find that some individuals indeed consistently use base rates when making judgments (see Fig. 3). Further, when samples were not identified as coming from different subpopulations, the continued manipulation of base rate presentation order did not affect responding1. Additionally, participants higher in numerical ability showed a greater tendency to integrate base rates into their judgments when it was appropriate to do so, that is, when subpopulations were linked to sample data and base rates were incongruent with sample size (main effect of numeracy F(2,173) = 6.8, p = .001, η 2p = .07; interaction between subpopulation status and numeracy F(2,173) = 6.9, p = .001, η 2p = .07; interaction between base rate-sample size congruency and numeracy F(2,173) = 11.8, p b .0005, η 2p = .12; interaction among subpopulation status, base rate/sample size congruency, and numeracy, F(2,173) = 3.9, p = .022, η 2p = .04; see Fig. 1). These effects were stable, and typically remained significant regardless of the inclusion criteria used for the data (e.g. excluding particularly low or high scaled responses; the division of numeracy groups). Further, ANOVAs separately analyzing responses in each numeracy group all showed significant interactions between subpopulation status and base rate-sample size congruency (pb .0005 in all cases). Thus, even though the least numerate participants were least sensitive, all groups' use of the base rates and sample size was affected by sampling information.
1 This was confirmed by a separate 2 (sample size magnitude) × 2 (percentage spread) × 2 (sample size weighted mean parity) × 2 (base rate presentation order) × 3 (numeracy) mixed model ANOVAs on the 16 base rate given/subpopulation not given scenarios. The only significant effects were for sample size magnitude (F(1,181) = 7.3, p = .008, η2p = .04; small: M = .588, SE = .034; large: M = .676, SE = .038), and numeracy (F(2,181) = 14.5, pb .0005, η2p = .14; low M = .403, SE = .052; medium: M = .701, SE = .061; high: M = .786, SE = .053). All other effects and interactions had ps > .09. A second follow up analysis (2 (base rate status) × (sample size magnitude) × 2 (percentage spread) × 2 (sample size weighted mean parity) × 3 (numeracy) mixed model ANOVA) was run to determine if presenting base rates affected how participants responded when samples were not identified as coming from different subpopulations. Again there was no main effect of base rate status. Only main effects of sample size magnitude (F(1,179) = 13.3, p = .0005, η2p = .07; small: M = .600, SE = .032; large: M = .690, SE = .032), and numeracy (F(2,179) = 16.8, p b .0005, η2p = .16; low M = .415, SE = .048; medium: M = .730, SE = .057, high: M = .789, SE = .049) were statistically significant. However, there was an interaction between sample size weighted mean parity and base rate status (F(1,179) = 5.2, p = .023, η2p = .03; base rate given/ parity low: M = .618, SE = .056; base rate given/parity high: M = .648, SE = .058; base rate not given/parity low: M = .700, SE = .054; base rate not given/parity high: M = .614, SE = .063).
376
N.A. Obrecht, D.L. Chesney / Acta Psychologica 142 (2013) 370–382
Fig. 2. Distribution of scaled aggression rate responses in the base rate given conditions as a function of subpopulation status, base rate/sample size (BR-N) congruency, and numeracy. A score of 0 is equivalent to the unweighted sample mean, while a score of 1 is equivalent to the sample size weighted mean, and is highly similar to the base rate weighted mean in the base rate/sample size congruent condition. A score of −1 indicates responses highly similar to the base rate weight mean in the base rate/sample size incongruent condition. Scores less than −3 or greater than 3 (4% of responses) are not shown. Note: Base rate/sample size congruent conditions are also low base rate first conditions, and base rate/sample size incongruent conditions are also high base rate first conditions.
3.2.1.2. Additional effects. The ANOVA also revealed a main effect of sample size-magnitude (F(1,173)= 4.3, p = .039, η2p = .02) and an interaction between sample size magnitude and numeracy (F(1,173) = 3.4, p = .037, η2p = .04). The least numerate participants particularly gave more weight to sample size when sample sizes were larger in magnitude (low numeracy: M = .396, SE= .048; medium numeracy:
M = .520, SE=.053; high numeracy: M = .529, SE= .047) than when they were smaller in magnitude (low numeracy: M= .258, SE= .047; medium numeracy: M = .511, SE= .051; high numeracy: M = .524, SE= .045). This is not normatively correct. Either sample size magnitude should have had no effect or larger magnitudes should have led to decreased sample size use. Merely weighting by sample size as per
Fig. 3. Distribution of individual participants' mean scaled responses in base rate/sample size incongruent conditions as a function of numeracy, on the eight trials in which samples were identified as coming from different subpopulations with given base rates. The distributions, particularly the bimodality of the distributions of the medium and high numeracy groups, seem to indicate that individual participants tended to be fairly consistent in their use of a particular weighting strategy. Means of −1 indicate participants who typically used a base rate weighting strategy, while means of 1 indicate participants who typically used a sample size weighting strategy. More normal distributions would have be expected if individuals instead varied their strategies, with their likelihoods of using a particular strategy on an particular trial matching the frequency of that strategy in the group as a whole.
N.A. Obrecht, D.L. Chesney / Acta Psychologica 142 (2013) 370–382
377
Generally, the high level interactions appeared to reflect minor differences in how numerical information was integrated by participants of different numeracy levels, as well as minor order effects idiosyncratic to the design of our study. Slightly different patterns of responding were found between the high and low numeracy groups, with the medium group often showing an intermediate pattern, or being similar to the high group. Take, for example, the interaction among sample size magnitude, percentage spread, sample size weighted mean parity, base rate/sample size congruency and numeracy. Here the high and medium numeracy participants gave more weight to sample size when sample size and base rates were congruent. Although low numerate participants showed a similar pattern, this effect was diminished when sample sizes were small in magnitude, percentages were close (narrowly spread), and the larger sample was paired with the smaller percent (low parity). Further, the interactions involving subpopulation status and base rate/sample size congruency are due in part to order effects since base rate/sample size congruency merely affected presentation order when subpopulations were not given. It should be noted, however, that these complex interactions were unstable and tended to change significance status when different inclusion criteria were used. Thus, we do not believe it scientifically sound to draw conclusions from these data.
Eq. (2), sample size magnitude should not have affected the participants' responses. Since the ratio of the two samples sizes within a scenario was held constant (i.e. 5 to 40= 65 to 520), the sample size weighted means should be equal across the two sample size magnitude conditions. However, since diverse sample means are less likely to be drawn from the same population when sample size is larger in magnitude (see Chesney & Obrecht, 2012), participants could justifiably give less weight to sample size when sample sizes were larger in magnitude because they might have inferred the data were drawn from discrete subpopulations. Several more complex interactions were also uncovered: an interaction among subpopulation status, percentage spread, and base rate sample size congruency (F(1,173) = 5.4, p = .021, η2p =.03); an interaction among subpopulation status, sample size weighted mean parity, and numeracy (F(2,173) = 3.1, p = .049, η 2p = .03); an interaction among subpopulation status, percentage spread, base rate sample size congruency, and numeracy (F(2,173) = 4.6, p = .012, η 2p = .05); an interaction among subpopulation status, sample size magnitude, percentage spread, and numeracy (F(2,173) = 3.7, p = .028, η 2p = .04); an interaction among percentage spread, sample size weighted mean parity, base rate/sample size congruency and numeracy (F(2,173) = 3.2, p = .045, η 2p = .04); an interaction among sample size magnitude, percentage spread, sample size weighted mean parity, base rate/sample size congruency and numeracy (F(2,173) = 6.7, p =.002, η 2p = .07); and an interaction among subpopulation status, sample size magnitude, sample size weighted mean parity, base rate sample size congruency, and numeracy (F(2,173) = 4.0, p = .020, η 2p = .04). Means and standard errors for all conditions are displayed in Table 2. Three of these interactions are subsumed by interactions of a higher order.
3.2.2. Base rates not given A previous study (Chesney & Obrecht, 2012) found that individuals gave less weight to sample size when verbal or numerical information indicated that samples came from different subpopulations and base rates were not given. In order to test if these effects replicated in the current paradigm, we ran a 2 × 2 × 2 × 2 × 3 mixed model
Table 2 Marginal means and standard errors of scaled responses to scenarios for participants in low, medium, and high numeracy groups. Subpop. status
Sample size Magnitude
Spread
Sample size weighted mean parity
Base rate/sample size congruency
Base rate status Given
Not given
Numeracy Low
Given Given Given Given Given Given Given Given Given Given Given Given Given Given Given Given Not given Not given Not given Not given Not given Not given Not given Not given Not given Not given Not given Not given Not given Not given Not given Not given
Small Small Small Small Small Small Small Small Large Large Large Large Large Large Large Large Small Small Small Small Small Small Small Small Large Large Large Large Large Large Large Large
Wide Wide Wide Wide Narrow Narrow Narrow Narrow Wide Wide Wide Wide Narrow Narrow Narrow Narrow Wide Wide Wide Wide Narrow Narrow Narrow Narrow Wide Wide Wide Wide Narrow Narrow Narrow Narrow
High High Low Low High High Low Low High High Low Low High High Low Low High High Low Low High High Low Low High High Low Low High High Low Low
Congruent Incongruent Congruent Incongruent Congruent Incongruent Congruent Incongruent Congruent Incongruent Congruent Incongruent Congruent Incongruent Congruent Incongruent Congruent Incongruent Congruent Incongruent Congruent Incongruent Congruent Incongruent Congruent Incongruent Congruent Incongruent Congruent Incongruent Congruent Incongruent
Numeracy Medium
High
Low
Medium
High
M
SE
M
SE
M
SE
M
SE
M
SE
M
SE
0.57 0.15 0.53 −0.12 0.69 −0.80 0.31 0.05 0.60 −0.08 0.52 −0.15 0.77 −0.09 0.87 −0.09 0.41 0.36 0.37 0.08 0.46 0.46 0.30 0.31 0.55 0.56 0.36 0.49 0.43 0.84 0.16 0.61
0.09 0.12 0.08 0.12 0.19 0.23 0.20 0.22 0.08 0.12 0.08 0.13 0.17 0.19 0.19 0.20 0.09 0.10 0.09 0.10 0.17 0.20 0.18 0.21 0.09 0.10 0.09 0.10 0.18 0.19 0.19 0.17
0.89 −0.36 0.97 −0.20 0.93 −0.49 1.28 −0.30 0.95 −0.41 0.95 −0.39 0.91 −0.41 0.90 −0.28 0.73 0.64 0.81 0.60 0.63 0.56 0.76 0.66 0.86 0.76 0.80 0.72 0.73 0.78 0.78 0.73
0.10 0.14 0.09 0.14 0.21 0.26 0.22 0.25 0.09 0.14 0.10 0.14 0.19 0.21 0.21 0.22 0.11 0.11 0.10 0.11 0.19 0.22 0.21 0.24 0.10 0.11 0.10 0.11 0.20 0.22 0.21 0.19
1.03 −0.42 0.92 −0.35 0.80 −0.25 1.01 −0.45 1.04 −0.42 0.96 −0.42 0.94 −0.53 0.82 −0.39 0.82 0.65 0.80 0.85 0.66 0.66 0.87 0.72 0.80 0.76 0.95 0.72 0.86 0.59 0.96 0.76
0.09 0.12 0.08 0.12 0.18 0.22 0.19 0.21 0.08 0.12 0.08 0.13 0.17 0.19 0.19 0.20 0.09 0.10 0.09 0.10 0.17 0.19 0.18 0.21 0.09 0.10 0.09 0.10 0.17 0.19 0.19 0.17
0.28 0.33 0.31 0.30 0.23 0.11 0.29 −0.03 0.44 0.55 0.32 0.27 0.69 0.08 0.32 0.74 0.42 0.36 0.42 0.47 0.14 0.36 0.49 0.27 0.54 0.60 0.39 0.55 0.70 0.62 0.43 0.31
0.09 0.10 0.08 0.09 0.16 0.19 0.20 0.20 0.09 0.09 0.10 0.09 0.18 0.20 0.16 0.17 0.10 0.10 0.09 0.09 0.19 0.18 0.20 0.18 0.09 0.08 0.08 0.08 0.17 0.17 0.16 0.15
0.66 0.58 0.71 0.71 0.43 0.51 0.79 0.70 0.72 0.75 0.68 0.78 0.52 0.66 0.63 0.85 0.67 0.61 0.84 0.83 0.40 0.54 0.90 0.81 0.72 0.84 0.75 0.86 0.80 0.64 0.79 1.00
0.10 0.11 0.09 0.10 0.18 0.22 0.23 0.22 0.10 0.10 0.11 0.10 0.20 0.22 0.19 0.19 0.11 0.11 0.10 0.10 0.21 0.20 0.23 0.20 0.10 0.10 0.09 0.08 0.19 0.19 0.18 0.17
0.57 0.60 0.69 0.63 0.51 0.45 0.64 0.66 0.66 0.64 0.61 0.66 0.47 0.56 0.67 0.64 0.80 0.87 0.85 0.85 0.62 0.64 0.88 0.73 0.83 0.86 0.92 0.86 0.67 0.69 0.83 0.84
0.09 0.10 0.08 0.09 0.16 0.19 0.20 0.19 0.09 0.09 0.10 0.09 0.18 0.19 0.16 0.17 0.10 0.10 0.09 0.09 0.18 0.17 0.20 0.18 0.09 0.08 0.08 0.07 0.17 0.17 0.16 0.15
378
N.A. Obrecht, D.L. Chesney / Acta Psychologica 142 (2013) 370–382
ANOVA on the data where base rates were not given, with subpopulation status, sample size magnitude, percentage spread, and sample size weighted mean parity as within-subjects factors and numeracy as a between-subjects factor. Recall that each participant responded to two trials in each of the sixteen resulting within-subjects conditions. Thus, the analysis was run on the mean of participants' scaled responses in these conditions. We found stable main effects for subpopulation status, sample size magnitude, percentage spread, and numeracy. No interactions were significant. Replicating the findings of Chesney and Obrecht (2012), participants gave less weight to sample size when they were told that the samples came from different subpopulations (F(1,178) = 24.9, p b .0005, η 2p = .12; subpopulation not given: M = .659, SE = .031; subpopulation given: M = .527; SE = .032). Additionally, more numerate participants gave responses that were more consistent with the sample size weighted model than did less numerate participants (F(2,178) = 15.5, p b .0005, η 2p = .15; high: M = .697, SE = .048; medium: M = .706, SE = .055; low: M = .375, SE = .046). Means are displayed in Fig. 4. However, we also found that participants gave more weight to sample size when sample sizes were larger in magnitude (F(1,178) = 9.5, p = .002, η 2p = .05; large, M = .634, SE = .032; small: M = .552, SE = .032), and when percentage spread was wide (F(1,178) = 5.30, p = .022, η 2p = .03; wide: M = .624, SE = .028; narrow: M = .562, SE = .035). These last two results are inconsistent with the findings of previous studies in which increased sample size magnitude reduced sample size sensitivity (Obrecht, 2010) and large spread between means decreased sample size use (Chesney & Obrecht, 2012). One possible explanation for these divergent findings is that the more extreme numerical values in the large sample size and wide spread conditions led our participants to think that the samples might be drawn from the different subpopulations noted in the scenarios' introductory sentences. However, given that these scenarios were interspersed with others where base rates were 10% and 90%, and that the stated sample sizes were in comparable ratios to these base rates, individuals may have assumed sample size was an indication of base rate, and thus given more weight to sample size in these instances. Further experimentation is needed to determine if this interpretation is correct. 3.2.3. Confidence ratings In addition to our analyses of participants' percentage responses, we also analyzed participants' confidence ratings. Recall, for each percentage response, participants also indicated on a scale of 1 to 7 how confident they were of their answer. Unlike the probability estimates, the range of confidence ratings was uniform for all questions. Thus we were able to evaluate the confidence ratings in a single analysis. We ran a 2 × 2 × 2 × 2 × 2 × 2 × 3 mixed model ANOVA, with base rate status, subpopulation status, sample size-magnitude, percentage spread, sample size weighted mean parity, and base rate/sample size congruency (base rate presentation order) as within-subjects factors and numeracy as the between-subjects factor. The results were consistent with the percentage response data.
Overall, participants were moderately confident in their responses: the mean confidence rating was 3.75 (SE = .09). Confidence was higher when base rate information was given (M = 3.83, SE = .09) rather than not given (M = 3.66, SE = .09; F(1,183) = 19.7, p b .001, η 2p = .10). Base rate and subpopulation status interacted to affect confidence (F(1,183) = 44.5, p b .001, η 2p = .20). Participants were most confident when base rate information was given and samples were related to subgroups (M = 3.91, SE = .09); they were least confidence when base rates were not given, but subpopulations were (M = 3.52, SE = .09). Confidence was moderate when base rates were given and subpopulations were not (M = 3.75, SE = .10) and when neither was given (M = 3.80, SE = .09). It appears that participants knew that base rate and sampling information needed to be considered in concert when combining information from different subpopulations, and were thus particularly unconfident in their responses when it was indicated that samples were drawn from different subpopulations, but they did not have the necessary information, i.e. the base rates, to adjust their responses accordingly. Participants were also more confident when base rate and sample size information were congruent, as indicated by a main effect of base rate/sample size congruency (base rate/sample size congruent: M = 3.78, SE = .09; base rate/sample size incongruent: M = 3.72, SE = .09, F(1,183) = 11.5, p = .001, η 2p = .06). Although the base rate/sample size congruency factor affected presentation order, the interactions described below suggest order itself does not account for this effect. Instead, the effect of congruency was driven by the conditions in which base rates were given (base rate status × base rate/sample size congruency: F(1,183) = 7.5, p = .007, η 2p = .04). When base rates were given, participants were more confident when low base rates were presented first and thus base rate and sample information was congruent (congruent/low base rate first: M = 3.89, SE = .09; incongruent/high base rate first: M = 3.78, SE = .09). When base rates were not given, and thus the base rate/sample size congruency factor only affected the order in which subpopulations were presented, there was no effect (low first: M = 3.67, SE = .09; high first: M = 3.66, SE = .09). Further, base rate/sample size congruency only affected confidence when subpopulations were linked to samples (subpopulation status × base rate/sample size congruency: F(1,183) = 12.1, p = .001, η 2p = .06). When subpopulation were given, participants were less confident in the incongruent conditions (congruent/low first: M = 3.78, SE = .09; incongruent/high first: M = 3.66, SE = .09). In contrast, the order manipulation did not affect confidence ratings when samples were not linked to subpopulations and thus order manipulation could not affect congruency (congruent/low first: M = 3.78, SE = .09; incongruent/high first: M = 3.78, SE = .10). Thus, participants' confidence was affected in a normative manner such that they were only affected by sample size/base rate congruency when this factor was statistically relevant. In line with probability theory, participants were more confident in their responses when sample sizes were larger, rather than smaller, in magnitude (F(1,183) = 14.7, p b .001, η 2p = .07; small M = 3.68, SE = .09; large M = 3.81, SE = .09). Also normative, they were less confident when the spread between the percentage data in a scenario
Fig. 4. Marginal means of scaled aggression rate judgments in base rate not given conditions, as a function of subpopulation status and numeracy. A score of 0 is equivalent to the unweighted mean and a score of 1 is equivalent to the sample size weighted mean. Error bars represent standard error.
N.A. Obrecht, D.L. Chesney / Acta Psychologica 142 (2013) 370–382
was wide rather than narrow (F(1,183) = 13.5, p b .001, η 2p = .07; wide M = 3.70, SE= .09; narrow M = 3.79, SE= .09). Spread interacted with base rate status (F(1,183)= 3.9, p = .049, η 2p = .02) such that spread had a larger impact when base rates were not given (wide M = 3.60 SE= .09; narrow M = 3.72, SE= .09) compared to when they were given (wide M = 3.81, SE=.09; narrow M= 3.86, SE= .09). This pattern makes sense. When sample percentages looked very different (i.e. in the widely spread condition), it suggested samples were drawn from different subgroups. However, when base rates were not given, there was no way to weight data according to these subgroups. Participants were also more confident when sample size weighted mean probabilities were closer to 1 than to 0 (sample size weighted mean parity: F(1,183) = 14.1, p b .001, η 2p = .07; high, M = 3.78 SE= .09; low: M = 3.71, SE= .09). This effect is not logically sound, but a similar result was found by Chesney and Obrecht (2012). Parity also interacted with sample size and spread (F(1,183)= 4.7, p = .032, η 2p = .02). Participants' increased confidence in conditions with sample size weighted means greater than .5 was particularly strong when sample sizes were large and spread was wide (see Table 3). Further, there were two four-way interactions involving sample size weighted mean parity (base rate status × subpopulation status × base rate/sample size congruency× sample size weighted mean parity: F(1,183) = 7.2, p = .008, η 2p = .04; subpopulation status × sample size× spread× sample size weighted mean parity: F(1,183) = 6.6, p = .011, η 2p = .04; see Table 3) which are not easily interpretable. While there was no main effect of numeracy on confidence ratings, numeracy did interact with other factors. There was a marginal interaction between base rate status and numeracy, such that the more numerate an individual was, the more her confidence
379
increased when base rates were given (F(2,183) = 2.9, p = .057, η 2p = .03, see Fig. 5). Subpopulation status and numeracy also interacted such that highly numerate participants were particularly less confidant when samples were linked to different subpopulations, in which case the typically taught method of weighting by means was in doubt (F(2,183) = 3.8, p = .023, η2p = .04, see Fig. 5). Participants with high numeracy were more sensitive to the interaction between base rate status and subpopulation status than those with lower numeracy (F(2,183) = 4.6, p = .011, η2p = .05, see Fig. 5). Further, the confidence of more numerate participants was affected more by the sample size magnitude manipulation compared to less numerate participants (sample size× numeracy: F(2,183) = 5.7, p = .004, η2p = .06, see Table 3). There were also number of more complicated 4, 5 and 6 way interactions involving numeracy: (base rate status × subpopulation status× sample size× numeracy: F(2,183) = 7.9, p b .001, η 2p = .08; base rate status × subpopulation status× sample size× base rate/sample size congruency × numeracy: F(2,183) = 3.4, p = .036, η 2p = .04; base rate status × subpopulation status × spread × base rate/sample size congruency × numeracy: F(2,183) = 3.0, p = .05, η 2p = .03; base rate status × sample size × spread × base rate/sample size congruency × numeracy: F(2,183) = 3.1, p = .049, η2p = .03; base rate status × sample size× spread× parity× numeracy: F(2,183)= 4.4, p = .013, η2p = .05; base rate status × subpopulation status × spread × sample size weighted mean parity × base rate/sample size congruency × numeracy: F(2,183) = 3.6, p = .029, η 2p = .04). Effects can be viewed Table 3. While not easily describable, they typically indicate that more numerate individuals were more sensitive to numerical information. No other effects were significant.
Table 3 Marginal means and standard errors of confidences ratings for all trial kinds for participants in the three numeracy groups. Subpop. Status
Sample size Magnitude
Spread
Sample size Weighted mean parity
Base rate/sample size congruency
Base rate status Given
Not given
Numeracy Low
Given Given Given Given Given Given Given Given Given Given Given Given Given Given Given Given Not given Not given Not given Not given Not given Not given Not given Not given Not given Not given Not given Not given Not given Not given Not given Not given
Small Small Small Small Small Small Small Small Large Large Large Large Large Large Large Large Small Small Small Small Small Small Small Small Large Large Large Large Large Large Large Large
Wide Wide Wide Wide Narrow Narrow Narrow Narrow Wide Wide Wide Wide Narrow Narrow Narrow Narrow Wide Wide Wide Wide Narrow Narrow Narrow Narrow Wide Wide Wide Wide Narrow Narrow Narrow Narrow
High High Low Low High High Low Low High High Low Low High High Low Low High High Low Low High High Low Low High High Low Low High High Low Low
Congruent Incongruent Congruent Incongruent Congruent Incongruent Congruent Incongruent Congruent Incongruent Congruent Incongruent Congruent Incongruent Congruent Incongruent Congruent Incongruent Congruent Incongruent Congruent Incongruent Congruent Incongruent Congruent Incongruent Congruent Incongruent Congruent Incongruent Congruent Incongruent
Numeracy Medium
High
Low
Medium
High
M
SE
M
SE
M
SE
M
SE
M
SE
M
SE
3.99 3.58 3.57 3.73 3.99 3.64 3.77 3.90 3.62 3.51 3.61 3.68 3.99 3.52 3.62 3.61 3.59 3.51 3.48 3.22 3.54 3.65 3.65 3.73 3.48 3.64 3.73 3.71 3.54 3.61 3.38 3.51
0.18 0.19 0.19 0.19 0.19 0.18 0.19 0.19 0.20 0.18 0.20 0.19 0.20 0.18 0.20 0.18 0.19 0.18 0.17 0.19 0.18 0.18 0.19 0.18 0.20 0.19 0.20 0.19 0.19 0.19 0.19 0.19
4.08 3.78 4.28 3.76 3.96 3.78 3.84 3.64 4.32 4.10 4.20 3.96 4.40 3.88 4.16 3.90 3.82 3.76 3.64 3.52 3.88 3.72 3.76 3.76 3.94 3.90 3.84 3.86 4.10 4.04 3.60 3.62
0.22 0.23 0.22 0.22 0.22 0.22 0.23 0.23 0.23 0.22 0.23 0.22 0.23 0.22 0.23 0.21 0.22 0.21 0.20 0.22 0.21 0.21 0.22 0.22 0.23 0.22 0.24 0.22 0.22 0.22 0.23 0.23
4.19 3.52 3.84 3.78 4.06 3.87 3.99 3.73 4.28 4.15 3.99 4.09 4.30 4.15 4.28 4.25 3.72 3.88 3.76 3.66 3.87 3.78 3.88 3.81 3.84 4.00 3.88 3.73 4.39 4.08 4.09 4.09
0.19 0.20 0.19 0.19 0.19 0.19 0.20 0.20 0.20 0.19 0.20 0.19 0.20 0.19 0.20 0.18 0.19 0.19 0.18 0.19 0.18 0.18 0.19 0.19 0.20 0.19 0.20 0.19 0.19 0.19 0.19 0.20
3.30 3.39 3.41 3.32 3.73 3.41 3.64 3.48 3.70 3.49 3.57 3.35 3.57 3.68 3.58 3.58 3.65 3.70 3.46 3.71 3.83 3.65 3.70 3.55 3.49 3.57 3.36 3.55 3.74 3.62 3.48 3.67
0.20 0.19 0.19 0.18 0.19 0.19 0.18 0.19 0.18 0.19 0.19 0.19 0.19 0.19 0.19 0.18 0.18 0.20 0.18 0.19 0.19 0.19 0.19 0.18 0.19 0.18 0.19 0.19 0.18 0.18 0.18 0.19
3.56 3.58 3.52 3.50 3.70 3.56 3.70 3.44 3.68 3.72 3.78 3.70 3.78 3.84 3.86 3.70 3.72 3.70 3.66 3.72 3.72 3.74 3.88 3.80 3.80 4.08 3.76 3.88 4.12 4.02 3.88 3.82
0.24 0.22 0.22 0.21 0.22 0.22 0.21 0.23 0.21 0.22 0.22 0.23 0.22 0.22 0.22 0.22 0.21 0.23 0.21 0.22 0.22 0.22 0.22 0.21 0.22 0.21 0.22 0.22 0.21 0.21 0.21 0.22
3.33 3.46 3.13 3.21 3.34 3.42 3.30 3.57 3.33 3.54 3.54 3.25 3.49 3.31 3.60 3.51 3.93 3.73 3.64 3.49 3.93 3.94 3.87 4.05 4.00 4.12 3.93 3.96 4.24 4.30 4.19 4.10
0.20 0.19 0.19 0.19 0.19 0.19 0.19 0.20 0.18 0.19 0.19 0.19 0.19 0.19 0.19 0.19 0.19 0.20 0.18 0.19 0.19 0.19 0.19 0.18 0.19 0.18 0.19 0.19 0.18 0.18 0.18 0.19
380
N.A. Obrecht, D.L. Chesney / Acta Psychologica 142 (2013) 370–382
Fig. 5. Marginal means of confidence ratings, as a function of base rate status, subpopulation status, base rate/sample size (BR-N) congruency, and numeracy. Confidence ratings of 1–7 were possible. A scale of 3–4.5 is used to facilitate visual comparison. Error bars represent standard error. Note: Base rate/sample size congruent conditions are also low base rate first conditions, and base rate/sample size incongruent conditions are also high base rate first conditions.
3.3. Conclusions The current study demonstrates that people can use base rates when making judgments based on samples known to be drawn from different subpopulations. When combining data from different groups with known base rates, many participants gave probability estimates that were consistent with a base rate weighted model. When samples instead appeared to represent a general population, rather than different subgroups, people gave judgments that were more consistent with a weighted average model that incorporates sample size information. Further, participants were sensitive to whether base rates and sample sizes were congruent with one another (i.e. whether large base rates were paired with large sample sizes or not). When these factors were incongruent, responses were split such that some participants favored the base rate information, while others favored sample size information. These results parallel those of Pennycook and Thompson (2012) in which participant responses were bi-model, with some participants giving responses consistent with base rates and others giving responses consistent with contrasting descriptive information. Our finding that people are able to use both base rates and sample sizes when making judgments stands in contrast to previous literature that suggests humans typically are not proficient at using either kind of information (e.g. Kahneman & Tversky, 1972). Instead, these results show that base rate use depends on the causal structure of a problem (Krynski & Tenenbaum, 2004; also see Ajzen, 1977). According to Krynski and Tenenbaum's model of causal Bayesian reasoning, statistical information is more likely to be used when it clearly maps onto a parameter within a causal model, compared to when it does not relate to any parameters. Like Krynski and Tenenbaum, we provided situations in which the same data were given (e.g. base rates), but found that they were used differently depending on the causal implications provided by other information. When the two percentages within a scenario described different subgroups within an animal kind, the difference between the sample percentages could be attributed to the subpopulations. In this case, their respective base rates were causally relevant; e.g. perhaps
the yellow nusuls are especially aggressive, while the green ones are not. Instead, when the percentages described samples from the same general population, the difference between sample percentages could be attributed to sampling error. In this case, sample size is causally relevant and base rates are not; e.g. since one percentage is based on a smaller sample size it could be less representative of the general population compared to the other. As Krynski and Tenenbaum point out, the result that people use base rates differently as a function of a problem's causal structure is not predicted by heuristic accounts (Tversky & Kahneman, 1974), nor natural frequency (Gigerenzer & Hoffrage, 1995) accounts. Instead, our data support the contention that humans use data that fit into their causal representations. This finding is consistent with work demonstrating that humans, and even human infants, attend to sampling processes and that these factors affect how inferences are made (Chesney & Obrecht, 2012; Gigerenzer et al., 1988; Kushnir et al., 2010; Rhodes et al., 2008; Xu & Denison, 2009). In the current study we presented participants with fictitious animals in order to limit their prior knowledge and focus on our variables of interest, while maintaining a natural context in which facilitate causal reasoning. However, this study was not without limitations. Although the design increased internal validity, it limited generalizability because in real life inferences are made within a rich and complex causal context that includes specific knowledge. Also, the within-subjects design employed may have made the variables we manipulated more salient. Further studies could examine the strength of these effects in between-subjects designs, or see if they change over time and exposure. We also found that participants' use of base rate and sample size information is linked to their numerical ability. Participants with higher numeracy scores demonstrated more normative use of base rates and sample size, which is consistent with previous work showing that more numerate individuals are better able to use numeric information when making judgments (e.g. Chesney & Obrecht, 2012; Chapman & Liu, 2009; Obrecht et al., 2009). These judgments have important real world implications regarding how statistical information is used to make decisions (see Reyna, Nelson, Han, & Dieckmann, 2009 for a review) and how individuals communicate numerical risk information
N.A. Obrecht, D.L. Chesney / Acta Psychologica 142 (2013) 370–382
(Anderson, Obrecht, Chapman, Driscoll, & Schulkin, 2011). It could be that people higher in numeracy also have a higher need for cognition (Cokely & Kelly, 2009) and gain more meaning from numbers (Peters et al., 2006), which allows them to respond in a more normative manner. Although numeracy correlates with presumably stable personality traits, such as need for cognition, training studies in both the developmental literature (Räsänen, Salminen, Wilson, Aunio, & Dehaene, 2009) and adult literature (Fong, Krantz, & Nisbett, 1986) have demonstrated that numerical education may, respectively, promote numerical comparison skills and improve statistical inference. Thus, it may be fruitful to further examine how interventions that improve numeracy may lead to improved numerical judgments. Nevertheless, despite the lack of intervention or training in our study, participants at all numeracy levels demonstrated some consideration of base rates. In sum, these data indicate 1) People do utilize base rate information when combining samples to make an inference; 2) People integrate base rates into their judgments only when sampling information indicates base rates are relevant to the inference; and 3) Numerical ability plays a role in the extent to which base rates and sample sizes are used. Moreover, we find that people are able to consider base rate and sample size information at the same time. It appears that people's ability to use base rates is not as limited as previously suspected. Acknowledgments This research was partially supported by a Summer Research Stipend awarded to the first author by the Research Center for the Humanities and Social Sciences at William Paterson University. We thank Nicole McNeil, Gretchen Chapman, and Rochel Gelman for their support, Ezana Taddese for his help with data collection, and Chris Doran for his assistance with coding. References Ajzen, I. (1977). Intuitive theories of events and the effect of base-rate information on prediction. Journal of Personality and Social Psychology, 35, 303–314. http://dx.doi.org/ 10.1037/0022-3514.35.5.303. Anderson, B. L., Obrecht, N. A., Chapman, G. B., Driscoll, D. A., & Schulkin, J. (2011). Physicians' communication of Down syndrome screen test results: The influence of physician numeracy. Genetics in Medicine, 13, 744–749. http://dx.doi.org/ 10.1097/GIM.0b013e31821a370f. Bar-Hillel, M. (1979). The role of sample size in sample evaluation. Organizational Behavior and Human Performance, 24, 245–257. http://dx.doi.org/10.1016/00305073(79)90028-X. Bar-Hillel, M. (1980). The base-rate fallacy in probability judgments. Acta Psychologica, 44, 211–233. http://dx.doi.org/10.1016/0001-6918(80)90046-3. Bar-Hillel, M., & Fischhoff, B. (1981). When do base rates affect predictions? Journal of Personality and Social Psychology, 41, 671–680. http://dx.doi.org/10.1037/00223514.41.4.671. Bernoulli, J. (1713). Ars conjectandi. Basel: Thurnisius. Edith Sylla's English translation, The Art of Conjecturing, together with Letter to a Friend on Sets in Court Tennis. Johns Hopkins University Press (Oscar Sheynin's English translation of Part IV, dated 2005, is at www.sheynin.de) Bramwell, R., West, H., & Salmon, P. (2006). Health professionals' and service users' interpretation of screening test results: Experimental study. British Medical Journal, 333, 284–289. http://dx.doi.org/10.1136/bmj.38884.663102.AE. Brase, G. L. (2008). Frequency interpretation of ambiguous statistical information facilitates Bayesian reasoning. Psychonomic Bulletin & Review, 15, 284–289. http://dx.doi.org/ 10.3758/PBR.15.2.284. Chapman, G. B., & Liu, J. (2009). Numeracy, frequency, and Bayesian reasoning. Judgment and Decision Making, 4, 34–40. Chesney, D. L., & Obrecht, N. A. (2011). Adults are sensitive to variance when making likelihood judgments. In L. Carlson, C. Hölscher, & T. Shipley (Eds.), Proceedings of the 33rd Annual Conference of the Cognitive Science Society (pp. 3134–3139). Austin, TX: Cognitive Science Society. Chesney, D. L., & Obrecht, N. A. (2012). Statistical judgments are influenced by the implied likelihood that samples represent the same population. Memory & Cognition, 40, 420–433. http://dx.doi.org/10.3758/s13421-011-0155-3. Cokely, E. T., & Kelly, C. M. (2009). Cognitive abilities and superior decision making under risk: A protocol analysis and process model evaluation. Judgment and Decision Making, 4, 20–33. Eddy, D. M. (1982). Probabilistic reasoning in clinical medicine: Problems and opportunities. In D. Kahneman, P. Slovic, & A. Tversky (Eds.), Judgment under uncertainty: Heuristics and biases (pp. 249–267). Cambridge, England: Cambridge University Press.
381
Evans, J. St. B. T., & Dusior, A. E. (1977). Proportionality and sample size as factors in intuitive statistical judgments. Acta Psychologica, 41, 129–137. http://dx.doi.org/ 10.1016/0001-6918(77)90030-0. Evans, J. St. B. T., Handley, S. H., Perham, N., Over, D. E., & Thompson, V. A. (2000). Frequency versus probability formats in statistical word problems. Cognition, 77, 197–213. http://dx.doi.org/10.1016/S0010-0277(00)00098-6. Fiedler, K., Brinkmann, B., Betsch, T., & Wild, B. (2000). A sampling approach to biases in conditional probability judgments: Beyond base rate neglect and statistical format. Journal of Experimental Psychology. General, 129, 399–418. http://dx.doi.org/ 10.1037/0096-3445.129.3.399. Fong, G. T., Krantz, D. H., & Nisbett, R. E. (1986). The effects of statistical training on thinking about everyday problems. Cognitive Psychology, 18, 253–292. http://dx.doi.org/ 10.1016/0010-0285(86)90001-0. Fouriezos, G., Rubenfeld, S., & Capstick, G. (2008). Visual statistical decisions. Perception & Psychophysics, 70, 456–464. http://dx.doi.org/10.3758/PP.70.3.456. Gigerenzer, G., Hell, W., & Blank, H. (1988). Presentation and content: The use of base rates as a continuous variable. Journal of Experimental Psychology. Human Perception and Performance, 14, 513–525. http://dx.doi.org/10.1037/0096-1523.14.3.513. Gigerenzer, G., & Hoffrage, U. (1995). How to improve Bayesian reasoning without instruction: Frequency formats. Psychological Review, 102, 684–704. http://dx.doi.org/ 10.1037/0033-295X.102.4.684. Girotto, V., & Gonzalez, M. (2001). Solving probabilistic and statistical problems: A matter of information structure and question form. Cognition, 78, 247–276. http://dx.doi.org/ 10.1016/S0010-0277(00)00133-5. Irwin, W. F., Smith, W. A. S., & Mayfield, J. F. (1956). Tests of two theories of decision in an “expanded judgment” situation. Journal of Experimental Psychology, 51, 261–268. http://dx.doi.org/10.1037/h0041911. Jacobs, J. E., & Narloch, R. H. (2001). Children's use of sample size and variability to make social inferences. Journal of Applied Developmental Psychology, 22, 311–331. http://dx.doi.org/10.1016/S0193-3973(01)00086-7. Kahneman, D., & Tversky, A. (1972). Subjective probability: A judgment of representativeness. Cognitive Psychology, 3, 430–454. http://dx.doi.org/10.1016/0010-0285(72) 90016-3. Kahneman, D., & Tversky, A. (1996). On the reality of cognitive illusions. Psychological Review, 103, 582–591. http://dx.doi.org/10.1037/0033-295X.103.3.582. Krosnick, J. A., Li, E., & Lehman, D. R. (1990). Conversational conventions, order of information acquisition, and the effect of base rates and individuating information on social judgments. Journal of Personality and Social Psychology, 59, 1140–1152. http://dx.doi.org/10.1037/0022-3514.59.6.1140. Krynski, T. R., & Tenenbaum, J. B. (2007). The role of causality in judgment under uncertainty. Journal of Experimental Psychology. General, 136, 430–450. http://dx.doi.org/ 10.1037/0096-3445.136.3.430. Kushnir, T., Xu, F., & Wellman, H. M. (2010). Young children use statistical sampling to infer the preferences of others. Psychological Science, 21, 1134–1140. http://dx.doi.org/ 10.1177/0956797610376652. Lipkus, I. M., Samsa, G., & Rimer, B. K. (2001). General performance on a numeracy scale among highly educated samples. Medical Decision Making, 21, 37–44. http://dx.doi.org/ 10.1177/0272989X0102100105. Macchi, L. (2000). Partitive formulation of information in probabilistic problems: Beyond heuristics and frequency format explanations. Organizational Behavior and Human Decision Processes, 82, 217–236. http://dx.doi.org/10.1006/obhd.2000.2895. Masnick, A. M., & Morris, B. J. (2008). Investigating the development of data evaluation: The role of data characteristics. Child Development, 79, 1032–1048. http://dx.doi.org/ 10.1111/j.1467-8624.2008.01174.x. Neace, W. P., Michaud, S., Bolling, L., Deer, K., & Zecevic, L. (2008). Frequency formats, probability formats, or problem structure? A test of the nested-sets hypothesis in an extensional reasoning task. Judgment and Decision Making, 3, 140–152. Nisbett, R. E., Krantz, D. H., Jepson, C., & Kunda, Z. (1983). The use of statistical heuristics in everyday inductive reasoning. Psychological Review, 90, 339–363. http://dx.doi.org/ 10.1037/0033-295X.90.4.339. Obrecht, N. A. (2010). Sample size weighting in probabilistic inference (Doctoral dissertation). Rutgers University, New Brunswick. Obrecht, N. A., Anderson, B., Schulkin, J., & Chapman, G. B. (2012). Retrospective frequency formats promote consistent experience-based Bayesian judgments. Applied Cognitive Psychology, 26, 436–440. http://dx.doi.org/10.1002/acp. 2816. Obrecht, N. A., Chapman, G. B., & Gelman, R. (2007). Intuitive t-tests: Lay use of statistical information. Psychonomic Bulletin & Review, 14, 1147–1152. http://dx.doi.org/ 10.3758/BF03193104. Obrecht, N. A., Chapman, G. B., & Gelman, R. (2009). An encounter frequency account of how experience affects likelihood estimation. Memory & Cognition, 37, 632–643. http://dx.doi.org/10.3758/MC.37.5.632. Obrecht, N. A., Chapman, G. B., & Suárez, M. T. (2010). Laypeople do use sample variance: The effect of embedding data in a variance-implying story. Thinking and Reasoning, 16, 26–44. http://dx.doi.org/10.1080/13546780903416775. Pennycook, G., & Thompson, V. A. (2012). Reasoning with base-rates is routine, relatively effortless and context-dependent. Psychonomic Bulletin & Review, 19, 528–534. http://dx.doi.org/10.3758/s13423-012-0249-3. Peters, E., Vastfjall, D., Slovic, P., Mertz, C. K., Mazzocco, K., & Dickert, S. (2006). Numeracy and decision making. Psychological Science, 17, 407–413. http://dx.doi.org/10.1111/j. 1467-9280.2006.01720.x. Räsänen, P., Salminen, J., Wilson, A. J., Aunio, P., & Dehaene, S. (2009). Computerassisted intervention for children with low numeracy skills. Cognitive Development, 24, 450–472. http://dx.doi.org/10.1016/j.cogdev.2009.09.003. Reyna, V. F., Nelson, W. L., Han, P. K., & Dieckmann, N. F. (2009). How numeracy influence risk comprehension and medical decision making. Psychological Bulletin, 135, 943–973. http://dx.doi.org/10.1037/a0017327.
382
N.A. Obrecht, D.L. Chesney / Acta Psychologica 142 (2013) 370–382
Rhodes, M., Brickman, D., & Gelman, S. A. (2008). Sample diversity and premise typically in inductive reasoning: Evidence for developmental change. Cognition, 108, 543–556. http://dx.doi.org/10.1016/j.cognition.2008.03.002. Schwarz, N., Strack, E., Hilton, D. J., & Naderer, G. (1991). Base rates, representativeness, and the logic of conversation: The contextual relevance of “irrelevant” information. Social Cognition, 9, 67–84. http://dx.doi.org/10.1521/soco.1991.9.1.67. Sedlmeier, P. (1998). The distribution matters: Two types of sample-size tasks. Journal of Behavioral Decision Making, 11, 281–301. http://dx.doi.org/10.1002/(SICI) 1099-0771(1998120)11:4b281::AID-BDM302>3.0.CO;2-U. Tversky, A., & Kahneman, D. (1974). Judgment under uncertainty: Heuristics and biases. Science, 185, 1124–1131. http://dx.doi.org/10.1126/science.185.4157.1124.
Tversky, A., & Kahneman, D. (1980). Causal schemas in judgments under uncertainty. In M. Fishbein (Ed.), Progress in social psychology (pp. 84–98). Hillsdale, NJ: Erlbaum. Xu, F., & Denison, S. (2009). Statistical inference and sensitivity to sampling in 11-month-old infants. Cognition, 112, 97–104. http://dx.doi.org/10.1016/j.cognition. 2009.04.006. Yamagishi, K. (2003). Facilitating normative judgments of conditional probability: Frequency or nested sets? Experimental Psychology, 50, 97–106. http://dx.doi.org/ 10.1026//1618-3169.50.2.97.