Judgments of associative memory

Judgments of associative memory

Cognitive Psychology 54 (2007) 319–353 www.elsevier.com/locate/cogpsych Judgments of associative memory William S. Maki q * Department of Psycholo...

416KB Sizes 3 Downloads 207 Views

Cognitive Psychology 54 (2007) 319–353 www.elsevier.com/locate/cogpsych

Judgments of associative memory William S. Maki

q

*

Department of Psychology, Texas Tech University, Lubbock, TX 79409-2051, USA Accepted 4 August 2006 Available online 18 September 2006

Abstract Judgments of associative memory (JAM) were indexed by ratings given to pairs of cue and response words. The normed probabilities, p(response|cue), were obtained from free association norms. The ratings were linearly related to the probabilities. The JAM functions were characterized by high intercepts (50 on a 100 point scale) and shallow slopes (<0.5). The JAM function generalized across materials and method of rating. The function was not affected by expectancies or semantic similarity. Attempts to alter the function by making alternative responses more available were unsuccessful. A computer simulation model (MINERVA 2) exhibited the linear JAM function and successfully accounted for more complex phenomena (like the joint influence of forward and backward associative strengths on ratings). The shallow JAM slope appears to result from a fundamental lack of discrimination among associative strengths. The high intercept appears to result partly from an independent post-mnemonic source of bias producing over-estimation of association.  2006 Elsevier Inc. All rights reserved. Keywords: Memory; Memory judgments; Association; Associative memory; Metacognition; MINERVA

q Experiments 1 and 2 were presented at the October 2003 meeting of ARMADILLO in College Station, TX. All the experiments and a preliminary version of the computational model were summarized in a poster presented at the meeting of the Psychonomic Society, Minneapolis, November 19, 2004. The version of the model described in this paper was presented at the meeting of the Society for Computers in Psychology, Toronto, November 10, 2005. For their assistance with the data collection, I thank Nicole Holstrom, Michael Mebane, and Amber Thompson. I am deeply indebted to Randy Engle, Ruth Maki, Cathy McEvoy, and Doug Nelson for their stimulating thoughts about ‘‘JAM’’ and its interpretation. Thanks also to Doug Nelson for sharing the rating data depicted in Fig. 9 and to Asher Koriat for permission to preview the results reported in Koriat, Fiedler, and Bjork (2006). * Fax: +1 806 742 0818. E-mail address: [email protected].

0010-0285/$ - see front matter  2006 Elsevier Inc. All rights reserved. doi:10.1016/j.cogpsych.2006.08.002

320

W.S. Maki / Cognitive Psychology 54 (2007) 319–353

1. Introduction The structure of associative memory is a topic of long-standing theoretical and empirical interest in psychology (Deese, 1965; Esper, 1973). The associative links between words are presumed to arise from our experience with the co-occurrences of those words in written and spoken language (e.g., Spence & Owens, 1990). Typically, the links have been assessed by use of the method of free association. In free association tests, people are instructed to respond to a stimulus word (‘‘cue’’) with the first response word that comes to mind. A frequency distribution accumulated over a large number of participants is the basis for computing the probability that a response word occurs given a cue. These probabilities are then interpreted as the strengths of the associations between a cue and its response words. The most current and most ambitious such undertaking is the set of free association norms compiled over many years by Nelson, McEvoy, and Schreiber (2004). Another technique for assessing word associations involves presenting both cue and response words and asking for an associative judgment (Garskof & Forrester, 1966; Haagen, 1949; Kamman, 1968; Koriat et al., 2006; Nelson, Dyrdal, & Goodmon, 2005). Garskof and Forrester (1966), for example, asked their participants to ‘‘imagine a group of 100 students and to try to estimate how many students would say the second word as their first association to the first word’’ (p. 503). The relationship between free association and associative judgments is not clear. On the face of it, the two methods should be measuring the same thing—associative strength. However, judgments of probability and frequency tend to be victimized by judgmental biases such as the availability heuristic (Tversky & Kahneman, 1973). So judgments of associative strength may produce different answers than those found with the method of free association. One study reporting such judgments was the particular motivation for the research to be reported here. Koriat (1981) provided some casual observations, in his Discussion section, of an associative judgment study. He presented pairs of words that varied considerably in normed associative strength and asked ‘‘people’’ to guess the probability that the second word would be the response to the first word in free association norms. Koriat observed that ‘‘Most people who were asked to guess these probabilities underestimated greatly the differences among the pairs’’ (Koriat, 1981, p. 597). Koriat’s brief report does not illuminate exactly how the reduced discrimination of associative strengths comes about. Three possibilities are exposed in Fig. 1. Associative judgments are represented on the vertical axis and plotted against normed strengths represented on the horizontal axis. The 45 line, with a zero intercept and unit slope, is what would be expected if observers were perfectly calibrated. Reduced discriminability would be reflected in functions with lesser slopes. The three functions depicted in Fig. 1 all have shallow slopes, but the reduced discriminability occurs for different reasons. The topmost function arises from severe overestimation of the associative strengths of weakly associated pairs. The middle function arises from overestimating strengths of weak pairs and underestimating strengths of strong pairs (regression to the mean). The bottom function arises from underestimating strengths of strongly associated pairs. The answer to the question of which of these functions might have been operating in Koriat’s study is speculative. There is a tendency toward overestimation in other domains of human judgments (Dunning, Johnson, Ehrlinger, & Kruger, 2003; Levin, Momen, Drivdahl, & Simons, 2000), but there are instances of regression toward the mean in frequency judgments

W.S. Maki / Cognitive Psychology 54 (2007) 319–353

321

100 Overestimate

90

Regression Underestimate

80

Rated strength

70 60 50 40 30 20 10 0 0

10

20

30

40

50

60

70

80

90

100

Normed associative strength

Fig. 1. Possible functions relating rated associative strength to normed associative strength. The dashed line indicates perfect calibration where the ratings exactly equal the norms. The other functions are all alternative ways for decreasing discriminability among pairs with differing normed associative strengths.

(Tversky & Kahneman, 1973, p. 225) and recently reported instances of underestimation in studies of metacomprehension (Maki, Wheeler, & Zacchilli, 2004). Koriat also suggested an explanation for the reduced discriminability of associative strengths. ‘‘Apparently, when both the stimulus and response terms are presented there is a tendency to ignore other possible responses to the stimulus word’’ (Koriat, 1981, p. 597). Here, also there are multiple interpretations of Koriat’s suggestion. One possibility is that there is some judgmental bias towards over-reacting to pairs of words that are associatively related (as in Tversky & Kahneman, 1973; see also Koriat & Bjork, 2005). Another possibility is that the presence of the response word in a to-be-judged pair of words interferes with (or inhibits) retrieval of other possible responses (Dunlosky & Nelson, 1997). The present series of experiments was undertaken for both empirical and theoretical reasons. The initial goals were to document the reduced discriminability of associative strengths during judgments of associative memory and to develop an empirical basis for choosing among the possible functions relating rated and normed associative strengths presented in Fig. 1. A third goal was to test the retrieval inhibition notion mentioned above. In Experiment 1, Koriat’s basic observation showing reduced discriminability among associative strengths was replicated and the source was found to be gross overestimation of strengths of weakly associated pairs. The overestimation generalizes across rating methods (Experiment 1) and word pairs differing in ranges and distributions of associative strengths (Experiments 2 and 3). The overestimation is not due to expectancies based on the context of many related pairs of words (Experiment 4). Nor does the overestimation appear to result from semantic relatedness (Experiment 5). The final two experiments (6 and 7) involved different manipulations that were intended to reduce the

322

W.S. Maki / Cognitive Psychology 54 (2007) 319–353

hypothesized retrieval inhibition by making alternative responses more available; both manipulations failed to reduce the observed overestimation. These results prompted consideration of a different possibility; the robust overestimation of weakly associated pairs may not be a judgmental artifact but instead may be a natural product of our memory system. In support of that view an extension of a multiple-trace model of memory (Hintzman, 1984) is developed. The model produces a function relating judged strength to free association norms that, like the observed functions, has a high intercept and shallow slope. In the closing discussion, the empirical observations and model are considered in the broader context of the literature on memory judgments. 2. Experiment 1 The aim of the first experiment was to replicate Koriat’s (1981) observations concerning the reduced discriminability of associative strengths during associative judgment tasks. In one version of the experiment, participants assigned numbers in the range 0–100 indicating the number of students, out of 100, who would give a word in response to a cue word. In the second version of the experiment, participants rated associative strengths on a 10-point scale. The two methods resulted in nearly identical results, both confirming Koriat’s observations and pinpointing the overestimation function in Fig. 1 as the source of the reduced discriminability of associative strengths. 2.1. Method 2.1.1. Participants In Experiment 1a, 58 students were recruited from the Texas Tech University Psychology Department human subject pool (13 males and 45 females); they were compensated for service by course credit. In Experiment 1b, 33 females and 18 males were recruited from the same pool. 2.1.2. Materials Forty pairs of words were selected on basis of Nelson et al. (2004) word association norms. The pairs differed in both forward and backward strengths. Forward strength (FSG) is the probability of a response word given a cue word, e.g., p(MOUSE|CAT). Backward strength (BSG) is the probability of that a different group of subjects will produce the cue word given the response word, e. g, p(CAT|MOUSE). FSGs and BSGs were categorized as high (H), medium (M), or low (L). High strengths averaged 0.70 (range: 0.60–0.82), medium strengths averaged 0.40 (range: 0.35–0.48), and low strengths averaged 0.10 (range: 0.01–0.19). Two rating forms were created. Each form was constructed to have the following combinations of forward-backward strengths: H–H, M–H, L–H, H–L, M–L, L–L, H–M, and L–M. Each form included eight H–H and L–L pairs and four of each of the other types. The pairs were printed on the rating form in a random order. The second form was created by reversing cues and responses within each pair and printed in a different random order. Pairs were selected so as to be roughly equal in word frequency for both cue words (M = 29, range: 1–216) and response words (M = 39, range: 1–320). Neither cue nor response frequencies differed among the categories of pairs, both Fs < 1. Cue set size (QSS) and target set size (TSS), however, were not controlled with the result that both

W.S. Maki / Cognitive Psychology 54 (2007) 319–353

323

QSS and TSS differed significantly among types of pairs, F (4,35) = 14.92 and F (4,35) = 7.05, respectively. Across pair types, QSS averaged 13.0 and ranged from 8.7 for H–H to 21.9 for L–L pairs. TSS averaged 14.3 and ranged from 8.1 (H–H) to 20.2 (L–L). (QSS and TSS were controlled in Experiments 2 and 3.) 2.1.3. Procedure Participants were tested in classrooms seating groups of 20–30. They first read a paragraph describing associative memory and its assessment using the free association procedure. To reinforce the written presentation, the students were given a small demonstration of the free association test in which they were instructed to write the first word that came to mind in response to the cue words LOST, OLD, and WORLD. The results of that test were revealed by a show of hands measuring the frequencies of the responses FOUND, NEW, and GLOBE. These cue-response pairs were selected so as to demonstrate large differences in normed forward strength (0.75, 0.47, and 0.18, respectively). In each testing group, the proportions of students raising their hands to the named responses roughly corresponded to those forward strengths. Each student was presented with one of the two forms containing 40 pairs of cue words and response words. The instructions, both oral and written were: ‘‘Assume 100 college students from around the nation gave responses to each CUE word. How many of these 100 students do you think would have given the RESPONSE word?’’ The students then wrote their numerical estimates beside each pair of words on the form. Experiment 1b was conducted exactly as described above except that the students marked their responses on an optical scanning form. The rating scale was partitioned into 10 categories (labeled with the digits 0–9). Each category represented a range of number of responses to the cue word: 0–9, 10–19, . . . , 90–100. 2.1.4. Data analyses Categorical ratings in Experiment 1b ranged from 0 to 9 (the values on the optical scanning form). The ratings were multiplied by 10 to create a scale comparable to that in Experiment 1a (0–100). Because category 0 represents the range 0–9, the mean of that range (4.5) was added to each rating. Associative strengths were multiplied by 100 thus converting both dependent and independent measures to 100-point scales. The significance of differences among pairs differing in forward and backward strengths was assessed by analyses of variance. In each experiment (1a and 1b), the analysis included forward strength (high, medium, low) and backward strength (high, low) as factors. (The medium backward strength condition was not included because the design was not completely balanced; medium backward strengths appeared in only two conditions.) In all of the experiments to be reported, two kinds of analyses were conducted, one using interactions with subjects as error terms and one using interactions with items. In the first kind, ratings were averaged over pairs of words within conditions; in the second kind, ratings were averaged over subjects for each pair of words. The reporting of statistical significance will focus on the analyses using subjects in the error terms. The analyses using items will be reported only when such analyses yielded a different pattern of significance. The significance level was set at p < .05 throughout. In all the analyses, the main effect of forward strength was attributable to strong linear trends that accounted for most of the variance (R2s over 0.90). Linear regressions were plotted to portray those trends. The significance of linear and quadratic trend components

324

W.S. Maki / Cognitive Psychology 54 (2007) 319–353

was assessed by computing trend coefficients using the method described by Keppel (1973, Appendix B) for the computation of orthogonal polynomial coefficients. In this experiment, the calculations of orthogonal polynomial coefficients were redundant because equal intervals separated the groups of word pairs with low, medium, and high average forward strengths (0.10, 0.40, and 0.70). But in later experiments, the intervals between strengths were not equal 2.2. Results In both Experiments 1a and 1b, mean ratings conformed to the overestimation pattern. Fig. 2 shows that most average ratings were above the calibration line and the degree of overestimation was higher for low forward strengths. Also, pairs with higher backward strengths were given higher ratings. (See Koriat & Bjork, 2005, & Koriat et al., 2006, for related findings.) Regression lines relating average ratings to average forward strengths were fit separately for each level of backward strength in both experiments. The expected fit given a perfectly calibrated set of ratings (the dashed line in Fig. 2) has a zero intercept and unit slope. In contrast, the obtained fits all have intercepts approximating 50 and slopes of less than one-half. 2.2.1. Experiment 1a The main effects of forward and backward strengths were both significant, F (2, 114) = 119.96, and F (1, 57) = 61.88, as was the interaction, F (2, 114) = 5.86. For pairs with high backward strengths, only the linear trend was reliable. For pairs with low backward strengths, both linear and quadratic trends were significant. However, in the analysis of items, the interaction of forward and backward strengths was not significant, F (2, 58) = 0.94. 2.2.2. Experiment 1b The same pattern was observed using the 10-point category rating method. Both main effects of forward and backward strengths were significant, F (2, 100) = 142.30, and F (1, 50) = 67.96; the interaction was significant as well, F (2, 100) = 12.54. The linear trend was significant for both high and low backward strengths, but the quadratic trend was reliable only for low backward strength pairs. Again like Experiment 1a, the interaction of forward and backward strengths was not significant when using interactions with items as error terms, F (2, 58) = 2.22. 2.2.3. Free association data In all the experiments to be reported, participants responded to three words during the introduction of the free association method during the instructions at the start of the experimental session. These data, while limited to three cue words, allow for a check on the extent to which our participant pool behaves normatively where normative is defined by the Nelson et al. (2004) free association norms. In each of the experiments, the proportions of participants responding with the critical words mentioned in the instructions were close approximations to the normed probabilities. For example, in this experiment (using data from all 109 participants), the proportion of participants responding to the cue word LOST with FOUND was 0.734. The proportion responding to the cue word OLD with NEW was

W.S. Maki / Cognitive Psychology 54 (2007) 319–353

a 100

JAM = 51.93 + 0.469 FSG 2 R = 0.984 JAM = 44.19 + 0.481 FSG 2 R =1

90 80 JAM (rated strength)

325

70 60 50 JAM = 44.90 + 0.388 FSG 2 R = 0.953

40 30 20

High Medium

10

Low

0 0

10

20

30

40

50

60

70

80

90

100

FSG (normed associative strength)

b 100 JAM = 54.66 + 0.430 FSG 2 R = 0.995

90

JAM = 52.28 + 0.398 FSG 2 R =1

JAM (rated strength)

80 70 60 50

JAM = 51.23 + 0.277 FSG 2 R = 0.912

40 30

High

20

Medium

10

Low

0 0

10

20

30

40

50

60

70

80

90

100

FSG (normed associative strength) Fig. 2. Functions relating rated strength to normed strength in Experiment 1a (a) and 1b (b). Pairs differed in backwards strengths (high, medium, or low) as well as in normed forward strengths. Participants rated cueresponse pairs in Experiment 1a on a 100-point scale, so for the perfectly calibrated participant the expected intercept is zero. Participants rated pairs on a 10-point scale (0–9) in Experiment 1b so the expected intercept is the midpoint of the lowest category (4.5 on a 100 point scale).

0.495. The proportion responding to the cue word WORLD with GLOBE was 0.128. The corresponding probabilities from the association norms are 0.747, 0.473, and 0.182. The frequencies of these responses vs. other responses were compared between the Nelson et al., norms and our data with v2 tests; none of the comparisons were significant indicating that our proportions are reasonable estimates of those reported by Nelson et al.

326

W.S. Maki / Cognitive Psychology 54 (2007) 319–353

2.3. Discussion The results from this experiment support the following conclusions. First, Koriat’s report of reduced discriminability of associative strengths revealed in a judgment task is replicable. Second, the reduced discriminability is related to overestimation of weakly related pairs of words. Third, this overestimation generalizes across the specifics of the rating task. And, fourth, the associative judgments are influenced jointly by forward and backward strength. This latter conclusion needs some qualification. The interaction between forward and backward strengths was significant when the error terms in the analyses were based on subjects but not when the analyses were based on items. Use of a small number of items (40 pairs) may have reduced the sensitivity of the item analyses. The observed statistical power for the interaction with subjects in the error term was 0.866 and 0.996 for Experiments 1a and 1b, respectively; the corresponding observed power for the interaction with items in the error term was 0.206 and 0.436. The interaction of forward and backward strengths will be one of the constraints needing to be satisfied by any model relating associative judgments to free association. In the remainder of the paper the judgments of associative memory task is nicknamed JAM, and the function relating rated and normed associative strengths is called the JAM function 3. Experiment 2 The second experiment was meant to examine the extent to which the JAM function obtained in Experiment 1 might generalize across materials. The question was whether the JAM function would be found with different pairs of words distributed over a smaller range of associative values with word characteristics (such as QSS and TSS) better controlled. 3.1. Method The experiment was conducted exactly like Experiment 1A but with new participants and a different set of materials. Participants (12 males and 34 females) were recruited from the same human subject pool; they assigned numbers (0–100) to pairs of words obtained from the appendix in Nelson, McEvoy, and Pointer (2003). The cue-response pairs varied in forward strength; one third of the pairs averaged 0.40, one-third averaged 0.17, and the remaining third averaged 0.07. Backwards strength was minimized, averaging 0.05 across all pairs. Within each level of forward strength, the pairs were further classified by a combination of resonance (high or low) and connectivity (high or low). (Resonance is a measure of connection strengths between a response word and its associates. Connectivity is a measure of connection strengths among the associates of the response word in each pair.) Across combinations of forward strength, resonance, and connectivity, pairs were equated on other variables such as word frequency and QSS and TSS (see Nelson et al., 2003, for details). The pairs of words were split into two lists, 72 in each. Within each list, each of six randomized blocks contained one instance of each combination of strength, resonance, and connectivity. Two forms were prepared with different random orders within blocks.

W.S. Maki / Cognitive Psychology 54 (2007) 319–353

327

100 JAM = 47.70 + 0.367 FSG 2 R = 0.926

90

JAM (rated strength)

80 70 60 50 40 30 20 10 0 0

10

20

30

40

50

60

70

80

90

100

FSG (normed associative strength) Fig. 3. Function relating rated strength to normed strength in Experiment 2. Ratings were on a 100-point scale.

3.2. Results and discussion The pattern of ratings was very similar to that observed in Experiment 1; rated strengths were substantially overestimated. Fig. 3 shows average ratings as a function of forward strength (averaged across all other classifications). The linear fit has parameters similar to those obtained in Experiment 1: an intercept of nearly 50 and a slope of 0.38, a value much less than unity. The main effect of forward strength was significant, F (2, 90) = 88.66; both linear and quadratic components of trend were significant. The replication of the (mostly linear) JAM function with the Nelson et al. (2003) word pairs shows that the function does not depend on specific word pairs or the range of associative values used. In this experiment the average associative strength of the word pairs in the lowest and highest strength categories were 0.07 and 0.40; in Experiment 1, the corresponding averages were 0.10 and 0.70. In both experiments, word frequency was controlled and set sizes were controlled in Experiment 2. Thus, it does not seem likely that the JAM function is a result of such stimulus characteristics. 4. Experiment 3 In the preceding experiments, the normed strength continuum was treated categorically. For example, in Experiment 1, forward strength categories were centered at averages of 0.1, 0.4, and 0.7. One might then ask whether associative judgments are sensitive to the distribution of strengths. In this experiment, a new sample consisted of 90 pairs of words that were more nearly continuous in their forward strength values.

328

W.S. Maki / Cognitive Psychology 54 (2007) 319–353

4.1. Method Ten pairs of words were selected from each of nine forward strength bands (0.0–0.1, 0.1–0.2, . . . > 0.8) with both cue and response words in the frequency range 1–500. For each of two forms, 10 blocks of pairs were created. For each block, one pair was randomly selected from each strength band and pairs were randomly ordered within blocks. Otherwise, procedures were identical to those used in Experiment 1b with the ratings done on 10-point scales with responses marked on optical scanning forms. Participants were recruited from the same subject pool as in previous experiments (7 males and 13 females). 4.2. Results and discussion Because the normed strengths were chosen so as to be more nearly continuous than in the previous (and following) experiments, regression analyses were performed on the ratings. For each of the 90 pairs, ratings were averaged over subjects. Several predictor variables were obtained for each pair. Forward, backward, and mediated strengths were obtained from the Nelson et al. (2004) norms as were log transforms of cue and target frequencies and the set size values (QSS and TSS). In addition, the semantic distance separating words in each pair was computed using the method described by Maki, McKinley, and Thompson (2004). Semantic distances were obtained from the electronic dictionary, WordNet (Fellbaum, 1998), using procedures described by Maki et al. (2004). The distance measure developed by Jiang and Conrath (1997) was computed from WordNet (version 1.7.1) using the computational package, WordNet-Similarity-0.05 (Patwardhan & Pedersen, 2003). Although the measure of semantic distance is computational in nature, Maki et al. (2004) showed that human ratings of semantic similarity were influenced by the distance measure. Thus this measure is a valid means for selecting word pairs for their semantic relatedness. Lastly, a measure of similarity was computed based on cue and target co-occurrence in text (the cosine from latent semantic analyses, LSA; see Landauer & Dumais, 1997). The mean rating for each pair is plotted against normed (forward) strength in Fig. 4. The most important thing to note about Fig. 4 is the close correspondence of the linear regression line to the results reported in the preceding experiments; the slope is shallow (0.3) and the intercept is again high (50). A hierarchical regression analysis showed that forward and backward strengths and their interaction accounted for significant variance in ratings, R2 = 0.613, F (3,86) = 45.40. The regression coefficients for the forward and backwards strengths were b = 0.64 and 0.79, respectively, both ps < .05. For the interaction, b = 0.58, p = 0.08. The remaining variables (entered second) only accounted for an insignificant 3.3% of the variance in ratings, Fchange(7,79) = 1.04. This pattern resulted from the fact that JAM was highly correlated with forward strength, r = .743, but only weakly related to semantic distance, r = .192. But JAM was also correlated with LSA (0.529) and LSA was in turn correlated with forward strength (0.560). A second hierarchical regression analysis began by examining the effect of the two semantic variables (distance and LSA). The semantic variables accounted for significant variance when first entered, R2 = 0.308, F (2,87) = 19.32. When the associative variables and their interaction were entered next, the variance explained increased significantly, R2 = 0.614, Fchange (3,84) = 22.19, but the regression coefficients for semantic distance and LSA were reduced

W.S. Maki / Cognitive Psychology 54 (2007) 319–353

329

100 90

JAM (rated strength)

80 70 60 50 40 30 JAM = 51.53 + 0.334 FSG 2 R = 0.553

20 10 0 0

10

20

30

40

50

60

70

80

90

100

FSG (normed associative strength) Fig. 4. Scatterplot of data obtained in Experiment 3. Each point represents the average rating (0–9) given a pair of words.

to nonsignificant values, 0.031 and 0.039. The remaining variables accounted for an insignificant 3.2% of the variance in JAM, Fchange (5,79) = 1.43. Like Experiment 1, normed forward and backward strengths accounted for much of the variance in ratings; also as in Experiment 1, there was a tendency for ratings to be influenced by the interaction of forward and backward strengths. The results support the conclusion in Experiment 2 that word frequency and set size are not factors in associative judgments. Moreover, the semantic variables, distance and LSA, did not contribute to associative judgments beyond what was already accounted for by the associative strength measures; this matter is examined further in Experiment 5. 5. Experiment 4 Even though the JAM function does not depend on the range and distribution of associative values, it still may result from a context in which all word pairs to be judged are associatively related. Judgments may then become inflated perhaps because of expectancies generated by the associative context. To evaluate the role of such expectancies, the percentage of associatively related pairs was greatly reduced; only 40% of the pairs were associatively related. If expectancies were responsible for the reduced discriminability observed in the previous experiments, then a lower relatedness proportion should steepen the slope and lower the intercept of the function relating rated association to normed association. 5.1. Method New participants (12 female and 14 male) were recruited from the same human subjects pool. All of the pairs of words used in Experiment 1 other than those with low

330

W.S. Maki / Cognitive Psychology 54 (2007) 319–353

backward strengths were used to form unrelated pairs. The cue and response words in those pairs were randomly joined with the constraint that the original pairings were not permitted. The resulting pairs were checked against the association norms; none of the unrelated pairs were found (meaning that all unrelated pairs had estimated strengths of zero). The related pairs had average forward strengths of 0.104, 0.391, and 0.689 for low-, medium-, and high-strength pairs respectively. The result was that each rating form contained 16 associated pairs (8 high strength, 4 medium strength, 4 low strength) and 24 unassociated pairs. Otherwise, the experiment was conducted exactly like Experiment 1b; the participants rated the pairs of words on the 10-point categorical scale. 5.2. Results and discussion For the related pairs, mean rated strength was once again linearly related to mean normed strength and, like in the other experiments, the pattern was one of severe overestimation. The mean ratings plotted in Fig. 5 are fit by a linear function with an intercept of 59 and a slope of 0.2 (R2 > 0.99). The main effect of forward strength on ratings was significant, F (2, 50) = 14.87, and only the linear component was reliable. The average rating given the unrelated pairs is represented by the black diamond on the ordinate in Fig. 5. Its 95% confidence limits (10.1 and 17.1) indicate that while the rating is significantly above 0 (and above the midpoint of the lowest category, 4.5), it is significantly below the intercept of the linear function for the related pairs. Both the results for the judgments of related and unrelated pairs suggest that expectancies do not play a role in our associative judgments. The linear JAM function shown in Fig. 5 resembles those observed in the preceding experiments. Whatever bias might be present in the judgments of unrelated pairs is insufficient to explain the degree of overestimation observed for the related pairs. 100 JAM = 59.24 + 0.205 FSG 2 R = 0.998

90

JAM (rated strength)

80 70 60 50 40 30 20 10 0 0

10

20

30

40

50

60

70

80

90

100

FSG (normed associative strength)

Fig. 5. Function relating rated strength to normed strength in Experiment 4. Ratings were on a 10-point scale. The diamond on the vertical axis is the average rating given associatively unrelated pairs.

W.S. Maki / Cognitive Psychology 54 (2007) 319–353

331

6. Experiment 5 Pairs of words that are associatively related are also often semantically related. The repairing manipulation in Experiment 4 might have destroyed both associative and semantic relations between word pairs. But the remaining associatively related pairs might also have been related semantically. Although the correlational analyses in Experiment 3 suggest that the JAM function is not due to semantic relatedness, it seemed prudent to provide experimental evidence on this matter. In this experiment, the influence of semantic relatedness on associative judgments was examined. Word pairs were selected to be high, medium, or low in associative strength and also to be either high or low in semantic similarity. 6.1. Method A new group of participants, 17 males and 21 females, were recruited from the same human subjects pool. Three of the participants were excluded from data analyses because of errors on the optical scanning forms (leaving data for 35 participants). A new set of 54 word pairs was culled from the association norms (Nelson et al., 2004). Selection of high, medium, and low strength pairs based on forward strength values was constrained by semantic distance measurements. The pairs used in this experiment were selected so as to have very high or very low semantic distance values (computed using the method of Maki et al., 2004). Of the qualifying pairs, nine were selected in each of the six combinations of forward strength and semantic distance values. The means and ranges for each of the six combinations are given in Table 1. To check on their independence, a 2 · 3 analysis of variance was performed on each of the measures. For forward strength, only the strength main effect was significant; for semantic distance, only the distance main effect was reliable. Expressed as effect sizes (partial g2), these main effects were very large (both > 0.95) and the other effects and interactions were very small (all < .04). In all respects not mentioned here, the experiment was procedurally identical to Experiment 1b. Participants rated the associative strength of the pairs of words using the 10-category scale. 6.2. Results and discussion The comparison of interest is the effect of semantic similarity on the linear function relating rated associative strength to normed associative strength. Two such functions Table 1 Means and ranges for association strengths and semantic distances for word pairs in Experiment 5 Strength

Distance

Low

Medium

High

Low

FSG JCN

0.13 (0.11–0.18) 21.9 (20.2–24.5)

0.37 (0.35–0.44) 21.5 (20.2–24.5)

0.71 (0.60–0.89) 21.7 (20.2–23.3)

High

FSG JCN

0.15 (0.13–0.18) 1.7 (0–3.5)

0.40 (0.35–0.45) 2.0 (0–4.1)

0.69 (0.61–0.80) 1.9 (0–4.6)

Note. Forward strength is abbreviated FSG and semantic distance is abbreviated JCN.

332

W.S. Maki / Cognitive Psychology 54 (2007) 319–353

100 JAM = 51.69 + 0.281 FSG 2 R = 0.971

90

JAM (rated strength)

80

JAM = 51.34 + 0.259 FSG 2 R = 0.999

70 60 50 40 30 20

Lo-Sim

10

Hi-Sim

0 0

10

20

30

40

50

60

70

80

90

100

FSG (normed associative strength) Fig. 6. Functions relating rated strength to normed strength in Experiment 5. The two functions are based on sets of word pairs that differ in semantic similarity between stimulus and response words in each pair (Lo-Sim vs HiSim). Ratings were on a 10-point scale.

are plotted in Fig. 6, one for low semantic similarity (high distance values) and one for high semantic similarity (low distance values). The functions are nearly indistinguishable having almost identical intercepts (about 51) and slopes (about 0.25). These values are within the range of those observed in the previous experiments. Thus semantic similarity appears to play little if any role in associative judgments once the effects of associative strength are factored out (a conclusion also reached by Garskof & Forrester, 1966). In this experiment, semantic similarity had no detectable impact on the JAM function. A 2 · 3 analysis of variance showed a significant main effect of forward strength, F (2, 68) = 87.00, but no effect of semantic distance, F (1, 34) = 2.19, and no interaction, F (2, 68) = 1.04. Most of the strength effect was due to the significant linear trend. The quadratic component was significant, but the departure from linearity was in the opposite direction to that observed in previous experiments. The results of this experiment, combined with those of Experiment 3, provide no evidence whatsoever for a contribution of semantic distance to the JAM function. The effects of semantic distance either by itself or interacting with associative strength were numerically tiny and not statistically significant. The JAM functions for high and low semantic distance shown in Fig. 6 are nearly identical and both are well within the parameter ranges obtained in our previous experiments. These results also complement earlier reports of the independence of semantic and associative judgments (Garskof & Forrester, 1966; Maki et al., 2004). 7. Experiment 6 The preceding experiments establish the JAM function as a highly replicable phenomenon. It was shown to transcend method of rating, specific stimulus materials, range and

W.S. Maki / Cognitive Psychology 54 (2007) 319–353

333

distribution of associative values, associative context defined by relatedness proportion, and semantic similarity. The next two experiments were intended to evaluate Koriat’s (1981) interpretation of the inaccurate judgments of associations. Koriat suggested that the reduced discriminability in associative judgments was caused by a tendency to ignore alternative responses when faced with a specific stimulus–response pair. It may be that alternative responses are not so much ignored as they are inaccessible. Having a response word physically present on a rating sheet may establish a local context making retrieval of alternative responses less likely. For example, what is the first word that comes to mind in response to ELM? Then consider the pair MOWER-GRASS. How many people, out of 100, are likely to give the word GRASS in response to MOWER?1 GRASS is not the dominant associate of MOWER, but its presence here in print may make it difficult to think of a stronger associate. The retrieval inhibition hypothesis was tested in the present experiment. The rating procedure was altered in an attempt to make alternate responses more available. Participants were shown each cue word alone and instructed to think about possible responses. Then the to-be-rated response word was revealed. This manipulation should have made the rating task more like a free association task and thus should have brought ratings into line with free association norms. 7.1. Method Another group of participants were recruited from our human subjects pool (28 females and 26 males). The same materials and procedures were employed as in Experiment 5 except for one critical difference. The pairs to be rated were presented singly using an overhead transparency projector. The cue word was presented for 10 s and the participants were instructed to think about possible response words. Then the response word to be rated was revealed and presented until each of the participants had rated that pair. 7.2. Results and discussion The important observation shown in Fig. 7 is the substantial overestimation and reduced discriminability that occurred in spite of the availability manipulation. The intercepts of the JAM functions are within the range of those from the previous experiments. Differences due to the semantic similarity variable appeared more pronounced in this experiment. The 2 · 3 analysis of variance indicated that the main effect of forward strength was reliable, F (2, 106) = 343.84 but that the main effect of semantic distance was not, F < 1. Although the interaction was significant, using the error term based on subjects, F (2, 106) = 25.11, the interaction based on items was not, F (2, 48) = 1.43. Only the linear component of trend was significant. In this experiment participants were asked to contemplate alternative responses for each cue word before revealing the response word to be rated. Thus when the cue word

1

TREE is the most common response to ELM (p = .62). LAWN is the most common response to MOWER (p = .66); GRASS is far less frequently given (p = .22).

334

W.S. Maki / Cognitive Psychology 54 (2007) 319–353

100 JAM = 43.28 + 0.499 FSG 2 R = 0.999

90

JAM (rated strength)

80 70 60 50 40

JAM = 49.05 + 0.349 FSG 2 R = 0.997

30 20

Lo-Sim

10

Hi-Sim

0 0

10

20

30

40

50

60

70

80

90

100

FSG (normed associative strength) Fig. 7. Functions relating rated strength to normed strength in Experiment 6. The two functions are based on sets of word pairs that differ in semantic similarity between stimulus and response words in each pair (Lo-Sim vs HiSim). Each stimulus word was presented prior to revealing the response word so that the participants could think of alternative responses. Ratings were on a 10-point scale.

was shown, the task resembled that of the free association test. Nevertheless, the JAM function was quite close to those previously observed. 8. Experiment 7 Although the participants in Experiment 6 were instructed to think about possible responses for a cue word, there is in fact no guarantee that they did so. Consequently a different method was devised for making alternative responses available. In this experiment, each cue word was accompanied in print by four of its normed response words. One of the response words was selected as the one to be rated. Of course there is no guarantee that the participants read the words, but at least the alternatives appeared visibly and in close proximity to the word being rated. 8.1. Method 8.1.1. Participants For Experiment 7a, 41 new participants (16 males and 25 females), were recruited from the human subjects pool. Another 34 participants (25 males and 9 females) were recruited for Experiment 7b. Three were dropped from Experiment 7a and two were dropped from Experiment 7b due to omissions on the optical scoring sheets. 8.1.2. Procedure Forty-eight cue words were selected from the Nelson et al. (2004) norms. Each of the cue words was constrained to have four response words as associates. From strongest

W.S. Maki / Cognitive Psychology 54 (2007) 319–353

335

to weakest, the four response words averaged 0.379 (range: 0.311–0.503), 0.226 (range: 0.186–0.283), 0.110 (range: 0.091–0.150), and 0.047 (range: 0.036–0.061). In Experiment 7a, each rating form contained 48 rows, one for each cue word, and each row contained seven words: the cue word, its four response words (the ‘‘context’’) arranged in random order, a copy of the cue word, and, in the rightmost column, a copy of one of the response words. The participants were instructed to consider each cue word and its four context words and to imagine how likely those words would be as responses to the cue word in a free association test. The participants were told to assign a rating to the rightmost cue-response pair after considering the context. Four different rating forms were created by randomly ordering the rows. Each of the four response words for each cue word was rated in the experiment. The to-be-rated words were assigned randomly to different forms with the constraint that 12 words from each of the four levels of associative strength appeared on each form. In Experiment 7b, four new forms were created by eliminating the leftmost cue-word and its four context words from each row, leaving only the cue-response pair that was to be rated. Experiment 7b was conducted after Experiment 7a was completed. In other respects, these experiments were conducted like Experiment 1b; participants rated the associative strength of the critical pairs of words using the 10-category scale. 8.2. Results and discussion Fig. 8 shows that the JAM function was not greatly altered by the context manipulation. The intercepts of the functions relating rated strengths to normed strengths, approximately 43 and 46, are well within the range of those previously observed. 100 JAM = 50.25 + 0.460 FSG 2 R = 0.995

90

JAM (rated strength)

80 70 60 50 40

JAM = 42.77 + 0.568 FSG 2 R = 0.926

30 20

Context

10

No context

0 0

10

20

30

40

50

60

70

80

90

100

FSG (normed associative strength) Fig. 8. Functions relating rated strength to normed strength in Experiment 7. Context was provided in Experiment 7a by display of four normed responses for each stimulus word; one of the words was selected to be rated. The four-word context was eliminated in Experiment 7b leaving only the stimulus–response pairs to be rated. Ratings were on a 10-point scale.

336

W.S. Maki / Cognitive Psychology 54 (2007) 319–353

A mixed 4 · 2 analysis of variance included associative strength as the within-subjects factor and presence vs. absence of context words (Experiment 7a vs. Experiment 7b) as the between-groups factor. Both main effects of strength and context were significant, F (3, 204) = 98.62, and F (1, 68) = 4.34, respectively. However, the effect of context was relatively weak as shown by the partial g2 values (0.06 for the context variable vs. 0.59 for the strength variable). The interaction of associative strength and context was not reliable, F (3, 204) = 1.38, and only the linear component of trend was significant. If the retrieval hypothesis were correct, making response words more available should have increased the discriminability between pairs of associatively related words. This increased discriminability would take the form of an increased slope and decreased intercept. However, in both Experiments 6 and 7 the functions relating rated and normed strengths approximated those in the preceding experiments. Thus the retrieval inhibition hypothesis was not supported. Perhaps, then, alternative response words in these rating tasks are accessible and not ignored (cf. Koriat, 1981). Some mechanism other than retrieval inhibition may be at work that produces inaccurate associative judgments. 9. General discussion The research reported here on judgments of associative memory was motivated by Koriat’s (1981) observation that people underestimate differences in associative strengths when judging the associative relations between pairs of words. Such reduced discriminability could be caused by tendencies to overestimate strengths, underestimate strengths, or both (cf. Fig. 1). Experiment 1 showed that people greatly overestimate the frequency of responses given by other people in free association tests, especially so for those responses that are relatively infrequent. The linear function relating judgments of associative memory (JAM) to normed associative strengths was characterized by a high intercept and shallow slope. This ‘‘JAM function’’ proved to be quite robust, generalizing across methods (Experiment 1), materials (Experiments 2 and 3), and range and distribution of associative strengths (Experiments 2 and 3). In Experiment 4, the proportion of associatively related pairs was greatly reduced, and in Experiment 5, semantic as well as associative relatedness was varied. The JAM function resisted these manipulations also. The JAM function found in all the experiments was not due to differences in word frequency. Frequencies of the words used in Experiment 1 did not significantly vary across the different associative strength conditions. The words used in Experiment 2 were selected by Nelson et al. (2003) so as to equate frequency across associative conditions. In Experiment 3, frequencies of cue and target word were controlled statistically in regression analyses. Yet, in all three experiments similar JAM functions were observed. Neither can the JAM function be attributed to variations in cue or target set sizes. Both QSS and TSS were not controlled in Experiment 1 and (inadvertently) confounded with forward strength. However, both QSS and TSS were controlled in Experiment 2 and statistically controlled in Experiment 3. Thus all these characteristics of words that influence recall in memory experiments have little or nothing to do with the associative judgments studied here. In the final two experiments, one interpretation of Koriat’s (1981) explanation for the reduced discriminability of associative strength was tested—the availability heuristic (Tversky & Kahneman, 1973). Availability is defined as ‘‘the ease with which instances come to mind’’ (Kahneman and Tversky, 1996, p. 582). More precisely, ‘‘a person could

W.S. Maki / Cognitive Psychology 54 (2007) 319–353

337

estimate . . . the likelihood of an event, or frequency of co-occurrences by assessing the ease with which the relevant mental operation of retrieval, construction, or association can be carried out’’ (Tversky & Kahneman, 1973, p. 208). A refinement of the notion of availability was proposed in which judgments are inflated because the specific context provided by the printed response word inhibits the retrieval of other associates of the stimulus word. The test of this retrieval inhibition hypothesis involved making alternative responses available either implicitly (Experiment 6) or explicitly (Experiment 7) in an attempt to reduce the overestimation and increase discriminability among associations. Participants were given several seconds in Experiment 6 to contemplate the associates of each stimulus word prior to viewing the response word to be rated. In Experiment 7, four associates of each stimulus word were printed on the rating form in order to increase the availability of alternative response words. Neither manipulation substantially altered the JAM function. These results suggest that the presence of the response word in a rating task appears not to reduce the availability of other responses. Nevertheless, as Koriat (1981) suggested, associates other than the immediate response word might be ignored even if available. That possibility needs to be the focus of future research. 9.1. On the generality of the JAM function In addition to the present experiments, two other programs of research in different laboratories confirm the generality of the JAM function. All three investigations were conducted independently of each other and simultaneously. Nelson et al. (2005) reported the results of an associative rating experiment. Their different participants, materials, and procedures present an opportunity to check on the generality of the results reported here. Nelson et al., presented a sample of participants at a different university (South Florida) somewhat different instructions and used a different rating scale. The participants in their study were told to rate pairs of words on a 7-point scale with no numeric anchors; a 10-point scale with a numerical range specified for each category was used in the present experiments. Nelson et al., instructed their participants to rate the pairs based on either association or ‘‘relatedness’’; instructions in the present experiments focused on the associative connection between words as measured by frequency of response production in free association tests. Of the 1,016 word pairs used in the Nelson et al., ratings, 945 appeared in the Maki et al. (2004) version of the association norms. The average ratings for these 945 pairs were adjusted to fall along the 100-point scale used in our experiments. A scatterplot of the rating for each pair plotted against the normed (forward) strength is presented in Fig. 9. The linear function resembles those found in our experiments in two important respects: it has a shallow slope and a high intercept. The slope is somewhat shallower than in the present experiments and the intercept is somewhat higher. Koriat et al. (2006) have performed several associative judgment studies in Israel in which associative strength (using Hebrew association norms) was manipulated. Their reported means were plotted as a JAM function. Averaged over eight experiments, their JAM function has an intercept of 55.49 and a slope of 0.50. The slope is higher than those obtained in the experiments reported here and by Nelson et al. (2005). Interestingly, they also found the discontinuity for unrelated pairs (cf. Experiment 4 in this report). Whether the quantitative differences among the three sets of results are important remains to be seen. The important point is that the JAM function appears to have

338

W.S. Maki / Cognitive Psychology 54 (2007) 319–353

100

JAM (rated associative strength)

90 80 70 60 50 40 30 20 JAM = 59.73 + 0.189 FSG 2 R = 0.066

10 0 0

10

20

30

40

50

60

70

80

90

100

FSG (normed associative strength) Fig. 9. Plot of data from Nelson et al. (2004). The original ratings were based on a 7-point scale. The average ratings were transformed to a 100-point scale for the sake of consistency with the other figures.

considerable generality—generality across materials, methods, subject population, and geography. 9.2. A model of associative judgments and free association Discounting availability (via retrieval inhibition) eliminates just one possible cause of the inaccurate associative judgments. It still may be, as Koriat originally suggested, that people fall prey to a judgmental bias by simply ignoring potential response words other than the one present on the rating form. Further, it may be argued that overestimation is present in many judgmental domains (e.g., Dunning et al., 2003; Levin et al., 2000). However, attributing the JAM function to a general judgmental bias in the absence of some unifying theory is unsatisfying (cf. Gigerenzer, 1996). So before capitulating to an explanation of the observed JAM function phrased purely in terms of judgmental bias or metacognitive processes (Koriat et al., 2006), an explanation was sought in the memory system itself. There are a handful of empirical constraints that guided the selection of a memory model. First, the model should be able to produce responses given a cue in a free association task and the probabilities of the responses should match those in the association norms (Nelson et al., 2004). Second, the model should be able to place numerical judgments on pairs of words, and the computed JAM values should match the empirical JAM function; the computed JAM function should be linear with a high intercept and shallow slope. Third, the computed JAM function should be sensitive to backwards associations (as in Experiment 1). Fourth, the computed JAM values for unrelated pairs should be quite low resulting in a discontinuity between associated and unrelated pairs (as in Experiment 4).

W.S. Maki / Cognitive Psychology 54 (2007) 319–353

339

The selection of a model was further guided by the application of variants of a computer simulation model, MINERVA 2 (Hintzman, 1984), to judgment and decision making phenomena-likelihood judgments (Dougherty, Gettys, & Ogden, 1999), judgmental biases (Fiedler, 1996), and illusory correlations (Smith, 1991). As will be shown in what follows, MINERVA 2 can be adapted to satisfy the constraints listed above. In the remainder of this discussion the standard features of the MINERVA class of models will be described along with the features of the decision-making version MINERVA-DM (Dougherty et al., 1999), that are important for present purposes. Then a further extension will be described and used to simulate free association and the empirical JAM function. MINERVA-JAM (M-JAM for short) is a modified version of MINERVA 2, a computer simulation created by Hintzman (1984, 1986, 1988) to model memory phenomena in the areas of frequency judgments, recognition memory, schema-abstraction, and paired associate learning. MINERVA2 assumes that long-term memory (LTM) consists of a large store of individual traces representing past experiences. Each repetition of an experience results in the storage (to some degree imperfectly) of another memory trace. Each trace is represented by a vector of lexical and semantic features and portions of an experience are represented within the trace by subvectors. Reading about ELM and TREE, for example, results in the storage of a vector with subvectors representing ELM, TREE and other (contextual) information. In MINERVA, ‘‘associative strength’’ is supplanted by multiple copies of traces. Information is retrieved from MINERVA’s long-term memory by the presentation of a probe. The probe, also represented as a vector, is simultaneously matched against all traces. The results of the parallel matching process are combined into a single echo. The degree to which the probe is similar to a trace determines the activation of that trace and activation summed over all traces determines one property of the echo, its intensity. The second property of an echo is its contents, also represented as a vector. The contents correspond to the sum of all memory traces weighted by their respective activation levels. The echo contents can be used to obtain cued recall; if only part of a vector is presented as a probe, the null vector elements are filled in as part of the echo content with their values determined by traces similar to the non-null elements of the probe vector. MINERVA 2 was applied to judgments of events whose frequencies were generated experimentally (e.g., Hintzman, 1988). Extra-experimental traces were ignored because of the assumption of strong contextual effects in making frequency judgments about experiences within the experimental session. Dougherty et al. (1999) took a similar approach in their application of MINERVA to decision making (DM). They created hypothesis and data subvectors for each trace. Traces were retrieved by probes containing only the data subvectors. If a trace was sufficiently similar to the probe on the basis of the data subvector, then it was also evaluated for similarity with respect to the hypothesis subvector of the probe. M-JAM makes a similar assumption. When a cue word is presented either during free association or during rating tasks, only the subvector of the probe corresponding to the cue is used to activate traces. That is, in M-JAM the only traces that enter into the probe matching process are usually just those that contain the subvector representing the cue word. In effect, the cue word acts

2

As in Dougherty et al., the bare term MINERVA refers to features shared by all the versions of the model— MINERVA 2, MINERVA-DM, and MINERVA-JAM.

340

W.S. Maki / Cognitive Psychology 54 (2007) 319–353

as a context (Hintzman, 1988) or data vector (Dougherty et al., 1999) in limiting the traces entering into the computation of the echo. M-JAM further limits the traces that are compared to the probe. LTM contains multiple copies of traces representing the repeated co-occurrences of cue and response words. For even one cue word, the frequency of such traces could be very large. M-JAM places a limit on the number of traces that can be matched against a probe. The limit is assumed to be larger than the number of traces stored in a typical experimental session but much smaller than the total number of traces in LTM containing a given cue word. In the simulations that follow, the selection of traces for matching is assumed to be a random process. However, the selection process could be quite systematic but driven by factors as yet unexplored. For example, the sampling for any probe could be influenced by preceding experience in the experimental session. The selection process could be further motivated theoretically. We could assume that the matching process, even if conducted in parallel, consumes some limited resource. Thus operating on a relatively small sample of traces respects the limited capacity of the system. However rationalized, the result is the matching of each probe against a random sample of traces containing the cue word. The sampling assumption is one way in which M-JAM differs from its predecessors. (See Fiedler, 2000, for related ideas about the use of samples in human judgments.) Because the contents of the echo obtained from a sample is based on all the traces in the sample, the response subvector in the echo most probably will not match exactly any known response word. In MINERVA 2, the echo content was treated as a probe and recursively matched against the memory traces. In M-JAM, a measure of similarity (the cosine of two vectors) is computed for each candidate response word and the response word most similar to the response subvector in the echo is selected for output. Thus both models cope with the ambiguous recall problem (Hintzman, 1984). The sampling assumption, combined with the response matching process, lets M-JAM produce responses during free association tests. Most of the time in a free association task, the sample of traces compared to the probe will contain high-frequency cue-response pairs. But some samples will contain more low-frequency pairs than high-frequency pairs. The result will be a distribution of responses that reflects the differing frequencies of their co-occurrence with the cue. This sampling process, and the resulting variability in response selection, mimics the model of free association proposed by Nelson, McEvoy, and Dennis (2000). In that model, the associative strength between a cue and response is represented as a distribution of strengths reflecting variability among both people and words. When presented with a cue word, a sample strength value is drawn from the distribution for each response and the response with the largest strength is produced. The variable strengths in Nelson et al. (2000) correspond to the random sampling in M-JAM and the selection of the highest strength in Nelson et al. (2000) corresponds to the choice of the response most similar to the probe in M-JAM. But M-JAM goes further and is able model associative judgments. Hintzman (1988) simulated experimentally manipulated frequency and showed that the echo intensity in MINERVA 2 was a linear function of presentation frequency. M-JAM closely follows MINERVA 2 in producing judgments of response frequency. Echo intensity is a function of the frequency of traces similar to the probe. In M-JAM the echo intensity is computed from the traces contained in the (random) sample for each cue word for each simulated subject. The echo intensity scale is then mapped onto the range of possible responses (Hintzman, 1988, p. 532).

W.S. Maki / Cognitive Psychology 54 (2007) 319–353

341

As reasoned above, M-JAM is capable of generating two dependent measures from the same set of traces. Probing a random sample of the traces appropriate for a given cue word produces either a response with some probability or a frequency judgment. Assuming that Hintzman’s (1988) observation generalizes to judgments of frequencies in long-term memory, M-JAM should produce a linear relation such as that shown in the experiments reported here. Moreover, the function should have a high intercept and shallow slope; that prediction follows from consideration of the similarity relations between the probe and the traces in the sample. Each trace in the sample contains a copy of the cue subvector and, as a result, the probe and each trace will contain many common elements thus inflating the echo intensity. However, the more elements shared by different traces, the more difficult will be the task of discriminating among them. Hence, the MINERVA models should produce JAM functions with high intercepts and shallow slopes. The workings of M-JAM also appear able to reproduce the effects of backwards associations (Experiment 1) and the low judgments of unrelated pairs of words (Experiment 4). In the MINERVA models, any collection of features can serve as a retrieval cue. For example, the subvector of features representing the context of an experimental session serves to retrieve only those traces created in the experiment (Hintzman, 1988). Thus the subvector representing a response word could act as a retrieval cue. So assume that each word in a presented cue-response pair functions as a retrieval cue. The cue word retrieves a set of its responses, one of which is the presented response word, and the response word retrieves a set of its cues, one of which is the presented cue word. This means that the cue-response pair will be doubly represented in the sample. The multiplication effect will be especially strong for pairs with high backwards strength. For pairs with no associative relation, the vectors retrieved by the cue word will bear little or no resemblance to the presented response word and a low echo intensity will result. The foregoing was a conceptual account of how the MINVERA class of models (and M-JAM specifically) might deal with the several key observations about JAM. But the intuitive predictions need to be put to the test. The following computer simulations evaluate the predictions made by MINERVA-JAM. 9.2.1. M-JAM simulations Table 2 contains a summary of the parameters and the values they were assigned in the simulations. 9.2.1.1. General simulation method. In MINERVA, experiences are encoded as vectors. Each vector consists of a string of elements with each element representing the presence (+1) or absence (1) of a feature. A learning rate parameter, L, determines the fidelity of the stored trace. If L < 1, then the stored copy of the trace will be imperfect; each element of the trace is then stored as 0 with a probability of 1  L. In the simulations of M-JAM, L was applied to retrieved vectors in the sample (because it does not matter in the simulation whether the learning rule is applied at storage or retrieval). In previous simulations, long-term memory was created by generating large numbers of traces from which to sample (e. g., Dougherty et al., 1999). A computationally more efficient method was used in the M-JAM simulations. The cue word’s set of responses was represented as a probability distribution reflecting the frequency distribution in the environment. A power function (Anderson & Milson, 1989; Anderson & Schooler, 1991) was used to generate the simulated frequencies for each response word i, f (i) = id,

342

W.S. Maki / Cognitive Psychology 54 (2007) 319–353

Table 2 Parameter settings used in MINERVA 2 simulations Parameter

Values

Notes

Vector length Attention

29, 29 1.0, 0.86

L Sample size A

0.90 50 0.318

B

0.12

C D

0.14 0.17, 0.61, 0.895

NR

40

Numbers of features in cue and response vectors, respectively Probability of attending to the cue word. When less than 1.0, the response word is a retrieval cue for its associated cues with p = 1.0—attention Learning rate. Traces in sample are set to 0 with p = 1  L Number of traces retrieved to the sample space Intercept parameter for power function determining feature overlap between a cue and its targets Slope parameter for power function determining feature overlap between a cue and its targets Proportion of features common to a cue and all its targets Exponent for power function used to compute environmental cooccurrence distributions. Values shown correspond to dominant associative probabilities of 0.099, 0.403, and 0.700 Maximum number of associates for each cue

i = 1 . . . NR, where NR represents the number of response words co-occurring with the cue word. (Arbitrarily, NR = 40 in the simulations.) The f (i) values were accumulated and normalized resulting in a cumulative relative frequency distribution used for the random samples. The rate parameter, d, was varied so as to produce free association probabilities closely approximating those in the actual experiments. Each trace selected for inclusion in the sample was composed of cue and response sub-vectors. For each cue, all the cue vectors were the same and each context element was randomly determined. The response vectors shared common elements with each other and with the cue vector; the proportion of shared elements for each response vector was approximately 0.14. The remaining elements in the response vectors were changed as a function of distance from the cue resulting in response vectors progressively less similar to the dominant response as a function of associative rank order, Ra. The probability of reversing an eligible feature in each response vector was a power function of associative rank, p ¼ :318 R0:12 a . The empirical justification for the choice of this power function (and for the proportion of shared elements) is given in the Appendix A. The remaining computations are the same as those described for MINERVA 2 (Hintzman, 1984, 1986, 1988). Assuming vectors containing F features, the similarity of the probe P to any trace Ti in the sample is given by ! F X Si ¼ P j T ij =N ð1Þ j¼1

The divisor, N, is the number of ‘‘relevant’’ features, that is, features that have nonzero values in both probe and trace vectors. In MINERVA 2, the activation value is the cube of the similarity value. Ai ¼ S 3i

ð2Þ

The sign of S is preserved but traces more similar to the probe are weighted more heavily in the echo by this transformation.

W.S. Maki / Cognitive Psychology 54 (2007) 319–353

343

Echo intensity, I, is the sum of the activation values over all M traces in the sample of traces selected for comparison to the probe. I¼

M X

Ai

ð3Þ

i¼1

The range of values of the intensity of the echo is determined by the sample size. If every trace in the sample is an exact duplicate of the probe, then each activation value will be +1.0 and then I = M. But if all the traces are uncorrelated with the probe, the expected value of each similarity score is zero so the expected value of I is also zero. Because the sample size is a potential variable, the echo intensity was corrected for sample size, 0 6I/M 6 1. The content of the echo is the sum of all the traces in the sample weighted by each trace’s activation value. The value of each vector element j in the content vector is Cj ¼

M X

Ai T ij

ð4Þ

i¼1

For response production during free association tasks, the elements of the probe vector corresponding to the response word were set to zero. The corresponding elements of the echo’s content vector were compared to each response vector declared in the simulation. The cosine was computed as a measure of similarity between each response vector and the response sub-vector of the content. The response with the highest cosine was the one produced in the simulated free association task. (In the case of ties, the winner was determined randomly.) Three sets of simulation runs will be described. The first set compared the free association distributions produced by M-JAM to those in the association norms (Nelson et al., 2004). The second group of simulations focused on the effects of backwards associations observed in Experiment 1. The third set to be reported focused on the differences between ratings of associated and unrelated pairs found in Experiment 4. 9.2.1.2. Simulations of free association. M-JAM, like other MINERVA models, performs cued recall and hence can also perform free association tasks. As previously described, presenting a partial vector containing just the features corresponding to the cue word resulted in the filling in of the response features with the echo contents. The response vector most closely resembling those contents was then chosen from the set of response vectors for that cue for output. The exponent of the power function that generated the co-occurrence frequency (or strength) distribution, d, was varied in order to discover values of d that would produce distributions with associative strengths of the dominant associate roughly equivalent to those used in Experiment 1: 0.1, 0.4, and 0.7. The probability of a response vector for each associative rank was then obtained from 1000 such simulations. The entire procedure was repeated 30 times and the probabilities at each associative rank were averaged across the 30 replications. The values of d that produced FSGs closest to those in Experiment 1 are shown in Table 2. For comparison purposes, cue words were extracted from the Nelson et al. (2004) norms such that the forward strength (FSG) of the dominant associate fell within one of three ranges, high strength (0.675–0.725), medium strength (0.375–0.425), or low strength (0.075–0.125). The FSGs were then averaged across cue words at each associative rank within each of the three ranges.

344

W.S. Maki / Cognitive Psychology 54 (2007) 319–353

The distributions obtained from the simulations and from the norms are displayed in Fig. 10 for high-, medium-, and low-strength dominant associates. Only the values for the first seven associative ranks are shown. Each distribution shown in Fig. 10 was fit quite well by a power function, R2s > 0.95. The power functions for the simulated values are shown in the figure. Moreover, in each case, the simulated and normed values were quite close, R2s > 0.97. Thus, on average, free associations produced by M-JAM closely track those in the Nelson et al. (2004) norms. 9.2.1.3. Simulation of Experiment 1. The second simulation was intended to explore how the effects of backward associations might be handled within M-JAM. It was shown in Experiment 1b (Fig. 2b) that backward associations increased judged associative strength. In a recent studies of judgments of learning and associative judgments, Koriat and Bjork (2005) and Koriat et al. (2006) also showed that backwards associations increased judgments of word pairs. The question explored here is whether M-JAM would be sensitive to backwards associations. One way that backwards associations could influence judgments is if the response word of a pair sometimes functioned as a retrieval cue. Thus, two cue sets were included in each simulation. Set #1 was generated by the cue vector in the probe; each trace contained copies of the probe cue vector and the responses generated by the methods described in the Appendix A. Set #2 was generated by the response vector in the probe; each trace contained copies of the probe response vector and the cues were generated by the methods described in the Appendix A with one exception. The feature similarities between the cue and response and between the response and the cue are commutative. Thus, the probe cue vector was inserted in Set #2 as the cue that was most similar to the probe response vector. The parameters (values of d) discovered for free association were used to generate the frequency distributions used in this simulation. The three forward strengths (low, medium, high) were combined factorially with two values of backward strength (low, high). At the outset, the influence of the backwards retrieval process was not known so an attention parameter was explored. The attention parameter weighted the frequency distributions

Fig. 10. Mean forward strengths (FSG) for simulated and observed free associations. See text for details.

W.S. Maki / Cognitive Psychology 54 (2007) 319–353

345

of Sets #1 and #2. If attention (to the cue word) were complete (p = 1.0) then the influence of the backwards retrieval was nil and Set #2 traces never appeared in the sample. But if attention ‘‘wandered’’ to the response word, then the complement of the attention parameter contributed to the appearance of Set #2 traces in the sample. Values of attention were stepped from 0.80 to 0.88 in 0.01 increments. An attention value of 0.86 (meaning a weighting of 0.14 on the backwards association) was found to produce the best (scaled) fit to the data. For each combination of forward and backwards strengths, echo intensities were averaged over 100 simulation runs. Intensities were in turn averaged over 60 such simulated participants. The results of the simulations are shown Fig. 11 in which the echo intensity (simulated JAM) is plotted as a function of FSG (simulated free association). As in the previous simulation, the functions are strongly linear with nonzero intercepts. Moreover, the slope of the function for high backwards strengths is much larger than the slope of the function for low backwards strengths. Thus, M-JAM successfully captures another empirical result–the interactive effects of forward and backwards strengths on judgments of association. 9.2.1.4. Simulation of Experiment 4. What is the response of M-JAM to unrelated cueresponse pairs? In Experiment 4, pairs varying in forward strength but with low backwards strengths were presented along with unrelated pairs (pairs that were completely unassociated). This simulation borrowed the low backwards strength results from the simulation of Experiment 1 and added one additional unrelated probe. The cue vector for that probe was paired with a response vector composed of random elements. The generation of feature elements was performed exactly as described in for the simulation of Experiment 1, and the attention parameter was maintained at 0.86.

40 35

JAM (simulated)

30 25 20 15 10

JAM = 22.43 + 0.218 FSG 2 R > 0.999

5

JAM = 24.30 + 0.105 FSG 2 R = 0.990

High BSG Low BSG Unrelated

0 0

10

20

30

40 50 60 70 FSG (simulated)

80

90

100

Fig. 11. Results of MINERVA-JAM simulations for Experiments 1b and 4.

346

W.S. Maki / Cognitive Psychology 54 (2007) 319–353

The results of this simulation based on 60 simulated participants are also shown Fig. 11. The important point to note is the one on the ordinate (where the free association probability is zero). The value of the echo intensity is part way between the intercept of the JAM function for the related pairs and zero thus qualitatively reproducing the empirical observations shown in Fig. 5. 9.2.1.5. Scaling simulated results to empirical data. Both the empirical and simulated JAM functions are linear, so a simple linear function seemed like a reasonable choice for scaling the simulation results to fit the empirical data. The function relating the simulated results (M-JAM echo intensity) to the empirical results (JAM ratings) is shown in Fig. 12. Inspection of the scatter plot indicated that the data point from the unrelated pairs in Experiment 4 was an outlier so only the data points obtained from associated pairs were used to compute the regression line. The relation is reasonably linear and has an intercept near zero. That regression line was used to transform the M-JAM echo intensity values obtained in the previous simulations into estimated JAM ratings. The results of this transformation are shown in Figs. 13 and 14. Fig. 13 displays the obtained and simulated JAM values for the backwards association data (Experiment 1b) together with the regression lines for the scaled, simulated values. The scaled simulation values are very close to those observed (root mean square error = 2.7). Fig. 14 displays the results for Experiment 4 (comparing the associated and unrelated pairs). The scaled simulation values for the associated pairs are very close to the corresponding observed points (root mean square error = 2.2). However, the scaled value for the unrelated pairs (on the ordinate) is 24.8 points higher than the value obtained in 100 90

Observed ratings (JAM)

80 70 60 50 JAM = 3.52 + 2.162 M 2 R = 0.917

40 30

Exp 1 (high BSG)

20

Exp 1 (low BSG) Exp 4 (associated)

10

Exp 4 (unrelated)

0 0

10

20

30 40 50 Simulated ratings (M)

60

70

80

Fig. 12. Relationship between observed ratings (Experiments 1b and 4) and values obtained from MINERVA-JAM simulations. The regression was based on the average ratings for the associated pairs in Experiments 1b and 4.

W.S. Maki / Cognitive Psychology 54 (2007) 319–353

347

100 JAM = 52.48 + 0.461 FSG 2 R = 0.999

JAM (observed and simulated)

90 80 70 60 50

JAM = 56.09 + 0.231 FSG 2 R = 0.990

40 30

JAM (High BSG)

20

JAM (Low BSG) M2 (High BSG)

10

M2 (Low BSG)

0 0

10

20

30

40

50

60

70

80

90

100

FSG (normed associative strength)

Fig. 13. MINERVA-JAM simulation results (open symbols) fitted to data from Experiment 1b (filled symbols). Values from the simulations were scaled using the relationship in Fig. 12.

100 JAM = 56.08 + 0.232 FSG 2 R = 0.990

JAM (observed and simulated)

90 80 70 60 50 40 30

JAM (Associated) JAM (Unrelated) M2 (Associated) M2 (Unrelated) De-biased

20 10 0 0

10

20 30 40 50 60 70 80 FSG (normed associative strength)

90

100

Fig. 14. MINERVA-JAM simulation results (open symbols) fitted to data from Experiment 4 (filled symbols). Values from the simulations were scaled using the relationship in Fig. 12. The de-biased points (open triangles) were adjusted downwards by an amount equal to the difference between simulated and observed ratings of unrelated pairs (diamonds on the ordinate).

348

W.S. Maki / Cognitive Psychology 54 (2007) 319–353

Experiment 4 (also shown on the ordinate). One interpretation of this discrepancy is that the model is egregiously wrong. But another possibility is that the model works fine but that the discrepancy shown in Fig. 14 results from the choice of values used to do the scaling. The scaling is applied after the mnemonic computations of M-JAM. In other words, the model just does not account for post-mnemonic judgment biases (as, for example, in Koriat et al., 2006). On this account, the predictions of the model need to be de-biased. One way to do that is to adjust the scaled simulation values by subtracting 24.8 points from each of the simulation values, in effect using the observed rating for the unrelated pairs to rescale the simulated values. The resulting function, for the associated pairs, is also plotted in Fig. 14. The de-biased version of the model, then, makes the claim that the true state of affairs is the middle function shown in Fig. 1. Associative strengths of low strength pairs are overestimated and associative strengths of high-strength pairs are underestimated. But some additional bias contributes an independent, additive amount to the observed judgments. 10. Concluding observations At the most general level, the computer simulations reported here show that MINERVAJAM does a credible job of describing one of the principal experimental results: associative ratings are linearly related to free association probabilities with the linear function having a shallow slope and high intercept. However, MINERVA-JAM failed to express the full extent of human overestimation of associative strengths. Another source of judgmental bias seems to be required to bring simulated values into line with those observed. The present theoretical analysis is agnostic with respect to the source of that bias. The bias may be related to the ‘‘illusions of competence’’ seen in studies of metacognition where it has been argued that ‘‘participants tend to perceive a relationship even between words that are unrelated according to word-association norms’’ (Koriat & Bjork, 2005, p. 189; see also Koriat et al., 2006). On this account, participants would over-estimate associations for related pairs and would even to some extent over-estimate associations for unrelated pairs. MINERVA-JAM is an associative model and was here limited to a narrow application to associative judgments and free association. The MINERVA class of models, however, has been successfully applied in other domains. Hintzman (1984, 1988) used MINERVA 2 to model episodic memory phenomena such as paired associate learning and frequency judgments. Dougherty et al. (1999) used MINERVA-DM to model judgment and decision making, and Smith (1991) used MINERVA 2 to model illusory correlations. Most recently, Kwantes (2005) applied MINERVA-style computations to extract semantic information from a large corpus of text. Kwantes reported that the model, like LSA (Landauer & Dumais, 1997), supports semantic inferences via indirect links among words even in the absence of first-order co-occurrences. In spite of its limitations, MINERVA-JAM has the advantage of producing both free association values (Fig. 10) and the linear JAM function (Fig. 11) from a common set of mechanisms. A theory concerned just with free association (as in Nelson et al., 2000) needs some supplemental means of producing associative judgments. A theory concerned only with judgments (as in Koriat & Bjork, 2005) needs some supplemental means of producing free associations. MINERVA-JAM provides a solution to both problems. The present experiments were motivated by Koriat’s (1981) observations about discriminations among associative strengths. But, more generally, judgments of associative mem-

W.S. Maki / Cognitive Psychology 54 (2007) 319–353

349

ory fall in the domain of metacognition and more specifically metamemory–what people know about their own memory systems. Previously studied metamemory judgments have taken a variety of forms. Ease of learning (EOL) judgments are made on items prior to learning. Judgments of knowing (JOK) and judgments of learning (JOL) predict future performance and are made after items have been studied (and perhaps learned to some criterion). Feeling of knowing (FOK) judgments are made on cues for items not successful remembered. All these measures may address different aspects of memory functioning (Leonesio & Nelson, 1990). Judgments of associative memory (JAM), which are made on cues for items experienced often and, presumably, well learned over long periods of time, now joins this list. The concluding remark is a disclaimer. First, MINERVA-JAM is not claimed to be the model of memory that is best suited to explain the observed JAM function. The only claim is that MINERVA-JAM is a model that provides a reasonably good qualitative and quantitative fit to the data (albeit when augmented with a source of judgmental bias). It would be surprising if other models that share representational and/or processing characteristics of MINERVA could not be coaxed into making similar predictions. Although the background for the simulations is rooted in the MINERVA class of models, it is possible to view the processes responsible for the JAM functions in different terms. The probability distribution adopted as a computational convenience in the simulations could be reinterpreted in terms of associative strength (cf. Nelson et al., 2000). The process of matching probes to traces in random samples could be recast as an information gathering process like a random walk (Ratcliff, 1978) that terminates after some criterial amount of evidence has been accumulated. Having other models predict the high intercept and shallow slope of the JAM function would strengthen the general claim. The inaccuracy of associative judgments arises at least in part from the natural operation of our memory system when interrogated in different ways. Appendix A The development of large-scale databases has made it possible to study the relationship between associative and semantic relations among words. Associative norms have been developed by Nelson et al. (2004) and semantic feature norms have been developed by McRae, Cree, Seidenberg, and McNorgan (2005; see also Cree and McRae, 2003). Intuitively, it seems that strongly associated pairs of words, more so than weakly associated pairs, should be semantically similar and share semantic features. However, the quantitative relation between associative strength and semantic similarity was not known at the beginning of the work on JAM and the MINERVA model. That relationship needs to be determined both for the sake of inherent interest and as a prescription for selecting vector representations in models such as MINERVA. Two related lines of research are described below. First, the associative and feature norms were cross-checked to discover word pairs in common. These common word pairs were then used to correlate feature overlap scores with associative rank order, Ra. Second, that empirical relation was modeled using vector representations of cues and their responses in which the semantic similarity between cues and responses become progressively less as a function of Ra. The result of the simulation work was a power function with three parameters that relates semantic feature overlap to Ra.

350

W.S. Maki / Cognitive Psychology 54 (2007) 319–353

Data Only those cue-response and response-response pairs that appeared in both the Nelson et al., associative norms and the McRae et al., semantic feature norms were included in the analyses. A search for matches between the two sets of norms resulted in 1,095 cue–target pairs (distributed over 360 cues) and 712 target–target pairs. For each cue word in the Nelson et al. norms, response words were rank-ordered according to forward strength from highest to lowest. Subsets of associates of some cue words in the associative norms are tied with respect to their forward strength. To break these ties, the responses in each set of tied values for each cue word were randomly ordered prior to assigning ranks. Then the cue-response pair was matched against the pairs in the semantic feature norms to retrieve the cosine. These cue-response cosines were averaged for each associative rank. Separately, for each cue word, each pair of adjacently ranked responses was also matched against the semantic feature norms. The resulting responseresponse cosines were also averaged for each associative rank. This entire procedure was repeated 100 times and cosines were averaged over the 100 replications. Fig. A1 shows the average cosines as a function of associative rank for both cueresponse and response-response data. Overall, both cue-response and response-response similarities declined as a function of associative rank (i.e., declined with increasing associative distance from the cue). Cue-response similarities were higher than were

0.5 Cue-response (model)

0.45

Response-response (model) Cue-response (data)

0.4

Response-response (data)

0.35

Cosine

0.3

-0.278

y = 0.4567x 2 R = 0.9674

0.25 0.2 0.15 0.1

-0.1932

y = 0.2345x 2 R = 0.9862

0.05 0 0

5

10

15

20

25

Associative rank

Fig. A1. Empirical and simulated relations between feature overlap and associative rank. The cue-response functions are based on cosines between a cue word and its associated response words. The response-response functions are based on cosines between each adjacent pair of associates. The best fitting power functions are shown for the simulations. The power function fits for the data were y = 0.445 x.266 and y = 0.209 x0.161 for the Cue-response and Target-target data respectively. Ranks beyond 17 were not used in fitting the Response– response function.

W.S. Maki / Cognitive Psychology 54 (2007) 319–353

351

response-response similarities. There was a pronounced curvature in the cue-response data and a power function provided a good fit to the data (also shown in Fig. A1). The similarities for the response-response pairs declined sharply after rank 17 and were not used in computing the response-response power function. Simulation A cue word was represented as a vector of features in which +1 represented the presence of a feature and -1 represented the absence of a feature. Word vectors contained an equal number of +1 and 1 elements. Response vectors were similarly represented as vectors of +1 and 1 elements. Each response vector contained a proportion, c, of ‘‘common’’ or ‘‘shared’’ elements copied from the cue vector. The remaining response vector elements were copied from the cue vector with identical or reversed signs. The probabilities of reversing an element were determined by power functions of associative rank, p ¼ a Rba . Each simulation consisted of 10,000 simulated sets of responses. Vectors consisted of 100 elements. Within each simulation, both cue-response and response–response cosines were computed for each response vector. The best-fitting parameter values for a, b, and c were discovered through a manual grid search of the parameter space. The criterion for a good fit was a nearly identical match of the slopes and intercepts for the empirical and simulated cue–response regression lines. Note that no special effort was made to fit the response–response regression lines. In all the simulations, the cue-response and response-response similarities (cos) to associative rank (Ra) were fit well by power functions, R2 > 0.88. In all cases it was possible to find a set of parameters that resulted in a close to identical match between the empirical and simulated cue-response regression lines, every R2 @ 1.00. However, the simulated response–response regression lines most closely approximated the empirical results with one set of parameters, a = 0.318, b = 0.12, and c = 0.14. The simulated data points for these parameters are shown in Fig. A1 together with the best-fitting power functions. Both functions are nearly identical to those obtained from the data. A caveat Use of just one function relating feature-based similarity and associative rank order is certainly too simple for both practical and conceptual reasons. A single function restricts the cue sets studied to those that fall along that function. A similarity score dictates the rank order and vice versa. However, feature similarity between a cue and its responses might depend on the particular cue, so a given similarity score might be found at different associative ranks for different cues. The problem is that there are an insufficient number of cases common to both associative and semantic norms; a larger body of feature norms is needed. Nevertheless, in the absence of that data and a more thorough analysis, the present results provided a principled means of creating feature vectors. References Anderson, J. R., & Milson, R. (1989). Human memory: an adaptive perspective. Psychological Review, 96, 703–719.

352

W.S. Maki / Cognitive Psychology 54 (2007) 319–353

Anderson, J. R., & Schooler, L. J. (1991). Reflections of the environment in memory. Psychological Science, 2, 396–408. Cree, G. S., & McRae, K. (2003). Analyzing the factors underlying the structure and computation of the meaning of Chipmunk, Cherry, Chisel, and Cello (and many other such concrete nouns). Journal of Experimental Psychology: General, 132, 163–201. Deese, J. (1965). The structure of associations in language and thought. Baltimore: The Johns Hopkins Press. Dougherty, M. R. P., Gettys, C. F., & Ogden, E. E. (1999). MINERVA-DM: a memory processes model for judgments of likelihood. Psychological Review, 106, 180–209. Dunlosky, J., & Nelson, T. O. (1997). Similarity between cue for judgments of learning (JOL) and the cue for test is not the primary determinant of JOL accuracy. Journal of Memory and Language, 36, 34–49. Dunning, D., Johnson, K., Ehrlinger, J., & Kruger, J. (2003). Why people fail to recognize their own incompetence. Current Directions in Psychological Science, 12, 83–87. Esper, E. A. (1973). Analogy and association in linguistics and psychology. Athens, GA: University of Georgia Press. Fellbaum, C. (Ed.). (1998). WordNet: An electronic lexical database. Cambridge, MA: The MIT Press, . Fiedler, K. (1996). Explaining and simulating judgment biases as an aggregation phenomenon in probabilistic, multiple-cue environments. Psychological Review, 103, 193–214. Fiedler, K. (2000). Beware of samples! A cognitive-ecological sampling approach to judgment biases. Psychological Review, 107, 659–676. Garskof, B. E., & Forrester, W. (1966). The relationships between judged similarity, judged association, and normative association. Psychonomic Science, 6, 504. Gigerenzer, G. (1996). On narrow norms and vague heuristics: a reply to Kahneman and Tversky (1996). Psychological Review, 103, 592–596. Haagen, C. H. (1949). Synonymity, vividness, familiarity, and association value ratings of 400 pairs of common adjectives. The Journal of Psychology, 27, 453–463. Hintzman, D. L. (1984). MINERVA 2: a simulation model of human memory. Behavior Research Methods, Instruments, and Computers, 16, 96–101. Hintzman, D. L. (1986). ‘‘Schema abstraction’’ in a multiple-trace memory model. Psychological Review, 93, 411–428. Hintzman, D. L. (1988). Judgments of frequency and recognition memory in a multiple-trace memory model. Psychological Review, 95, 528–551. Jiang, J. J., & Conrath, D. W. (1997). Semantic similarity based on corpus statistics and lexical taxonomy. In Proceedings of International Conference Research on Computational Linguistics (ROCLING X), Taiwan. Kahneman, D., & Tversky, A. (1996). On the reality of cognitive illusions. Psychological Review, 103, 582–591. Kamman, R. (1968). A study of the properties of associative ratings and the role of association in word-word learning. Journal of Experimental Psychology Monograph, 78(Part 2), 1–16. Keppel, G. (1973). Design and analysis: A researcher’s handbook. Englewood Cliffs, NJ: Prentice-Hall. Koriat, A. (1981). Semantic facilitation in lexical decision as a function of prime-target association. Memory & Cognition, 9, 587–598. Koriat, A., & Bjork, R. A. (2005). Illusions of competence in monitoring one’s own knowledge during study. Journal of Experimental Psychology: Learning, Memory, and Cognition, 31, 187–194. Koriat, A., Fiedler, K., & Bjork, R. A. (2006). The inflation of conditional predictions. Journal of Experimental Psychology: General, 135, 429–447. Kwantes, P. J. (2005). Using context to build semantics. Psychonomic Bulletin & Review, 12, 703–710. Landauer, T. K., & Dumais, S. T. (1997). A solution to Plato’s problem: the latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review, 104, 211–240. Leonesio, R. J., & Nelson, T. O. (1990). Do different metamemory judgments tap the same underlying aspects of memory? Journal of Experimental Psychology: Learning, Memory, and Cognition, 16, 464–470. Levin, D. T., Momen, N., Drivdahl, S. B., & Simons, D. J. (2000). Change blindness blindness: the metacognitive error of overestimating change-detection ability. Visual Cognition, 7, 397–412. Maki, R. H., Wheeler, A. E., & Zacchilli, T. L. (2004). When are the unskilled unaware? Paper presented at the meeting of the Psychonomic Society, November 19, Minneapolis, MN. Maki, W. S., McKinley, L. N., & Thompson, A. G. (2004). Semantic distance norms computed from an electronic dictionary (WordNet). Behavior Research Methods, Instruments, and Computers, 36, 421–431.

W.S. Maki / Cognitive Psychology 54 (2007) 319–353

353

McRae, K., Cree, G. S., Seidenberg, M. S., & McNorgan, C. (2005). Semantic feature production norms for a large set of living and nonliving things. Behavior Research Methods, Instruments, and Computers, 37, 547–559. Nelson, D. L., Dyrdal, G. M., & Goodmon, L. B. (2005). What is preexisting strength? Predicting free association, similarity ratings, and cued recall probabilities. Psychonomic Bulletin & Review, 12, 711–719. Nelson, D. L., McEvoy, C. L., & Dennis, S. (2000). What is free association and what does it measure? Memory & Cognition, 28, 887–899. Nelson, D. L., McEvoy, C. L., & Pointer, L. (2003). Spreading activation or spooky action at a distance? Journal of Experimental Psychology: Learning, Memory, and Cognition, 29, 42–52. Nelson, D. L., McEvoy, C. L., & Schreiber, T. A. (2004). The University of South Florida free association, rhyme, and word fragment norms. Behavior Research Methods, Instruments, and Computers, 36, 402–407. Patwardhan, S., & Pedersen, T. (2003), WordNet::Similarity. . Ratcliff, R. (1978). A theory of memory retrieval. Psychological Review, 85, 59–108. Smith, E. R. (1991). Illusory correlation in a simulated exemplar-based memory. Journal of Experimental Social Psychology, 27, 107–123. Spence, D. P., & Owens, K. C. (1990). Lexical co-occurrence and association strength. Journal of Psycholinguistic Research, 19, 317–330. Tversky, A., & Kahneman, D. (1973). Availability: a heuristic for judging frequency and probability. Cognitive Psychology, 5, 207–232.