Cog&&e De~elo~e~~,
12, 5.57469 (1996)
A Fuzzy Trace Analysis of Categorical Inferences and lnstantial Associations as a Function of Retention Interval Melvin H. Marx Bruce B. Henderson Western Carolina University Two experiments on children’s inferences and associative memory provided a test of predictions from fuzzy-trace theory. Specifically, it was expected that gist-based false recognitions would increase with age and that false recognitions would be uncorrelated with verbatim memory. In Experiment 1, children in Grades 1 through 5 heard lists of category labels, clustered instances from categories, and individual instances. On an immediate test, children indicated whether or not they had previously heard a series of individually presented test words. This list consisted of old words, new words, or words that were categorically or semantically related to the studied word clusters. Children made more false recognition errors for instances than for categories. Verbatim memory and inferences were unrelated. In Experiment 2, the effect of a test delay on categorical inferences and associated instances was examined with children in Grades 1 to 6. With delay, false recognition of associated instances declined for children at all grade levels. In contrast, categorical inferences increased with delay for older children. Verbatim memory and inferences were uncorrelated under immediate and l-day delay conditions, but there were some low but significant correlations across grades under the T-day delay. The results of the two experiments are interpreted as supporting fuzzy-trace theory.
Inferences are ubiquitous in children’s thinking. They occur during reading text, listening to speakers, speaking to listeners, reasoning about premises, and remembering what has happened. It may not be overstating the case to argue that inferences are an essential part of almost any kind of higher order thinking or communication. The experiments reported here were designed to examine the generation of categorical inferences and associative memory across the elementary school years. A categorical i~~re~ce is defined here as the generation of a new category label based on the study of a number of instances from the category. We thank Dr. J. Jacobs, Principal, and Mr. Bob Bond, Principal, and the teachers and students at Highlands Elementary School and St. Edwards Lower School, Vero Beach, FL, for their assistance in carrying out this research. ‘IIanks are also due Valerie Reyna and Charles Brainerd for their critical reading of earlier drafts of this report. Correspondence and requests for reprints should be sent to Bmx Henderson, Department of Psychology, Western Carolina University, Cullowhee, NC 28723. E-mail: . 551
552
Marx and Henderson
The occurrence of categorical inferences in the experiments reported here was measured using an analogue of traditional false recognition procedures. Such procedures have been used in studies of “constructive memory” for sentences in adults (Bransford & Franks, 1971) and children (Paris & Carter, 1973; Paris & Lindauer, 1977). These studies have shown that adults and children make inferences that go beyond the information that is actually provided and fill in gaps or elaborate on the information. The interpretation of these findings, with roots going back to Bartlett’s classic studies (Bartlett, 1932), have commonly centered on the presumption that children form gist-based schemas or scripts from general knowledge to help organize material to be remembered. Fram the cont~ctivist perspective, the use of general knowledge and implications included in new information to draw inferences can have both positive and negative influences on the accuracy of memory. On one hand, the generation of inferences based on the use of schemas, scripts, or clues may improve the ability to recall the gist of newly learned material and the ability to fill in gaps to facilitate communication. On the other hand, the drawing of inferences may lead to “intrusions” of various sorts on verbatim memory. For example, Brown, Smiley, Day, Townsend, and Lawton (1977) found that children who heard a story about Eskimos were more likely to mistakenly recall a sentence indicating that the weather was bad than were children who heard the same story about desert Indians. Other examples of the distorting effect of inferences include ones associated with gender-role stereotypes (Carter & Levy, 1988) and better liked and less liked animal characters (Marx & Henderson, 1993). A new approach to the interpretation of inferences comes from developments in fuzzy-trace theory (Brainerd & Gordon, 1994; Reyna & Kiernan, 1994). Fuzzy-trace theory was developed originally to explain the counterintuitive empirical finding that being able to remember the premises in a reasoning task and being able to successfully reason are independent and develop independently (see Brainerd & Reyna, 1993, and Reyna & Brainerd, 1990, for reviews). Reyna and Brainerd (Reyna, 1992, Reyna & Brainerd, 1992) argued that such independence has its source in the multiple traces formed in the memory process. They suggest that memories contain two different kinds of traces: verbatim traces and gist traces. Verbatim traces reflect memory for surface details of physical occurrence (Brainerd, Reyna, & Kneer, 1995) and tend to decay rapidly, probably because of susceptibility to interference (Brainerd & Gordon, 1994). Gist traces, on the other hand, reflect memory for senses and meanings and tend to last longer. In fuzzytrace theory, it is assumed that the cognitive system is intuitionist rather than computational or logical and has a bias to operate at the vaguest (fuzziest) level on a hierarchy of gist, possibly as a means for conserving resources (Reyna, 1992).
Development of Inferences
553
An important aspect of fuzzy-trace theory is that it has developmental implications (Reyna, 1992; Reyna & Brainerd, 1990). Fuzzy-trace theory predicts a developmental shift from verbatim to gist in remembering input forms (Reyna, 1995). It is assumed that early in development, the memory system puts a premium on verbatim representations in encoding, whereas the gist-memory system shows a slower evolution. Reyna (1992) suggested that verbatim processes predominate before the age of 6, the shift to gist processes begins between the ages of 6 and 9, and that gist preference predominates by age 9 or 10. She also suggested that there is probably a point sometime during early adolescence when verbatim processes actually deteriorate while gist processes continue to improve. Substantial evidence for the predictions of fuzzy-trace theory in regard to memory-reasoning independence has now accumulated (Brainerd & Reyna, 1993). Recently, the principles of fuzzy-trace theory have been applied more broadly to other areas of cognitive development. For example, Brainerd and Gordon (1994) explored the memories of preschoolers and second graders for ordered numbers. They contrasted the integration retrieval hypothesis of traditional constructivist theories with the parallel retrieval hypothesis of fuzzy-trace theory. The integration hypothesis is that gist memories are based on verbatim memories. The parallel hypothesis is that verbatim and gist memories for numbers function and develop independently. Brainerd and Gordon found that, in support of the parallel retrieval hypothesis, memory for actual (verbatim) numerical inputs did not help either younger or older children remember gist. Moreover, in a second experiment, Brainerd and Gordon found that instructions to extract gist improved the gist memories of preschoolers, but impaired their verbatim memories. Similarly, Reyna and Kiernan (1994) obtained more generalized support for the verbatim-gist independence predicted by fuzzy-trace theory in studies of 6- and g-year-old children’s memory for sentences (verbatim tests) and related inferences (tests for meaning or gist). Reyna and Kiernan found that accurate verbatim memory coexisted with systematic misrecognition of gist (true inferences) when children were instructed to remember verbatim sentences, more so for older children. In contrast, when children were explicitly instructed to remember gist in a second experiment, memory and inferences were more dependent, as expected, because of the emphasis on gist in both memory strategies and inferences. Most recently, fuzzy-trace theory has been applied to the question of false recognition (Brainerd, Reyna, & Brandse, 1995; Brainerd, Reyna, & Kneer, 1995). In the false recognition paradigm, after an opportunity to remember something, an individual is asked questions about whether or not targets and distracters had been encountered at the earlier time. Traditional views of false recognition suggest that distracters that resemble the target will be falsely recognized due to constructive inferences from related sche-
554
Marx and Henderson
mas that lead to information being lost, distorted, or interfered with (Loftus & Hoffman, 1989). The processes that underlie true memories and false memories are the same. Developmentally, constructive inferences have more of a role with increasing age because of the development of more elaborative schemas. A fuzzy-trace analysis of false recognition differs somewhat from the traditional constructivist approach. According to fuzzy-trace theory, when a person’s memory is tested for previous information, both verbatim traces and gist traces are retrieved. Verbatim retrieval should lead to recognition of target items. If the false recognition task stimulates retrieval of gist traces, the usual false recognition of related distracters will occur. However, fuzzytrace theory suggests a more complicated set of circumstances. If a distractor cues retrieval of gist memories, those memories may, in turn, cue verbatim traces that help the indi~dual disc~minate between targets and related distracters. Under these circumstances, a “false-recognition reversal” (Brainerd, Reyna, & Kneer, 199.5) may occur in which related distracters are falsely recognized less frequently than unrelated distracters (although both may be relatively rare). In a series of studies, Brainerd, Reyna, and Kneer (1995) found a pattern of false-recognition reversals for several different types of material (rhymes, category names, category instance% and word associates). False-recognition reversals increased with age. In the experiments reported here, the false recognition of categories related to sets of studied instances (e.g., indicating that “boat” had been heard after hearing “canoe” and “raft”) is compared with the false recognition of instances associated with previously presented sets of instances (e.g., indicating that “dime” had been heard after hearing “penny” and “nickel”). Bjorklund (1987) argued that false recognition of associated instances is due to the relatively automatic activation of semantic relations. Such false recognitions are similar to inferences in that they do not indicate verbatim memory, but they are not gist-based. In the present experimental context, the false recognition of category names after presentation of instances of the category and of instances after presentation of associated instances are assumed to be due to the extraction of gist. The developmental prediction from fuzzy-trace theory is that false recognitions will increase with age. False recognition of category names, in contrast to false recognition of associated instances, are assumed to be relatively more indicative of children’s tendency to extract gist (Brainerd 2% Reyna, 1990). Older children should be more capable than younger children in extracting gist. Finally, based on the proposed gist-verbatim independence, the tendency to make false recognitions should be uncorrelated with verbatim memory performance at each age level. As initial hypotheses, we expected both verbatim memory and false recognition of related category name distracters (relative to associated
Development of Inferences
555
instances and unrelated distracters) to increase with age. This prediction is based on the ideas that both verbatim and gist memories are supposed to increase over the ages under study (Brainerd, Reyna, & Kneer, 1995) and that category names are more indicative of gist extraction than are associated instances. Finally, based on the proposed gist-verbatim independence, the tendency to make false recognitions should be uncorrelated with verbatim memory performance at each grade level. EXPERIMENT
1
Sets of three instances of a category were presented in the study session along with single instances and category names. Subsequently, children were asked to indicate whether various test words had or had not been presented during study. Categorical inferences were measured by presenting the category names for sets of three studied inferences, when the category names occurred in the study list. False recognition of associated instances was measured by presenting an additional unstudied instance from the same category as a previously studied set of three instances. The new (unstudied) category and instance words were presented during testing as controls. It was assumed that the generation of categorical inferences would be indexed by the false recognition of category names relative to false recognition of control category names. Similarly, it was assumed that activation of instantial associations would be indexed by false recognition of new, associated instances relative to false recognition of new instances unrelated to previously studied instances. Also, previously presented single category and instantial items were included in the study list to provide a measure of verbatim memory. Older children were expected to falsely recognize more category names and associated instances than younger children. The tendencies to make inferences and verbatim memory were expected to be uncorrelated. Method
Participants. Participants attended a public school in a small southeastem city. The 124 children were in Grades 1 (n = 39), 2 (n = 37), 4 (n = 25), and 5 (n = 23). There were approximately equal numbers of girls and boys in each group. Third graders were originally included in the study, but their data were unusable because instructions were not adequately followed by the classroom teacher. Materials And Procedure. Study. Classroom teachers told the students that they would be read a list of words and later would be given a memory test on them. The study list
556
Marx and Henderson
consisted of 54 items (see Appendix, Form A) that were read to students at a 2-s rate. There were six filler-items, three presented at the beginning of the list and three at the end. The rest of the list consisted of six names of categories, six instances from separate categories, and 12 sets of three instances from categories that were clustered together on the list. Comprehension check. After the data for Experiments 1 and 2 were collected, a question was raised about the ability of younger children to unde~tand the categorical and instantial relationships in the word sets used. To insure that they could, additional data were collected and are briefly reported here. Children in the first (n = lS),second (n = 14),and third (n = 15) grades of the same school as was used for participants in Experiment 2, but who had not participated in that experiment, were tested. The children were told that they would play a word game. They were given answer sheets with sets of three related words on the left side and four single words on the right side. The word clusters on the left were the same ones that were used in Experiments 1 and 2, and the category names and associated instances that had been used as test items were the correct alternatives,of the four presented, on the right side. The teacher read off each set of three related words on the left and then read the four possible answers from the right side. The children were told to circle the word on the right that “goes best with the set.” The ZZword clusters that had categorical labels on the test list were presented first, after two practice items, and the 12 clusters that were followed by associated instances on the test list were then presented, again after two practice items. The three distracters for each item were ordinary words that had no relationship to the clustered words (e.g., movie, hat, and sky, along with playground for the cluster of slide, sandbox, seesaw; desk, tractor, sound along with hate for the cluster of fear, envy, love). The results were quite clear. For categories, the mean percentage of correct responses was 98,99, and 100 for Grades I, 2, and 3, respectively. For instances, the corresponding percentages were 93,98, and 100. There seems to be little doubt but that even the youngest children were capable of comprehending both categories and instances related to the word clusters used in these experiments. Test. In the test session, given immediately after study, students were told that a new list of words would be read and that some of the words would be ones that had been read to them previously and some would be new words. Their task was to circle a “yes” on the answer sheet for any word they remembered as having been read in the study session and a “no” for any word they did not remember hearing. They were told to make one of the two responses for each item, even if they had to guess. The test items were read, along with their identifying number from the test answer sheet, at a 3-s rate.
Development of Inferences
557
The test list (see Appendix) consisted of six old category labels; six old single instances; six new category labels; six new instances; six category labels, each representing one studied set of three instances; and six instances, each belonging to one of the studied sets. The latter two types of test items were the crucial ones, intended to permit both a comparison of categorical and instantial false recognitions and the evaluation of each against the appropriate control items. To ensure the associability of test instances, only high-probability associates (e.g., “dime” after “quarter, penny, nickel,” and “salmon” after “tuna, bass, trout”) were used. The new category names and new instances were unrelated to study items. The test items were presented in a randomized order, except for the restriction that three of each type be in each half of the list. Design. Children’s yes responses were cast in a 4 (Grade) X 2 (Gender) X 2 (Item: Categorical vs. Instantial) X 3 (Test Type: Old, Inferential or Associated, New) design with repeated measures on the last two factors. Results
Preliminary analyses indicated an absence of gender differences or interactions of gender with the other factors, so gender was deleted from further analyses. Means for response types are summarized in Table 1. An overall 4 (Grade) x 2 (Item: Categorical vs. Instantial) X 3 (Test Type: Old, Inferential or Associated, New) analysis of variance (ANOVA) revealed main effects of categorical versus instance item, F(l, 100) = 17.46,~ < -01, and test type, F(2,200) = 287.77, p < .Ol. These main effects were conditioned by interactions of Stimulus X Response Type, F(2,200) = 28.53,~ < .Ol, and Grade X Response Type, 46,200) = 5.00,p < .Ol. Grade Effects. There was a steady improvement over grades, as indicated by linear trends, in verbatim memory (old items) for both category
Table 1. Mean Positive Recognition Scores by Grade and Category and Instance Item: Experiment 1 Category Items
Instance Items
Grade
n
Old
Inferred
New
Old
1 2 4 5
39
3.5 3.9 4.3 4.4
1.3 1.7 0.7 0.6
1.4 1.6 1.0 1.0
3.6 3.8 4.1 4.6
37 25 23
Associated 2.3** 3.1** 1.9** 2.1**
New 1.2 1.2 0.6 1.3
Note. Positive responses (“yeses”) were correct for old items but were false recognitions for inferred or associated items and for new items. Asterisks indicate that mean inferred-new or associated-new error scores were reliably different, **p < .Ol.
5.58
Marx and Henderson
names, F&120) = 7.62,~ < .Ol, and instance items, F(1,120) = 10.38,~ < .Ol. No two grades differed (Scheffe tests) for verbatim recognition of categories, but the oldest children had sign~ica~tly higher scores for verbatim recognition of instances than the youngest children 0, < .05). The only other significant linear trend was a decreasing trend for inferred category names, F&118) = 9.19,~ < .Ol. Scheffe tests between grades indicated that second graders made more categorical inferences than third and fourth graders. Comparisons between false recognition of inferred category items and new (control) category names were insignificant (by t tests) at all grade levels. In contrast, at each grade level, children were more likely to falsely recognize associated than new instances (p < .Ol). Cu~iations: Ve~ati~ begot and False Rec~g~itiu~. The relations between verbatim memory and the two types of false recognitions were examined by correlating scores for correct recognition of old items and categorical and associated instance false recognitions. These correlations do not provide as strict a test of the verbatim-gist relationship as does the stochastic dependency analysis suggested by Reyna and Kiernan (1994), but that type of analysis could not be used here because old and inferred responses were based on different study items. The correlations between correct verbatim recognition and category and instance false-recognition responses, respectively, were -.27 and .34 for Grade 1, - .34 and .21 for Grade 2, .OOand .16 for Grade 4, and .17 and JO for Grade 5. None of these correlations was reliable (p > .05, two tailed), indicating relative independence of verbatim recognition and false recognition of inferred category and associated instance items. Discussion
Consistent with fuzzy-trace theory, verbatim memory for the items used in this study was relatively good and increased with age. The lack of correlation between verbatim recognition and false recognition also supports fuzzytrace theory. False recognition of category names declined with age, but false recognition for instances did not vary with age. A constructivist position would predict increases with age in both kinds of false recognition because of increases in the elaboration of schematic networks. However, according to fuzzy-trace theory and the findings of the Reyna and Kiernan (1994) study of sentences and Brainerd, Reyna, & Kneer (1995) for false recognition, these findings make some sense. According to fuzzy-trace theory, the accessibility of both verbatim memory and gist memory increase with age. It is possible that in the study presented here, the test items encouraged verbatim memory and rejection of distracters that neutralized age trends in gist memory.
development of Inferences
559
Under immediate testing conditions, category names for studied instances may have not stimulated category to instance gist processing. Instead, consistent with the findings of Brainerd, Reyna, & Kneer (1993, instances of categories may have elicited retrieval of category names. The retrieved names may then have been checked for verbatim memory of the category name, which would have increased the likelihood of rejection. The developmental decrease in false recognition of category names from instances may thus be indicative of the developmental increase in verbatim memory accessibility. False recognition did occur for associated instances. It is possible that although associated instances are not in themselves gist-based ~Bjorklund, 1987), that the testing process may have cued children to engage in strategies that stimulated gist-like processing. Instance-level associations require little effort. lfnder these conditions, gist traces may have on~eighed verbatim memory and led to false recognition. In contrast to the power of category names to elicit verbatim memory of specific instances, the retrieval of particular instances may have not cued verbatim memory of other instances. Although the findings of Experiment 1 generally support fuzzy-trace theory, it is difficult to disentangle the relative contributions of verbatim memory and gist memory. One way to better explore the effects of the accessibility of gist memory in categorical inferences and associated inferences is to increase the salience of factors that would encourage gist memory while decreasing the effects of verbatim memos. One factor that could have such an effect is delay in testing. In the second experiment, the effects of delay were examined.
EXPERIMENT
2
Verbatim memory has been shown to decline over time since the days of Ebbinghaus and has been shown to be less resistant to loss over retention intervals than memory for gist (e.g., Kintsch, Welsch, S~hmalhofer~ & #ZimnyY 1990). Reyna and Kiernan (1994) suggested that older children are more likely than younger children to show a decreased tendency to rely on verbatim memory because of an increased reliance on the cons~ction of gist representations. Thus, it is important to examine the effects of test delay on verbatim memory and false recognition of categorical and instantial information. Experiment 2 replicated Experiment 1, but extended the grade level tested and examined the effects of a retention interval on false recognition of categorical and instantial information. If categorical inferences are based more strongly on gist, they should be retained over a longer period of time than instantial associations. Also, older children should make relatively more categorical errors and categorical errors should persist at a relatively higher rate over retention periods.
560
Marx and Henderson
Method Pu~ici~~~~. Participants attended a private school in a small southeastern city. The 266 children were from 18 classes, three at each grade from the first through the sixth grades. For purposes of analysis, first and second, third and fourth, and fifth and sixth grades were combined to form three grade levels. The number of children in each grade level and experimental condition is listed in Table 2. There were approximately equal numbers of boys and girls at each level. The school had assigned children to classes to balance rated achievement, motivation, and gender, thus attenuating the possibility that delay was confounded with such characteristics.
~~te~~i~ and Procedure. The test materials and basic procedures for Experiment 2 were essentially the same as those used in Experiment 1. However, two forms of the study list were used in this experiment to permit pairing of grades. Both included fillers, category names, instances from separate categories, and clustered instances. The forms differed in whether a particular category was represented by the name for the category (e.g., farm animal) or a set of the category’s instances (e.g., cow, pig, sheep) and whether a single instance of a possible category (e.g., dime) or a list of instances for that category (e.g., quarter, penny, nickel) was included. The only other major difference was the addition of the delay variable. The test session occurred immediately after the study session for the zero-delay condition and after 1 or 7 days for the other two conditions. All students heard the same test list. Whether a test item was a test of false recognition of a category name or an associated instance or a test of verbatim memory was a function of which of the two study forms had been used. For example, marking yes to “farm animal” was a correct verbatim response for those who had heard it on their study list, but a categorical false recognition for those who had hear “cow, pig, sheep” during the study period. Likewise, marking yes to “dime” was a correct verbatim response for those who heard it but a associated instance for those who heard “quarter, penny, nickel”).
&sign. Grades were combined in adjacent pairs (l-2, 3-4, 5-6) to avoid small IZSin some cells and to permit the counterbalancing of items by use of two study list forms. Gender of child and delay period were also used as between-subjects variables. The test list in each case, the same one used in Experiment 1, contained both category and instance items which were either an old item (repeated from the study list), an inferred or associated item, or a new item; these variables were within-subject factors
Development of Inferences
561
Results ~Zi~i~~~ A~~Zy~~ The reco~ition responses to the test items were first analyzed in a 3 (Grade) X 3 (Test Delay) X 2 (Gender) X 2 (Item: Category vs. Instance) X 3 (Test Type: Old, Inferential or Associated, New) ANOVA with repeated measures for the last two factors. This initial analysis revealed no gender main effects or interactions, and the data therefore were reanalyzed without gender. The mean numbers of recognition responses by delay and grade are presented in Table 2. The 3 (Grade) X 3 (Test Delay) X 2 (Item: Category vs. Instance) X 3 (Test Type: Old, Inferential or Associated, New) ANOVA indicated reliable main effects of grade, delay, category versus instance item, and test type. However, all of these effects were conditioned by higher level interactions including a four-way interaction, F(8,514) = 2.06,~ < .OS.The main effects and interactions involving test type are hardly surprising because of the presence of measures of simple recognition of previously heard items, which tended to raise scores, as well as measures of recognition of potentially inferred or related and new items, which tended to lower scores. Also not surprising was a decline with deiay in the number of items correctly recognized from the original list. Therefore, to begin to sort out the effects of central interest, separate analyses were done, by delay period for verbatim memory, categorical inferences and associated instances, and control items.
Table 2. Mean Positive Retortion and Instance Items: Experiment 2
Responses by Grade, Delay, and Category
Category Items Delay Grades 1,2 0 l-day 7-day Grades 3,4 0 l-day 7-day Grades 5,6 0 l-day 7-dav
n
Old
29 30 24
4.2 3.2 3.6
25 27 28 36 35 32
Instance Items New
Old
1.1x 2.6 2.6
0.7 2.1 3.2*
4.3 3.3 3.1
1.4** 2.0 2.3
OS 1.6 2.7
4.8 4.3 3.5
1.9* 2.8 3.5**
1.3 2.6 2.4
4.9 4.2 3.7
2.1** 3.0 3.0
1.0 2.7 2.8
4.8 4.0 3.4
1.3 3.2** 2.8**
1.5 2.2 1.5
4.5 3.8 2.9
2.4* 2.7* 2.3
1.0 2.1 2.2
Inferred
Associated
New
Nore. Positive responses (“yeses”) were correct for old items but were false recognitions for inferred or related items and for new items. Asterisks indicate that mean inferred-new or related-new error scores were reliably different, *p < .OS,**p < .Ol.
562
Marx and Henderson
Yerbatim iWemo7-y. In the analysis of the old (correct) recognition responses for the immediate test (zero deiay), the 3 (Grade) x 2 (Item: Category vs. Instance) ANUVA resulted in only a reliable grade difference, F(Z, 87) = 3.69,~ < .QS. A Scheffe test revealed only more correct recognition by those in the middle group than by the youngest children (p < .O.S).A similar result was obtained for the l-day delay, with a reliable grade difference, F(2,89) = 7.2&p < .Ol. Again, children in the middle group had higher scores than those in the youngest group. No grade differences were obtained for recognition responses after a 7-day delay. Thus+ absolute levels of verbatim memory were not significantly different for categories and instances, leveled off in the two oldest groups, and declined with delay. Vemas Coat& False~e~oga~t~o~~ Grade efficfs. These more critical analyses were cast in a 3 (Grade) X 2 (Item: Category vs. Instance) X 2 (Test Type: Inferential or Associated vs. New) ANOVA with repeated measures on the last two factors. For each level of delay, two types of follow-up analyses were conducted. First, for each delay and each grade group, the frequency of recognition responses to new and inferred categories or associated instances was compared. Results of two-tailed r tests of these means are reported in Table 2 for each comparison. Second, grade trends for each dependent variable were tested with polynomial contrasts for linear and quadratie trends when an overall oneway ANOVA was reliable. For the immediate test, there were main effects of grade, inferred or associated versus new recognition, and category versus instance item that were conditioned by a three-way interaction, F(2,87) = 3.77, p < .05. At each grade level, children falsely recognized reliably more associated instance items than new instances. Children at the two youngest grade levels also falsely recognized more category inferences than new category names, but there was no difference for the oldest group. Polynomial contrasts of grade effects for each of the dependent variables indicated no grade trend for recognition of category inference items or new instances, but linear trends for new category items F&87) = 6.88, p < .05 (with paired comparison differences reliable only between the oldest and youngest groups), and associated instances F&87) = 13.46,~ < .Ol (with paired comparison differences reliable between the two oldest groups and the youngest group)_ That is, older children were more likely to say they recognized both test items that were new categories and instances of categories from which instance study items had been drawn. For the l-day delayed test, there were only main effects. Children incorrectly recognized more category than instance items, F(1, 89) = 4.70, p <: +05, and more inferential or associated than new items were incorrectly recognized, F(1,89) = 16.60,~ < .05. Hiowever, in regard to both the catera~e~e~~so~~ot~
Development of inferences
563
gorical and instantial items versus new items, the t tests reported in Table 2 indicate that the differences were reliable only for the oldest group, with those children recognizing more inferred or associated than new items. There were no linear grade trends for recognition of either inferred or new category items or for new instances, but there was an increasing linear trend for associated instances, F(1, 89) = 6.36, p < .OS. Older children were more likely to incorrectly recognize instances that could be associatively related to studied instances (Scheffe tests indicated that both the older groups were signi~cantly higher than the youngest group, p < .05). There were also reliable quadratic trends for the associated instances, F(1, 89) = 4.91, p < .05, and for incorrect recognition of new instances, F(1, 89) = 7.40, p < .05. In the latter case, children in Grades 3 and 4 were more likely than children in the other groups to incorrectly recognize new instances. At the 7-day test, there were main effects of grade and inferred or associated versus new item, and two-way interactions involving inferred or associated versus new item, and grade and inferred or associated versus new, and category versus instance recognition. These were all conditioned by a three-way interaction of grade, category versus instance, and inferred or associated versus new, F(2,81) = 4.40,~ < .05. As indicated by the t tests in Table 2, the youngest children falsely recognized more new items than inferred category items, whereas children in the older two groups falsely recognized more inferred category than new category items. There were no reliable differences between numbers of associated and new instance test items at any grade level. The only reliable linear grade trend was a decline over age in false re~gnitions of new categories, F(1,81) = 24.32,~ < .Ol. There were reliable quadratic trends for category inferences, F(1, 81) = 7.34, p < .Ol, with Scheffe tests indicating only an increment from the youngest to the middle grade levels. There was also a quadratic trend for associated instances, F(1,81) = 8.00,~ < .Ol, with the middle grade group higher than the oldest and youngest children. S~~~~~ ofdelay t$fecrs by grade. Figure 1 summarizes the findings for categorical inferences and associated instances compared to control (new) item recognition by delay for each grade level. The differences between inferred category and associated instance errors and corresponding control errors are graphed for each delay. In the youngest group, children made more categorical errors than false recognitions of new categorical items only in the zero-delay condition. For the oldest two groups, the increase with delay in the likelihood of making categorical errors, at the 7-day delay for the middle-grade group, and at both the l- and 7-day delays for the oldest group, contrasts with the decline in instantial errors relative to false recognition of new items. There were reliable differences between recognition of associated instances and control instances in the l-day delay
I
O-delay
I
I
7-day
C
O-delay
I
7.
I
Instances
1 -day
V
I
5-6
3-4
7-day
Grades
Grades
0
l-2
category and instance error differences by grade level and delay.
Categories
1-day
Figure 1. Mean inferred- or associated-control
-1
.o
0.0
tl t * z?z n
-0.5
0.5
8
1.0
1.5
Grades
0
I
Development of Inferences
565
for the oldest group but no ‘f-day delay instant&new the grade-level groups.
differences for any of
Correlations: Verbatim Memory and False Recognitions. Children’s positive recognition responses to old and inferred category and associated instance items were correlated for each delay and grade. The correlations are reported in Table 3. As indicated in the table, of the 18 within-gradelevel correlations, only one, the instance correlation for the middle group in the 7-day delay condition, was significant. However, across grades there were low, but statistically reliable, correlations between verbatim memory and both categorical and instantial inferences under the 7-day delay condition. Discussion Children at all the ages studied made inferences, as indexed by reliably higher levels of false recognitions of category names from studied instances than of new (control) category names. This suggests the operation of what Marx (1992) called “inferential bias in memory.” However, the extent of inferring exhibited by children was clearly infhrenced by memory test interval, developmental status, and the type of information to be remembered. The results also suggest that instantial associations are also readily made, as reported earlier by Bjorklund (Bjorklund & de Marchena, 1984; Bjorklund & Jacobs, 1985). However, categorical inferences seem to be both more readily made by older children and more durable. Overall, the pattern of results for the immediate test in Experiment 2 were similar to those from Experiment 1. Verbatim memory increased with grade for categories and instances in both experiments. In both experiments, children, especially older children, were more likely to make instantial than
Table 3. Correlations Between Verbatim and inferred Categorical or Associated Instantial Scores by Grade Level and Delay: Experiment 2 Delay Item Type
Grade
Immediate
l-day
‘i-day
Categorical
1,2 3>4 5,6 across 132 3,4 5,6 across
.OS .03 -.04 .04 .23 .28 .07 .19
.17 -.Ol .02 .lO .Ol .37 -.23 .06
.34 30 .15 .24* -25 .40* .29 .3iP
Instantial
*p < .05, two-taifed.
566
Marx and Henderson
categorical false recognitions at a rate above a baseline control. However, unlike the children in Experiment 1, the oldest two groups of children in Experiment 2 showed small but reliable tendencies to falsely recognize more inferred category labels than control labels. These results are consistent with the reported frequent occurrence of associative responding throughout the elementary school grades and the gradual increase with grade level of categorical responding (Bjorklund & de Marchena, 1984). The different developmental patterns for verbatim recognition, categorical inferences, and associated instances are supportive of fuzzy-trace theory’s proposition that gist-based and non-gist-based representations are developmentally independent. The results of the correlational analyses again suggest substantial independence of verbatim and gist-based memory except after extended delay. Finally, it should be noted that the radically different results obtained from immediate and delayed tests in this experiment indicate the importance of using more than one retention interval. Further implications of the results of the second experiment are discussed next. GENERAL DBCUSSION Several important findings of the first experiment were replicated in the immediate test group of the second experiment. First, as predicted by fuzzytrace theory, verbatim memory increased with age (Brainerd & Gordon, 1994). Second, at the immediate test, there was evidence for fuzzy-trace theory’s well-established finding of independence between verbatim and gist memories (Brainerd & Gordon, 1994, Brainerd & Reyna, 1993). Third, there was evidence for the relatively more important role of verbatim memory in the rejection of false recognitions for category names than for associated instances. When a category name is a distractor (from a set of instances), it is more likely to have been invoked in verbatim memory during study than is any particular instance when it has been cued by other instances as found by Brainerd, Reyna, & Kneer et al. (1995). Constructivist theories might well predict that either type of item would be largely gist-based, especially for older children. A fourth finding from both experiments was that, consistent with fuzzytrace theory, but in contrast to constructivist theories, false recognition did not consistently vary with age. Older children did not show a tendency to make more schema-based false recognitions than younger children. Instead, the developmental trends seemed to depend on the relative contributions of verbatim memory and gist memory in a way that neutralized developmental trends in any one particular direction (Brainerd, Reyna, & Kneer, 1995; Reyna & Kiernan, 1994). The major contribution of the second experiment was to show several substantial effects of delay. First, to no one’s surprise, verbatim memory
Development of Inferences
567
declines with delay. Second, there was some evidence, particularly when data were collapsed across age groups, that the dependency between verbatim memory and gist memory increased with delay. Although fuzzy-trace theory predicts verbatim-gist independence for immediate memory, it suggests that there should be an increase in dependency over a delay. This occurs because during immediate testing, correct recognition is based on verbatim memory and false recognitions are based on gist memory. With increasing delay, as verbatim memory weakens, correct recognitions reflect both verbatim and gist memories and the gist component of false recognitions will begin to correlate with them (Brainerd et al., 1995; Reyna & Kieman, 1994). A third finding, consistent with the results of earlier studies, is that false recognitions for categorical inferences, those that were most susceptible to the influence of gist, greatly increased as the relative role of verbatim memory declined and the role of gist memory increased. The exception to this trend was for the youngest group, They apparently did not make gistbased representations of the categorical information. It is possible that the development of spontaneous organizational strategies utilizing categorical relations was responsible for the age difference (Bjorklund, Ornstein, & Haig, 1977). In contrast to the findings for categories, false recognitions for associated instances declined with delay. Perhaps this decline reflects less stability in the memories because they are relatively easily activated (Bjorklund, 1987) and less gist-based (Brainerd, Reyna, & Brandse, 1995). In summary, we conclude that the results of these experiments provide support for the fuzzy-trace theory of memory. The differential effects of delay and developmental status on memory for categories and instances are quite consistent with the predictions of fuzzy-trace theory. The independence of verbatim and gist representations predicted by the theory (Reyna & Kieman, 1994) is at least partially supported by the correlational analyses in both experiments It appears that fuzzy-trace theory has potential for helping us understand at least one of the many kinds of inferences children make.
REFERENCES Bartlett, EC. (1932). Remembering: A study in experime?ztal and soc~alpsycholog% Cambridge, England: Cambridge University Press. Bjorklund, D.F. (1987). How age changes in knowledge base contribute to the development of children’s memory: An interpretive review, Developmental Review, 7, 93-130. Bjorklund, D.F., & de Marchena, M.R. (1984). Developmental shifts in the basis of organi!zation in memory: The role of associative versus categorical relatedness in children’s free recall. Child Development,
55, 952-962.
Bjorklund, D.F., 81 Jacobs, J.W., III. (1985). Associative and categorical processes in children’s memory: ‘Ihe role of automaticity in the organization of free recall. Journal of Experimental Child Psychology,
39,599-617.
568
Marx and Henderson
Bjorklund, D.F., Ornstein, PA., & Haig, J.R, (1977). Development of organization and recall: Training in the use of organizational techniques. DeveIupmenral PsycEzo~ogy, 13, 175-183. Brainerd, C.J., & Gordon, L.L. (1994). Development of verbatim and gist memory for numbers. Deve~opmentai Psychology, 30,163-177. Brainerd, C.J., & Reyna, VF. (1990). Gist is the grist: Fuzzy-trace theory and perceptual salience effects in cognitive development. Developmental Review, IO, 365403. Brainerd, C.J., & Reyna, VF. [1993). Memory independence and memory interference in cognitive development. Psychological Review, t&$42-67. Brainerd, C.J., Reyna, VT., & Brandse, E. (1995). Are children’s false memories more persistent than their true memories? Psychological Science, 6,359-364. Brainerd, CT., Reyna, V.F., & Kneer, R. (1995). False-recognition reversal: When similarity is distinctive. Journal of Memory and Language, 34, 157-185. Bransford, J.D., & Franks, J.J. (1971). The abstraction of linguistic ideas. Cognidve Psychology, 2,331-380.
Brown, AL., Smiley, S.S., Day, J-D.>Townsend, M.A.R., & Lawton, SC. (1977). Intrusion of a thematic idea in chiidren’s comprehension and retention of stories. Child development, 48,1454-1466.
Carter, D.B., & Levy, G.D. (1988). Cognitive aspects of early sex-role development: The influence of gender schemas on preschoolers’ memories and preferences for sex-typed toys and activities. Child Development, 59, 782-792. Kintsch, W., Welsch, D., Schmalhofer, E, & Zimny, S. (1990). Sentence memory: A theoretical analysis .hwnal of Memory and Language, 29,133-159. Loftus, E.E, & Hoffman, H.G. (1989). Mis~nfo~ation and memory: The creation of new memories. Journal ofExperimentalPsycho~~gy~ General, I@ X%104. Marx, M.H. (1992). Development of inferences over elememary-school grades: III. Verbatim and forward-consequence inferential errors made by regular and gifted students. Bulletin of the Psychonomic Society, 30, 353-355.
Marx, M.H., & Henderson, B.B. (1993). Development of inferences over the elementary years: IV. Affective bias as a determinant of inferences. Bulletin of the Psychonomic Society, 31,149~-151.
Paris, S.G., 8r Carter, A.Y. (1973). Semantic and constructive aspects of sentence memory in children. Developmental Psychology, 9, 109-113. Paris, S.G., & Lindauer, B.K. (1977). Constructive aspects of children’s comprehension and memory. In R.V. Kail & J.W. Hagen (Eds.), Perspectives on the development of memory and cog&ion (pp. 35-60). Hillsdale, NJ: Erlbaum. Reyna, V.E (1992). Reasoning, remembering, and their relationship: Social, cognitive, and developmental issues. In M.L. Howe, C.J. Brainerd, & V.F. Reyna (Eds.), ~eve~o~~e~~ 5fflong-term retentkm (pp. 103-127). New York: Springer-Verlag. Reyna, V.F. (1995). Interference effects in memory and reasoning: A fuzzy-trace theory analysis. In F. N. Dempster & C.J. Brainerd (Eds), New perspectives on interference and inhibition processes in cognition (pp. 29-61). San Diego: Academic. Reyna, V.F., & Bramerd, C.J. (1990). Fuzzy processing in transitivity development. Annals of Operations Research, 23, 37-63. Reyna, V.F., & Brainerd, C.J. (1992). A fuzzy-trace theory of reasoning and remembering: Paradoxes, patterns, and parallelism. In A. Healy, S. Kosslyn, & R. Shiffrin (Eds.), From learning processes to cognitive processes: Vol. 2. Es.w)/s in honor of Wliam
K. Estes (pp.
235-259). Hillsdale, NJ: Erlbaum. Reyna, VI?, & Kiernan, B. (1994). Development of gist versus verbatim memory in sentence recognition: Effects of lexical familiarity, semantic content, encoding instructions and retrieval interval. DevelupmentaE Psychology, 30,178191.
Development
of Inferences
569
APPENDIX Study List A Fillers (3 at beginning; 3 at end): airplane, carpet, dog, tank, pencil, fan Category labels: boat, farm animal, state, politician, facial feature, solar system Instances: carrot, salmon, Christmas, Miami, baseball, sprint Clusters: September-June-August, army-marines-air force, almondcashew-pistachio, captain-major-general, newspaper-radio-TV, quarter-penny-nickel, elm-maple-palm, Ford-Dodge-Pontiac, slidesandbox-seesaw, fear-envy-love, prince-king-princess, rose-petunialily Study List B
Fillers: (3 at beginning; 3 at end): airplane, carpet, dog, tank, pencil, fan Category labels: summer, nut, media, automobile, playground, flower Instances: navy, colonel, dime, oak, hate, queen Clusters: canoe-raft-runabout, bean-pea-spinach, tuna-bass-trout, cowpig-sheep, New Year-Fourth of July-Halloween, Florida-Ohio-Georgia, mayor-senator-governor, Orlando-Melbourne-Tampa, eye-ear-nose, football-basketball-soccer, race-dash-run, earth-sunmoon Test List
Christmas, automobile, storm, lake, state, oak, farm animal, colonel, circle, carrot, card game, summer, baseball, violin, playground, queen, solar system, building, wild animal, sprint, bake, dime, media, politician, boat, furniture, shoe, flower, navy, salmon, Miami, nut, color, lobster, facial feature, hate