Journal of Experimental Child Psychology 81, 157–193 (2002) doi:10.1006/jecp.2001.2647, available online at http://www.idealibrary.com on
Age, Memory Load, and Individual Differences in Working Memory as Determinants of Class-Inclusion Reasoning F. Michael Rabinowitz Memorial University of Newfoundland, St. John’s, Newfoundland
Mark L. Howe Lakehead University, Thunder Bay, Ontario, Canada
and Kelly Saunders University of New Brunswick, Fredericton, New Brunswick, Canada We studied the effects of individual differences in speak-span scores and variations in memory demands on the class-inclusion performance of 10-, 13-, and 15-year-old children. The speak-span task was an age-appropriate modification of Daneman and Carpenter’s (1980) reading-span task and was considered to be a measure of global resources. The age variable was assumed to be a global index of skill development, and some of the specific skills hypothesized to be important in class-inclusion reasoning were estimated using a mathematical model. The results from both regression analyses and the mathematical model indicated that differences in age, speak span, and memory load all affected performance. Surprisingly, the effects of speak span and memory load were independent. However, the effects of each of these variables depended on the age level of the participants. Based on these findings, we argued that (a) resources vary continuously with age, (b) both skill level and global resources should be varied in developmental studies of problem solving, and (c) resource theories (e.g., Norman & Shallice, 1986) should be modified to account for developmental change. © 2002 Elsevier Science Key Words: class-inclusion reasoning; cognitive development; global resource measures; mathematical modeling; memory/reasoning trade-off; problem solving; resource theory; working memory. Preparation of this article was supported by Grants OGP0003334 (to Mark L. Howe) and OGP0002017 (to F. Michael Rabinowitz) from the Natural Science and Engineering Research Council of Canada. The authors thank Charles Brainerd, Ulrich Müller, and two anonymous reviewers for their many constructive suggestions regarding the article as well as the students, teachers, secretaries, vice principals, and principals at the Holy Cross Primary School, MacDonald Drive Junior High School, Macpherson Junior High School, and St. Paul’s Elementary School for making this research possible. Address correspondence and reprint requests to F. Michael Rabinowitz, Department of Psychology, Memorial University of Newfoundland, St. John’s, Newfoundland, Canada A1B 3X9. E-mail:
[email protected]. 157 0022-0965/02 $35.00 © 2002 Elsevier Science All rights reserved
158
RABINOWITZ, HOWE, AND SAUNDERS
Piaget (1952) studied class inclusion to investigate children’s understanding of both category membership and number. He presented sets of picture–question pairs in which a picture depicted two different subclasses (e.g., 4 dogs and 2 cats) belonging to the same superordinate class (e.g., animals). After the children looked at the picture, they were asked a class-inclusion question in which the larger subclass was compared to its superordinate class. Based on Piaget’s pioneering work and the subsequent mathematical development proposed by Hodkin (1987), it appears that people use one of three types of reasoning strategies when answering class-inclusion questions. They either guess and/or reason idiosyncratically, use subclass–subclass reasoning, or show an understanding of the question and use class-inclusion reasoning. When children use idiosyncratic reasoning, they may be using systematic rules that do not correlate with the question and, therefore, are not detected. Mathematical representations of this type of reasoning are indistinguishable from guessing and include the assumption that correct responding will occur at a chance level. When subclass–subclass reasoning is used, the superordinate class is treated as the subclass that is not specified in the question. For example, if the class-inclusion question is “Are there more dogs or more animals?” and “animals” is treated as “cats,” then children will incorrectly answer “more dogs.” Piaget found that children between 3 and 7 years old usually used subclass– subclass reasoning, whereas older children usually correctly solved the question by using class-inclusion reasoning. We have been using a modified version of Piaget’s task to evaluate the relationship between reasoning and remembering for the past 10 years (Howe & Rabinowitz, 1996; Howe, Rabinowitz, & Powell, 1998; Rabinowitz, Howe, & Lawrence, 1989). We assumed that class inclusion is a logical reasoning task that has minimal mathematical prerequisites, depends on the participants’ abilities to compare subclass and superordinate class, and can be studied with materials drawn from either quantitative or qualitative (e.g., color) dimensions and that our version of the task is representative of tasks that require reasoning and remembering. Campbell (1991) obtained evidence consistent with our first two assumptions and, consistent with the third assumption, argued that the Inhelder and Piaget (1969) logical inference model “predicts that class-inclusion problems with uncountable items (Dagenais, 1973) should not be appreciably harder than class-inclusion problems with countable items” (Campbell, 1991, p. 175). Because Markman (1978) reported that children over 10 years old often appreciated that the solution to number-based class-inclusion problems depended on logical necessity obviating the need to keep the subclass information in memory, we modified the standard Piagetian task by adding color-based and subclass–subclass number-based questions in an attempt to get all participants to remember the number and/or color information associated with each subclass. Because it is necessary to remember color information to answer any of the color-based questions and to remember number information to answer the subclass–subclass numberbased questions, it is likely that the participants who are included in our studies,
WORKING MEMORY AND CLASS-INCLUSION REASONING
159
independent of their age, attempt to remember the number and/or color information that appears in the statements that precede the questions. In our task, a computer is used to present written statements and questions. Each statement contains information about number and/or color of subclass members (e.g., “There are 4 brown dogs and 2 white cats”) and is followed by either a subclass–subclass, minor subclass–class, or class-inclusion question and three alternative answers. There are 11 possible question types that represent the combination of relevant dimension (number or color), number of items in the two subclasses (same or different), and type of comparison (subclass–subclass, minor subclass–class, or class inclusion). It is impossible to present the 12th and missing minor subclass versus superordinate class number-equal question because there is not a minor subclass unless the number of items in each subclass is different. There are four different forms of questions used. Equivalence judgments are required for both number and color dimensions following statements in which the number of items in the subclasses are the same; otherwise, superlative judgments are required. The equivalence questions are “Are there the same number of x1s as x2s?” (e.g., “Are there the same number of dogs as cats?”) and “Is the c-est x1 the same color as the c-est x2?” (e.g., “Is the brownest dog the same color as the brownest animal?”). The superlative questions are “Are there more x1s or more x2s?” (e.g., “Are there more dogs or more animals?”) and “Is the c-est x1 c-er than the c-est x2?” (e.g., “Is the brownest dog browner than the brownest cat?”). Note that there are a number of features of our task that merit emphasis. First, the child must remember color cues to answer any color questions. Second, the child must remember number cues to answer subclass–subclass number questions. Because the child does not know what question will follow a statement, we assume that an attempt is always made to remember the cue values associated with each of the subclasses. Third, the color-based questions are syntactically and semantically more complex than the number questions. In addition to the equivalence judgment required in the number problem, two superlative judgments are called for in the color-based questions. Fourth, the child must be a sufficiently skilled reader to understand and complete the task. This is the primary reason why our youngest participants are fourth-graders. However, as suggested above, if all participants attempt to remember the cues and usually use this cue information as a basis for reasoning in our task, then the findings are likely to be relevant to the class-inclusion performance of younger children in other variants of the task who, because they do not recognize the logical necessity of class inclusion (Markman, 1978), must make use of stored representations in reasoning about subclass–class relationships. Attempts to Explain Class-Inclusion Reasoning Many theorists have attempted to explain class-inclusion performance. Inhelder and Piaget (1969) proposed that children develop the ability to classify objects hierarchically during the concrete operations stage and that class inclusion de-
160
RABINOWITZ, HOWE, AND SAUNDERS
pends on children’s ability to perform two operations, one involving the addition of classes and the other involving the subtraction of classes (for development and support of this position, see Müller, Sokol, & Overton, 1999). In his review, Winer (1980) argued that (a) there is considerable evidence suggesting that many children of concrete operational age fail standard class-inclusion tests; (b) across studies, fourth-grade and older children often correctly answered only 50% of the classinclusion questions; and (c) studies showing early development of class inclusion, in which children age 7 or 8 years correctly solve class-inclusion problems, are rare and outnumbered by studies showing late development by a 3:1 ratio. Information processing explanations have been proposed (Brainerd & Reyna, 1995; Shipley, 1979; Trabasso et al., 1978; Winer, 1980). Trabasso et al. (1978) suggested that it is the acquisition of many skills that predicts success on classinclusion questions and that it “seems unlikely that this knowledge follows a simple age progression or is structurally linked to quantification” (p. 179). These skills include perceptual, semantic, and linguistic operations. They list eight required components that include (a) interpreting the question as a request to compare two quantities, (b) comparing the resulting quantities, and (c) responding with a set of decision rules. Shipley (1979) discussed the role of linguistic factors emphasizing the non-naturalness and ambiguity of the class-inclusion question. Brainerd and Reyna (1995) assumed that both verbatim and gist memories are stored when stimuli are presented (fuzzy trace theory). Class-inclusion errors or “inclusion illusions” are assumed to occur because “inappropriate gists dominate reasoning and lead to inferences that contradict certain background inputs” (p. 73). They suggested that developmental changes in class-inclusion reasoning probably reflect that older children are better able to discriminate gist and verbatim alternatives, making it easier for them to access the appropriate verbatim memories. However, it appears that class-inclusion performance is dependent on more than skill acquisition. For example, Winer (1980) found that 10-year-olds have the necessary skills but often cannot correctly solve the class-inclusion problems. Thus, a complete explanation would seem to require more than a skills analysis. A different kind of explanation has been offered by the neo-Piagetian theorists, who attempt to capture and build on the structuralist features of the Piagetian model: Case’s (1985) efficiency model; Chapman’s (1987) attentional capacity model; Halford, Maybery, and Bain’s (1986) capacity model; and PascualLeone’s (1987) energy model. These theorists have posited that changes in a child’s performance are mediated by underlying changes in the actual or functional resource capacity of that individual. If both skill acquisition/automatization and working memory capacity are related to the development of class-inclusion performance, then a more encompassing model is needed. Using college students as participants, Howe et al. (1998) found evidence to support the inclusion of both of these theoretical possibilities. For example, color problems proved to be more difficult than number problems, which is consistent with the assumption that linguistic skills are important when solving class-inclusion problems (Shipley, 1979). Both increases in
WORKING MEMORY AND CLASS-INCLUSION REASONING
161
working memory, an individual difference variable, and decreased memory load, an experimentally manipulated variable, correlated with an improvement in performance, which suggests that capacity may have a role to play in understanding developmental changes. A Resource Approach Because both skill and capacity are central concepts in most resource models, these models seem to be sufficiently encompassing to be useful in accounting for the relationship between reasoning and remembering in class-inclusion performance and, perhaps, in problem solving in general. We have reviewed a number of resource models (Howe & Rabinowitz, 1990), and to some extent, all of the models are underspecified and can be used interchangeably or even replaced by nonresource approaches. Within the context of resource theories, skill seems to represent any mental or physical response that is performed in completing a task, and overlearned skills become automatic and no longer require resources. One way this could happen is that overlearned skills could be directly retrieved from memory rather than constructed (Logan, 1988). As we noted (Howe & Rabinowitz, 1990), automaticity is probably the most appealing resource-based concept from a developmental perspective because, over time, a child gets to practice a large number of skills repeatedly. Some of these highly practiced skills (e.g., visual scanning, reading, writing) are useful in a large number of tasks and should be associated with developing expertise. Four ideas about capacity common to most resource theories and central to our arguments are that (a) capacity limits performance, (b) capacity varies across individuals, (c) capacity is used in performance, and (d) demands on capacity are negatively correlated with the level of relevant skills. Because in our classinclusion work the key hypothetical constructs are operationalized using a mathematical model (i.e., reasoning and remembering skills), a psychological test (capacity), and an experimental manipulation (memory load), it does not matter whether the capacity is conceived as attention, energy, or working memory. Nevertheless, we choose to use the resource-limited, willed-attention, working memory model proposed by Norman and Shallice (1986) because it provides a fairly simple framework to deal with developmental changes in problem solving and the relationship between reasoning and remembering. Norman and Shallice (1986) suggested that the supervisory attentional mechanism is the limited resource. They assumed that deliberate attentional resources are employed in tasks that involve planning or decision making, troubleshooting, ill-learned or novel sequences, dangerous or technically difficult components, and the inhibition of strong habitual responses. Because participants are rarely presented with questions involving the comparison of a superordinate class and a subclass, it appears that all of these attention-demanding task characteristics, with the possible exception of troubleshooting, characterize class inclusion. A number of predictions can be derived from Norman and Shallice’s assumptions that the automatization of skills and working memory capacity play important
162
RABINOWITZ, HOWE, AND SAUNDERS
roles in problem-solving tasks that require attentional resources. First, if relevant skills improve or become automatic as children become older, then performance should improve with age because less resources will be required to complete the task. Second, if working memory capacity increases up to age 25 years and declines thereafter, then performance should follow an inverted U-shaped function with age on any resource-demanding task. Third, if at least one aspect of performance is above floor and all aspects are below ceiling, then an increase in task demands will produce a deterioration in at least one aspect of performance. These general conclusions, along with the assumption that participants assign capacity to memory processes before reasoning processes, yield a number of predictions about developmental changes in class inclusion.1 If relevant skills become more proficient and the capacity of working memory increases with age, then class-inclusion performance should improve with age. Similarly, changes in task demands should have an effect so long as performance is below ceiling and above floor. As expected, memory performance deteriorates when task demands are increased for fourth-graders who are already performing at floor for reasoning (Howe & Rabinowitz, 1996). In older individuals, who perform above floor in both memory and reasoning, a different pattern of outcomes would be expected. So long as reasoning performance is well above floor, small increases in memory load would be expected to affect only reasoning, whereas larger increases might affect both reasoning and memory. As college students show appropriate class-inclusion reasoning in our baseline conditions, consistent with expectation, increases in memory load affect only their reasoning. However, baseline reasoning is not as robust in seventh-graders, ninth-graders, and seniors. Similar increases in memory load affect both reasoning and memory in these groups (Howe & Rabinowitz, 1996; Howe et al., 1998; Rabinowitz et al., 1989). Objectives In the current experiment, we wanted to determine the degree to which developmental changes in class-inclusion performance should be attributed to age-correlated increases in working memory capacity or to age-correlated changes in skill use and automatization. We attempted to sort out these possibilities by assessing individual differences in working memory, manipulating memory load, separating component processes (i.e., some of the relevant skills) using the mathematical model of class inclusion developed by Rabinowitz et al. (1989), using age as a marker for the development of other relevant skills, and measuring reading and choice latencies on the class-inclusion task. Reading latencies should reflect syntactic, semantic, and reading skills. These skills, as well as the skills involved in interpreting the question as a request to compare two qualities or quantities, comparing the resulting qualities or quantities, and responding with a 1
Many variants of the class-inclusion task have been studied. From a resource perspective, a somewhat different growth curve, reflecting skill and resource requirements, would be expected to occur with each variant.
WORKING MEMORY AND CLASS-INCLUSION REASONING
163
set of decision rules, should be reflected in choice latencies. It is likely that the time involved in comparing qualities/quantities and using decision rules would be related to the development of both capacity and skill. Working Memory Measure Fourth-, seventh-, and ninth-graders were first tested for working memory capacity. A variety of tasks have been used as definitions of working memory by the neo-Piagetians. These measures have been criticized by Howe and Rabinowitz (1990) because task performance could be affected by the use of memory strategies. Howe et al. (1998) used Daneman and Carpenter’s (1980) reading-span task as a measure of working memory. They selected this task primarily because it fulfills many of their criteria for a satisfactory measure of working memory. In addition, they noted that the measure plays a central role in the Just and Carpenter (1992) activation-based language comprehension model and that Daneman and Carpenter’s conceptualization is similar to that offered by Rabinowitz et al. (1989). This measure was found to be a reliable predictor of college students’ performance on class-inclusion tasks (Howe et al., 1998). However, the reading-span task proved to be too difficult for the youngest participants in the current study. Because we wanted to keep as many features of this task as possible, an age-appropriate modification, called speak span, was developed and used. The Mathematical Model Rabinowitz et al. (1989) developed a model that characterizes the manner in which participants represent the information in the statements and interpret the questions when choosing among the three possible solutions. The parameters, which represent hypothetical constructs (i.e., some of the skills considered to be essential in performing the task), are specified by sets of equations and estimated by fitting the model to data. The model contains five parameters: two that are related to memory (e and d) and three that are related to reasoning (i, s, and u). The two memory-related parameters reflect the manner in which information is encoded at retrieval (i.e., when questions are answered). Here, e represents the probability of correctly encoding the values on the relevant dimension as same or different on the test, and 1 ⫺ e represents the probability of incorrect encoding. The parameter d, which is irrelevant to the number-same problems, represents the conditional probability of remembering the cue value associated with each subclass when the cues are correctly encoded as different. The conditional probability of associating each of the two cue values with the wrong subclass is 1 ⫺ d. These parameters are influenced by the way information is stored when the statements are presented and by the loss between initial presentation and the presentation of the question, but they do not reflect these processes directly. The three reasoning parameters (i ⫽ idiosyncratic, s ⫽ subclass–subclass, and u ⫽ understanding [see also Hodkin, 1987; Howe & Rabinowitz, 1996; Howe et al., 1998; Rabinowitz et al., 1989]), reflect the way in which questions involving the superordinate class and either subclass are interpreted. The parameter i represents
164
RABINOWITZ, HOWE, AND SAUNDERS
the probability that the participant uses either idiosyncratic reasoning that is not correlated with the correct answer or guesses, s represents the probability that the participant treats the question as involving only subclasses, and u represents understanding. The three reasoning parameters reflect events that are assumed to be mutually exclusive and exhaustive, and only 2 degrees of freedom are lost in their estimation: 1 ⫽ i ⫹ s ⫹ u.
Thus, four free parameters are estimated for the number-different and color problems, whereas only three free parameters are estimated for the number-same problems. The parameters and associated definitions are summarized in Table 1. Different sets of equations (see Howe & Rabinowitz, 1996, Appendix B, (1) or Howe et al., 1998, Appendix B) are used for the number-same and number-different problems. Only one set of equations is needed for the color-relevant problems because different colors are always associated with each of the two subclasses appearing in a statement. As an illustration of the way the model works, consider the following. The child is presented with the statement “There are 6 dogs and 4 cats” followed by the class-inclusion question “Are there more dogs or more animals?” and the three choices “more dogs,” “more animals,” and “same number.” Suppose that the child correctly encodes the numerical values of the subclasses as different (which would occur with probability e) but incorrectly remembers that there are more cats than dogs (probability 1 ⫺ d). If either the subclass–subclass rule (probability s) or the class-inclusion rule (probability u) was then used, then the child would always choose the correct answer (more animals). The subclass–subclass rule would generate the correct answer with these assumptions because the child “remembers” that there are more cats than dogs. On the other hand, if the child then used an idiosyncratic rule or guessed (probability i), then the correct answer would occur a third of the time because idiosyncratic rules and guessing are assumed to be unrelated to the answer chosen. TABLE 1 Theoretical Definitions of the Choice Model’s Parameters Parameter Memory e d
Theoretical definition Probability of correct encoding of dimensional cues as same or different Probability of correctly associating the cue with each of the two subclasses given the relevant dimension is accurately encoded as different
Reasoning u Probability of understanding and accurately interpreting questions involving comparison of the superordinate class and subclass s Probability of subclass–subclass interpretations of questions involving comparisons of the superordinate class and a subclass i Probability of idiosyncratic interpretations of questions involving the superordinate class and a subclass Source. This table originally appeared in Rabinowitz et al. (1989).
WORKING MEMORY AND CLASS-INCLUSION REASONING
165
METHOD Participants The participants were 88 (50 female and 38 male) fourth-graders (mean age ⫽ 114 months, SD ⫽ 4), 75 (32 female and 43 male) seventh-graders (mean age ⫽ 152 months, SD ⫽ 6), and 63 (33 female and 30 male) ninth-graders (mean age ⫽ 176 months, SD ⫽ 5). A consent letter was sent home from school and parental permission was required for each child who participated. The children attended one of four different public schools in the city of St. John’s, which served pupils from all socioeconomic groups. More than 90% of the children were Caucasian. The data from an additional 8 participants were not included. Due to time constraints, 5 of these participants did not complete the class-inclusion task, 1 participant was an immigrant who had not yet mastered English, and 2 ninth-grade participants used an obvious strategy (i.e., they rehearsed by using more than one word in the active set in their sentences) on the speak-span task. Apparatus and Materials A computer was used to present the speak-span and class-inclusions tasks. Both programs were written in Quick Basic 4.5. Stimuli were presented on a 22.5 ⫻ 27 cm monitor. Responses were entered using a 101-key enhanced keyboard. The speak-span task is a modification of Daneman and Carpenter’s (1980) reading-span task in which the participant is presented with sentences and required to verify whether each sentence is true or false. Following the presentation of all sentences in a set, the participant is asked to recall the final word appearing in each sentence. Because this task proved to be too difficult for fourthgraders, a speak-span task, which retained the structure but not the specific content of the reading-span task, was constructed using 110 English words ranging from 3 to 12 characters in length (see Appendix). The words were taken from fourth-grade books used in the school system and were arranged in sets containing different numbers of words (two, three, four, five, or six). Five sets of words (e.g., sets of two words) made up a group. For each child, the words were randomly assigned to the sets. After a word was presented, the child was required to make up a sentence containing the word.2 The experimenter typed the sentence as 2 Brainerd (personal communication, December 1999) noted that verbal elaboration is required in the speak-span task and, therefore, that the task might reflect individual differences in capacity and/or semantic processing. We concur with his assessment but note that this possibility probably cannot be excluded with any of the tasks used to define memory capacity. For example, using the standard Daneman and Carpenter (1980) task in which college students were required to answer true–false questions based on general information, Howe et al. (1998) found that the percentage of correct true–false answers was correlated with reading-span scores. On the other hand, they did not find a relationship between two other measures that should reflect semantic processes: reading and choice latencies and reading span. Similarly, with the exception of the choice latencies obtained with class-inclusion questions, neither reading nor choice latencies were related to speak-span scores in the current experiment. The relationship between semantic processing (and a variety of other variables) and definitions of memory capacity currently is unknown and merits further study.
166
RABINOWITZ, HOWE, AND SAUNDERS
the child spoke. After producing sentences for each of the words in a set, the child was asked to recall the words. In the class-inclusion task, the computer was used to control presentation of the verbal materials and to record choices. The children responded using the numerals 1, 2, and 3 on the number pad. The materials were arranged in blocks of 48 units, with each unit consisting of a statement with information about color or number, a question, and three alternative answers. For color, there was a statement (e.g., “There are black dogs and orange cats”), either a superlative question (e.g., “Is the blackest animal blacker than the blackest dog?”) or an equivalence question (e.g., “Is the blackest animal the same color as the blackest dog?”), and three alternatives to choose from when answering the question (e.g., “same color,” “dog blacker,” and “animal blacker”). For number, there was either a number-different statement followed by a superlative question (e.g., “There are 6 robins and 4 swallows. Are there more robins or more birds?”) or a number-equal statement followed by an equivalence question (e.g., “There are 6 robins and 6 swallows. Are there the same number of robins as birds?”), and three alternatives (e.g., “more robins,” “more birds,” and “same number”). As in the examples, each statement consisted of either two numbers (n1 and n2) and two nouns or two colors (c1 and c2) and two nouns. The values of the numbers ranged from 4 to 9, where n1 ⫽ n2 for the equivalence problems and n1 ⫽ n2 for the superlative problems. Each of the 11 possible problems determined by question (subclass–subclass, minor subclass–class, or class inclusion), problem type (equivalence or superlative), and dimension (number or color), plus an additional number-different minor subclass versus class problem (the equivalence rather than the superlative form of the question was used in this case), appeared once in each of four successive blocks of 12 problems. The additional number-different minor subclass versus class problems were substituted for the impossible to construct numbersame minor subclass versus class problems. For each individual, the numbers were randomly assigned to each number problem and the problems were quasirandomly assigned, subject to the blocking constraint. Design All children completed the speak-span task before the class-inclusion task. Based on speak-span score, gender, and grade, each child was quasi-randomly assigned to either the statement-present (low memory load) or statement-absent (high memory load) class-inclusion condition. If the statement remained on the screen (as compared to being absent) when the question appeared, then it was assumed that memory load is reduced. Procedure All children were tested individually. The following speak-span instructions appeared on the monitor:
WORKING MEMORY AND CLASS-INCLUSION REASONING
167
You will be presented with a number of sets of words. At the beginning of each set, you will be told how many words will occur in that particular set. I will then read you one word from the set. After I read the word, please make up a sentence using that word. After I type in your sentence, I will read the next word in the set. At the end of the set, you will be asked to remember each of the words that you made up a sentence about. You can remember the words in any order. After you have remembered as many words as you can, another set of words will start. The number of words per set will increase as the procedure continues. Before beginning the experiment, you will encounter a set of practice words.
The experimenter read these instructions to the child and answered any questions. Two-word sets were used in the pretest, which ended as soon as the child correctly recalled both words in a set or a total of five two-word sets had been presented. The purpose of the pretest was to familiarize the child with the procedure. After the pretest was completed, each child was presented with one to five test trials. A trial consisted of the presentation of five sets of words that constituted a group. Each set within a group contained the same number of words. Every child began with the two-word group. A child who reached criterion on the two-word trial was presented with a three-word trial and so forth. The most difficult trial involved six-word sets. At the beginning of each trial, the child was informed about how many words to expect in each set. After the experimenter read the first word aloud, the child was required to make up a sentence using that word. The experimenter typed the sentence. After all of the words in the set had been presented, the child was asked to recall the words in any order. Recall was self-paced. The experimenter typed the recalled words and hit the carriage return for each of the words in a set that the child omitted. Only the first three letters of the words that the experimenter entered were evaluated by the computer in an attempt to reduce the likelihood that typing errors affected the outcomes. If the child successfully recalled the words in three of five sets making up a group, then the next trial, which involved adding an additional word to each set, would be presented. Speak-span scores could range from 1 to 6. If a child was successful with less than two of the five sets on a trial, then a speak-span score corresponding to the number of words in each set on the prior successful trial was assigned. For example, if a child was successful with less than two sets on a three-word trial, then a score of 2 was assigned. A speak-span score of 1 reflected that the child failed to succeed on at least two two-word sets. If the child was successful in recalling exactly two of the five sets on a trial, then a speak-span score was assigned that was midway between the number of words per set on that trial and on the previous trial (e.g., 2.5 on a three-word trial). If the child was successful with more than two of the five sets, then he or she continued on to the next level (unless the child had completed level 6). After the child had finished the speak-span task and following a delay equivalent to the time to load the software, the class-inclusion task was presented. The instructions appeared on the monitor containing the following information. There would be 48 questions, each question would be preceded by a statement containing the information needed to answer the question, and the questions would be
168
RABINOWITZ, HOWE, AND SAUNDERS
answered by pressing the correct key (either 1, 2, or 3 on the number pad). If the child had any questions about the instructions, then the experimenter answered them. The child pressed any key, and the first statement appeared. The child controlled the amount of time that the statement was visible by pressing a key when finished reading the statement. In the statement-absent condition, a trial consisted of the presentation of a statement that remained on the monitor until the child hit a key indicating that the statement had been read. Following an interval of 1 s, the question and related choices appeared on the monitor. After the child answered a question, the screen was cleared and 1 s later the next statement appeared. The procedure was identical in the statement-present condition except that the statement remained on the screen until the child answered the question. During a trial, each of the child’s responses terminated a clock. The time between the appearance of the statement and the child’s first response defined reading latency, whereas the time between the appearance of the question and related choices and the child’s answer defined choice latency. The keyboard was constantly monitored in the computer program so that responses occurring in inappropriate intervals were ignored (almost always associated with not releasing the key quickly enough), and disallowed responses were corrected (e.g., choice responses other than 1, 2, or 3). RESULTS Three sets of analyses are described in this section. In an initial attempt to establish the utility of the speak-span measure, regression analyses of the speakspan data are presented first. Regression analyses of the reading latencies, choice latencies, and choice accuracy class-inclusion data are presented next, and a description of the likelihood ratio tests associated with the mathematical model completes the section.3 The regression and likelihood approaches provide complementary information. In the regression analyses, (a) age and speak span are treated as quantitative rather than qualitative variables and (b) the influence of each variable of interest is assessed with all other variables partialled out. The mathematical model is used to (a) operationalize hypothetical constructs and (b) test hypotheses about them using likelihood-ratio tests. Because the mathematical model is sufficient (i.e., it fits the data), the statistical tests based on the model are more sensitive than tests based on general linear models. 3
In this article, 38 regression analyses and 4 experiment-wise likelihood ratio tests used with the mathematical model are reported. With such a large number of tests, the Type 1 error rate across the data analyses usually would be of concern. However, in the current case, it appears to be a minor issue for the following reasons. Follow-up tests involving the predictors in the regression equations were conducted only if the relevant R2 value was significant. Similarly, the parameters in the mathematical model were tested only if the relevant experiment-wise test was significant. If we had adopted a .001 significance level for all of the regression coefficients and experiment-wise tests, then the overall probability of a Type 1 error would have been less than .05. This effectively occurred as p ⬍ .001 for each of the significant experiment-wise tests and for all but two of the significant R2 values (the two exceptions were p ⬍ .01).
WORKING MEMORY AND CLASS-INCLUSION REASONING
169
Analyses of the Speak-Span Data A large number of tasks have been used in developmental research to operationalize (i.e., define) working memory (see Towse, Hitch, & Hutton, 1998). One criterion that might be used to ascertain the utility of the speak-span task as a definition of working memory is that there is a monotonic increase in scores on the task as a function of age. As expected, the linear correlation between age and speak-span score was positive, r(224) ⫽ .42, p ⬍ .001. It should be noted that most of the quantitative change in scores occurred between the fourth and seventh grades (means ⫽ 2.60, 3.37, and 3.52 for the fourth-, seventh-, and ninth-graders, respectively). The qualitative division of the children into low (speak-span scores ⬍ 3.0), medium (speak-span scores between 3.0 and 3.5), and high (speak-span scores ⬎ 3.5) working memory groups provides additional information about the correlation between age and speak span. The numbers of children who had low, medium, and high working memory, respectively, were 53, 27, and 8 (fourthgraders); 21, 31, and 23 (seventh-graders); and 10, 31, and 22 (ninth-graders), x2(4) ⫽ 38.22, p ⬍ .001. Two other utility criteria are that the score has predictive validity and construct validity (i.e., the task provides a definition of working memory that predicts performance on other tasks in an expected manner). The second and third criteria were assessed using the class-inclusion data set in both regression analyses and tests performed using the mathematical model. Mean free-recall speak-span latency (i.e., the mean time between a computer prompt for first word, second word, . . . , and a response) across the trials in which the child produced a word was the predicted variable in a regression analysis in which age, gender, and speak-span scores were predictor variables, R2(3, 222) ⫽ .14, p ⬍ .001. Although both age and speak-span score correlated negatively and significantly with mean free-recall latency, only age was a significant predictor with the other variables partialled out, t(222) ⫽ 4.88, p ⬍ .001. Gender did not correlate significantly with any of the other variables. Regression Analyses of the Class-Inclusion Data Reading Latencies A set of 12 regression analyses were performed on each of the dependent measures obtained on the class-inclusion task: reading latency in seconds, choice latency in seconds, and choice accuracy.4 In each analysis, the predicted variable 4
We screened the data using analyses of variance on the reading latency, choice latency, and choice class-inclusion data. The findings reported by Howe and Rabinowitz (1996) were replicated for comparable age–condition combinations. Age and speak span entered into only one significant interaction in the three analyses of variance, Age ⫻ Speak Span ⫻ Present–Absent ⫻ Trial Blocks, which reflected that the reading latencies changed across the four blocks of 12 trials in a complex manner, F(12, 570) ⫽ 2.93, p ⫽ .001.
170
RABINOWITZ, HOWE, AND SAUNDERS
was the mean score obtained with a particular question, dimension, and problem type (same–different with number, equivalence–superlative with color) combination obtained with one of the measures. For example, for one of the analyses, each participant’s average reading latency was calculated across the four subclass–subclass color equivalence questions. In all of the analyses, the predictor variables were memory load (the statement being either absent or present when the questions were presented), gender, age (months), speak-span score, and mean freerecall latency on the speak-span test. Because the participants did not know what question would follow a statement, all 12 reading latency analyses should have generated comparable results. Similar findings were obtained in 11 analyses, R2(5, 220) ⬎ .10, p ⬍ .001. Memory load and age were significant predictor variables with the remaining variables partialled out, t(220) ⬎ 2.50, p ⬍ .05. Comparable results were obtained in the remaining analysis if a ln (x ⫹ 1) transformation, which reduces the impact of high scores, of the reading latencies was used as the predicted variable. Reading latencies correlated negatively with age and were shorter if the statements were present rather than absent. The findings are consistent with the analyses of variance reported in earlier studies (Howe & Rabinowitz, 1996; Rabinowitz et al., 1989). It would appear that in the age range studied, older children process written information faster than do younger children and that the children attempt to memorize the information presented in the statements if that information is not available when the questions appear. Speak-span scores did not correlate with the mean reading latency measures. Thus, it appears that age, but not speak-span score, is an index of reading skill. Choice Latencies The results obtained in the 12 regression analyses of the mean choice latencies are summarized in Table 2 and are consistent with the memory load and age findings reported by Howe and Rabinowitz (1996). With the exception of the colordifferent minor subclass–class questions, choice latencies were markedly influenced by memory load. Choice latencies were longer with the statement present than with the statement absent. When the statement is available, the children reread the statement while answering the question. Age also predicted mean choice latencies in most of the analyses. The linear correlations between age and choice latencies were negative in all cases in which age was a significant predictor. Older children tended to respond more quickly than did younger children. Again, it appears that age is an index of skill development, in this case reading and decision skills. Speak-span scores were a consistent predictor of choice latencies only with class-inclusion questions. Unlike the linear correlations between age and mean choice latencies, the correlations between speak-span scores and choice latencies were positive in all analyses in which speak span was a significant predictor. It appears that the children with higher speak-span scores are more likely to recognize the complexity of the class-inclusion questions and spend
WORKING MEMORY AND CLASS-INCLUSION REASONING
171
TABLE 2 Regression Analyses for the Choice Latency Data Problem Subclass–subclass questions Number same Number different Color equivalence Color superlative Minor subclass–class questions Number same Number different Color equivalence Color superlative Class-inclusion questions Number same Number different Color equivalence Color superlative
R2 (5, 220)
Absent or present
Gender
Age
.212*** .078** .188*** .091***
6.18*** 4.14*** 5.77*** 4.34***
0.17 0.14 0.27 0.69
2.65** 0.62 3.69*** 1.68
0.12 0.84 1.36 0.68
2.36* 0.06 0.76 0.14
.163*** .194*** .196*** .129***
4.53*** 5.44*** 6.05*** 4.32***
0.36 0.19 0.48 1.93
4.66*** 4.47*** 3.66*** 2.32*
1.51 1.12 2.38* 1.91
0.61 0.03 0.21 0.44
.145*** .157*** .126*** .159***
4.66*** 5.00*** 4.42*** 4.32***
0.23 0.03 0.14 0.09
2.69** 3.59*** 2.97** 4.18***
3.41*** 2.50* 2.06* 2.12*
0.44 0.07 0.58 0.84
Speak span
Recall latency
Note. With the exception of the R2 values, all numbers in the table are t(220). *p ⬍ .05. **p ⬍ .01. ***p ⬍ .001.
more time reasoning about these questions than are children with lower speakspan scores. The differential abilities to attend/recognize and to reason both might depend on the availability of capacity in working memory. Choice Accuracy On the class-inclusion task, each correct response was scored as 2, errors consistent with appropriate same–different encoding of the dimension relevant to the question (E1 errors) were scored as 1, and the remaining errors (E2 errors) were scored as 0. For example, the E1 error associated with a number-different classinclusion question involving cats and animals would be “more cats,” whereas the E2 error would be “same number.” It should be noted that the E2 class-inclusion color error is different from all of the other types of E2 errors because it cannot follow same–different encoding errors (i.e., same color encoding) unless the encoding error is associated with idiosyncratic interpretation. For example, if the statement was “There are white cats and brown dogs” and the question is “Is the whitest cat whiter than the whitest animal?,” then the E1 error would be “cat whiter” and the E2 error would be “animal whiter.” The results obtained in the choice accuracy regression analyses are summarized in Table 3. The regression coefficients were significant in 8 of the 12 analyses. Memory load was a significant predictor, with the remaining variables par-
172
RABINOWITZ, HOWE, AND SAUNDERS TABLE 3 Regression Analyses for the Choice Accuracy Data
Problem Subclass–subclass questions Number same Number different Color equivalence Color superlative Minor subclass–class questions Number same Number different Color equivalence Color superlative Class-inclusion questions Number same Number different Color equivalence Color superlative
R2 (5, 220)
Absent or present
Gender
.109*** .108*** .009 .037
2.46* 3.12**
2.20* 1.65
.109*** .071** .018 .018
2.93** 2.11*
.230*** .139*** .157*** .172***
1.56 0.77 1.45 0.78
Age
Speak span
Recall latency
0.52 0.55
0.03 2.16*
3.41*** 1.58
1.18 0.29
0.10 0.40
3.40*** 3.28**
0.70 0.68
0.32 0.35 0.62 1.52
2.77** 2.09* 3.06** 3.94***
5.68*** 4.14*** 3.20** 3.02**
1.11 0.59 0.64 0.43
Note. With the exception of the R2 values, all numbers in the table are t(220). *p ⬍ .05. **p ⬍ .01. ***p ⬍ .001.
tialled out only with subclass–subclass and minor subclass–class questions involving number. Children’s performance was enhanced with these questions by having the statements present. Because these questions can be answered correctly with subclass–subclass reasoning, it is likely that the present statement served as a memory aid and facilitated performance by permitting the children in this age range to scan the statements and answer the questions. With the linguistically more complex color questions and with the conceptually more complex classinclusion questions, it is likely that the color and/or number information must be kept in short-term memory for reasoning to proceed. If this interpretation is correct, then one would expect that memory load would be likely to affect the memory parameters in the mathematical model with children in the age range studied. Two other outcomes summarized in Table 3 merit mentioning. First, it would appear that the speak-span scores have both predictive and construct validity for class-inclusion performance. In 7 of the 8 analyses in which the multiple-regression coefficients were significant, speak span was a significant predictor. Furthermore, consistent with the resource model discussed (Norman & Shallice, 1986), the linear correlation between speak span and choice accuracy was positive in all 12 analyses. Second, further support for our general position comes from the fact that age and speak span both were significant predictors with classinclusion questions. It appears that both working memory capacity and age-correlated changes in skills account for improvement in class-inclusion reasoning.
WORKING MEMORY AND CLASS-INCLUSION REASONING
173
Modeling Class-Inclusion Choice Data Before we can use the mathematical model to interpret working memory differences in reasoning and remembering, the parameters must be estimated and the degree of fit of the model to the choice data must be evaluated statistically. To see how this process works, we begin by defining the data space. For each age, working memory, dimension (color or number), problem type (same–different with number and equivalence–superlative with color), and memory load (absent or present), three data types were defined for each question type (subclass–subclass, minor subclass vs class, class inclusion): S ⫽ success, E1 ⫽ error usually associated with correctly encoding the values on each subclass as same or different, and E2 ⫽ remaining error usually associated with incorrectly encoding the values on each subclass as same or different. Different equations were developed for the subclass–subclass (ss), minor subclass versus class (msc), and major subclass versus class or class-inclusion (ci) questions. These equations and the associated likelihood functions appear in Howe and Rabinowitz (1996, Appendix B) and Howe et al. (1998, Appendix B). Because 1 ⫽ P(S) ⫹ P(E1) ⫹ P(E2),
2 degrees of freedom are associated with the data for each question type. Thus, when three question types are available to test the model, as in the color problems, a total of 6 degrees of freedom exist in the data, as 2 degrees of freedom are necessary to estimate the memory parameters (e and d) and 2 more are needed to estimate the reasoning parameters (see Eq. 1). This leaves 2 degrees of freedom for assessing the goodness of fit of the model for the color problems. Similarly, (2) because there were no minor subclass versus class number-same problems, the three number-same parameters were estimated using a data set containing 4 degrees of freedom. Finally, because number-different minor subclass versus class problems were used in lieu of the impossible minor subclass versus class number-same problems, the four number-different parameters were estimated using a data set containing 8 degrees of freedom (the two sets of minor subclass vs class problems were treated independently). A simplex procedure was used to estimate the parameters (Siddal & Bonham, 1974). Necessity and Sufficiency Tests Having defined the model, the data space, and the estimation procedure, only two steps remain: direct assessment of goodness of fit and hypothesis testing. Concerning goodness of fit, two tests are conducted: a necessity test and a sufficiency test. The necessity test examines whether a model with fewer parameters provides a statistically adequate account of the data. This involves comparisons of the likelihood of the data given the three- (number-same questions) or four-parameter model with the likelihood of the data given a simpler twoparameter model in which memory is assumed to be perfect (i.e., 1 ⫽ e ⫽ d). Specifically, the necessity test evaluates the null hypothesis that a model with
174
RABINOWITZ, HOWE, AND SAUNDERS
fewer parameters fits the data as well as a model with more parameters. Rejection of this null hypothesis means, in this case, that the model with three (number-same problems) or four parameters is necessary to account for these data and that memory encoding was not perfect. Thus, this test has a dual purpose. First, it serves to evaluate the necessity of having more (three or four) rather than less (two) parameters in the model. Second, it serves to confirm the necessity of positing memory processes to account for class-inclusion reasoning. The necessity tests for each level of working memory (by age, condition, and question) can be found in Table 4. As can be seen, the null hypothesis was rejected in 65 of the 72 tests. Of the 7 exceptions, 5 occurred with number-same problems with the statement present. This problem condition combination imposes the lightest memory load on the children; therefore, it is not surprising that memory often appears to be perfect. Furthermore, Howe and Rabinowitz (1996) reported similar results. The sufficiency test, which examines whether a more complex model is required to account for the data, involves comparing the likelihood of the data given the three- (number-same problems) or four-parameter model with the likelihood of the data itself (i.e., when all of the empirical probabilities are free to vary, thus exhausting all of the information in the data). What this means is that the theoretical model (which reduces the number of parameters estimated) is compared to a data-based model (which does not limit the number of parameters estimated). Specifically, the sufficiency test, like the necessity test, evaluates the null hypothesis that a model with fewer parameters fits the data as well as a model with more parameters. Failure to reject the null hypothesis means, in this case, that the model with three (number-same questions) or four parameters is sufficient to account for the data. The sufficiency tests for each reading span by condition can also be found in Table 4. As can be seen, in 67 of the 72 cases, the null hypothesis could not be rejected. Exceptions occurred for (a) two color-equivalence statement-absent problems with seventh-graders (significant at p ⬍ .01 but not .001) and (b) three number-same problems with fourth-graders (significant at p ⬍ .05 but not .01). Despite these exceptions in both the necessity and sufficiency tests, the three(number-same questions) and four-parameter models were found to be, on average, both necessary (i.e., more than two parameters were needed to account for much of the data) and sufficient (i.e., generally no more than three or four parameters were needed to account for the data). Note that in those cases where this latter conclusion did not hold, the magnitude of the corresponding necessity test was substantially greater than that of the sufficiency test, suggesting that the model is accounting for a considerable proportion of the variance in the data (even though it was not as adequate as the data themselves). Thus, although in some cases the data may be somewhat more complicated than our models depict, it can be reasonably concluded that, within the bounds of statistical tolerance, the three(number-same problem) and four-parameter models provided an adequate and parsimonious fit to the data from this experiment.
TABLE 4 x2 Values for the Goodness-of-Fit Tests Fourth grade Condition
Seventh grade
Ninth grade
Sufficiency test
Necessity test
Sufficiency test
Necessity test
Sufficiency test
357.59*** 15.74*** 264.41*** 540.43***
6.39 0.35 2.80 0.38
116.94*** 4.56* 293.91*** 78.09***
5.63 1.39 4.66 4.59
91.70*** 0.40 86.44*** 156.47***
6.45 0.00 2.43 1.65
1124.11*** 108.76*** 1219.62*** 294.83***
4.30 0.30 5.82 1.06
150.95*** 20.08*** 91.37*** 81.32***
4.73 0.35 5.38 3.12
106.87*** 0.04 59.78*** 94.79***
3.59 0.00 5.05 0.22
73.54*** 16.53*** 90.54*** 240.51***
6.17 6.35* 2.97 1.06
183.40*** 0.13 75.71*** 72.39***
3.02 0.00 3.68 0.47
62.10*** 0.05 298.24*** 372.03***
3.08 0.01 2.87 0.03
488.67*** 19.78*** 283.18*** 226.33***
2.49 4.20* 0.30 5.12
142.17*** 26.49*** 94.70*** 143.22***
9.31 1.05 4.73 11.55**
150.45*** 26.67*** 275.70*** 192.02***
7.77 0.50 4.49 0.94
39.55*** 0.03 35.25*** 5.47
4.70 0.01 1.16 2.85
15.47*** 0.13 102.62*** 65.79***
9.12 0.00 1.52 3.16
57.12*** 11.47*** 81.12*** 69.45***
1.68 0.00 2.42 5.70
77.99*** 15.67*** 28.85*** 116.08***
1.51 4.04* 2.91 4.37
67.14*** 13.04*** 200.60*** 135.93***
2.81 2.78 0.90 7.12*
7.12* 11.71*** 64.36*** 74.27***
8.08 2.77 0.29 2.83
175
Note. The degrees of freedom associated with the number-different, number-same, color-superlative, and color-equivalence necessity tests were 2, 1, 2, and 2, respectively. The degrees of freedom associated with the number-different, number-same, color-superlative, and color-equivalence sufficiency tests were 4, 1, 2, and 2, respectively. *p ⬍ .05. **p ⬍ .01. ***p ⬍ .001.
WORKING MEMORY AND CLASS-INCLUSION REASONING
Low working memory, statement present Number different Number same Color superlative Color equivalence Low working memory, statement absent Number different Number same Color superlative Color equivalence Medium working memory, statement present Number different Number same Color superlative Color equivalence Medium working memory, statement absent Number different Number same Color superlative Color equivalence High working memory, statement present Number different Number same Color superlative Color equivalence High working memory, statement absent Number different Number same Color superlative Color equivalence
Necessity test
176
RABINOWITZ, HOWE, AND SAUNDERS
Hypothesis Testing Next, we can turn our attention to the main business of hypothesis testing. Because the parameter estimates from the fitted models turn out to be identifiable, they can be used directly in testing hypotheses about the theoretical relationships between reasoning and remembering and individual differences in working memory. The models’ parameters are identifiable because there are more data points than parameters and the parameter estimates generated using the maximum likelihood procedure were independent of the initial starting values. The numerical values of these parameters appear in Tables 5, 6, and 7. The change in reasoning as a function of working memory in fourth-graders, apparent in these parameter estimates, is so striking that it merits emphasis before describing the formal
TABLE 5 Estimated Parameter Values for the Choice Model: Fourth Grade Condition
e
d
i
s
u
.00 .05 .27 .01
.95 .88 .73 .90
.05 .07 .00 .09
.00 .00 .24 .10
.81 .86 .76 .90
.19 .14 .00 .00
.00 .15 .08 .39
.48 .35 .57 .27
.52 .50 .35 .35
.00 .02 .00 .39
.93 .98 1.00 .61
.07 .00 .00 .00
.00 .04 .19 .30
.86 .85 .80 .70
.14 .11 .01 .00
.00 .00 .22 .00
.63 .71 .72 .94
.37 .29 .07 .06
Statement present Low working memory Number different Number same Color superlative Color equivalence Medium working memory Number different Number same Color superlative Color equivalence High working memory Number different Number same Color superlative Color equivalence Low working memory Number different Number same Color superlative Color equivalence Medium working memory Number different Number same Color superlative Color equivalence High working memory Number different Number same Color superlative Color equivalence
.97 .97 .82 .81 .97 .96 .93 .87 1.00 1.00 .85 1.00
.94 .97 .84 .90 .96 .95 .82 .72 1.00 .88 .92 .94
.87 .90 .84 .95 .86 .75 .87 .95 .95 Statement absent .70 .72 .79 .71 .72 .92 .72 .81 .67
WORKING MEMORY AND CLASS-INCLUSION REASONING
177
hypothesis-testing sequence. As can be seen in Table 5 and comparable tables that appear in Rabinowitz et al., (1989) and Howe and Rabinowitz (1996), fourthgraders usually do not demonstrate any understanding of class-inclusion reasoning, the parameter u in the mathematical model, in our task. This is not so with fourth-graders who have high working memory. In fact, fourth-graders with high working memory appear to have a better understanding of class-inclusion reasoning than do ninth-graders with low working memory (see Tables 5 and 7). The three-phase hypothesis-testing sequence begins with an experiment-wise test that, like an omnibus F test, evaluates the null hypothesis that, on average, the numerical estimates of the model’s parameters did not vary statistically across working memory, grade, and memory load. In each case, the null hypothesis was
TABLE 6 Estimated Parameter Values for the Choice Model: Seventh Grade Condition
e
d
i
s
u
.00 .44 .00 .50
.64 .39 .69 .43
.36 .17 .31 .07
.00 .05 .13 .22
.84 .83 .73 .75
.16 .12 .14 .03
.12 .00 .06 .17
.18 .36 .26 .30
.70 .64 .68 .54
.05 .21 .15 .22
.77 .61 .70 .74
.17 .18 .15 .04
.05 .16 .39 .30
.78 .76 .53 .62
.17 .08 .08 .08
.03 .02 .00 .04
.57 .73 .69 .78
.40 .25 .31 .18
Statement present Low working memory Number different Number same Color superlative Color equivalence Medium working memory Number different Number same Color superlative Color equivalence High working memory Number different Number same Color superlative Color equivalence Low working memory Number different Number same Color superlative Color equivalence Medium working memory Number different Number same Color superlative Color equivalence High working memory Number different Number same Color superlative Color equivalence
.97 .97 .82 .87 .93 1.00 .92 .89 .96 1.00 .82 .83
.98 .93 .87 .87 .91 .94 .92 .93 1.00 .95 .92 .86
.87 .81 .91 .92 .92 .96 .98 .97 1.00 Statement absent .79 .90 .91 .90 .88 .83 .85 .83 .85
178
RABINOWITZ, HOWE, AND SAUNDERS TABLE 7 Estimated Parameter Values for the Choice Model: Ninth Grade Condition
e
d
i
s
u
.00 .30 .01 .00
.71 .65 .68 .79
.29 .05 .31 .21
.03 .00 .30 .02
.40 .57 .39 .53
.57 .43 .30 .45
.12 .26 .14 .32
.18 .22 .31 .25
.71 .52 .55 .43
.00 .15 .26 .21
.94 .75 .74 .60
.06 .15 .00 .19
.06 .04 .20 .45
.53 .55 .61 .40
.41 .41 .18 .15
.15 .04 .13 .28
.32 .39 .35 .29
.54 .57 .52 .43
Statement present Low working memory Number different Number same Color superlative Color equivalence Medium working memory Number different Number same Color superlative Color equivalence High working memory Number different Number same Color superlative Color equivalence Low working memory Number different Number same Color superlative Color equivalence Medium working memory Number different Number same Color superlative Color equivalence High working memory Number different Number same Color superlative Color equivalence
.90 1.00 .75 .66 .98 1.00 .66 .73 .93 .96 .95 .93
.99 1.00 .80 .86 .94 .94 .69 .71 .97 .95 .92 .85
.87 .95 .94 .99 .89 .84 .88 .87 .89 Statement absent .80 .92 .69 .81 .90 .96 1.00 .90 .96
rejected, with the numerical values being x2(68) ⫽ 371.61, p ⬍ .001 (number-different problem); x2(51) ⫽ 238.72, p ⬍ .001 (number-same problem); x2(68) ⫽ 250.07, p ⬍ .001 (color-different problem); and x2(68) ⫽ 265.21, p ⬍ .001 (color-equivalence problem). The next phase involves condition-wise tests that, like t tests, evaluate the null hypothesis that the numerical estimates of the model’s parameters do not vary statistically between pairs of conditions. There was a total of 180 condition-wise tests in the current experiment. The numerical results are presented in Tables 8, 9, and 10. Concerning age effects, it can be seen in Table 8 that age affected performance for all combinations of memory load and working memory. Half of the 72 tests were significant. It should be noted, however, that among the high work-
WORKING MEMORY AND CLASS-INCLUSION REASONING
179
TABLE 8 Condition-wise Tests: Age Effects Hypothesis
Number different (all tests are x2[4])
Fourth versus seventh grade: Statement present Low working memory 11.65* (s,u) Medium working memory 3.81 High working memory 11.74* (s,u) Fourth versus seventh grade: Statement absent Low working memory 5.78 Medium working memory 23.63*** (d) High working memory 2.86 Seventh versus ninth grade: Statement present Low working memory 3.25 Medium working memory 26.00*** (s,u) High working memory 5.07 Seventh versus ninth grade: Statement absent Low working memory 2.30 Medium working memory 14.07** (s,u) High working memory 10.66* (d,s,u) Fourth versus ninth grade: Statement present Low working memory 8.76 Medium working memory 20.11*** (s,u) High working memory 10.44* (s,u) Fourth versus ninth grade: Statement absent Low working memory 7.10 Medium working memory 12.64* (s,u) High working memory 14.41** (d,s,u)
Number same Color superlative (all tests are (all tests are x2[3]) x2[4]) 18.24*** (i,s) 4.44 2.78
6.07 6.23 5.82
Color equivalence (all tests are x2[4]) 7.90 16.07** (d,s) 9.25
16.13** (i,s,u) 1.16 7.15
14.41** (d,s) 8.36 2.02
8.11 15.12** (e) 5.99
3.86 14.85** (s,u) 9.78* (i,s)
5.82 37.86*** (e,i,s,u) 14.26** (e,u)
9.51* (e,i,s) 35.65*** (e,i,s,u) 13.96** (i)
3.19 12.93** (s,u) 8.80* (s,u)
2.81 22.03*** (e,i,s) 6.64
8.85 25.52*** (e,i,s) 10.52* (i,s,u)
3.29 15.07** (s,u) 2.38
10.87* (i,u) 41.44*** (e,s,u) 4.75
11.08* (e) 35.69*** (e,s,u) 2.67
8.28* (s,u) 13.08* (s,u) 3.75
6.12 20.23*** (e,d,s,u) 7.82
2.76 3.80 21.34*** (d,i,s,u)
Note. The listed parameters are significantly different, p ⬍ .05; e, encoding; d, conditional memory of cues; i, idiosyncratic interpretation; s, subclass–subclass interpretation; u, understanding. *p ⬍ .05. **p ⬍ .01. ***p ⬍ .001.
ing memory groups, there was not much change between the fourth and seventh grades (only the number-different, statement-present condition-wise test was significant) and that among the low working memory groups, there was not much change between the seventh and ninth grades (only the color-equivalence, statement-present test was significant). Concerning individual differences in working memory, it can be seen in Table 9 that working memory affected performance for all combinations of age and memory load. Of the 72 tests, 43 were significant. Consistent with the choice accuracy regression analyses, the effects of memory load were somewhat more limited than those of age and working memory. For example, only 2 of the 12 tests involving ninth-graders were significant, and none of the tests involving low memory span seventh-graders was significant. In total, 15 of the 36 tests were significant. Finally, parameter-wise tests are conducted to determine the locus (reasoning and/or remembering) of these differences. Here, for those pairs of conditions that differed significantly, a series of x2(1) tests were conducted to determine which of the parameters differed reliably between the conditions. Because these tests are both tedious and space-consuming to report, they are typically described in sum-
180
RABINOWITZ, HOWE, AND SAUNDERS TABLE 9 Condition-wise Tests: Individual Differences in Speak Span Hypothesis
Number different (all tests are x2[4])
Low versus medium: Statement present Fourth grade 9.18 Seventh grade 8.96 Ninth grade 9.76* (s,u) Low versus medium: Statement absent Fourth grade 1.79 Seventh grade 12.08* (d) Ninth grade 9.69* (s,u) Medium versus high: Statement present Fourth grade 12.85* (s,u) Seventh grade 37.36*** (s,u) Ninth grade 8.64 Medium versus high: Statement absent Fourth grade 4.34 Seventh grade 22.95*** (s,u) Ninth grade 10.47* (d,u,s) Low versus high: Statement present Fourth grade 18.87*** (s,u) Seventh grade 21.01*** (d,i,s,u) Ninth grade 10.82* (i,s,u) Low versus high: Statement absent Fourth grade 8.09 Seventh grade 18.31** (s,u) Ninth grade 24.50*** (d,i,s,u)
Number same Color superlative (all tests are (all tests are x2[3]) x2[4]) 1.81 15.88** (i,s) 10.32* (i,u) 5.77 2.39 6.15 14.38** (i,s,u) 32.37*** (s,u) 16.31*** (i,s)
7.57 10.50* (d,i,u) 7.40 2.01 4.04 6.30
Color equivalence (all tests are x2[4]) 9.69* (d) 9.49* (i,s) 4.14 19.68*** (e) 3.96 11.72* (e,d,i,s)
12.79* (i,s,u) 37.42*** (s,u) 20.97*** (e,i,u)
18.97*** (e,d,i,s,u) 30.45*** (s,u) 11.07* (e,i,s)
2.02 5.11 15.88** (e,u,s)
14.32** (e,d,i,s) 3.12 7.56
20.78*** (s,u) 17.52** (i,u) 15.21** (s,u)
7.71 22.28*** (d,s,u) 10.06* (e,i,s,u)
11.47* (e,i,s,u) 28.91*** (i,u) 13.95** (e,i,s,u)
8.31* (s,u) 1.90 10.46* (i,s,u)
1.43 13.80** (i,u) 16.52** (e,i,s,u)
4.21 1.54 13.33** (d,s,u)
3.04 3.24 2.61
Note. The listed parameters are significantly different, p ⬍ .05; e, encoding; d, conditional memory of cues; i, idiosyncratic interpretation; s, subclass–subclass interpretation; u, understanding. *p ⬍ .05. **p ⬍ .01. ***p ⬍ .001.
mary form. Consistent with this tradition, we present only those parameter-wise differences that were statistically reliable (p ⬍ .05) in Tables 8, 9, and 10. Age effects. Inspection of Table 8 reveals that the probability of correct encoding of dimension cues as same or different (the parameter e) was not significantly different in any of the age comparisons with number problems but did differ in 11 of the 36 age comparisons with color problems. For this reason, separate graphs appear for number and color problems in Fig. 1. The parameter values plotted in these graphs are averaged across working memory and memory load and are based on the estimates that appear in Tables 5, 6, and 7. When interpreting these graphs, one should be cognizant that age effects are not independent of working memory (see condition-wise age effects above) and that general trends, rather than the specific parameter-wise tests that appear in Table 8, are described. Inspection of this figure does reveal a different pattern of performance across age in the number and color problems with both the encoding (e) and idiosyncratic reasoning (i) param-
181
WORKING MEMORY AND CLASS-INCLUSION REASONING TABLE 10 Condition-wise Tests: Memory Load Effects—Statement Present versus Absent Hypothesis
Fourth grade Low working memory Medium working memory High working memory Seventh grade Low working memory Medium working memory High working memory Ninth grade Low working memory Medium working memory High working memory
Number different (all tests are x2[4]) 37.47*** (d) 34.50*** (d) 6.05 0.99 2.90 23.17** (d,s,u) 6.79 9.94* (d,s) 9.20
Number same Color superlative (all tests are (all tests are x2[3]) x2[4]) 7.49 7.48 5.63 2.16 8.47* (i) 16.45** (s,u) 0.48 4.13 2.48
Color equivalence (all tests are x2[4])
15.51** (d,i,s) 11.33* (d) 6.05
16.95** (i,s) 15.20** (e,d,i,s) 14.24** (d,i,s,u)
3.94 6.62 28.30** (d,s,u)
5.56 11.63* (d,s) 24.08** (d,s,u)
3.94 2.15 0.80
15.37** (e,d,i,s) 9.09 3.63
Note. The listed parameters are significantly different, p ⬍ .05; e, encoding; d, conditional memory of cues; i, idiosyncratic interpretation; s, subclass–subclass interpretation; u, understanding. *p ⬍ .05. **p ⬍ .01. ***p ⬍ .001.
eters. Encoding was nearly perfect, e ⬎ .95, on the number problems at all ages, whereas a curvilinear relationship across age appeared on the color problems, with the poorest performance occurring at the ninth grade (e ⬍ .88 at all grades). On average, more idiosyncratic reasoning or guessing occurred on the color problems (i ⬎ .18) than on the number problems (i ⬍ .10) at all ages. The average idiosyncratic parameter value was approximately constant across age on the color problems but increased with age on the number problems. The poorer encoding and increased guessing associated with the color problems, as compared to the number problems, reflect the greater linguistic complexity associated with the color questions. The parameters d and u increased and the parameter s decreased across age groups on both number and color problems. Overall, the two graphs suggest that the major developmental change is a switch from subclass–subclass to classinclusion reasoning as children get older. In comparison to the developmental change in reasoning, memory changes are minimal on these problems. These data and conclusions replicate those reported by Howe and Rabinowitz (1996). Working memory effects. To highlight the magnitude of the effects associated with working memory at each of the grade levels, parameter values averaged across memory load are presented at each grade in Fig. 2. It can be seen in each of the three graphs that the effects of working memory are much more pronounced on the reasoning parameters s and u than on either memory parameter, e or d, or on the idiosyncratic reasoning parameter, i. The parameter d, the conditional probability of correctly associating the cue with each of the subclasses following accurate encoding, increased from low to medium and from medium to high working memory at each grade level. The patterns of outcomes associated
182
RABINOWITZ, HOWE, AND SAUNDERS
FIG. 1. The mean parameter values for the fourth-, seventh-, and ninth-graders. The memory parameters are e (probability of correct encoding) and d (conditional probability of correctly associating the cues with each of the subclasses). The reasoning parameters are u (probability of understanding), s (probability of subclass rule), and i (probability of idiosyncratic rule or guessing).
with the parameters e and i varied across grade. The major change in the reasoning parameters s and u occurred between middle and high working memory groups with fourth- and seventh-graders (s decreased and u increased). Note that
WORKING MEMORY AND CLASS-INCLUSION REASONING
183
FIG. 2. The mean parameter values for low, medium, and high working memory at each grade. The memory parameters are e (probability of correct encoding) and d (conditional probability of correctly associating the cues with each of the subclasses). The reasoning parameters are u (probability of understanding), s (probability of subclass rule), and i (probability of idiosyncratic rule or guessing).
184
RABINOWITZ, HOWE, AND SAUNDERS
for fourth-graders the values of u were .04, .07, and .31 for low, medium, and high working memory groups, respectively. The primary reason why the values of u were near 0 in our earlier studies (Howe & Rabinowitz, 1996; Rabinowitz et al., 1989) is that high working memory children constitute a small percentage of fourth-graders (9% in the current experiment). Thus, their understanding of class inclusion reasoning minimally affects fourth-grade averages. In contrast to the fourth- and seventh-graders, there were substantial changes in the values of s and u between low and medium working memory groups, as well as between medium and high working memory groups, in ninth-graders (also in college students [Howe et al., 1998]). The different patterns across grades probably reflects an interaction between nonmemorial skills and working memory on class-inclusion reasoning. In the age range studied, high working memory children demonstrate an understanding of class-inclusion reasoning. It appears that nonmemorial skills must approach those obtained by ninth-graders before medium working memory participants, as compared to low working memory participants, demonstrate improved reasoning. Memory load effects. The effects of manipulating memory load on the model’s parameters tended to be more dependent on grade than on individual differences in working memory (see the condition-wise tests appearing in Table 10). For this reason, parameter values averaged across working memory are presented at each grade in Fig. 3. Because of the wide range of skill levels represented in the current experiment, it is surprising that the data do not deviate markedly from the Howe et al. (1998) finding with college students that the effects of memory load and individual differences in working memory are independent. Although there are discrepancies at each age, inspection of the parameter values in Tables 5, 6, and 7 reveals qualitatively similar changes of memory load at each level of working memory. Inspection of Fig. 3 and Tables 5 and 10 reveals that memory load primarily affected the memory parameter d at the fourth grade. As memory load increased, d decreased; that is, fourth-graders were more likely to associate the wrong cue value (e.g., thought there were 7 dogs rather than 3 dogs) with a dimension correctly encoded as different (e.g., number) with the statement absent when the question appeared. A similar, but weaker, trend occurred with seventh- and ninthgraders. Note that the number of significant memory load differences involving d decreased across grades (see Table 10). The effect of memory load on d probably accounts for the four significant memory load effects that were obtained in the choice accuracy regression analysis. Furthermore, consistent with the conjecture based on the choice accuracy regression analyses, significant difference in the memory parameters were more likely to follow significant condition-wise tests involving memory load (80%) than involving either age (44%) or individual difference in working memory (44%), x2(2) ⫽ 6.43, p ⬍ .05. Inspection of Fig. 3 also reveals a trade-off between subclass–subclass (s) and class-inclusion (u) reasoning. At all three grades, s increased and u decreased as memory load increased. However, the changes in s and u are sta-
WORKING MEMORY AND CLASS-INCLUSION REASONING
185
FIG. 3. The mean parameter values for statement-present and statement-absent memory loads at each grade. The memory parameters are e (probability of correct encoding) and d (conditional probability of correctly associating the cues with each of the subclasses). The reasoning parameters are u (probability of understanding), s (probability of subclass rule), and i (probability of idiosyncratic rule or guessing).
186
RABINOWITZ, HOWE, AND SAUNDERS
tistically significant only with fourth-grade (only with color-equivalence problems) and seventh-grade (all problems) high working memory children (see Table 10). Howe and Rabinowitz (1996) did not find memory load to affect u with their fourth-grade sample. In light of the current data, their failure to find such an effect is not surprising because the effect is restricted to one of the four problem types presented to high working memory children, who constituted only 9% of the current sample. Because high working memory seventh-graders constituted 31% of the current sample, the effect of memory load on u obtained by Howe and Rabinowitz would be expected based on the current findings. The relatively restricted effects of memory load on the performance of the ninthgraders obtained in the current experiment, as compared to that of Howe and Rabinowitz, probably reflect the much smaller number of children involved in each condition-wise test because low, medium, and high working memory children were in different comparisons as well as the fact that only one dimension, as compared to two dimensions, appeared in each statement. Howe and Rabinowitz found memory load effects to be more restricted when information load was low (i.e., one dimension appeared in each statement) with ninthgraders. DISCUSSION In the current study, we were interested in the relationship among individual differences in memory capacity, memory load, skills, and the development of class-inclusion reasoning. Individual differences in working memory were defined using the speak-span task. Memory load was manipulated by having the statements either present or absent when the questions appeared on the screen. Skills were assumed to be monotonically related to the age of the participants, and some of the relevant skills were estimated using the mathematical model. Consistent with this assumption, older children read statements and answered questions more quickly than did younger children. Likelihood ratio tests and regression analyses were used to evaluate the relationships. All three variables affected class-inclusion reasoning. On average, reasoning improved as skills (i.e., age) and memory capacity increased and memory load decreased. The findings were consistent with our earlier class-inclusion work (Howe & Rabinowitz, 1996; Howe et al., 1998; Rabinowitz et al., 1989) and extended the database to individual differences in memory capacity across a considerable age span. Three issues that seem particularly germane to the current findings (the utility of the speak-span test, determinants of developmental changes in class-inclusion reasoning, and educational implications) are discussed below. The Utility of the Speak-Span Test Three criteria (monotonic increases in scores as a function of age, predictive validity, and construct validity) to assess the utility of the speak-span task as a definition of working memory were noted. In general, the task proved useful in
WORKING MEMORY AND CLASS-INCLUSION REASONING
187
all three respects. The correlation between age and speak-span score was positive, and scores increased monotonically across grades. With regard to predictive validity, with age partialled out, speak span was a significant predictor of choice latencies with class-inclusion questions and of choice accuracy in 7 of the 8 analyses in which the regression coefficients were significant. Furthermore, speak-span scores were positively correlated with choice accuracy for all problem types. Using the mathematical model, 43 of the 72 speak-span condition-wise tests were significant. It is worth emphasizing that the analyses involving speak-span scores were not restricted to low and high scores, as has been typical in language comprehension research using a reading-span measure (e.g., Just & Carpenter, 1992). Thus, with fourth-, seventh-, and ninthgraders, the entire range of speak-span scores was useful in predicting classinclusion behaviors. With regard to construct validity, most of the significant relationships obtained with the speak-span scores were consistent with expectations in that (a) speakspan scores increased with age, (b) the memory parameter d tended to increase as speak-span scores increased, and (c) reasoning improved with increasing speakspan scores (i.e., class-inclusion understanding [u] increased and the use of subclass–subclass reasoning [s] decreased). The latter finding characterized only the change from medium to high speak-span scores in fourth- and seventh-graders. Two of the findings associated with the speak-span scores were surprising. First, in earlier work, there was little evidence that 10-year-olds engage in class-inclusion reasoning when information is presented only verbally (e.g., Howe & Rabinowitz, 1996; Piaget, 1921; Rabinowitz et al., 1989). In fact, Howe and Rabinowitz (1996) found memory load effects to be restricted to the memory parameter d. With the exception of the color problems with the statement absent, high speak-span fourth-graders clearly used class-inclusion reasoning in the current study. Second, although we had no prior knowledge about the relationship between speak-span scores and choice latencies, the fact that age was negatively correlated while speak-span scores were positively correlated with class-inclusion question choice latencies was unexpected. Apparently, older children made class-inclusion choices more rapidly than did younger children, but higher speakspan children were able to inhibit their initial responses and think about the difficult class-inclusion questions before choosing. In general, we have been very skeptical about the value of capacity measures for both empirical and theoretical reasons (Howe & Rabinowitz, 1990). The results obtained in the current study with 9- to 16-year-olds using the speak-span task, and in an earlier study with college students using the reading-span task (Howe et al., 1998), dampen our skepticism. In addition, Towse et al. (1998) recently found that a variety of capacity measures are intercorrelated and “reflect common properties” (p. 213). Although a great deal of additional work is required to understand the mechanisms associated with these measures, it is becoming more apparent to us that the related ideas of capacity/working memory deserve consideration in developmental analysis.
188
RABINOWITZ, HOWE, AND SAUNDERS
Determinants of Developmental Changes in Class-Inclusion Reasoning We have used Norman and Shallice’s (1986) willed attention model as an aid to understanding the interdependence of memory and reasoning apparent in classinclusion performance and have argued that this interdependence is characteristic of reasoning and problem solving in general. Although their model is not sufficiently detailed to account for all of the changes in the relationship of memory and reasoning across age groups (see Howe & Rabinowitz, 1996), their assumptions are consistent with the hypothesis that the development of both skill and working memory capacity should be important determinants of age-correlated changes in performance. In the typical developmental study, skill level and working memory capacity are confounded. Thus, age-correlated changes in performance might reflect either or both of these variables. In the current study, we attempted to separate skill from working memory capacity by using the speak-span task as a definition of working memory. It was assumed that age represented skill development and that speak-span scores represented capacity. Furthermore, some of the relevant skills were estimated using the mathematical class-inclusion model. Although there certainly would have been some conceptual advantages in assessing skill levels for all of the sub-skills thought to be important in the current task (e.g., syntactic, semantic, reading), these were outweighed by (a) the impracticality of measuring the large number of sub-skills thought to be important; (b) the unavailability of sufficient knowledge to generate a complete list of relevant sub-skills; and (c) the hypothesis that age, an omnibus variable, would be a useful developmental marker for all of the relevant sub-skills. The contributions of age, speak span, and memory load were assessed using both regression and likelihood ratio tests. The advantage of regression is that it is possible to assess the contribution of each of the variables with the effects of the other variables removed statistically. Regression, however, is indifferent to the underlying psychological processes that account for performance. The importance of the mathematical model is that the relationship between the hypothetical processes is specified and can be evaluated (see Batchelder & Riefer, 1999). Both types of data analyses produced outcomes that are consistent with the assumption that changes in both skills and working memory capacity underlie developmental change in class-inclusion reasoning. However, the effects of the two variables were not independent. Ninth-graders (see Fig. 3) and college students (Howe et al., 1998), but not fourth- and seventh-graders, with low and medium working memories did manifest meaningful differences in class-inclusion reasoning. By contrast, meaningful reasoning differences characterized medium and high working memory participants at all ages tested thus far. It should be emphasized that if Winer (1980) correctly concluded that 10-year-olds have the necessary skills to solve class-inclusion problems, then the comparisons among fourth-, seventh-, and ninth-graders reflect a continuum of expertise (i.e., how quickly and accurately each skill is implemented), not the absence of necessary skills in fourth-graders and the presence of such skills in the older children.
WORKING MEMORY AND CLASS-INCLUSION REASONING
189
The mechanisms that account for the interaction between skill level and working memory need to be explored. Cognition, Education, and the Representativeness of Class-Inclusion Problems When class-inclusion questions are presented, participants are asked to compare some feature of a class to one of its subclasses. Because such questions are asked infrequently but are similar to frequently requested comparisons of subclasses, it is important to consider whether the information obtained from studying class inclusion is representative of problem solving or is task specific. In our earlier work, we argued that the task is representative of problem solving and that the findings are relevant to a theory of instruction. If the quality of reasoning depends on memory load, whether it be experimentally manipulated or a consequence of individual differences, then reasoning should be facilitated by reducing the memory demands of the task. As can be seen in Figs. 2 and 3, the current findings are consistent with this conclusion. Furthermore, it appears that experimentally manipulated memory load and individual differences in working memory independently influence the class-inclusion reasoning of fourth-, seventh-, and ninth-graders as well as college students (Howe et al., 1998). If the reasoning–remembering relationship apparent in our class-inclusion work is not domain specific, then it should be evident in any body of knowledge that is hierarchically organized. In particular, this relationship should play an important role in both teaching and understanding mathematical performance. Reasoning/Problem difficulty should be monotonically related to the load carried in working memory. Not only would the amount of information specified in the problem determine the load, but the number of principles that need to be recalled or reconstructed, and the number of activities that need to be completed, also would be determinants. From our perspective, the larger the number of skills that become automatic and the fewer activities that need to be completed, the more expert the performance. Thus, it should be more efficient to solve complex problems by (a) remembering rather than reasoning about basic principles, especially if the basic principles are overlearned and the memory retrieval is automatic; and (b) performing basic arithmetic operations mentally rather than having to rely on a calculator. If arithmetic facts are automatically retrieved, then in most instances doing mental calculations while solving problems should not require the redeployment of attention associated with the use of a calculator. Four recent studies (Ashcraft & Kirk, 2001; Campbell & Xue, 2001; Haverty, 1999; Klein & Bisanz, 1999) and our work in attempting to teach the laws of exponents to university students with poor skills in algebra are relevant to these hypotheses and support the claim that the interdependence of remembering and reasoning is not domain specific. In a university population, Ashcraft and Kirk (2001) demonstrated that one of the ways in which mathematical anxiety interferes with mathematical reasoning is by reducing working memory. The correlation between mathematical anxiety and computational span was ⫺.40 (p ⬍ .01), with a word-based measure of working memory partialled out. Klein and Bisanz
190
RABINOWITZ, HOWE, AND SAUNDERS
(1999) found that 4-year-old children’s error rates on nonverbal arithmetic problems were closely related to the maximum number of units that needed to be held in working memory to solve each of the problems (r2 ⫽ .88). Haverty (1999) trained seventh-graders to generate either the 17 or 19 multiplication tables but not both the 17 and 19 tables. Following their mastery of these facts, the children were presented inductive reasoning problems at a variety of difficulty levels by providing them with tables, each of which contained six x,y pairs, and requiring them to describe the mathematical relationship between the variables represented in each table. The prior learning of the relevant multiplication facts facilitated solving the most difficult problems, for example, y ⫽ 17(x ⫹ 1). Thus, it appears that the automatic retrieval of number facts facilitated fairly abstract reasoning. Campbell and Xue (2001) demonstrated that differential amounts of practice over more than a decade facilitates arithmetic skills. They presented 72 students registered in undergraduate and graduate programs with 360 arithmetic questions. Participants were required to add three one- or two-digit numbers, divide two- or three-digit numbers by single-digit numbers, subtract two-digit numbers from two-digit numbers, and multiply two-digit numbers by one-digit numbers. The number of these problems that participants successfully completed in 15 min was predicted (R2 ⫽ .59, p ⬍ .001) by reported calculator use before entering the university (b ⫽ ⫺.29, p ⫽ .005) and mean reaction time to answer simple arithmetic questions (e.g., 4 ⫻ 3) (b ⫽ ⫺.70, p ⬍ .001). Presumably, early practice in single-digit arithmetic led to faster access to basic facts, which transferred to more complex arithmetic, whereas access to calculators reduced the number of opportunities to practice more complex arithmetic. Both of these possibilities are consistent with the hypothesis that practice automates access to basic facts, reduces working memory load, and facilitates reasoning. In our work (May, Rabinowitz, & Mantyka, 2001), a computer program was developed to teach university students the laws of exponents. Earlier efforts to remediate the algebraic deficiencies of these students were generally successful but failed with the laws of exponents. Both speed and accuracy were required of the students to advance in the program. As the students progressed through the program, principles that were already mastered were mixed in new problem sets, and speed and accuracy needed to be maintained. The students experiencing the program performed with much greater accuracy on tests of the laws of exponents than did conventionally tutored students, demonstrating that either the retrieval or automatic retrieval of basic algebraic principles facilitates performance on complex problems. Thus, with the evidence currently available, the reasoning– remembering relationship apparent in class-inclusion data also characterizes mathematical problem solving. CONCLUSIONS Three classes of models of historical interest have been offered to account for developmental changes in class-inclusion reasoning: the Piagetian model based on changes in the structure of reasoning (e.g., Inhelder & Piaget, 1969), the neo-
191
WORKING MEMORY AND CLASS-INCLUSION REASONING
Piagetian models based on changes in resource availability (e.g., Case, 1985), and the information processing models based on changes in underlying skills (e.g., Trabasso et al., 1978). The first two classes of models are usually associated with developmental stages, whereas information processing models are usually associated with continuous developmental change. We found that changes in both resource availability and skills seem to account for developmental changes in class-inclusion reasoning. Both sources of change seem to vary in a continuous manner with age. However, skill level and resource availability seem to interact, as the class-inclusion reasoning of low and medium resource-level participants is similar in both fourth- and seventh-graders but is different in ninth-graders and university students. In general, the findings are consistent with the assumptions made by Norman and Shallice (1986) in their resource model. But as we have argued (Howe & Rabinowitz, 1996; Howe et al., 1998), additional assumptions need to be added to account for some aspects of the age-correlated changes. To the extent that the developmental changes in class-inclusion reasoning (Howe & Rabinowitz, 1996; Rabinowitz et al., 1989; current study) are representative of the cognitive changes that occur across most problem-solving tasks, developmental researchers should vary both age and individual differences in working memory in their studies, and developmental theorists should incorporate individual differences in both capacity and skill in their models. APPENDIX Words Presented in the Speak-Span Task adult art body chair cub driftwood exercise food glass history hungry large messy mouse noise planet sad shiver spider storm temper whales
Africa astronaut Canada change diamonds earth famous forest gold hobby hunting learn migration museum Ontario plants school silence stars strange tiny winter
airplanes basketball carpenter chipmunks different eat fat fossil grown-ups hockey iron listening mistake music painting poison season size steal summer Toronto worm
alcoholic beach cavemen complain dinosaurs elephant feeling geese healthy hole jungle lungs money music park Prime-Minister secret snake stomach summer tough year
animal blind centimetre cone dollar excuse flowers gills hibernate hot kitten medicine Mount-Everest nest picture province serious snowflakes stories team wallet zoo
Note. Due to clerical error, the words music and summer appeared twice in the list.
192
RABINOWITZ, HOWE, AND SAUNDERS
REFERENCES Ashcraft, M. H., & Kirk, E. P. (2001). The relationships among working memory, math anxiety, and performance. Journal of Experimental Psychology: General, 130, 224–237. Batchelder, W. H., & Riefer, D. M. (1999). Theoretical and empirical review of multinomial process tree modeling. Psychonomic Bulletin & Review, 6, 57–86. Brainerd, C. J., & Reyna, V. F. (1995). Autosuggestibility in memory development. Cognitive Psychology, 28, 65–101. Campbell, J. I. D., & Xue, Q. (2001). Cognitive arithmetic across cultures. Journal of Experimental Psychology: General, 130, 299–315. Campbell, R. L. (1991). Does class inclusion have mathematical prerequisites? Cognitive Development, 6, 169–194. Case, R. (1985). Intellectual development: Birth to adulthood. New York: Academic Press. Chapman, M. (1987). Piaget, attentional capacity, and the functional implications of formal structure. In H. W. Reese (Ed.), Advances in child development and behavior (Vol. 20, pp. 289–334). New York: Academic Press. Dagenais, Y. (1973). Analyse de la cohérence entre les groupements d’ addition des classes, de multiplication des classes et d’ addition des relations asymétriques [Cluster analyses of the groupings addition of classes, multiplication of classes, and addition of asymmetric relations]. Unpublished doctoral dissertation, University of Montreal. Daneman, M., & Carpenter, P. A. (1980). Individual differences in working memory and reading. Journal of Verbal Learning and Verbal Behavior, 19, 450–466. Halford, G. S. (1984). Can young children integrate premises in transitivity and serial order tasks? Cognitive Psychology, 16, 65–93. Haverty, L. A. (1999). The importance of basic number knowledge to advanced mathematical problem solving. Unpublished doctoral dissertation, Carnegie Mellon University. Hodkin, B. (1987). Performance analysis in class inclusion: An illustration with two language conditions. Developmental Psychology, 23, 683–689. Howe, M. L., & Rabinowitz, F. M. (1990). Resource panacea? Or just another day in the developmental forest. Developmental Review, 10, 125–154. Howe, M. L., & Rabinowitz, F. M. (1991). Gist, another panacea? Or just the illusion of inclusion? Developmental Review, 11, 305–316. Howe, M. L., & Rabinowitz, F. M. (1996). Reasoning from memory: A lifespan inquiry into the necessity of remembering when reasoning about class inclusion. Journal of Experimental Child Psychology, 61, 1–42. Howe, M. L., Rabinowitz, F. M., & Powell, T. L. (1998). Individual differences in working memory and reasoning–remembering relationships in solving class-inclusion problems. Memory & Cognition, 26, 1089–1101. Inhelder, B., & Piaget, J. (1969). The early growth of logic in the child: Classification and seriation. New York: Norton. Just, M. A., & Carpenter, P. A. (1992). A capacity theory of comprehension: Individual differences in working memory. Psychological Review, 99, 122–149. Klein, J. S., & Bisanz, J. (1999, April). Preschoolers doing arithmetic: The concepts are willing, but the working memory is weak. Poster session presented at the biennial meeting of the Society for Research in Child Development, Albuquerque, NM. Logan, G. D. (1988). Towards an instance theory of automatization. Psychological Review, 95, 492–527. Markman, E. M. (1978). Empirical versus logical solutions to part–whole comparison problems concerning classes and collections. Child Development, 49, 168–177. May, S., Rabinowitz, F. M., & Mantyka, D. (2001). Teaching the rules of exponents: A resource based approach. In Proceedings of the Symposium about Mathematical Understanding. College of Education, University of Saskatchewan, Saskatoon, Saskatchewan. Müller, U., Sokol, B., & Overton, W. F. (1999). Developmental sequences in class reasoning and propositional reasoning. Journal of Experimental Child Psychology, 74, 69–106.
WORKING MEMORY AND CLASS-INCLUSION REASONING
193
Norman, D. A., & Shallice, T. (1986). Attention to action: Willed and automatic control of behavior. In R. J. Davidson, G. E. Schwartz, & D. Shapiro (Eds.), Consciousness and self-regulation: Advances in research (Vol. 4, pp. 1–18). New York: Plenum. Pascual-Leone, J. (1987). Organismic processes for neo-Piagetian theories: A dialectical causal account of cognitive development. International Journal of Psychology, 22, 531–570. Piaget, J. (1921). Essai sur quelques aspects du developpement de la notion de partie chez l’enfant [Essay of some aspects of the development of the concept of part in children]. Journal de Psychologie, 18, 449–480. Piaget, J. (1952). The child’s conception of number. New York: Norton. Rabinowitz, F. M., Howe, M. L., & Lawrence, J. A. (1989). Class inclusion and working memory. Journal of Experimental Child Psychology, 48, 379–409. Reyna, V. F. (1991). Class inclusion, the conjunction fallacy, and other cognitive illusions. Developmental Review, 11, 317–336. Shipley, E. F. (1979). The class-inclusion task: Question form and distributive comparisons. Journal of Psycholinguistic Research, 8, 301–331. Siddal, J. N., & Bonham, D. J. (1974). Optimization subroutine package. Unpublished materials, Department of Mechanical Engineering, McMaster University, Hamilton, Ontario. Towse, J. N., Hitch, G. J., & Hutton, U. (1998). A reevaluation of working memory capacity in children. Journal of Memory and Language, 39, 195–217. Trabasso, T., Isen, A. M., Dolecki, P., McLanahan, A. G., Riley, C. A., & Tucker, T. (1978). How do children solve class-inclusion problems? In R. S. Siegler (Ed.), Children’s thinking: What develops? (pp. 151–180). Hillsdale, NJ: Erlbaum. Winer, G. A. (1980). Class-inclusion reasoning in children: A review of the empirical literature. Child Development, 51, 309–328. Received October 4, 1999; revised August 16, 2001