JOURNAL
OF EXPERIMENTAL
CHILD
PSYCHOLOGY
48, 379-409 (1989)
Class Inclusion and Working Memory F.
MICHAEL
RABINOWITZ, Memorial
MARK
L. HOWE, AND JOAN A. LAWRENCE
University
of Newfoundland
Three experiments were conducted to investigate the general issue of the relationship between memory and reasoning and the specific issue of subskills used in answering class-inclusion questions. Semantic skills of first-grade children were investigated in the Pilot Experiment. In Experiment I, an attempt was made to determine if fourth graders, seventh graders, and college students mimicked the first graders tendency to appropriately encode the superordinate class, but answer class-inclusion questions as though only subclass comparisons were required. A mathematical model and a computer-based methodology were developed for this purpose. Unlike previous research, in Experiment I fourth and seventh graders showed no understanding of class-inclusion logic. Because of the additional memory demands imposed by the questions used in Experiment 1, a third experiment was conducted to evaluate if memory load determined the quality of class-inclusion reasoning. The results obtained across the three experiments were interpreted as reflecting the need for a new conceptualization of the class-inclusion task. Performance seems to be dependent on subjects’ abilities to integrate relevant subskills, rather than on deficient reasoning or missing subskills, consistent with a resource-limited, willed-attention, workingmemory model. 0 1989 Academic Press, Inc.
THE LIMITED RESOURCES HYPOTHESIS The hypothesis that central resources used in information processing are limited has gained in importance in the last decade and often serves as a critical concept in the construction of cognitive theories (e.g., Baddeley, 1986; Case, 1985). However. Navon (1984, 1985) and Hirst and Special thanks are due to Mrs. Margaret Rumsey and Mr. William Whelan, principals of Holy Cross Primary School and St. Paul’s Elementary School, respectively; to Mrs. Geraldine Roe, Associate Superintendent of Curriculum and Instruction of the Roman Catholic School Board of St. John’s for making it possible to collect the data; and to Charles Brainerd, Carole Menig-Peterson, Judy Tudiver, and two anonymous reviewers for their suggestions about the manuscript. The research was supported by Natural Sciences and Engineering Research Council Canada Grants A0304 and A2017 to the first author and A3334 to the second author. Requests for reprints, more detailed descriptions of the Pilot Experiment and the analyses of variance reported in Experiments 1 and 2, and a listing of the computer program should be addressed to either F. Michael Rabinowitz or Mark L. Howe, Department of Psychology, Memorial University of Newfoundland. St. John’s, Newfoundland. Canada AIB 3X9. 379
0022-0965189 $3.00 Copyright D 1989 by Academic Press, Inc. All rights of reproduction in any form reserved.
380
RABINOWITZ,
HOWE,
AND
LAWRENCE
Kalmar (1987) provide equally satisfactory nonresource hypotheses. Furthermore, Brainerd and Kingma (1984, 1985) empirically challenged the resource hypothesis with data reflecting that short-term memory and reasoning are independent in several classical developmental tasks such as class inclusion and conservation. As argued elsewhere (Howe & Rabinowitz, 1989) not only are these alternative theoretical ideas potentially equally satisfactory, they are complementary rather than competitive. Classical factorial manipulations are not sufficient to disentangle the theoretical alternatives. It is only through scaling, measurement, and modeling activities that these theoretical issues can be broached. In this paper, we illustrate these approaches with class inclusion. A FRAMEWORK
FOR MODELING
CLASS INCLUSION
The standard class-inclusion question (e.g., Are there more dogs or more animals?) was used by Piaget (1952) to assess the child’s understanding of both category membership and number. Subsequent work (see Trabasso, Isen, Dolecki. McLanahan, Riley, & Tucker, 1978; and Winer, 1980 for reviews) suggests that many skills (e.g., linguistic, logical, memorial, syntactical) are needed to answer class-inclusion questions. Although young children may indeed lack some of the skills, it is unlikely that lo-year-olds, 13-year-olds, and college students share these deficiencies. Yet, subjects of these ages do not perform perfectly in classinclusion experiments. The alternate theoretical framework adopted in this paper is based on Norman and Shallice’s (1986) discussion of willed and automatic control of behavior. It is hypothesized that class-inclusion performance deteriorates as the load on working memory increases. This deterioration primarily reflects the manner in which the subjects interpret the class-inclusion question rather than faulty memory. If this hypothesis is correct, developmental changes in class inclusion are, at least in part, a consequence of how automatically subjects can perform the necessary subskills rather than whether they can perform them at all, given unlimited time. PILOT EXPERIMENT
Two hypotheses relevant to the construction of the mathematical model, concerning the ways in which semantic factors might affect classinclusion performance, were investigated in the Pilot Experiment. First, the child might not be able to decode the class-inclusion question even though he or she understands every word (see Shipley, 1979). If so, chance performance should be obtained both on the standard class-inclusion problem, involving the superordinate class and the major subclass, and on a varian; in which the superordinate class and the minor subclass are compared. Second, the child might interpret the “or” ex-
CLASS
INCLUSION
381
elusively and treat the standard question as a request to compare the two subclasses (see Piaget, 1952). In this case, the child would always answer the standard question incorrectly because the major subclass is compared with the superordinate class, but would answer the variant involving the minor subclass correctly. We investigated these hypotheses with a population of 6-year-old children. The mean number of correct responses was 4.75 (out of a possible 5.00) when the questions involved comparison of the minor subclass with the major one, 3.88 when the comparison was between the minor subclass and the superordinate class, and 1.67 when the comparison was between the major subclass and the superordinate class. These differences yielded a significant main effect of type of question, F(2, 44) = 58.95, p < .OOl. Scheffe follow-up analyses confirmed that all pairwise differences were significant, S(2, 44) > 2.98, ps < .025. Hodkin (1987, personal communication) found similar question-type effects using a different methodology with 4-, 5-, 6-, and 8-year-olds. Another interesting finding was obtained from the children who were asked “How many (superordinate-class name) are there?” immediately following errors on the class-inclusion questions. Eight children made a total of nine errors. The other 16 children never erred. EXPERIMENT 1 In the Pilot Experiment, the children’s answers to the class-inclusion questions reflected an exclusive or interpretation. However, inasmuch as they almost always correctly stated the number of items in the superordinate class immediately following these incorrect class-inclusion answers, it appears that they appropriately encoded the superordinate class. We attempted to determine whether the discrepancy between the manner in which the superordinate class and the subclasses are encoded and the way the class-inclusion question is answered only characterizes young children. To investigate this question, mathematical models were developed and color-based, as well as number-based, subclass-subclass, minor-subclass vs superordinate-class, and class-inclusion problems were used. Fourth graders, seventh graders, and college students were selected as subjects in Experiment 1 for two reasons. First, they could read computer-generated verbal statements. These stimuli permitted precise control of stimulus sequences and measurement of choice accuracy, reading latency, and choice latency. The latter was expected to be a sensitive index of the manner in which the questions were decoded. Second, subjects of this age had performed skillfully in other experiments. This was important here because in Experiment 1 the task information load was higher than that in earlier work (information about
382
RABINOWITZ,
HOWE,
AND
LAWRENCE
both the color and the number of subclass members was presented and queried) and because the computer test was novel and perhaps difficult. Age Trends and Working Memory In most of the studies reviewed by Winer (1980), fourth-grade and older children answered over 50% of the class-inclusion questions correctly. He cited several studies in which the performance of lo-t-yearold children was nearly perfect. In other studies performance of 10+ children was much poorer. Some information is available on adult performance. Both Sinnott (1975) and Denney and Cornelius (1975) found 30-year-olds to perform better than 60- to 70-year-olds on class-inclusion questions. Denney and Cornelius (1975) found that 85% of the 30-yearolds performed perfectly. Sinnott (1975) also found excellent performance in the 30-year-old group, but only when using geometric stimuli. When verbal statements about people were used, performance dropped to 63% correct. Markman (1978) found that children younger than 11 years tended to answer standard number-based class-inclusion questions by reasoning empirically (e.g., they would compare three roses to five flowers), while older children and adults often appreciated that the solution depended on logical necessity rather than the information presented (i.e., there would always be more flowers than roses or tulips because “flowers” is the superordinate class). It may be that most children perform poorly on class-inclusion problems because they attempt to hold specific information in working memory while reasoning about class relationships. Furthermore, the relatively poor performance of Sinott’s (1975) adult subjects when presented verbal statements about people might reflect adults attempt to hold empirical information in working memory while reasoning about some class relationships. Thus, there may be a tradeoff between memory demands and class reasoning at all points in the developmental spectrum. Color Problems Including color, as well as number, information about subclass members was motivated by our interest in determining whether some variants of the class-inclusion problem might prove to be more difficult than others. Color and number class-inclusion questions differ in several ways. In addition to the syntactic and semantic differences discussed below, color problems do not involve counting and require both correct reasoning and retention of cue-subclass pairings. Lawrence (1980) introduced a version of the class-inclusion problem in which either color or number could serve as the relevant dimension. For example, the subject was presented with pictures of two red tomatoes and two green apples. The number-based class-inclusion question was
CLASS INCLUSION
383
“Are there the same number of tomatoes as things to eat?” The colorbased class-inclusion question was “Is the reddest tomato the same color as the reddest thing to eat ?” Note that the color-based question is syntactically and semantically more complex. In addition to the equivalence judgment required in the number problem, two superlative judgments were called for in the color-based question. In Experiment I, we expected that subjects would take longer to answer color than number questions because of the additional syntactic and semantic complexity associated with the color questions. On the basis of Shipley’s (1979) semantic analysis, it was predicted that subjects would answer the subclass-subclass comparisons more quickly than the comparisons involving the superordinate class. No difference in choice latencies was expected when comparing the minor-subclass vs superordinate-class to the class-inclusion questions because these two comparison types do not differ in semantic or syntactic complexity. Mathematical Models In order to better understand the mechanisms involved in class inclusion, as well as how these mechanisms develop, it is necessary to secure a method of formal measurement that generates estimates of the hypothesized underlying processes. For this purpose, we constructed a methodology and a conceptual model to diagnose subjects’ understanding of class inclusion and to evaluate the manner in which the subclasses were encoded at test. The methodology involved presenting a statement followed by a question and three possible answers. Each statement consisted of numerical and color information about items in two subclasses. The form of the statements were “There are n, C, X,‘S and n2 c2 x2’s” The it’s refer to number, the c’s to color, and the x’s to item type. An example of one of the statements is “There are three brown dogs and * two white cats.” One of 11 types of questions representing the combination of relevant dimension (number or color), number of items in the two subclasses (same or different, the color associated with each subclass was always different), and type of comparison (subclass-subclass, minor-subclass vs superordinate-class, class inclusion) followed each statement. It was impossible to present the 12th and missing minor-subclass vs superordinateclass number-equal question because there is not a minor subclass unless the number of items in each subclass is different. When the number of items in the subclasses was the same, equivalence questions were used for both number and color dimensions, otherwise superlative judgments were required. The form of the number-different questions was “Are there more x1’s or more x2’s?” (e.g., Are there more cats or more animals?). The form of the number-same questions was “Are there the same number of xl’s as x,‘s?” (e.g., Are there the same number of dogs
384
RABINOWITZ,
HOWE,
AND
LAWRENCE
as cats?). The form of the color-different questions was “Is the c-est x, c-er than the c-est x~?” (e.g., Is the brownest cat browner than the brownest animal?). The form of the color-same questions was “Is the c-est xl the same color as the c-est x2?” (e.g., Is the brownest dog the same color as the brownest cat?). The same three alternative answers, randomly ordered, followed each number relevant question: more y,‘s, more y2’s, and same number. A different set of three alternatives, randomly ordered, followed each color relevant question: y,‘s c-er, y,‘s c-er, and same color. Like the x’s in the questions, the y’s stand for either the subclasses or the relevant superordinate class and the c-ers stand for a superlative color label (e.g., redder). Thus, following the statement about dogs and cats, the numberrelevant class-inclusion choices would have been more dogs, more animals, and same number. The color-relevant subclass-subclass choices would have been dogs browner, cats browner, and same color. Three dependent variables were of interest: number of correct choices, reading latencies. and choice latencies. Choice-Latency
Model
Mathematical models were developed to account for the choice and choice-latency data. The chronometric choice-latency model was designed to account for individual-subject data. Because Eq. (I), based on variables directly manipulated, adequately accounted for choice latenties, further conceptual development was not attempted, ln(c1 + 1) = a,ln(seq + 1) + a2n + a,d + a4q, + a5q2 + a,dq, + a,dq, + a,ln(rl
+ I) + ag.
(1)
The a’s are parameters estimated using standard linear regression, cl is choice latency, seq represents the sequential order in which the 48 statements were presented, n represents the numerosity of the subclasses (different = 0, same = 1). d represents dimension (number = 0, color = l), the q’s represent the three question types (subclass-subclass: q1 = q2 = 0; minor-subclass vs superordinate-class: ql = 0, q2 = 1; class inclusion: 4, = 1, q2 = 0), the dq’s represent the interaction between dimension and question type, and rl is reading latency. Choice Model
The choice model was developed to characterize the manner in which groups of subjects represented the information in the statements and interpreted the questions when choosing among the three possible answers to each question. The model is consistent with Brainerd and Kingma’s (1985, pp. 210-211) generalized working memory analysis of childhood cognition paradigms. It should be emphasized that the two
CLASS INCLUSION
385
memory-related parameters in the model, e and d, are estimates of the manner in which information was encoded at retrieval (i.e., when questions were answered). Although these estimates are influenced by the way information was stored when the statements were presented and by the loss between storage and retrieval, they do not directly reflect these processes. Several findings obtained in the Pilot Experiment influenced the construction of the model. Recall that performance was not perfect on the subclass-subclass problems (95% correct) suggesting that some forgetting probably occurred in the Pilot Experiment. Subjects were correct only 78% of the time on the minor-subclass vs superordinate-class questions. Compared with the subclass-subclass questions, the poorer performance on the minor-subclass vs superordinate-class questions could not have resulted from exclusive or interpretations because such interpretations would always yield correct answers. Therefore, it appears that subjects sometimes either guess the answer or idiosyncratically interpret questions involving a comparison of superordinate class and subclass. For convenience, guessing an answer to questions involving the superordinate class and either subclass without attempting an interpretation and idiosyncratic interpretations of these questions will be collectively referred to as idiosyncratic interpretations in the mathematical model. Note that if the subject either forgets relevant information or interprets the classinclusion question idiosyncratically, he or she might fortuitously answer the class-inclusion question correctly. Finally, performance on classinclusion questions (33% correct) was poorer than performance on minorsubclass vs superordinate class questions suggesting that exclusive or interpretations of class-inclusion questions were used. Parameters. Different sets of equations were constructed for the number-same and number-different questions, while one set of equations was sufficient for the color-relevant questions because different colors were always associated with each of the two subclasses appearing in a statement. Three parameters were used to estimate the way questions involving the superordinate class and either subclass were interpreted (i = idiosyncratic, s = subclass-subclass, u = understanding, also see Hodkin, 1987). Since these interpretations are treated as mutually exclusive and exhaustive in the model, only two degrees of freedom are lost in their estimation, I=i-ts+u.
(Znd,ns,c)
Note that abbreviations follow the equation numbers in order to specify the appropriate reference(s) for each equation: number different (nd), number same (ns), and color (c). It was assumed that subjects always interpreted subclass-subclass questions correctly. The parameter e represented the probability of correctly encoding the
386
RABINOWITZ,
HOWE, AND LAWRENCE
values on the relevant dimension as same or different on the test, while 1 - e represented falsely encoding the relevant dimension as same or different. The parameter d, which was irrelevant to the number-same questions, represented the conditional probability of remembering the cue values associated with each subclass when the cues were correctly encoded as different. The conditional probability of associating each of the two cue values with the wrong subclass was 1 - d. Thus, four free parameters were estimated for the number-different and color problems, while only three free parameters were estimated for the number-same problems. The parameters and associated definitions are summarized in Table 1. Data space. In Experiment 1, the model was applied to, and parameters were estimated for, data broken down by grade (fourth, seventh, and college), dimension (number or color), and numerosity of items in a statement (same, different). Thus, four sets of parameters were estimated for subjects at each grade level. Three data types were defined for each question type: S = success, E, = error associated with correctly encoding the values on each subclass as same or different, and El = remaining error usually associated with incorrectly encoding the values on each subclass as same or different. For example, the E, error associated with the number-different class-inclusion question involving dogs and animals would be “more dogs,” while the E2 error would be “same number.” The E, error associated with the color-different subclass-subclass questions involving dogs and cats would be “cats browner,” while the E2 error would be “same color.” Different equations were developed for the subclass-subclass, ss, minor-subclass vs superordinate-class, mssc, and major-subclass vs superordinate-class or class-inclusion, ci, questions. These equations appear under Appendix A. TABLE DEFINITION
Parameter Encoding e d
Interpretation il s i
OF THE PARAMETERS
Definition
1 IN THE CHOICE
MODEL
of the parameter
Probability of correctly encoding the dimensional cues as same or different. Probability of associating the correct cue with each of the two subclasses if the relevant dimension is accurately encoded as different. Probability of understanding and accurately interpreting questions involving comparison of superordinate class and subclass. Probability of subclass-subclass interpretations of questions involving comparison of superordinate class and subclass. Probability of idiosyncratic interpretations of questions involving comparison of superordinate class and subclass.
CLASS INCLUSION
387
Method Subjects
The subjects were 10 males and 10 females in Grade 4, Grade 7, and college. Mean ages (age ranges) in months were 116 (97 to 133), 154 (146 to 178), and 244 (210 to 292), respectively. The children were recruited through the Roman Catholic School Board of St. John’s, while the college students were volunteers paid $1.75 for about 20 min of participation. Apparatus
and Materials
An Exidy Sorcerer computer, programmed in Microsoft Basic, was used to control the presentation of the verbal materials and to record reading times, decision times, and choices. A monitor (22.5 X 27 cm) was used to display the materials. The subjects responded by pushing one of the three buttons marked A, B, or C. The buttons were mounted on a metal box (13 x 18 x 5.5 cm) and were interfaced to the computer via the parallel port. The materials were organized into 48 units. Each unit contained a statement (e.g., There are three brown dogs and two white cats.), a question (e.g., Are there more animals or more dogs? or Is the brownest animal browner than the brownest dog?), and three alternatives to choose from when answering each question (e.g., same number, more dogs, more animals or same color, dog browner, animal browner). In Experiment 1, a particular set of descriptors (e.g., three brown dogs, two white cats) was associated with the same question type (e.g., class inclusion) for all subjects. The order in which the two descriptors appeared in a statement was randomized for each subject. A descriptor always consisted of a number, color, and noun. The numbers were always “three” and “two” preceding nonequivalence questions (e.g., Are there more animals or more dogs? or Is the brownest animal browner than the brownest dog?). With the exception noted in the following paragraph, both numbers were “four” preceding equivalence questions (e.g., Are there the same number of dogs as animals? or Is the brownest dog the same color as the brownest animal?). The colors and nouns were roughly counterbalanced across the I1 question types. The nouns appearing in a particular statement always belonged to the same superordinate class. Superordinate classes and nouns were chosen to represent categories and exemplars familiar to the subject (e.g., jewels-rubies and diamonds). Both the order in which the units appeared and the order of choices associated with each question were randomized for every subject. The order in which the two nouns appeared was counterbalanced across question type, so that the “correct” noun appeared first in half the questions (order of mention). Each problem type occurred twice in the 48 units and represented the factorial combination of question type (subclass-subclass, minor-subclass vs superordinate-class, class inclusion),
388
RABINOWITZ,
HOWE,
AND
LAWRENCE
dimension (number or color), numerosity of the subclasses (different or same), and order of mention. When the numerosity of the subclasses is the same, it is not possible to construct a minor-subclass vs superordinate-class equivalence comparison with number as the relevant dimension. Therefore, it was necessary to have different numbers associated with the subclasses. The numbers three and two were used. Using the above dogs and cats descriptors, the minor-subclass vs superordinate-class number equivalence question would have been “Are there the same number of cats as animals?” The minor-subclass vs superordinate-class color equivalence problem would have been “Is the brownest cat the same color as the brownest animal?” Procedure
Subjects were tested individually. The instructions appeared on the video monitor and contained the following information: there would be 48 questions, each question would be preceded by a statement containing the information needed to answer the question, and the questions should be answered by pushing the correct button. If the subject had any questions about the instructions, the experimenter answered them. The subject pressed any button and the first statement appeared. The subject controlled the amount of time the statement was visible by pressing any button when he or she completed reading the statement. This button press terminated a clock (reading latency) and cleared the screen. One second later the question and choices appeared. The subject indicated which choice he or she thought to be correct by pushing the appropriate button. Both the buttons and the choices were labeled A, B, or C. The choice response terminated a clock (choice latency), cleared the screen, and was followed in 1 s by the next statement. Results Because the descriptions generated with the mathematical models were of primary interest, analyses of variance of the reading latencies, choice latencies, and choice data are selectively described in Experiments 1 and 2. In both experiments, the statistical tests of interest were powerful because of the large number of data points associated with the withinsubjects effects (in each experiment, 60 subjects generated 48 data points for each dependent variable). In order to reduce the possibility of Type 1 errors, a .Ol significance level was adopted for the group data. Only the choice-latency regression analyses based on single-subject data are reported at the .05 level. In the analyses of variance of Experiment 1, grade and sex were between-subjects factors while dimension, numerosity, question type, and order of mention were within-subjects factors.
CLASS
INCLUSION
389
The clock used to measure latencies failed with one of the male college students. Cell means were substituted for the missing data. Reading Latencies Subjects’ reading latencies were longer when the numbers associated with the two subclasses were different (mean = 12.22 s) rather than the same (1 I.53 s), F( 1, 54) = 11.79, reflecting the sensitivity of the reading latency measure. Thus, subjects spend more time studying, and presumably attempting to store, the statements containing the greater amount of information. Unfortunately, the descriptors appearing in the statements were of differential reading difficulty. Of particular concern was that differential latencies were associated with question type, F(2, 108) = 11.49. The reading latencies were shortest if subclass-subclass questions followed the statements (11.25 s), of intermediate length if class-inclusion questions followed (12.03 s), and longest if minor-subclass vs superordinate-class questions followed (12.35 s). In order to ascertain whether the different reading latencies associated with the descriptors affected choice performance, two regression models were developed. In both regression models, reading latency was used a predictor variable. The results obtained with the choice-latency model (see Eq. (1)) are reported below. A similar model was constructed to predict choice data. Choices were scored 2 if correct, 1 if the error was consistent with the coding of cues as different or the same, and 0 otherwise. The partial correlation between choice score and reading latency (with all the remaining predictor variables partialed out) was not significant for any of the 60 subjects. The linear correlation between choice score and reading latency averaged across 59 subjects was - .04. Thus, there was no apparent relationship between reading latency and choice data. Choice Latencies Analysis of variance. The Grade x Dimension x Numerosity x Question interaction, F(4, 108) = 5.40, was significant. This interaction primarily reflects different patterns associated with age on the color-same problems. College students took 4.65 s longer to answer minor-subclass vs superordinate-class than class-inclusion questions, while fourth (2.80 s) and seventh (2.05 s) graders took longer to answer the classinclusion questions. This finding is the first indicant that suggests college students find minor-subclass vs superordinate-class color questions difficult to interpret. There were a number of interesting lower order findings. Consistent with Shipley’s (1979) semantic analysis of class inclusion, subjects took longer to answer the smaller subclass-class (15.54 s) and class-inclusion (14.41 s) questions than they did to answer subclass-subclass (11.95 s)
390
RABINOWITZ,
HOWE,
AND
LAWRENCE
questions, F(2, 108) = 35.11. In addition, the choice latencies associated with color questions (16.57 s) were considerably longer than those associated with number questions (11.36 s), F(1, 54) = 128.45. Although this trend is apparent at all grades, it increases in magnitude with grade, F(2, 54) = 7.37. Although latencies get shorter as a function of grade with number questions, the opposite occurs with color questions. Regression mode/. The regression model (see Eq. (1)) did a good job of accounting for the variability in individual subject’s choice latencies. The multiple regression coefficients were significant, R > .56, F(8, 39) > 2.19, p < .05, for 56 of 59 subjects and approached significance, p < .07, for 2 of the remaining 3 subjects. Analyses of variance were conducted on each of the /3 weights with Grade as the between-subjects factor. These findings complement those obtained in the analysis of variance of choice latencies: Subjects’ latencies became shorter across training (sequence effect) and longer latencies were associated with color than with number relevant questions (dimension effect) and that this difference x Dimension interaction): nuincreased with Grade (Grade merosity minimally affected choice latencies; longer latencies were associated with minor-subclass vs superordinate-class and class-inclusion questions than with subclass-subclass questions (Question 1 and Question 2 effects)-particularly with the number dimension questions since the p weights associated with the D x Ql and D x Q2 interaction terms were negative; and reading latencies minimally affected choice latencies. Thus, as was the case for choice score, differential reading latencies were not correlated with choice latencies. Choice Data Analysis of variance. Each correct response was scored “2,” errors consistent with appropriate same-different encoding of the dimension relevant to the question (E, errors) were scored “ 1,” while the remaining errors (E2 errors) were scored “0.” The means relevant to the significant Grade x Dimension, F(2, 54) = 5.55, Grade x Question, F(4, 108) = 9.07, and Dimension x Numerosity x Question, F(2, 108) = 31.64, interactions appear in Table 2. Inspection of this table reveals that the performance of fourth and seventh graders was quite similar. Without further research, it is not possible to determine if the superior performance of the college students, F(2, 54) = 34.01, reflects differences in age or intelligence quotient. The Grade x Dimension interaction reflects that college students, but not fourth or seventh graders, performed more poorly on color than on number questions. Subjects at all grades performed most accurately on subclass-subclass questions and poorest on class-inclusion questions, F(2, 108) = 153.77. The Grade x Question interaction reflected that the college students showed a smaller decrement on class-inclusion questions, compared to minor-subclass vs superordi-
391
CLASS INCLUSION TABLE MEANS
AND STANDARD
DEVIATIONS
OBTAINED
IN THE CHOICE
Age x dimension Number Color Collapsed Age x question Subclass vs subclass Minor subclass vs superordinate class Class inclusion Dimension
x
numerosity
x
2
REPRESENTING ANALYSIS
THREE
SIGNIFICANT
IN EXPENMENT
INTERACTIONS
1
Fourth
Seventh
College
Collapsed
0.98 (0.82) 0.96 (0.80) 0.97 (0.81)
1.07 (0.83) 1.03 (0.80) 1.05 (0.81)
1.65 (0.61) 1.32 (0.77) 1.49 (0.71)
1.24 (0.82) 1.10 (0.80) 1.17 (0.81)
1.46 (0.66)
1.62 (0.54)
1.75 (0.51)
1.61 (0.58)
1.11 (0.73) 0.34 (0.57)
1.18 (0.75) 0.35 (0.56)
1.49 (0.70) 1.23 (0.78)
1.26 (0.75) 0.64 (0.77)
question Question
Subclass vs subclass
Minor subclass vs superordinate class Numerosity
Number Color Collapsed
1.49 (0.64) 1.62 (0.54) 1.55 (0.59)
Number Color Collapsed
1.88 (0.32) 1.45 (0.68) 1.67 (0.58)
different 1.31 1.08 1.19 Numerosity same 1.41 1.24 1.33
Class inclusion
(0.74) (0.72) (0.74)
0.86 (0.83) 0.57 (0.73) 0.72 (0.72)
(0.70) (0.78) (0.75)
0.45 (0.68) 0.67 (0.77) 0.56 (0.74)
’ Standard deviations appear in parentheses.
nate-class questions, than did the younger subjects. The Dimension x Numerosity x Question interaction is difficult to interpret without the aid of the mathematical model. Parameter estimation. In order to use the mathematical choice model, four parameter sets were estimated for subjects at each grade level. For each problem type (Dimension x Numerosity), estimates were based on the observed frequencies of successes, E, errors, and E2 errors for each question type (subclass-subclass, minor-subclass vs superordinate-class, and class-inclusion). Since 1 = P(S) + P(E,) + P(E,),
(3,nd,c,ns)
two degrees of freedom are associated with the data for each question type. For each type of color problem, the four parameters were estimated using a data set containing six degrees of freedom, two per question type. Since there were no minor-subclass vs superordinate-class number-
392
RABINOWITZ,
HOWE,
AND
LAWRENCE
same questions, the three number-same parameters were estimated using a data set containing four degrees of freedom. As number-different minorsubclass vs superordinate-class problems were used in lieu of the impossible minor-subclass vs superordinate-class number-same problems, the four number-different parameters were estimated using a data set containing eight degrees of freedom (the two sets of minor-subclass vs superordinate-class questions were treated independently). Two different parameter estimation procedures were used. The first procedure was a nested fit (Rabinowitz, Grant, & Dingley, 1984), while the second involved a simplex method used to generate maximum likelihood estimates (Siddal & Bonham, 1974). Across Experiments 1 and 2, 90 independent parameter estimates were obtained using each procedure. The maximum discrepancy between estimation procedures for any single parameter was .15, while 67 of the estimates differed by less than .025. The consistency of the parameter estimates across fitting procedures is one indicant of the good agreement of model and data (also see goodness-of-fit discussions to follow). Excepting one parameter set in Experiment 2 for which the simplex method was inappropriate because an empirical proportion was zero and a logarithm could not be calculated, all parameters reported in the paper are maximum likelihood estimates. (Note that maximum likelihood procedures are preferable because they incorporate sophisticated goodness-of-fit and hypothesis-testing mechanisms, Brainerd, Howe, & Kingma, 1982; Theios, Leonard, & Brelsford, 1977.) Choice model. The necessity and sufficiency tests of the model are summarized in Table 3, while maximum likelihood estimates of the parameters appear in Table 4. The model appears to be identifiable since (a) there are more data points than parameters: (b) as noted earlier nesting and maximum likelihood procedures generate convergent values: and (c) the estimates generated by the maximum likelihood procedure are independent of the initial values assigned to the parameters. The necessity test, which is used to evaluate whether a simpler model will adequately account for the data, involves comparing the three- (number-same questions) or four-parameter model with a two-parameter model in which encoding is assumed to be perfect (i.e., e = d = 1). As can be seen in Table 3, the simpler model is rejected in all instances except the numbersame questions with college students. The estimated value of e for this condition was .99 which does not differ reliably from 1. Two of the 12 sufficiency tests, which are used to evaluate whether a more complex model is required, were significant at the .Ol, but not the .OOl, level reflecting that the choice model was not as adequate as a data-based model in these instances. Note, however, the magnitude of the corresponding necessity tests is substantially greater than that of the two significant sufficiency tests, suggesting that the model is accounting for
CLASS INCLUSION TABLE GOODNESS-OF-FIT
393
3
TESTS FOR THE CHOICE
MODEL
IN EXPERIMENT
1
Necessity test Fourth grade Number different Number same Color different Color same Seventh grade Number different Number same Color different Color same College students Number different Number same Color different Color same
x2(2) x2(1) x2(2) x’(2)
= 620.19** = 11.55* = 99.43** = 125.43**
x2(4) x’(1) x’(2) x?(2)
= = = =
x*(2) x*(l) x2(2) x’(2)
= 132.07** = 15.86* = 49.01** = 223.01**
x2(4) x’(1) x’(2) x’(2)
= 10.67 = 2.27 = 11.03* = 2.00
x*(2) x*(l) x2(2) x2(2)
= = = =
x2(4) x2(2) x’(2) x2(2)
= 9.50 = 1.33” = 5.47 = 10.07*
” This test is based on a two-parameter * p < .Ol. ** p < ,001. PARAMETER
VALUES
32.91** 0.04 39.03** 82.78**
0.02 2.07 5.80 1.90
model.
TABLE ESTIMATED
Sufficiency test
4
FOR THE CHOICE
MODEL
IN EXPERIMENT
1
Parameters
Fourth grade Number different Number same Color different Color same Seventh grade Number different Number same Color different Color same College students Number different Number same Color different Color same
e
d
.89* .91* .93 .86
.67* .78* .79*
.oo* .oo* .oo* .oo*
.99* .92* .56* .66*
.oo*
.97 .92*
.75* .a4* .77*
.oo* .oo* .oo*
.68* .81*
.32* .19 .40 .04*
.88 .94 .92
.67 .49 .36 SO
.26 .37 .21 .35
.07 .14 .43 .16
.99 9 .89 .84
u
S
.13*
i
.09 .44 .34*
Note. The encoding parameters are e (correctly encoding the cues on the relevant dimension as same or different) and d (associating the correct cue value with each subclass if the relevant dimension is accurately encoded as different), while the interpretation parameters are u (understanding), s (subclass-subclass), and i (idiosyncratic). * The parameter value is significantly different from comparable parameter estimates for the college students, ~‘(1)s > 6.64, p < .Ol.
394
RABINOWITZ,
HOWE,
AND
LAWRENCE
a substantial proportion of the variance in the data. (Also, the summed residual squared deviations generated by the nested fit were c.03 in both instances.) A three-step sequence of standard likelihood-ratio tests was conducted to determine if the parameter estimates differed across grade levels for the four basic question types (see Brainerd et al., 1982). All groupwise tests, which were used to evaluate whether the numerical estimates of the parameters differed across grade levels (analogous to an omnibus F test), were significant x*(6 or 8) > 49, p < .OOl. Following this, a series of conditionwise tests was conducted in order to evaluate whether the numerical estimates of the parameters differed between a pair of conditions (analogous to a t test). None of the conditionwise tests involving fourth and seventh graders were significant, while all conditionwise tests comparing college students with either fourth or seventh graders were significant, x2(3 or 4) > 32, p < .OOl. Finally, parameterwise tests were used to evaluate differences between conditions for each of the individual parameters. The parameter estimates obtained with fourth and seventh graders that were significantly different from comparable estimates obtained with the college students are starred in Table 4. Inspection of Table 4 reveals a number of interesting patterns. Perhaps most striking is that neither fourth nor seventh graders evidence an understanding of class inclusion. For these subjects, all estimates of u were 0 except the .13 obtained by the seventh graders with color-same questions. These findings are inconsistent both with Inhelder and Piaget’s (1964) assertion that class inclusion is acquired during the concrete operation period (i.e., between ages 7 and 11) and with empirical findings (reviewed by Winer, 1980) that subjects usually answer half of the classinclusion questions correctly by age 10. One might argue that the inconsistency reflects the advantages associated with using a mathematical model for diagnostic purposes. Although the estimates of u were not influenced by correct class-inclusion answers based on faulty encoding or idiosyncratic interpretation, the exclusion of these false positives only accounts for a small proportion of the inconsistency. If a standard correct-wrong scoring procedure is used, the fourth graders, seventh graders, and college students were correct on 18, 18, and 61%. respectively, of the class-inclusion questions. The younger subjects and college students also differ markedly in their use of subclass-subclass interpretations. Even though all subjects apparently distinguish between subclass-subclass and questions involving the superordinate class (choice latency analyses), younger subjects answer superordinate-class questions as if they only involved subclasses. The discrepancy between the choice latency and the choice data obtained with the fourth and seventh graders in Experiment 1 mirrors the discrepancy between the manner in which the first grade children quantified
CLASS
INCLUSION
395
the superordinate class and answered class-inclusion questions in the Pilot Experiment. Another consistent difference between the younger subjects and college students appeared in the estimates of d. If a dimension was correctly encoded as different, then college students were more likely than the younger subjects to associate the correct cue value to each subclass. The e and i parameters were irregularly related to grade. College students more accurately encoded the number dimension, but not the color dimension, as same or different. Apparently, adults are more likely than children to encode information numerically, rather than by color, when given the choice. The developmental difference in encoding accounts for the Grade x Dimension interaction obtained in the analysis of variance of the choice data. The developmental trends associated with the estimates of idiosyncratic interpretation are complex and a matter of speculation. The parameter values help clarify the Dimension x Numerosity x Question interaction obtained in the analysis of the choice data. First consider the pattern of findings (see Table 2) when numerosity was different. The better performance associated with color than with number on the subclass-subclass questions is a consequence of the fact that total encoding (e x d) was more accurate with color than with number for fourth (.73 vs .60) and seventh graders (.81 vs .73), and only slightly less accurate (.84 vs .87) for college students. The poorer performance with color than with number on minor-subclass vs superordinate-class questions reflects that all subjects were more likely to use idiosyncratic interpretations with color-different than with number-different questions. The poorer performance with color than with number on class-inclusion questions reflects different processes in the children and the college students. For the children, cue reversals (d) were more likely with number than with color and, therefore, subclass-subclass interpretations of the class-inclusion questions were associated with more correct number answers. For college students, understanding of class inclusion (u) was poorer with color-different than with number-different questions. A different pattern of findings was obtained when the numerosities associated with each subclass were the same. At all grade levels, subjects were more likely to encode the cues on the color dimension as “the same” than the cues on the number dimension as “different.” These differential encoding (e) error rates, augmented by the fact that no group of subjects perfectly associated the relevant color cue with each of the subclasses (d), resulted in higher color than number error rates on the subclass-subclass comparisons. The differential encoding (e) error rates also account for the differences obtained with class-relevant questions. When the color cues were encoded as same, subjects would have tended to answer class-relevant questions with “same color,” an error with
396
RABINOWITZ,
HOWE,
minor-subclass vs superordinate-class with class-inclusion questions.
AND
LAWRENCE
questions and a correct response
Discussion
In Experiment 1, we wanted to know (1) if older subjects would encode the superordinate-class information and answer class-inclusion questions discrepantly as did the first graders in the Pilot Experiment, and (2) whether, using the three question types, a mathematical model could be developed which would generate quantitative diagnostic information about skills relevant to class inclusion. Use of such a model illustrated that (1) fourth graders and seventh graders failed to demonstrate any understanding of class inclusion; (2) college students, but not fourth and seventh graders, were poorer at encoding color than number information; and (3) class-inclusion errors resulted from two types of encoding errors (faulty encoding of the dimensional cues as same or different, I - e, and associating the cue value with the wrong subclass, 1 - 4, as well as idiosyncratic (i) and subclass-subclass (s) interpretations. Furthermore, the model, along with methodological improvements such as recording reading and choice latencies, reflected that there was a probable discrepancy in the way subjects older than first graders encoded superordinate-class information (choice latency data) and answered classinclusion questions. EXPERIMENT
2
In Experiment 2, we attempted to find out why subjects seemed to discriminate the superordinate class and subclasses, but failed to interpret and answer class-inclusion questions accurately. Given that most investigators report that children 10 years of age and older answer at least 50% of class-inclusion questions correctly (see Winer, 1980). the fact that fourth and seventh graders demonstrated no evidence of understanding class inclusion in Experiment 1 provided a clue. The most obvious difference between the standard number-based class-inclusion task and the task used in Experiment 1 is that color information was presented in addition to number information. A second difference is that correct answers on the standard task are sometimes based on the recognition of logical necessity (Markman, 1978). In Experiment 1, the subjects had to remember the specific number and color information to correctly answer all question types except number questions involving the superordinate class. Because the statement provided no clue to the question type which would follow, it is likely that subjects considered empirical information on all questions producing a trade-off between memory and reasoning. A third difference is that in most standard expeliments stimuli were available for inspection when class-inclusion questions were presented. In contrast, in Experiment 1 the subjects had to
CLASS
INCLUSION
397
remember the information in the prior statement when answering a question. All of these differences increased the memory load of the subjects in Experiment 1 compared to that of the subjects experiencing the standard task. Because the memory parameter estimates obtained in Experiment 1 suggested that the relevant information was usually available to all the subjects (see Table 4), the hypothesis tested in Experiment 2 was that the manner in which class-inclusion questions are interpreted depends on the memory load. As memory load increases, so does the likelihood of an erroneous subclass-subclass interpretation. College students were used as subjects to test this hypothesis because it is reasonable to assume that they have mastered class-inclusion relevant skills including those related to logic. Method Subjects
The subjects were 30 female and 30 male college students. Each subject was quasi-randomly assigned to one of the three experimental groups, so that there were an equal number of males and females in each group. The mean age of the subjects was 271.5 months, with a range from 213 to 397 months. Each subject was paid $2.00 for participating in the experiment. Apparatus
and Materials
An IBM PC computer, programmed in Turbo Basic, was used to control the presentation of the verbal materials and to record reading times, decision times, and choices. A monitor (22.5 x 27 cm) was used to display the materials. The subjects responded by pushing one of the three keys marked 1, 2, or 3 on the number keypad. As in Experiment 1, the materials were organized into 48 units. Each unit contained a statement (e.g., There are n, brown dogs and II? white cats.), a question (e.g., Are there more animals or more dogs? or Is the brownest animal browner than the brownest dog?), and three alternatives to choose from when answering each question (e.g., same number, more dogs, more animals or same color, dog browner, animal browner). A descriptor always consisted of a number, color, and noun. The randomizations described below were independently generated for each subject. The values of n, and n2, which ranged from 4 to 9, were randomly selected for each unit. Of course, IZ, = n2 for the equivalence questions, while nl and n2 were different for the nonequivalence questions. Although the color, subclass, and superordinate-class names forming each unit were the same as those used in Experiment 1, color-subclass pairings were randomly determined for each unit. The question type, the order
398
RABINOWITZ,
HOWE,
AND
LAWRENCE
in which the descriptors were presented in both the statements and the questions, and the order of the three alternative choices associated with each question also were randomized for each unit. It was hoped that this complete randomization procedure would reduce the possibility of repeating the Experiment 1 finding that differential reading latencies were associated with question type. Each of the possible 11 problem types determined by question type, numerosity, and dimension, plus an additional number-different’minorsubclass vs superordinate-class problem, appeared once in successive blocks of 12 problems. This methodological change made it possible to examine performance changes across the test session using analysis of variance. The additional number-different minor-subclass vs superordinate-class questions were substituted for the impossible to construct number-same minor-subclass vs superordinate-class questions. Design
The three experimental treatments were used to manipulate memory load. This was accomplished by varying the availability of the statement when the question was presented and by inserting an interpolated task between the statement and the question. The relevant statement remained visible when the questions were presented to the subjects in the “No Load” condition. The treatment in the “Load” condition was identical to that used in Experiment 1. Subjects in the “Detection” condition experienced a detection trial interpolated between each statement-question pair. Procedure
Except for the changes in the way the material was randomized and responses were made (keyboard instead of button box), the procedure used for the Load subjects was identical to that used in Experiment 1. The keyboard was constantly monitored in the computer program so that responses occurring in inappropriate intervals were ignored (almost always associated with not releasing a key quickly enough), and disallowed responses were corrected (i.e., choice responses other than 1, 2, or 3). A trial in the Load condition was defined by (a) the presentation of a statement which remained on the video monitor until a key was pressed; (b) a question and related choices which appeared 1 s later and remained on the screen until a choice was made; and (c) a l-s blank interval followed by the next statement. The No Load subjects experienced exactly the same procedure except the relevant statement remained on the screen until the question that followed was answered. The detection task was described in the instructions that appeared on the screen prior to the appearance of the first statement for each subject apin the Detection condition. The subject was to press “1" if a “y"
CLASS INCLUSION
399
peared on the left side, “2” if a y appeared on the right side, or “3” if a y appeared on both the left and the right sides of the screen. Consistent with the values of up to five random variables, the y could appear anywhere on the left, right, or both sides of the screen. The timing of events for each trial in the Detection condition was (a) a statement appeared on the screen until a key was pressed, (b) the screen went blank for 1 s, (c) “READY” appeared in the center of the screen for 100 ms with the letters arranged vertically to divide the screen into left and right halves, (d) the screen went blank for 500 ms, (e) y appeared on the screen for 20 ms and immediately was followed by “1 = left, 2 = right, 3 = both,” (f) if the subject did not respond within 2 s then “You took more than two seconds. Please try to respond within two seconds.” appeared on the screen for 2 s, (g) either following a legitimate response or following the offset of the respond within the 2-s message, the screen went blank for 1 s before the question appeared, and (h) after the subject answered the question, the screen went blank for 1 s before the next statement appeared. Thus, for the subjects in the Detection condition, the time between the offset of the statement and the onset of the question varied between 2.62 (plus detection-response latency time) and 6.62 s. Results As in Experiment 1, a .Ol significance level was used for all analyses involving group data. Only the choice-latency regression analyses based on single-subject data are reported at the .05 level. In the analyses of variance, condition and sex were between-subjects factors while dimension, numerosity, question type, and blocks (four blocks of 12 problems, Problems l-12, 13-24, 25-36, and 37-48) were within-subjects factors. Reading Latencies Unfortunately, even though the descriptors associated with the question types were randomized independently for every subject, reading latencies were longest when minor-subclass vs superordinate-class questions followed (8.00 s), of intermediate length when class-inclusion questions followed (7.50 s), and shortest when subclass-subclass questions followed (7.03 s), F(2, 108) = 5.38. However, reading latencies were again not related to choice behavior. As in Experiment 1, reading latencies were longer when the numbers associated with the two subclasses were different (7.93 s) rather than the same (7.10 s), F(1,54) = 12.12. However, this effect was not obtained with the No Load subjects, who spent approximately the same amount of time reading number-same (3.84 s) and number-different (3.78 s) statements. (The Condition x Numerosity interaction approached significance, F(2, 54) = 3.86, p < .05.) Across Experiments 1 and 2. subjects
400
RABINOWITZ,
HOWE, AND LAWRENCE
seemed to adjust their study time according to the task demands. Since the statements remained available to the No Load subjects when the questions appeared, it was not necessary to spend more time studying, and presumably attempting to store, the statements containing the greater amount of information. The flexible way in which subjects adjusted their study time is highlighted by the condition differences, F(2, 54) = 31.90. No Load subjects spent an average of only 3.81 s reading the statements, compared to 7.91 and 10.82 s spent by Load and Detection subjects, respectively. Choice Latencies Analysis of variance. The Dimension x Blocks, F(3, 162) = 11.46, and Dimension x Question, F(2, 108) = 8.76, interactions were higher order to the main effects of dimension, F(1, 54) = 138.13, blocks, F(3, 162) = 48.45, and question, F(2, 108) = 21.34. The means relevant to all the significant effects appear in Table 5 and are consistent with comparable findings obtained with the college students in Experiment 1. Choice latencies decreased across blocks approximately in a logarithmic fashion, were considerably longer with color than with number questions, and exhibited a different pattern across question types with number and TABLE THE
CHOICE
I,ATENCY
MEANS
REPRESENTING
Dimension
x
Block
(s) ALL
5
AND STANDARD SIGNIFICANT
11.55 9.19 8.07 7.50 9.08
Dimension
Color
(7.14) (6.46) (5.50) (6.66) (6.65)
x question interaction Question
Subclass
IN ExpEmmu
19.64 13.52 12.15 11.75 14.27
Block means
(14.68) ( 8.35) ( 8.17) (10.39) (11.18)
and related main
15.59 11.35 10.11 9.63
Dimension
subclass
Number Color Question means
7.49 (4.49) 13.13 (9.72)
9.86 ( 6.63) 16.33 (12.19)
9.88 ( 8.05) 13.34 (11.22)
10.31 (8.08)
13.09 (10.33)
11.61 ( 9.91)
u Standard deviations appear in parentheses.
(12.23) ( 7.77) ( 7.26) ( 8.98)
effects
Minor subclass vs superordinate class
VS
2
EFFECTS
blocks interaction and related main effects Dimension Number
1 2 3 4 Dimension means
DEVIATIONS
CONTRAST
Class inclusion
Dimension means 9.08 ( 6.65) 14.27 (11.18)
CLASS INCLUSION
401
color questions. When number was the relevant dimension, subclasssubclass latencies were shorter than latencies associated with the other question types (see Table 5). When color was the relevant dimension, minor-subclass vs superordinate-class latencies were longer than the latencies associated with the other question types. Inconsistent with Shipley’s (1979) semantic analysis, college students make the class-inclusion color judgment as rapidly as the subclass-subclass color judgment. Regression model. The regression model (see Eq. (1)) again did a good job of accounting for the variability in individual subject’s choice latenties. The multiple regression coefficients were significant, R > 36, F(8, 39) > 2.19, p < .05, for 55 of the 60 subjects. The results obtained in the /? weight analyses complement those obtained in the analysis of variance of choice latencies: Subjects’ latencies became shorter across training (sequence effect); longer latencies were associated with color than with number relevant questions-the differential was smaller in the No Load condition (dimension effect): and numerosity and reading latencies minimally affected choice latencies. As in the regression analysis in Experiment 1, the p weights associated with class-inclusion and minorsubclass vs superordinate-class questions suggest that latencies for these question types are longer than subclass-subclass latencies in the Load condition. Similar p weights characterized the performance of the subjects in the Detection condition, but not in the No Load condition. However, the No Load subjects did show markedly larger p weights for minor-subclass vs superordinate-class color questions (D x Q2). Choice Data Analysis of variance. The condition means were correctly ordered by our limited working-memory hypothesis (1.72, 1.62, and 1.49 for the No Load, Load, and Detection subjects, respectively), F(2, 54) = 5.29. We expected choice performance to be stable across trials. Thus, the significant blocks effect was surprising (means of 1.54, 1.60, 1.65, and 1.65 for Blocks 1, 2, 3, and 4, respectively), F(3, 162) = 6.18. However, performance changes across blocks were small. The Dimension x Numerosity x Question interaction, F(2, 108) = 13.42, highlights the importance of the mathematical model in generating an explanation for a complex, unexpected data set. Inspection of the number-relevant means in Table 6 reveals the usual monotonic decline in performance across subclass-subclass, minor-subclass vs superordinate-class, and class-inclusion questions. The color-same data is particularly surprising. Not only did performance on class-inclusion questions exceed performance on minor-subclass vs superordinate-class questions, but it also exceeded performance on subclass-subclass questions. Although different aspects of the Dimension x Numerosity x Questions interaction are emphasized here, the interpretation of performance
402 MEANS
RABINOWITZ,
REPRESENTING
HOWE,
AND LAWRENCE
TABLE
6
THE DIMENSION x NUMEROSITY x QUESTION INTERACTION IN THE CHOICE ANALYSIS IN EXPERIMENT 2
OBTAINED
Question
Dimension
Subclass vs subclass
Number Color Question means
1.83 (0.44) I .65 (0.70)
Number Color Question means
1.95 (0.29) I .50 (0.84)
I .74 (0.60)
I .73 (0.66)
Minor subclass VS
superordinate
class
Numerosity different 1.69 (0.54) 1.45 (0.85) I.57 (0.72) Numerosity same 1.81 (0.49) 1.25 (0.92) I .53 (0.79)
Class inclusion
Dimension means
1.55 (0.60) I .54 (0.58)
1.69 (0.54) I .55 (0.72)
1.55 (0.59) 1.51 (0.53) 1.60 (0.58)
1.76 (0.49) 1.45 (0.81)
1.55 (0.56)
’ Standard deviations appear in parentheses
on color-same problems is the same as that offered in Experiment 1. Consider the parameter estimates presented in Table 8. Note that with the exception of the e parameter, no obvious pattern of parameter differences is correlated with number-relevant and color-relevant questions. Subjects were more likely to erroneously encode the dimensional cues as same or different (e parameter) with color-based than with numberbased questions, particularly so following number-same statements (i.e., for color-same questions). Averaging the e values associated with colorsame questions over the three treatment groups suggests that subjects made encoding errors (i.e., they encoded the color cues as same rather than different) on 27% of these questions. These encoding errors are associated with wrong “same-color” answers on both subclass-subclass and minor-subclass vs superordinate-class questions, but correct samecolor answers on class-inclusion questions. Choice model. As can be seen in Table 7 (Necessity Test column), the simpler two-parameter model is rejected in all instances except the number-same and number-different problems with the No Load condition. The estimated value of e and d (the memory parameters) in these cases was 39 which does not differ reliably from 1. The mathematical model was sufficient to account for all of the data except that involving color-same problems in the Load condition, p < .Ol, but not at .OOl . Note, however, as in Experiment 1, the magnitude of the corresponding necessity test is substantiahy greater than that of the significant sufficiency test, suggesting that the model is accounting for a substantial
403
CLASS INCLUSION TABLE 7 GOODNESS-OF-FITTESTS FOR THE CHOICE MODEL Necessity test No load Number different Number sameb Color different Color same Load Number different Number same Color different Color same Detection Number different Number same Color different Color same
2
Sufficiency test
x2(2) = 114.67** x2(2) = 156.18**
x’(6) x?(2) x’(2) x2(2)
= 15.96” = 1.64 = 4.85 = 6.93
x2(2) x2(1) x*(2) x2(2)
= 74.54** 8.64* = = 56.31** = 92.72**
x’(4) x’(1) x’(2) x’(2)
= 12.59 = 2.77 = 7.76 = 9.78*
x*(2) x2(1) x’(2) x’(2)
= 170.64** = 1.5.01* = 224.79** = 216.74**
x’(4) x’(I) x’(2) x2(2)
= 12.03 = 1.17 = 4.95 = 3.02
x42) =
-
0.11
IN EXPERIMENT
” This test is based on a two-parameter model. b Parameter values were obtained using a nested fit. * p < .Ol. ** p < ml.
proportion of the variance. (Also, the summed residual squared deviations generated by the nested fit were c.03.) The same three-step sequence of likelihood-ratio tests used in Experiment 1 was employed to determine if the parameter estimates differed across conditions for the four question types determined by dimension and numerosity. All groupwise tests were significant x2(5, 6, or 8) > 30, p < .OOl. None of the conditionwise follow-up tests between the Load and the Detection groups were significant, although the color-different test approached significance, x2(4) = 12.63, p < .05. All conditionwise tests comparing the No Load group with either the Load or the Detection groups were significant, x2(3 or 4) > 32, p < .OOl. The parameter estimates obtained with the Load and Detection groups that were significantly different from comparable estimates obtained with the No Load group are starred in Table 8. Recall that the hypothesis of interest in Experiment 2 was that as memory load increases so does the likelihood of subclass-subclass interpretations of questions involving the comparisons of subclass with superordinate class. Inspection of Table 8 reveals a pattern of parameter findings consistent with this hypothesis. Even though memory load was varied, only 4 of the 14 possible memory-parameter comparisons between the No Load group and either the Load or the Detection group were significant. Furthermore, one of these significant effects (e parameter,
404
RABINOWITZ,
HOWE, AND LAWRENCE TABLE
ESTIMATED
PARAMETER
VALUES
8
FOR THE CHOICE
MODEL
IN EXPERIMENT
2
Parameters e No load Number different Number same’l Color different Color same Load Number different Number same Color different Color same Detection Number different Number same Color different Color same
d
u
s
i
.99 .99 .82 .71
.99 .96 .94
.67 .73 .71
.15 .23 .25 .21
.I8 .04 .oo .03
.99 .98 .92* .78
.80* .90 .96
.39* .44* .43* .39*
.52* .51* .39* .42*
.10* .06
.95 .93 .76 .65*
.71* .89 .93
.31* .34* .26* .31*
.59* .66* .73* .45*
.10* .oo .Ol .24*
.I5
.18* .19*
Note. The encoding parameters are e (correctly encoding the cues on the relevant dimension as same or different) and d (associating the correct cue value with each subclass if the relevant dimension is accurately encoded as different), while the interpretation parameters are u (understanding), s (subclass-subclass), and i (idiosyncratic). ” The parameter estimates in this row were obtained using an iterative fitting procedure. All of the other parameters are maximum likelihood estimates. * The parameter value is significantly different from comparable parameter estimates in the No Load condition, ~‘(1)s > 6.64. p < .Ol.
color-different problem) reflected more accurate same-different encoding of the color dimension in the Load than in the No Load group. The only memory results consistent with the memory manipulation involved the d parameter (an estimate of the accuracy which subjects associated the correct cue value with each subclass) with number-different questions. The parameter value for No Load subjects exceeded that for either Load or Detection subjects. In contrast to the small and inconsistent effects of memory load on the memory parameters, the effects of memory load on the interpretation parameters were large and consistent. All comparisons involving the u and s parameters between the No Load group and either the Load or Detection group were significant. Furthermore, the values of the u and s parameters were monotonically related to memory load with each of the four problem types. These findings are inconsistent with the Brainerd and Kingma (1985) finding that memory and reasoning are independent processes in the class-inclusion task. Instead, the quality of class-inclusion reasoning depended directly on the memory load. It would appear
CLASS
INCLUSION
405
that the discrepant conclusions resulted from either (1) the sensitivity of the mathematical model compared to simple conditional-probability analysis or (2) Brainerd and Kingma’s use of a different, rather than the same, question form to assess memory and reasoning. CONCLUSIONS
A number of methodological innovations were introduced in Experiments 1 and 2. These include computer presentation, statements containing color as well as number information, and nonstandard color variants of the class-inclusion question. It is possible that some feature(s) of these innovations forces the subjects to use processes (such as reasoning about empirical information rather than perceiving logical necessity) that would be unnecessary in the standard class-inclusion task. This possibility certainly merits further empirical scrutiny. However, the consistency of the findings across the Pilot Experiment (in which standard class-inclusion methodology was used) and Experiments 1 and 2 leads us to argue that the processes identified in this paper generalize across a wide age range and many variants of the class-inclusion question. Subskills and Methodology Using a computer to present the materials and gather the data represents a substantial improvement compared to conventional class-inclusion methodology. Not only was it possible to control and randomize the presentation of statements, questions, and choices, but it was also possible to measure reading and choice latencies. In Experiments 1 and 2, these data enhanced our understanding of subskills important in class inclusion. The flexible study skills of college students were reflected in reading latency differentials associated with condition and numerosity in Experiment 2. It is significant, for both applied and theoretical reasons, to study and understand the development of study skills which are basic in many cognitive tasks. Working Memory and the Reasoning/Remembering Relationship Norman and Shallice (1986) sketched a working-memory model in which the supervisory attentional mechanism is the limited resource. According to Norman and Shallice, deliberate attentional resources are employed in tasks that involve planning or decision making, trouble shooting, ill-learned or novel sequences, dangerous or technically difficult components, and the inhibition of strong habitual responses. Because subjects are rarely presented with questions involving the comparison of a superordinate class and a subclass, it appears that all the attentiondemanding task characteristics outlined by Norman and Shallice, with the possible exception of trouble shooting, characterize class inclusion. Therefore, based on their proposal, it would be expected that adding a
406
RABINOWITZ,
HOWE, AND LAWRENCE
memory retrieval task (which is assumed to be a strategic attentiondemanding task) to tasks already involved in class inclusion would further spread the limited attentional resource. If one also assumes that subjects give priority to accurate memory retrieval, then the attentional resources spent on this task are not available for reasoning and the obtained poorer reasoning performance would be expected. Developmental implications. The Norman and Shallice theory is inherently developmental. This follows from their assumptions that if task performance becomes more automatic, then less attentional supervision is required, and that automatization occurs through learning and practice. One of the obvious consequences of aging is that it provides many occasions for learning and practice. The subskills necessary to answer linguistically presented class-inclusion questions successfully are probably all available by age seven or eight. It is also likely that the subjects’ competence on these tasks (i.e., the degree to which the tasks become automatic) improves with practice. Therefore, on the basis of these assumptions and the Norman and Shallice model, it is predicted that classinclusion skills should improve across a considerable age span rather than appear abruptly in 5- to 7-year-olds. Furthermore, it may be the case that if a simple enough version of the class-inclusion task is constructed (i.e., one that requires the mastery of a limited set of subskills available to the very young child), infants and young children might not appear to be deficient in class-inclusion reasoning. It appears to us that the development of class-inclusion reasoning can be conceived within a working-memory model in which the limited resource is willed attention. We noted earlier that it is difficult to operationalize the working memory/limited resource constructs that dominate the current literature (also see Salthouse, 1988), but if a formal measurement technique, such as the one provided here, is available to diagnose subprocesses, then it may be possible to delineate the construct. Only after working memory has been changed from a heuristic device to an operational construct, can its explanatory value be assessed. Given the prominence of working-memory explanations in cognitive-developmental analysis (e.g., Bjorklund, 1987; Brainerd & Kingma, 1984, 1985; Case, 1985), it would appear that theoretical progress is contingent on the development of formal methods. APPENDIX A: THE CHOICE MODEL EQUATIONS Subclass-subclass quesfions. The probability of correctly answering a subclass-subclass question, if the cues associated with each subclass were different, is equal to the probability of correctly encoding the cue values as different multiplied by the conditional probability of correctly associating cue values and subclasses, KL) An E, subclass-subclass
= Led].
(4,nd.c)
error would accurately reflect labeling the cues as different, but
407
CLASS INCLUSION reversing the association of cues and subclasses,
f-w,,,) = [e(l - 41. An E2 subclass-subclass
(5,nd,c)
error would reflect labeling the cues as same rather than different, m,,,)
= [(I - e)l.
(6,nd,c)
If the same cue was associated with each subclass, then the probability answering a subclass-subclass question is Y,
P(S,,) = [el.
of correctly (7,ns)
In this case, E, and E2 errors would be indistinguishable encoding the cues as different,
and both would result from
fw,,,) = PE,,,) = [(I - em.
(8,ns)
Minor-subclass vs superordinate-class questions. Minor-subclass vs superordinate-class number-equal questions cannot be constructed. A minor-subclass vs superordinate-class number-different or color question can be answered correcfly in a number of ways. If the subclasses are appropriately encoded as different and the cue values are remembered, then subclass-subclass and correct interpretations always result in correct answers. The logic following either type of encoding error is different for number-different and color questions. If a subject understands class inclusion, then, as long as neither of the subclasses is empty, the relative numerosity of subclasses is irrelevant. The superordinate class is always larger than either subclass and the minor-subclass vs superordinate-class number-based question will always be answered correctly. On the other hand, if cue values are reversed or treated as the same with the minor-subclass vs superordinate-class color-based question, then the subject who understands class inclusion will always respond same color and make an E? error. For example, if cats are erroneously encoded as brown rather than white, then the brownest animal is the same color as the brownest cat. For all questions involving superordinate-class comparisons with a subclass, idiosyncratic interpretations generate correct answers one-third of the time, E, errors one-third of the time, and E2 errors one-third of the time independent of the way information was encoded. Thus, “i/3” multiplies the probabilities associated with each of the possible encodings in all of the equations that follow: P(S,,,,) P(S,,,,)
= [ed(u = [ed(u
+ s + i/3) + s + i/3)
+ +
e(1 - d)(u + i/3) + (1 - e)(u + i/3)], e(1 - d)i/3 + (1 - e)i/31.
(9,nd) (l&c)
Note that in each of the equations that appear below, as well as in Eq. (9) and (10). the terms are organized so that the term reflecting correct encoding (ed for the number-different and color problems, e for the number-same problems) appears first, while the term reflecting reversed same-different encoding (1 - e for all problems) appears last. In the minor-subclass vs superordinate-class number-different and color problems, E, errors occur following subclass-subclass interpretations if the cue values associated with each subclass are reversed, P(E,,,,,)
= [edi/
+ e(l
- d)(s
+ i/3)
+
(1 - e)i/3].
(11 ,nd,c)
In the minor-subclass vs superordinate-class number-different and color problems, Ez errors (i.e., same number or same color choices) occur following same encoding and subclass-subclass interpretations. In addition, as explained earlier, Ez errors will occur in color problems following any encoding error and correct class-inclusion interpretation, P(E2,,,,) P(E2,,,,)
= [edi/ = [edi/
+ +
e(l - d)i/3 + (1 - e)(s + i/3)1, e(l - d(u + i/3) + (1 - e)(u + s + i/3)1.
(12,nd) (13-c)
408
RABINOWITZ,
HOWE, AND LAWRENCE
Class-inclusion questions. For all problem types, correct encoding of the cues followed by the correct interpretation of the class-inclusion question results in a correct answer. Since the number of items is irrelevant. as long as neither subclass is empty, in numberbased class-inclusion problems, understanding class inclusion always results in correct responding in number-different and number-same problems. In number-different problems, subclass-subclass interpretations of the class-inclusion question result in correct answers if the cues are correctly encoded as different and associated with the wrong subclasses. In the number-same problems, subclass-subclass interpretations of the class-inclusion question result in correct answers half the time if the cues erroneously are encoded as different because it is assumed that each of the two cues is equally likely to be encoded as the more numerous. In the color problems, either correct or subclass-subclass interpretations of the class-inclusion question result in correct answers (i.e., same color) following encoding the cues as the same. Therefore, P(S,,) = [ed(u + i/3) + e(1 - d)(u + s + i/3) + (I - e)(lc + i/3)], P(S,J = [ed(u + i/3) + e(l - d)i/3 + (I - e)(u + s + i/3)], PLY,,) = [ecu + i/3) + (I - e)(u + s/2 + i/3)].
(14,nd) (15.c) (16,ns)
The equation for E, errors is identical for number-different and color problems with class-inclusion questions. For all problem types. if the cues are correctly encoded errors follow subclass-subclass interpretations, P(E,,,) = [ed(s + i/3) + e(l - d)i/3 + (I - e)i/3], P(E,,,) = [c(s + i/3) + (I - e)i/3].
(17,ndc) (18,ns)
For number-different problems with class-inclusion questions, Ez errors (i.e., same number) result from encoding the cues as the same and subclass-subclass interpretations. For color problems, Ez errors (i.e., “class y-er”) follow correctly encoding the cues as different, reversing the cues associated with the two subclasses, and either correct or subclasssubclass interpretations of the class-inclusion question. Note that the Ez class-inclusion color error differs from all other types of Ez errors in that it cannot follow same-different encoding errors (i.e.. same color encoding, I - e) unless the encoding error is associated with idiosyncratic interpretation (i/3). For number-same problems, .I$ errors (i.e., “subclass larger”) occur half the time following erroneously encoding the cues as different and subclass-subclass interpretations. Therefore. P(E>,,) = [e&/3 + e(l - d)i/3 + (I - e)(s + i/3)]. P(EZ,,) = [edi/ + e(l - d)(u + s + i/3) + (I - e)i/3]. P(E?,,) = [ei/3 + (I - e)(s/2 + i/3)].
(19a-l)
L20,c) (21 ,ns)
REFERENCES Baddeley, A. (1986). Working memory. Oxford: Oxford Univ. Press. Bjorklund, D. F. (1987). How age changes in knowledge base contribute to the development of children’s memory: An interpretive review. Developmenta/ Review, 7, 93-130. Brainerd, C. J., Howe. M. L., & Kingma, J. (1982). An identifiable model of two-stage learning. Journal of Mathematical Psychology, 26, 263-293. Brainerd, C. J., & Kingma, J. (1984). Do children have to remember to reason? A fuzzytrace theory of transitivity development. Developmental Review, 4, 31 l-377. Brainerd, C. J.. & Kingma, J. (1985). On the independence of short-term memory and working memory in cognitive development. Cognitive Psychology, 17, 210-247. Case, R. (1985). Intellectual development: Birth to adulthood. New York: Academic Press. Denney, N. W., & Cornelius, S. W. (1975). Class inclusion and multiple classification in middle and old age. Developmental Psycho/ogy, 11, 521-522.
CLASS INCLUSION
409
Hirst, W., & Kalmar, D. (1987). Characterizing attentional resources. Journal of&perimental Psychology: General, 116, 68-81. Hodkin, B. (1987). Performance analysis in class inclusion: An illustration with two language conditions. Developmental Psychology, 23, 683-689. Howe, M. L., & Rabinowitz, F. M. (1989). On the uninterpretability of dual-task performance. Journal of Experimental Child Psychology, 47, 32-38. Inhelder, B.. & Piaget, J. (1964). The early growth of logic in the child. New York: Norton. Lawrence, J. A. (1980). Class inclusion: Question order, question type, and training. Unpublished senior honors thesis, Memorial University of Newfoundland. Markman, E. M. (1978). Empirical versus logical solutions to part-whole comparison problems concerning classes and collections. Child Development, 49, 168-177. Navon, D. (1984). Resources-a theoretical soup-stone. Psychological Reviews, 91, 216234.
Navon, D. (1985). Attention division or attention sharing? In M. I. Posner & 0. S. Marin (Eds.), Attention and performance XI (pp. 133-146). Hillsdale. NJ: Erlbaum. Norman. D. A., & Shallice, T. (1986). Attention to action: Willed and automatic control of behavior. In R. J. Davidson, G. E. Schwartz. & D. Shapiro (Eds.), Consciousness and se/f regulation: Advances in research (Vol. 4. pp. l-18). New York: Plenum Press. Piaget, J. (1952). The child’s conception of number. New York: Norton. Rabinowitz, F. M.. Grant, M. J., & Dingley, H. L. (1984). Fit: An iterative parameterestimation function in LISP. Behavior Research Methods, Instruments, und Computers, 16, 307-314. Salthouse, T. A. (1988). The role of processing resources in cognitive aging. In M. L. Howe & C. J. Brainerd (Eds.). Cogrritive development in adulthood: Progress in cognitive development research (pp. 185-239). New York: Springer-Verlag. Shipley, E. F. (1979). The class-inclusion task: Question form and distributive comparisons. Journal of Psycholinguistic Research, 8, 301-331. Siddall, J. N., & Bonham. D. J. (1974). 0ptimization subroutine package. Department of Mechanical Engineering, McMaster University. Sinnott. J. D. (1975). Everyday thinking and Piagetian operativity in adults. Human Development. 18, 430-443. Theios, J., Leonard, D. W., & Brelsford. J. W. (1977). Hierarchies of learning models that permit likelihood ratio comparisons. Journal of Experimental Psychology: General, 106, 213-225. Trabasso, T.. Isen, A. M., Dolecki. P., McLanahan. A. G., Riley. C. A., & Tucker, T. (1978). How do children solve class-inclusion problems? In R. S. Siegler (Ed.), Children’s thinking: What develops? (pp. 151-180). Hillsdale. NJ: Erlbaum. Winer, G. A. (1980). Class-inclusion reasoning in children: A review of the empirical literature. Child Development, 51, 309-328. RECEIVED:
May 23, 1988;
REVISED:
April 20, 1989.