Vol. 16, No. 3, pp. 433445, 1994 CopyrightQ 1994 ElsevierScienceLtd Printedin Great Britain.All rightsreserved 0191-8869/94 $6.00 + 0.00
Person. individ. D@
Pergamon
ERROR LINDA
ANALYSIS
B. L. VODECEL
Department
MATZEN,
of Psychology,
OF RAVEN MAURITS
University
W.
TEST
PERFORMANCE
VAN DER MOLEN
of Amsterdam, Roetersstraat The Netherlands
and AD C. M. DUDINK 15, 1018 WB Amsterdam,
(Received 14 April 1993) Summary-Two studies were performed concerned with error analyses of matrix analogy problems. In the first study information concealed in the incorrect response alternatives of the Standard Progressive Matrices (SPM) was used to find out what kinds of errors are committed when children (n = 1655, age range 8.5-12.5 years) make incorrect response choices. The error analysis of the SPM showed that omitting solution rules is a major cause of incorrect responses and that post-hoc error classification of the alternatives was problematic. In the second study, Experimental Progressive Matrices (EPM) were constructed, based on five solution rules, with an a priori notion regarding variation in rule complexity. Response alternatives were constructed thus, that from incorrect choices the number and kinds of rules omitted could be deduced. Children (n = 200, age range 8.5-12.5 years) completed the paper and pencil version of the SPM and EPM. Both tests yielded the same test scores. Errors were most often due to omitting one rule. Lower scoring children were particularly apt to omit complex rules. Further development of the EPM is needed to obtain a Raven like test which can give insight into how a child acquires a particular test score.
Raven Progressive Matrices (RPM) is a test which provides a measure of the ability to educe relationships and correlates (Raven, Court & Raven, 1986). The test is used all over the world for both research and diagnostic purposes, and is regarded as a reliable measure of g (e.g. Carpenter, Just & Shell, 1990; Marshalek, Lohman & Snow, 1983). The items of the RPM are all analogy problems which can have any of three different formats; an incomplete pattern, a 2 x 2 matrix, or a 3 x 3 matrix. With the first format the idea is to infer the appearance of the missing piece in the pattern by continuing the perceived pattern into the blank area. With the latter two formats the matrix consists of 4 or 9 entries, arranged in 2 rows and 2 columns or 3 rows and 3 columns. The bottom right-hand entry is missing, and its appearance should be inferred from the relationships that exist row- and column-wise between the remaining entries. Regardless of the format, each problem has a set of response alternatives. One of the alternatives depicted below the problem represents the missing entry, the others are distracters. The RPM was not constructed with the a priori notion of providing information as to how a particular test score is accumulated. Only the correct choices are taken into account, and no information is extracted from the incorrect choices. However, analyses of errors made on a test provides the means of utilizing the test performance of a S as a measure of the (mis)understandings individual Ss possess concerning analogy problems (Siegel, 1983). Examples of post-hoc attempts to obtain information from the incorrect choices made by Ss can be found in both the Advanced Progressive Matrices (APM) and the Coloured Progressive Matrices (CPM) sections of the RPM manual (Raven, Court & Raven, 1983, 1984). Both analyses are concerned with the identification of different error types as they are represented in the distracters. Results indicate that incomplete solutions (i.e. choosing a distractor representing a partly correct solution) and repetition errors (i.e. choosing a distractor which represents an entry adjacent to the bottom right-hand entry) occur most often. Examples of post-hoc error analyses of the RPM outside the manual are few and far between (e.g. Jacobs & Vandeventer, 1970; Thissen, 1976; Vejleskov, 1968; Watts, 1985). The Standard Progressive Matrices (SPM) section of the manual (Raven, Court & Raven, 1988) does not contain an error analysis as presented for the CPM and APM. The first aim of the present study is therefore a replication of such an error analysis with the SPM. Answers to the following questions will be sought: (i) are children inclined to make the same types of errors on the SPM as adults are on the APM? and (ii) are lower scoring children liable to commit specific types of errors more often than higher scoring children? Unfortunately neither the manual section of the The
433
LINDA B. L. VODEGELMATZEN PI al.
434
CPM nor the one of the APM provides information as to how problems arising from post-hoc coding were tackled, i.e. distracters fitting into more than one error type category or uneven representation of the different error types over the items. Nor are any data available on inter-rater reliability. These problems will therefore also be addressed in the first study presented, in order to determine whether they can be solved satisfactorily. The second aim will be to perform an error analysis purged from all the above mentioned problems associated with the post-hoc nature of the analysis, and to perform a more extensive examination as to what constitutes particular error types. For this purpose a second study is reported on the performance of children on a set of new matrix items. The distracters of the new items are labelled a priori as to the errors they represent. STUDY
I: ERROR
ANALYSIS
OF THE
SPM
Method Subjects Children (n = 1655) in the age range 8.5512.5 years (M = 10.5) participated with the consent of their caregivers and teachers. All children spoke Dutch and attended schools in and around major cities in The Netherlands. Error analysis The distracters of the SPM items were coded as to the error types they represented, according to the same categories as defined in the APM section of the manual. A distractor was coded as an incomplete correlate error if part of the relevant features of the correct solution were present in the distractor. A distractor was coded as a wrong principle error if it was entirely made up of features represented in the problem matrix, but no parts of the correct solution were present in the distractor. The third APM error type, con$uency qfideas, turned out to be a sub-category of the first error type, incomplete correlate. A combination of all features represented in the 8 entries of the matrix problem, as is the definition of the confluency of ideas error, implies that the features of the correct solution will also be included. Consequently confluency of ideas was not included in coding the SPM distracters. The fourth error type, repetition, was coded when the distractor chosen was an exact copy of I of the 3 adjacent entries of the missing entry. Among the distracters of the SPM some were found which could not be classified according to the set of 4 error types defined for the APM. They had in common that they harboured features not to be found in the problem matrix. Because of this, a fifth error type was introduced; additional elements. The distracters which suited more than one error category, were allotted to the first encountered fitting category. Incomplete correlate errors are proof of an attempt in the right direction, thereby being closest to the correct solution. Wrong principle errors are the result of an attempt to distinguish a relationship between the matrix entries, albeit an incorrect one. Thus. both these error types reflect an attempt to take all the information available in the 8 entries of the matrix into account. Repetitions and additional elements, however, both seem to lack any attempt of finding the relationships between all 8 matrix entries. Repetition errors are slightly better than additional element errors, because the former are in keeping with the original features of the matrix problem, whereas the latter contain unidentifiable elements. Thus, by allotting multi-interpretable distracters to the first encountered fitting category, Ss choosing a particular distractor are given the benefit of the doubt, as to the extent of their exhibited deficient problem solving behaviour. Items of sets A and B of the SPM were not included in the analyses, because of an incongruous format. The 36 items of sets C, D, and E of the SPM are all of the same nature as those of the APM: a 3 x 3 matrix analogy problem, accompanied by 8 response alternatives, besides, the particular age group involved in this study was expected to have a low error rate on sets A and B (Table SPM IV, p. SPM25, Raven et cd., 1988). The purpose of the study. an analysis of incorrect responses, made sets A and B unsuitable. Two raters worked at classifying the incorrect SPM alternatives according to error type. Agreement between both raters was 71.53%. This is an indication that the error categories were not as exclusive as was intended, and that indeed the post-hoc categorization is problematic. In
Raven
error
analysis
435
those cases where both raters disagreed with one another, the categorization of the first rater was decisive. The 4 different error types were unevenly represented over the 36 items. The incomplete correlate error was found to be represented in 32 out of 36 items, the wrong principle error in 33, the repetition error in 21, and the additional elements error in 18. This necessitated a correction of the incidence of each error type for each individual child. Correct alternatives chosen by each child and the incidence of the 4 error types were noted. Each of the 4 error type incidences was subsequently multiplied by 36/x (x being the number of items in which the particular error type was represented), to obtain a corrected incidence for 36 items. The 4 corrected error type incidences and the correct alternative choice incidence were added up to obtain a total choice incidence. Finally the refutive choice incidence of each error type and correct alternative choice was calculated by dividing each of the 4 corrected error type incidences and the correct alternative choice incidence by the total choice incidence and multiplying it by 100. Data collection The data were collected over the course of 3 years in aid of several different research projects, using the Raven SPM as a measure of general cognitive ability. The children completed the paper and pencil SPM in a classroom setting. A test score was computed for each child over all 60 test items. Subsequently children were classified into one of three possible groups of intellectual ability, making use of the 1979 smoothed summary norms (Table SPM IV, p. SPM25, Raven et al., 1988): above average (a test score at or above the 75th percentile point, n = 644), average (a test score between the 75th and 25th percentile point, n = 695) and below average (a test score at or below the 25th percentile point, n = 316). All effects reported are significant with P < 0.01, unless reported otherwise. Results The error analysis of the SPM was performed in order to answer two questions: (i) how often are the various types of errors committed? and (ii) do children with different analogy problem solving abilities commit different types of errors? Before addressing these questions it should be noted that the error rate of the SPM is influenced by age as well as individual differences in ability level. Some of the analyses reported below were therefore performed with age as covariate. As expected, the correct alternative choices incidence in set C, D and E was indeed highest for the above average ability level group (M = 63.92%, SD = 9.88), followed by the average level (M = 44.72%, SD = 8.92) followed by the below average level (M = 24.18%, SD = 12.93). A linear contrast analysis was defined for ability level with an ANCOVA on the proportion of correct alternative choices, with ability level (3) as the between S factor, and age as covariate. This analysis yielded the anticipated, significant main effect for ability level, F(2, 1649) = 35.96, with a linear decrease of the proportion of correct alternative choices with decreasing ability level. The purpose of the present study was to find out whether the SPM error analysis would yield the same result as previous error analyses of the APM, i.e. an overall high incidence of the incomplete correlate error and a high incidence of the repetition error for lower scoring children. If so, a main effect for error type and an interaction effect between error type and ability level is to be expected when performing an ANCOVA with ability level (3) as the between S factor, error type (4) as the within S factor, age as covariate, and relative choice incidence as dependent variable. The ANCOVA yielded a significant main effect for error type, F(3, 4947) = 25.36. Post-hoc pairwise comparison indicated that the incomplete correlate error occurs most often (M = 20.94%, SD = 8.65) followed by the wrong principle error (A4 = 16.72%, SD = 7.75). The repetition error (M = 6.92%, SD = 8.62) and the additional elements error (M = 7.15%, SD = 6.27) occurred equally often and least of all types of errors. The interaction effect between error type and ability level was also established, F(6, 4947) = 25.22. In the barchart depicted in Fig. 1 it can be seen that the origin of this interaction is to be found with the incomplete correlate and the repetition error. To confirm this suggestion a linear contrast for ability level was defined in 4 separate ANCOVAs for each of the 4 error types. All linear contrasts were significant, F( 1, 1649) = 1495.11, F( 1, 1649) = 126.07, F( 1, 1649) = 738.01, and F( 1, 1649) = 220.43, for respectively the incomplete correlate, the wrong principle, the repetition, and the additional elements error. Although the
LINDA B. L. VODEGELMATZEN et al.
436
incidence of all error types increased with the decrease in ability level, the F values for the linear contrasts of the incomplete correlate and the repetition error are greatest. This indicates that the incidence of both these error types increases most markedly compared to the other 2 error types (wrong principle and additional elements). Conclusions Present results show that children commit the incomplete correlate error most often, followed by the wrong principle error, the repetition error and the additional elements error. The incomplete correlate error and the repetition error were found to discriminate best between children of different ability levels. These results coincide largely with the results of the analysis presented in the APM section of the manual; i.e. the incomplete correlate error occurs most often, and the repetition error discriminates well. It therefore seems plausible to suggest that adults and children show the same pattern in types of errors committed. A task analysis of the Raven APM performed by Carpenter et al. (1990) yielded similar results regarding partially correct solutions. Five solution rules were identified with which the greater part of the APM problems could be solved. Analyses of eye movements and verbal protocols of college students (representing the upper end of the distribution of Raven scores), engaged in solving the APM, showed that rules were educed one at a time, with elementary rules being produced first and complex rules last. From the verbal protocols it appeared that 50% of the errors on problems involved with multiple rules were due to failing to mention one or more of the relevant solution rules. Also problems concerned with multiple rules occasioned higher error rates with less able Ss than with more able Ss. This was particularly so when it involved complex rules. The less able often failed to mention the complex rules, before resorting to inspection of the alternatives. The ability to keep track of multiple rules was therefore thought to be a major source of individual differences in performance level on the APM. The frequent occurrence and discriminative qualities of the incomplete correlate error support the findings of Carpenter et al. (1990). From the occurrence of an incomplete correlate error it can be concluded that the distractor chosen is a reflection of a genuine attempt to find the solution to a given matrix problem. The correct solution, however, is only found in part, because of an apparent inability to manage a large set solution rules in memory. This seems to be particularly the case with lower scoring Ss. It should be noted that, along with the high incidence of the incomplete correlate error, lower scoring children have a relatively high instance of repetition errors. The occurrence of a repetition error indicates that the row- and/or columnwise deduction of the relationships between the elements of the matrix has failed. The S seems to switch strategies altogether. Apparently the whole matrix provides too much information. Instead, attention is focused on the immediate surroundings of the missing element, resulting in a repetition error. From
18.86
tncomplete
correlate
Wrong
18.42
principle
Repetition Error
Additional
elements
type
Fig. I. Barchart. with means and standard deviations (inside column), displaying the incidence of alternative choices resulting from incomplete correlate, wrong principle, repetition and additional elements errors, for above average. average and below average ability children.
Raven error analysis
437
the perpetration of an incomplete correlate error, however, it follows that the whole matrix problem has been taken into account. Together the current results suggest that both higher and lower scoring children set out to view the whole matrix problem, in order to educe the solution rules. With simple problems they might succeed completely or in part, resulting respectively in the choice of the correct alternative or of a distractor representing an incomplete correlate error. As the problems get more complicated, viewing the complete matrix might lead to such a profusion of information that either the wrong hypotheses are generated altogether, resulting in a wrong principle error, or a complete strategy shift occurs, resulting in repetition or additional element errors. Carpenter et al. have experimentally established that lower scoring Ss are different from higher scoring Ss in that the former experience more problems in systematically organizing information in working memory. This limitation will cause lower scoring Ss to perceive the matrices presented as being thick with information in an earlier stage of the test. They will therefore also be compelled to switch strategies in an earlier stage of the test. This may account for the relatively high instances of the repetition error with lower ability children. The definitions of the error type categories as applied in the present study were stricter than those applied in the APM manual. In spite of stricter definitions, classification of distracters according to error type still poses a problem, as indicated by the relatively low inter-rater reliability of 71.53%. This implies that these problems must also exist with the APM. The low inter-rater reliability, ensuing from the post-hoc nature of the analysis, also calls for cautious interpretation of the results found in this study. In the next section we therefore present a study concerning the design and application of a set of items which have been newly constructed with the a priori idea that the distracters should harbour information as to the inabilities of children regarding matrix analogy problems. The incomplete correlate error is the focus of attention, because of its frequent occurrence and discriminating qualities. Besides, a further examination of this error type allows one to generalize Carpenter et al.‘s findings to children in a wide ability level range. The distracters were constructed so that they could all be regarded as incomplete correlate errors, albeit differing with respect to the kind and number of solution rules omitted. STUDY
2: INCOMPLETE
PROBLEM
SOLUTIONS
First and foremost we wanted to find out whether the findings of Carpenter et al. (1990) could also be established for children in a wide ability range. The aim was to see whether children are more likely to make an erroneous alternative choice when a problem involves multiple rules. Furthermore we wanted to establish whether specific solution rules cause more problems than others, and whether this was particularly the case for lower scoring children. Carpenter et al. also attempted to obtain a ruling on this point, but were hampered by the post-hoc nature of their analysis. In their study they made two comparisons; (i) the item error rates with the kinds of rules involved in an item and (ii) the total test score, i.e. the ability level, with the error rate on items involved with specific kinds of rules. However, they were hindered by the fact that besides the number and kinds of rules involved in an item another factor, so called correspondence finding, influenced the item error rate. Correspondence finding is needed when an item is concerned with two or more rules. If this is the case, it is necessary to determine which figural elements or attributes in the 3 entries in a row or column are governed by the same rule. The exact influence of this last factor on item error rate is harder to pin down than the influence of the clear cut number, and kinds of rules. Thus, in attempting to find out to what extent specific kinds of rules influenced the item error rate and the total test score of a S, this third factor had a confounding effect. A part replication of the study by Carpenter et al. on the SPM would have been appropriate if only the post-hoc analysis problems did not exist for the SPM. The solution to this problem is an obvious one; present children with newly constructed test items, which resemble the items of the Raven SPM in appearance and difficulty. The new test (from now on referred to as the EPM; Experimental Progressive Matrices) should consist of items based on a priori definitions of the number and kinds of rules involved and the errors represented in each distractor. For the APM Carpenter et al. derived 5 solution rules needed to solve the APM items: (1) constant in a row, (2) quantitative pairwise progression, (3) addition or subtraction, (4) distribution
LINDA B. L. VODEGEL MATZEN et al.
438
Fig. 2. Example
of a matrix
problem with 2 rules involved; constant in a row and quantitative progression (alternative 3 is correct).
pairwise
of three values, and (5) distribution of 2 values. All 5 of these solution rules are represented in the three examples of newly constructed matrix analogy problems depicted in Figs 2, 3 and 4. The matrix problem depicted in Fig. 2 illustrates the first 2 rules; rule 1, the same shape occurs throughout a row, but changes down a column; and rule 2, a quantitative increment occurs in the number of small triangles between adjacent entries. Figure 3 depicts a matrix problem which pictures the third rule; rule 3, the line from the first column or row is added to another line in the second column or row to produce the matrix entry of crossed lines in the third. Finally, Fig. 4 shows a matrix problem which represents a combination of rules 1, 4 and 5; rule 1, the direction of the line in each entry is the same in each row, but alters down a column, rule 4, 3 different black shapes are distributed through a row or column, and rule 5, 2 of 3 figural entries in a row or column are framed by a square, whereas the third is not. Empirically it was established by Carpenter et al. that the item error rate increased when the items involved an increasing number of these rules. It was also established that items harbouring the first of the 5 solution rules resulted in the lowest error rates, whereas items in need of application
I
\
/
x +>
K f
p-),:>rstip) Fig. 3. Example
of a matrix problem
with
I rule involved; addition or subtraction
(alternative
4 is correct)
439
Raven error analysis
Fig. 4. Example
of a matrix problem with 3 rules involved; constant in a row, distribution and distribution of 2 values (alternative 7 is correct).
of 3 values,
of the last of the set of 5 rules, showed the highest error rates. However, no straightforward order was established for the remaining 3, intermediate in difficulty, rules. Nevertheless, some theoretical speculation can be made regarding the contribution of the 3 intermediate rules to item difficulty. According to Carpenter et al. successfully solving a matrix problem is associated with educing one rule at a time. This implies that it must be possible to order the 5 solution rules according to the priority they are given for deduction. Largely on the basis of empirical evidence Carpenter et al. suggested the following hierarchy: 1, constant in a row, 2, quantitative pairwise progression, 3, distribution of 3 values, 4, figure addition or subtraction, and 5, distribution of 2 values. Going on the assumption that obvious, simple rules are educed first, whereas complex rules are educed last, it might be suggested that the above hierarchy coincides with the hierarchy in difficulty in rule kind. Thus, according to Carpenter et al. the level of difficulty of an item is determined by at least 2 factors: (a) the kind of solution rules involved, and (b) the number of rules involved*. In striving for a set of items which matched the SPM in difficulty, and at the same time keeping the source of the item difficulty in mind, both well defined factors, number and kind of rule, were used in the item construction of the EPM. The hypotheses to be tested with regard to item difficulty were therefore 2-fold; it was expected that (i) the average error rate of items would increase going from items involved with 1 rule to items involved with multiple rules and (ii) the average error rate of items would increase going from items involved with simple rules to items involved with complex rules. Carpenter et al. have experimentally established that lower scoring Ss are different from higher scoring Ss in that the former experience more problems in keeping track of multiple rules. That is, lower scoring Ss lag behind in the ability to generate sub-goals in working memory, record the attainment of sub-goals, and set new sub-goals as others are attained. Thus, it is a difference in ability to infer and apply more than one rule which separates the ‘good’ from the ‘bad’ in matrix analogy problem solving. Moreover, according to Carpenter et al. the difference between the good and the bad is also linked with the ability to educe rules that do not contain corresponding figures or attributes in all 3 columns. As can be seen in the examples in Figs 3 and 4 this is the case with the addition or subtraction rule and with the distribution of the 2 values rule. It can therefore be expected that these 2 solution rules will cause those with low ability levels more problems than those with high ability levels. So, hypotheses regarding individual differences in omissions made in the *The exact influence of the correspondence finding factor on item difficulty is harder to pin down than the influence of the number, and kind of rules. Accordingly the correspondence finding factor was not included in designing the set of new items. This was done through making the figural elements governed by 1 rule distinctly different in appearance from the figural elements governed by another rule, thus omitting any misleading cues to correspondences.
440
LINDA B. L. VODEGEL MATZEN et al
rule deduction and application process, are 2-fold; we expected children with a high error rate on the EPM, compared to those with a lower error rate, to have a high instance of choosing response alternatives which (i) represent omissions of multiple solution rules and (ii) representing omissions of complex rules (addition or subtraction and distribution of 2 values). In summary, the following results are expected: (i) equal difficulty of SPM and EPM, i.e. no difference between the SPM and EPM mean error rate, and a high correlation between both; (ii) an increasing mean item error rate with an increasing number of solution rules, and increasingly more complex rules involved in an item; (iii) interactions between the ability level of Ss and the number and kinds of rules omitted in the solution process. Method Subjects
i
The Ss were 200 children in the age range 8.5512.5 years (M = 10.7) who participated on a voluntary basis, with the consent of their teachers and caregivers. The number of boys and girls was approximately equal. Procedure
Fifty two matrix analogy problems were designed, making use of the 5 solution rules defined by Carpenter et ul. (1990) for the APM. The first 7 items needed 1 rule to be applied in order to find the correct solution. The matrix problem depicted in Fig. 3 is an example of a 1 rule item. The following 22 items were all 2 rule items, like the one in Fig. 2. Next, 23 items were designed with 3 rules each relating the 9 entries to one another, like the one in Fig. 4. In designing the items the 5 solution rules were systematically varied. All 52 items were allotted with 8 response alternatives; 1 correct alternative, and 7 distracters. The 7 distracters of each item could only be made truly meaningful if an item was involved with at least 2 rules. In the case of a 2 rule item failing to infer and apply to either one of the rules, or both rules, will most probably result in the S choosing a distractor. Looking at the response alternatives of the matrix problem in Fig. 2, it can be seen how these 3 different kinds of ‘failing’ are represented in the alternatives. Alternative 1, 4, 6, and 8 are a reflection of failing to infer and apply one of the necessary rules. Alternatives 1 or 6 reflect an omission of the constant in a row rule, whereas the quantitative pairwise progression rule has been inferred and applied correctly. Alternatives 4 and 8 are equivalent to the opposite; the quantitative pairwise progression rule has been omitted, while the constant in a row rule has been inferred and applied correctly. Alternatives 2. 5, and 7 are examples of both rules having been omitted. Alternative 3 is the only alternative reflecting the correct inference and appliance of both rules. Along the same lines the response alternatives of a 3 rule item can be accounted for, albeit with an extra possibility of failing to apply all 3 solution rules. Alternative 1 of the 3 rule matrix problem in Fig. 4 is an example of the omission of all 3 necessary rules. Alternatives 2, 4. and 8 reflect the omission of various combinations of 2 rules. Alternatives 3, 5, and 6 are each an example of omitting 1 of 3 necessary rules, leaving number 7 to represent the correct response alternative. The response alternatives of 1 rule items are not all meaningful in the way described above for 2 and 3 rule items. Omitting the 1 necessary rule does not provide enough material to construct 7 meaningful distracters. Each 2 rule item was allotted with four 1 rule omission distracters (2 examples for each rule to be applied), three 2 rule omission distracters, and I correct alternative. Each 3 rule item was allotted three 1 rule omission distracters (I example for each rule to be applied), three 2 rule omission distracters (1 example for each possible combination of 2 of the 3 necessary rules), one 3 rule omission distractor, and 1 correct alternative. The allocation of an alternative number (1 through 8) to the correct alternative and the various distracters was done at random for each item. First the children were administered the paper and pencil version of the Raven SPM. The instruction and completion of the test took place in a classroom setting, in accordance with the instructions given in the manual (Raven c’t ul., 1988). Children were left to complete the test at their own pace. For each item they had to write down their choice made from the response alternatives. In a next test session, a few days later, the children were presented with the EPM. The EPM was administered along the same lines as the SPM, with the exception that explaining
Raven
error
analysis
441
the test was restricted to the first item of the EPM, because children already had the experience of completing the SPM. Because a subset of the children, viz. high and low scoring on the SPM, were to take part in another experiment, the order of presenting both tests, first SPM followed by EPM, was equal for all children. Results
Because the seven 1 rule items of the EPM were not allotted with 8 truly meaningful response alternatives, the analyses presented below were done on the 2 and 3 rule items only, unless stated otherwise. All effects reported are significant with P < 0.01, unless reported otherwise. EPM
items
Our first objective was to establish whether the construction of the EPM items was successful, by establishing the reliability of the EPM. On the whole the reliability of the EPM was high. Split-half reliability (dividing the EPM in odd and even numbered items) was 0.90 for both 52 and 45 items. Furthermore our study verified whether or not a difference between the SPM and EPM mean error rate existed, and whether SPM and EPM correlated with each other. Items from sets A and B of the SPM were excluded from the analyses comparing the SPM and the EPM, because set A is concerned with pattern completion, and set B contains 2 x 2 matrix analogies, with 6 response alternatives. Items from sets C, D and E of the SPM however, are of the same format as those presented in the EPM; 3 x 3 matrix analogies with 8 response alternatives. To determine whether the EPM and SPM were comparable as to their mean level of difficulty, an ANOVA with error rate as dependent variable, and test kind (EPM vs SPM) as within S factor was performed. No significant effect was found, which indicated that EPM and SPM items on the average were equal in difficulty; M = 39.51%, SD = 19.14 for the EPM, and M = 39.52%, SD = 14.06 for the SPM. A simple regression analysis with SPM error rate as regressor and EPM error rate as dependent variable yielded a significant linear trend, F(1, 195) = 238.10, with 55% (r = 0.74) of the variance in EPM error rate explained by the error rate on the SPM. From these results it can be concluded that SPM and EPM are equal in difficulty. It was predicted that with an increasing number of solution rules and increasingly more difficult rules involved in an item, the item difficulty would increase. In order to establish the influence of the number and kind of rules on item difficulty, each item was assigned an item weight on the basis of the number and the kinds of rules involved in that particular item. The contribution of a particular kind of rule to the item weight was dependent upon its place in the above postulated hierarchy of difficulty. Thus the 5 solution rules made the following contributions: constant in a row, 1 point; quantitative pairwise progression, 2 points; distribution of 3 values, 3 points; addition or subtraction, 4 points; and distribution of 2 values, 5 points. Two simple regression analyses were performed on all 52 items, to determine how much of the variance in item error rate was explained by the number of rules in an item and by the item weights. Carpenter et al. found that 45% of the variance in error rate was accounted for by the number of rules involved in an APM item. Present results are in agreement with this finding, i.e. 47% of the variance in item error rate was explained by the number of rules involved in an item. The item weight explained a substantial greater part of the variance in error rate; 63%. Note that both number and kind of rules determine the item weights. The 63% variance explained by item weight is therefore not independent of the 47% explained by the number of rules. Individual difSerences
in error and omission rate
As with the first study, percentiles were used for ability level division, albeit now percentiles were calculated for EPM scores (over 45 items). Ability level division was as follows; above average scoring Ss (at or above the 75th percentile, n = 52), average scoring Ss (between the 75th and 25th percentile, n = 96) and below average scoring Ss (at or below the 25th percentile, n = 52). As with the SPM, percentiles were calculated for each whole and half year age group in order to remove the influence of age on determining the percentile points. The last analyses were performed to determine what error patterns were to be distinguished, i.e. how often children omitted rules, and in particular how often each specific kind of rule was omitted.
LINDA
442
B. L.
VODEGEL
MATZEN
Ed al.
Besides establishing the overall error patterns, we were interested to know whether children of different ability levels displayed different error patterns. Thus, the following analyses are concerned with the response alternatives of the EPM items. They were performed on 2 and 3 rule items only, because I rule items did not have 8 truly meaningful alternatives. For each child a record was made of which alternative was chosen for each of the 45 items. Each distractor choice was scored on the number of rules omitted (1 or multiple), and the kind(s) of rules omitted (any of the 5 different kinds of rules). This yielded an omission score for 1 rule omission, multiple rule omissions, and for omitting each of the 5 solution rules. Besides, a total test score on correct alternative choices was calculated over the 45 EPM items. From an ANCOVA with ability level (3) as the between S factor, age as covariate and total test score as dependent variable it was confirmed that the total test score, decreased with decreasing ability level; above average (M = 37.73, SD = 3.52), followed by average (M = 26.91, SD = 4.72) followed by below average (M = 17.06, SD = 4.12). It goes without saying that a main effect for ability level can also be found for the omission score, it being the counterpart of the total test score. In reporting the results of subsequent analyses this main effect will therefore not be explicitly reported. The subsequent analysis was aimed at determining whether a significant difference in mean omission score existed between 1 and multiple rule omissions. This was done overall and for each of the 3 ability levels. For this purpose an ANCOVA was performed with omission score as dependent variable, ability level (3) as the between S factor, number of rules omitted (2) as the within S factor, and age as covariate. A main effect for number of rules omitted, F( 1,194) = 22.45, was due to a decrease in mean omission score going from omitting 1 rule (M = 13.83, SD = 6.08) to omitting multiple rules (M = 4.01, SD = 4.20). Although no interaction effect was found, Fig 5 shows that lower scoring children do have a higher instance in omitting multiple rules than do higher scoring children. The lower scoring children even show a higher instance of multiple rule omissions than the higher scoring children do for 1 rule omissions. This confirms the findings of Carpenter et al. that lower ability Ss have problems applying multiple rules, and thus tend to get mixed up when compelled to do so. However, the interaction effect being absent indicates that the increase in omission score with decreasing ability level is the same for 1 and multiple rule omissions. Subsequent analyses were concerned with the mean omission score of the 5 different kinds of rules. The following questions were to be answered; (i) does the omission score increase in accordance with the theoretical hierarchy of rule difficulty, as postulated above‘? and (ii) does the addition or subtraction rule and the distribution of 2 values rule discriminate better between the 3 ability groups than the other 3 rules do? In searching for an answer to both questions an ANCOVA was performed with omission score as dependent variable, ability level (3) as the between 20
lY.73
-
18 16 2
14-
$ z
12 -
:g
10 -
g
0
Ahove
average
H
Below
average
x-
2 5
64-
One
Number
Multiple
of rule omissions
Fig. 5. Barchart, with means and standard deviations (inside column), displaying the omission scores of 1, and multiple rule omissions, for above average, average and below average ability children.
Raven
error
analysis
7.21
Constant
in a row
Quantitative pairwise
Distribution of three values
Addition or subtraction
Distribution
two
of
values
progression
Rule kind Fig. 6. Barchart, with means and standard deviations (inside column), displaying the omission scores of constant in a row, quantitative pairwise progression, distribution of 3 values, addition or subtraction, and distribution of 2 values, for above average, average and below average ability children,
S factor, rule kind (5) as the within S factor, and age as covariable. The first question postulated above was confirmed by the main effect found for rule kind, F(4,776) = 4.49. Contrast analysis indicated that indeed a linear increase in mean omission score exists in the expected direction; constant in a row (M = 2.80, SD = 2.52) quantitative pairwise progression (M = 3.32, SD = 2.85) distribution of 3 values (M = 4.24, SD = 3.06), addition or subtraction (M = 5.87, SD = 4.34), and distribution of 2 values (A4 = 6.20, SD = 3.70). The interaction effect between ability level and rule kind, F(8,776) = 3.51, confirmed that some rules discriminate better between the 3 ability levels than others do. However, interactions between rule kind and Helmert contrast analysis for ability level were not both significant. The only significant interaction exists between rule kind and Helmert comparison 1 (above average vs average/below average), F(4,776) = 17.34, which means that some rules discriminate better between above average on the one hand and average and below average children on the other. The interaction between rule kind and Helmert comparison 2 (average vs below average) was not significant, indicating that the omission score of all 5 solution rules increases at the same rate going from average to below average Ss. Figure 6 suggests that, as expected, the omission score of the addition or subtraction rule and the distribution of the 2 values rule increases most of all with decreasing ability level. This was supported by the fact that no interaction effects were found between ability level and rule kind, when ANCOVAs were run separately for the 2 difficult solution rules (addition or subtraction and distribution of 2 values) and the 3 easier solution rules (constant in a row, quantitative pairwise progression, and distribution of 3 values).
GENERAL
DISCUSSION
The results from both studies presented above show that analysis of the responses to multiple choice analogy problems provides the means to obtain instant and direct knowledge as to the inabilities of a S. Unfortunately the original Raven test was not designed with this purpose in mind. As a result problems arise when attempting to perform a post-hoc error analysis of the SPM, as became apparent in the first study. The second study showed that a simple solution to these problems was to design a test which maintains the merits of the Raven test, albeit with a priori definitions regarding the errors represented in the response alternatives. The fact that the design of the EPM was very successful is shown in a multitude of ways. First of all a comparison of the mean error rates of the EPM and SPM and the correlation between both indicate a match in difficulty. Ideally the correlation between the EPM and SPM should equal the test-retest correlations of the SPM. The SPM manual reports test-retest correlations ranging from 0.55 up to 0.93 obtained from studies engaged with different samples varying in size, nationality, age range and special characteristics (e.g. deaf Ss) (Raven et al., 1988). Unfortunately none are provided for a sample as used in the present study. However, taking into account that the
444
LINDA B. L. VODECEL MATZEN et al
correlation between the EPM and SPM, r = 0.74, by far exceeds the lowest reported test-retest correlation reported for the SPM, it can be accepted as satisfactory. Secondly the number of rules involved in the EPM items yielded approximately the same explained variance in error rate for children as the one found by Carpenter et al. (1990) for undergraduates. This is an indication that the way in which the 5 solution rules were used for constructing the EPM harmonizes with the way in which these same 5 solution rules are applied in the Raven APM. Furthermore, it was established that the quality, as well as the quantity of rules involved in an item contributed to the item difficulty. Taking both factors into account, by means of the item weight, yielded a substantial explained variance in error rate, viz. 63%. Thirdly the construction of informative distracters with regard to the omissions made in the solution process also proved to be successful. By means of omission scores, obtained from the distracters, the theoretical expectations were confirmed. That is, the deduced hierarchy of difficulty of the 5 solution rules was established (constant in a row, quantitative pairwise progression, distribution of 3 values, addition or subtraction and distribution of 2 values). Furthermore it was demonstrated that a difference in the power to discriminate among Ss of various ability levels exists between on the one hand solution rules with corresponding elements in all 3 entries of a row or column (constant in a row, quantitative pairwise progression, and distribution of 3 values) and on the other hand that of solution rules with corresponding elements in 2 out of 3 entries only (addition or subtraction and distribution of 2 values). The latter 2 solution rules were particularly difficult for lower scoring children. The results obtained from the EPM coincide with the findings of Carpenter et al. concerning the way in which multiple rule deductions are handled during processing. Carpenter et al. amassed evidence from various sources to suggest that low scoring Ss try to confirm multiple solution rules in parallel. This may result in a working memory overload and the selection of a response alternative on the basis of an incomplete analysis of the variation contained by the entries of the problem matrix. High scoring Ss on the other hand were found to confirm solution rules in a serial manner. They thereby minimized the chance of a memory overload, and a subsequent premature termination of the solution process. The present results provide further support for these findings. Tn the case of a distractor choice, children preferred a distractor representing the omission of 1 rule to those representing multiple rule omissions. Omitting 1 rule is a clear indication of incomplete problem solving with both 2 and 3 rule items. because at least 1 or 2 rules have been applied correctly. Yet, omission of multiple rules is less likely to be the result of an incomplete problem solution. Omission of 2 or 3 rules might just as well represent a completely different error type, e.g. application of a wrong principle. Omission of 1 rule occurring most often, particularly with less skilled reasoners. thus again provides a link between less skilled reasoners and incomplete problem solving supposedly associated with premature termination of the solution process. It must be kept in mind that the incomplete problem solution is one of several possible types of errors which can be observed on the Raven SPM test. This is clearly indicated by the results of the first study. In the second study, the focus has been on the incomplete problem solution error. Likewise incorporating the other error types in the response alternatives might prove fruitful. In this way it might be possible to devise test material with the merits of the Raven. i.e. a reliable measure of analytic ability, and the extra advantage of providing information concerning the (in)competences of individual Ss. Various methods of identifying processes of problem solving behaviour, may benefit from this information. The cognitive origins of individual differences in analogy problem solving ability are researched in a multitude of ways. Correlational methods, training studies, and unravelling the solution process in distinct cognitive components are all examples of indirectly establishing individual differences in cognition. With these approaches sidelong accomplishments are the focus of attention and not the direct performance on a test measuring analogy problem solving ability. These sidelong accomplishments vary from the performance on some extra task (as with the cognitive correlates approach, e.g. Keating & Bobbitt, 1978) to exhibition increase in performance level from pre- to post-training test sessions (the cognitive training approach, e.g. Malloy, Mitchell & Gordon, 1987) to the reaction times associated with solving analogy problems (the cognitive components approach, e.g. Goldman & Pellegrino, 1984). More structured test material with informative response alternatives may be of use in these kinds of studies. The cognitive correlates approach might benefit through systematically
Raven
error analysis
445
relating cognitive processes to differences in occurrence frequencies of specific error types or omission rates of specific solution rules. Analysis of distracters chosen provides the cognitive training approach with easy access information as to the specific inabilities of Ss, thus supplying virtual instant knowledge as to what a S should be trained on. The computer based approach may analyse the qualitative differences (intellectual operations, types of errors made, and kinds of rules omitted) in the performance by man and by machine. And finally the cognitive components approach might find use in linking specific error types and inabilities to educe specific solution rules with deficits regarding specific cognitive processes. Acknowledgements-We
would like to thank Wilma Schepers
and Ellen de Boer for their assistance
in the data collection.
REFERENCES Carpenter, P. A., Just, M. A. & Shell, P. (1990). What one intelligence test measures: A theoretical account of the processing in the Raven progressive matrices test. Psychological Review, 97, 404431. Goldman, S. R. & Pellegrino, J. W. (1984). Deductions about inductions: analyses of developmental and individual differences. In Sternberg, R. J. (Ed.), Advances in the psychology of human intelligence (Vol. 2) (pp. 1499197). Hillsdale, NJ: Erlbaum. Jacobs, P. I. & Vandeventer, M. (1970). Information in wrong responses. Psychological Reports, 26, 31 l-315. Keating, D. P. & Bobbitt, B. C. (1978). Individual and developmental differences in cognitive processing components of mental ability. Child Developmenf, 49, 155-167. Malloy, T. E., Mitchell, C. & Gordon, 0. E. (1987). Training cognitive strategies underlying intelligent problem solving. Perceptual and Motor Skills, 64, 1039-1046. Marshalek, B., Lohman, D. F. & Snow, R. E. (1983). The complexity continuum in radex and hierarchial models of intelligence. Intelligence, 7, 1077127. Raven, J. C., Court, J. H. & Raven, J. (1983). Manual for Raven’s progressive man-ices and vocabulary scales: Secrion 4 Advanced progressive matrices. London: Lewis. Raven, J. C., Court, J. H. & Raven, J. (1984). Manualfor Raven’s progressive matrices and vocabulary scales: Section 2 Coloured progressive matrices. London: Lewis. Raven, J. C., Court, J. H. & Raven J. (1986). Manual for Raven’s progressive matrices and vocabulary scales: Section I General overview. London: Lewis. Raven, J. C., Court, J. H. & Raven, J. (1988). Manualfor Raven’s progressive matrices and vocabulary scales: Section 3 Standard progressive matrices. London: Lewis. Siegel, R. S. (1983). Information processing approach to development. In Mussen, P. H. (Ed.), Handbook of child psychology, (vol. I) (pp. 129-211). New York: John Wiley. Thissen, D. M. (1976). Information in wrong responses to the Raven progressive matrices. Journal of Educational Measuremenr, 13, 201-214. Vejleskov, H. (1968). An analysis of Raven matrices responses in fifth grade children. I. Scandinavian Journal of Psychology, 9, 177-186. Watts, W. J. (1985). An error analysis on the Raven’s using Feuerstein’s deficient cognitive functions. The Alberta Journal of Educational Research, 31, 41-53.