Psychometric issues in “intelligent testing” using the null hypothesis approach

Psychometric issues in “intelligent testing” using the null hypothesis approach

PSYCHOMETRICISSUES IN"INTElliCENTTESTINC" USlNGTHENUllHYPOTHESlSAPPROA VICTOR L. WILLSON, HARRISON C. STANTON,JR.,AND ARTURO OLIVAREZ,JR. TEXAS A & M ...

480KB Sizes 0 Downloads 113 Views

PSYCHOMETRICISSUES IN"INTElliCENTTESTINC" USlNGTHENUllHYPOTHESlSAPPROA VICTOR L. WILLSON, HARRISON C. STANTON,JR.,AND ARTURO OLIVAREZ,JR. TEXAS A & M UNIVERSITY

ABSTRACT:

The analysis and interpretation of intelligence subtest profiles was authored primarily by AS. Kaufman. He attempted to interpret learning styles of children from apparent extreme scores defined in terms of the difference between mean subtest composite score and mean score of the global score or factor from which the composite subtests were taken. Kaufman used a uniform difference value of 3 points, making many simplifying assumptions about the subtests. In this article these assumptions are examined. The difference variable is defined, and its variance, reliability, and standard error are derived. It can then be shown that two different subtests will in general not have the same characteristics, and consequently, a 3-point difference will select quite disparate proportions of a normal distribution. Two examples are provided to illustrate calculations and demonstrate this difference in proportion selected under a normal distribution. Whereas Kaufman’s concept of score interpretation is not discredited, the implications for practitioners are that each subtest defined for learning styles must be separately examined and validated for prevalence in a population.

A.S. Kaufman provided practitioners with a major conceptual advance on publication of his text titled Intelligent Testing With the WISC-R (Kaufman 19791, conceming intelligence test score interpretation. Kaufman linked the performance on WISC-R subtests to specific learning styles through score profile analysis. This was presented in contrast to the prevailing interpretation of global scores or, when considered, more subjective interpretation of subtest performance common at that time. A major part of Kaufman’s advice on intelligent testing was based on the examination of patterns of extreme subtest scores. Extremity was defined by him as a deviation on a subscale of three or more scale score points (mean = 10, sd = 3) from the average of the subtests in the component to which the subscale belongs (i.e., Verbal and Performance). Kaufman made a great simplifying assumption: the 3-point difference would be associated with approximately similar prevalence Direct

all correspondence

to: Victor

Willson,

School

of Business

Administration,

Texas A & M University,

College

Station,

TX 77840.

Learning and Indiiuai All rights of reoroduction

Differences, Volume in anv form reserved.

1, Number

2, 1989,

pages 247-254.

Copyright

0 1989

by JAI Press, ISSN:

Inc.

1041-6080.

248

1 EARNING AND INDIVIDUAL

DIFFERENCES

VOLUME

1. NUMBER

L, 1989

for each subscale. In the intervening decade several papers have challenged this assumption in the relevant literature. Naglieri (1982) took issue with Kaufman’s 3-point rule for profile interpretation, and Silverstein (1984) derived the standard deviation (sd) of the difference between a single subtest and the mean of subtests of the global score from which it was drawn. Silverstein (1984, 1985) also examined both the theoretical and empirical distributions of difference scores for scales on the Behavior Scale. Empirical and theoretical differences were quite similar for various percentile points between the sixteenth and first. Much work has been done on the nature of part scores related to their validity and reliability (e.g., Willson & Reynolds 1984a, b, 1985) so that an analytical evaluation of Kaufman’s criterion can be undertaken for any number of subtests. An empirical analysis with major intelligence tests is being concurrently explored by the authors in other works. Because both analytical and empirical evaluation of Kaufman’s criterion can be made, the specifics of the 3-point rule can be detailed. It should be noted that discrepancies between Kaufman’s conclusions and the work presented here do not discredit the practice of linking a child’s learning style to particular profile patterns. Instead, more complex representations and models for investigating profile patterns will be required.

PSYCHOMETRIC

THEORY

OF DISCREPANCY

The theory of discrepancy scores has been investigated in great detail over the last decade in the context of learning disabilities. Most recently Reynolds (1984) and Willson and Reynolds (1984b) have described various models in practice and recommended as most reasonable the regression model for estimating differences between achievement and intelligence scores. These models assume separate achievement and intelligence measures. Part-whole scores were investigated by Willson and Reynolds (1984b, 1985) in the context of screening tests consisting of subtests derived from entire batteries. Both of these psychometric literatures can be employed to examine Kaufman’s difference criterion. The difference between a single subtest x1 and the average M of the K subtests in the component from which it is drawn is

D = =

=I -

@i

)/k (1)

z1

-

M

The variance of D, given equal subtest variances,

= us =. +

is

(elk’) k + kkP*,*,) - ZP,~,U,,U, J#i

,‘=I

This equation is the same as that reported between the subtest and its composite is

by Silverstein

(1984). The correlation

PSYCHOMETRIC

ISSUES IN “‘INTELLIGENT

24’3

TESTING”

(3)

The reliability of D is PDD’

(Pw,

=

+ PUM’

-

2P=,At)/(2

-

2Psx,)

(4)

This result allows us to calculate a standard error for difference D: 0e.D

=

CD

Jl’i-,,

(5)

These parameters are estimable from estimates of the part-whole correlations and allow one to determine for each subtest the probability of occurrence of a 3 point or greater difference under a null hypothesis of no difference and under assumption of normality. Because the standard deviation of difference depends entirely on the number of subtests in the composite, the number of subtests in the global score, and the subtest intercorrelations (for subtests having equal standard deviations, as is true for tests such as the WISC-R and K-ABC), in the simplest case it is trivial to show that a 3-point difference will not result in the same percentage of subjects exceeding it for subtests that correlate differently with their composite mean. This will be true of any score difference, even if it is composed of a sum (or mean) of subtests differenced from its appropriate global score. Although the formulations are complicated they are derivable in general and are given below in reduced form for the case of two subtests x1 and x2 drawn from a component with k subtests (the Appendix gives a general form for the standard deviation and for the composite-global correlation): D=

k

(6) =

(21 +

2,)/Z

-

M,

=x-MM,

(7)

250

LEARNING AND INDIVIDUAL

=

POD’

(P,r’ +

PMM,

-

arM)/(2

DIFFERENCES

-

VOLUME

1, NUMBER

+a)

2,1989

(9)

The standard error of D in this case is of the same form as Equation (5). These formulas allow characterization of the theoretical distribution of difference scores D for a normal population and allow estimation of the percentage of scores in a given sample expected to exceed Kaufman’s value of 3 points. This is dependent on the correlation matrix of subtest scores. Using the correlations reported by Kaufman 0979) for the WISC-R, the calculations of Equations (1) to (5) and the predicted number of scores exceeding 3.0 is given in Table 1 for a single subtest, Information, which is a verbal ability subtest. Because all subtests and the mean of the subtests have the same mean, 10, the theoretical difference between any subtest and its component mean is zero. The standard deviation of the difference, given as the square root in Equations (2) and (7) in the previous derivations, provide us with the distributional information needed, under normality, to calculate the number of cases exceeding a value of 3 recommended by Kaufman. For Information the standard deviation of difference is 1.726. This would result in 8 cases per 100 exceeding a difference of 3 (absolute value, because Kaufman does not distinguish positive from negative differences). Ex-

TABLE1 Calculation of a Sinde Subtest’sDiscrepancy Statisticsfor the Information Subtestof the WISC-R

Q + W‘%(Q@) + 2(Q)&..,, o,,-1 = 0 + (l/36)(169) =

1.726

(3)

P,”

=

0.82*

(4)

ODD’

= =

(6,’

+

PMM

(0.85

+

0.94

=

0.417

(5)

O..D

=

1.726Jl

=

1.318

Thuq a 3 point difkmce

-

repraurb

Note: This in

thetabled

-

2PIM )/(2 2(O.S2))/(2

-

2PIM)

-

2(0.82))

0.417

l

prevrlence of 8 scorn per 100 exeadiig l

2(.82(3)4-

2.978

Ug= on

-

- 2@32)(3)~,

s-score of %=/1.726 3 points.

value from Kaufman.

=

1.739. with a thaxeticrl

PSYCHOMETRIC

ISSUES IN “INTELLIGENT

TESTING”

251

TABLE2 Calculation of a Paired Subtest’sDiirepancy Statisticsfor the Paper and Pencil Skill Compositeof the WISC-R, Composedof Cod& and Mazes SubscalesCompared with Pekwmance Scales D =

(6)

(H

+ CD)/2

-

hf.

=X-M, (7)

0;

=

(1/4)(oi

=

(l/4)(9 + 0 + 2(0.21)(9)1 +

-

+ 4,

+ 2P”..O~M~0ao)

+ 4.

-

*n,o;-ou

ol=)II~)(~I + ww.91

2(2.222)(2.Ot32)[(0.gg+ O.&x)/(2 + 0.21))

= 2397 0Ll = (8)

6”

l.MB

=

(OYC”.“,

=

(2(0.65) + S(Oas))/&l

+ ~Lm#0..U.)/

0:

+ @‘oL. + 1Cu.aD~~aJr

+ 0 + 2(0.21)(O)

= o.mtl (0) PK.0’ = -

(ez

+ P”,“;

(O.TgIl + 0.0 -

-

2piu,)l(2

Z(O.TSO))/(Z -

-

2CiY.1 Z(O.760))

= 0.212 (5)

0.a

= an JG=

1..54gm

=

1.283

Thus,. 2 point diKe,e,,ce rep-. of, = S/l.Mg prenlente of g #corm per 100 acesding 9 poillts.

=

1327 with.

thamthi

amining the empirical distribution of differences from the norm sample (Kaufman 1979) yields an empirical estimate of 7.7 cases per 100. The Paper and Pencil Skill composite, composed of Mazes and Coding subtests of the WISC-R performance scale, is analyzed in Table 2, representing a case with two subtests. The standard deviation of difference is calculated to be 1.548, resulting in an expected 5 of 100 persons exceeding the 3-point value. The empirical number is 5.4 per 100. Table 3 provides the theoretical and empirical percentages of the WISC-R norm sample for a minimum 3-point difference on the various subtests and composites that Kaufman discussed in his text. These results show a great variability in theoretical percentages exceeding 3 points from virtually 0 to almost 25%, while the differences between theoretical and empirical percentages are generally quite similar. TABLE3 Theoretical and Empiriial PrevalenceRatesfor Subtestsand Compositesof the WISC-R Subtest or Composite I. Verbal Composites INFORMATION SIMILARITIES ARITHMETIC VOCABULARY COMPREHENSION DIGIT SPAN

Difference SD

Theoretical Percentage

Empirical Percentage

1.726 1.803 2.02 1.601 1.950 2.421

8 9.7 13.6 6.1 12.4 21.5

7.7 10.2 12.3 6.6 10.2 20.8 kontinued)

252

LEARNING AND INDIVIDUAL

DiFFERENCES

VOLUME

1, NUMBER

2, 1989

TABLE 3 icontinued) St&test or Composife

D~~~ff~~~e SD

T~~r~ca~ Percmtuge

E~~~~cal Percentage

1. VerbalComposites(continued) VERBAL COMPREHENSION VERBAL CONCEPTUALIZATION ACQUIRED KNOWLEDGE MEMORY ABSTRACT THINKING FUND OF INFORMATION MENTAL ALERTNESS EXTENT OF OUTSIDE READING

,772 ,993 .829 .993 1.174 1.154 1.565 ,912

.l .3 .4 .3 1.1 1.0 5.5 .l

.1 .3 .I .3 1.5 1.1 4.7 .2

2. Performance Composites PICTURE COMPLETION PICTURE AR~GEMENT BLOCK DESIGN OBJECT ASSEMBLY MAZES CODING PERCEPTUAL ORGANIZATION SPATIAL ABILITY CONVERGENT PRODUCTION HOLISTIC PROCESSING INTEGRATED BRAIN FUNCTION PAPER AND PENCIL PLANNING ABILITY VISUAL-MOTOR COORDINATION VISUAL ORGANIZATION V-P OF M~INGFUL STIMULI

2.143 2.199 1.979 1.993 2.283 2.587 0.507 1.054 1.546 1.377 0.680 1.548 1.413 0.470 1.404 1.104

16.2 17.4 9.7 13.1 19.0 24.6 0.0 0.4 5.2 2.9 0.0 5.2 3.4 0.0 3.2 0.3

15.4 16.0 8.9 13.3 19.3 23.8 0.0 0.5 5.3 3.1 0.0 5.4 3.0 0.0 2.7 0.1

DISCUSSION The examples and the complete set of data for the WTSC-R composites demonstrate what the formulas also say, that Kaufman’s 3-point rule for noting important differences is not generalizable across the various subtests defined by Kaufman for the W&C-R, or more recently by Kaufman and Kaufman (1983) and Kamphaus and Reynolds (1987) for the K-ABC. These results do not invalidate the procedure advocated by Kaufman to examine children’s learning styles in terms of performance patterns on specific subtests of standardized intelligence tests. Nevertheless, each subtest defined by Kaufman or hers should be empirically and theoretically evaluated. The simplifying assumption made by Kaufman in his initial work does not hold in practice.

APPENDIX ~

I

General form of the standard deviation of difference D between a composite mean C with p subtests and a global meaning M containing K subtests (subtests have a standard deviation s).

PSYCHOMETRIC

ISSUES

IN “INTELLIGENT

TESTING”

253

where

4 = (l/P)4 +

P/P)@7'(j34 i>j

j=1

and

Proof: The proof follows directly from the result of the variance found in any basic statistics book.

of a difference

Theorem II General form of the correlation between a composite c with p subtests Xi and its global score M with K subtests (subtests all have a standard deviation equal to S).

PCM

=

=

Proof: Begin with Equation (8) in the text. Create a variable X= (l/3) (xl + x2) + x3/3. Insert the two parts of X into 1 and reduce. This results in the proper equation for 3 subtests. Create a new variable X equal to the sum of p - 1 subtests divided by p and subtest x, divided by p. Insert into Equation 8 and reduce.

ACKNOWLEDGMENT:

Paper presented at the American Educational Research Association Annual Meeting, New Orleans, April 1988.

254

LEARNING AND INDIVIDUAL

DIFFERENCES

VOLUME

1, NUMBER

2, 1989

REFERENCES Kamphaus, R.W., & C.R. Reynolds. (1987). Clinical and resenrch applications of the K-ABC. Circle Pines, MN: American Guidance Service. Kaufman, AS. (1979). Intelligent testing with the WISC-R. New York: Wiley Interscience. Kaufman, A.S., & N.L. Kaufman. (1983). KutQtnmnn Assessment Battery for Children: Administrution and scoring manual. Circle Pines, MN: American Guidance Services. Naglieri, J.A. (1982). “Interpreting the profile of McCarthy Scales Indexes.” Psychology in the Schools, 19, 49-51. Reynolds, C.R. (1984). “Critical measurement issues in learning disabilities.” The Journal of Special Education, 18, 451-476. Silverstein, A.B. (1984). “New formulas for evaluating the abnormality of test score differences.” Journal of Psychoeducational Assessment, 2, 79-82. _.(1985). “Unusual differences between domain standard scores on the Vineland Adaptive Behavior Scales: Estimated versus empirical values.“Journul of Psychoeducutional Assessment, 3, 291-293. Willson, V.L., & CR. Reynolds. (1984a). “Another look at evaluating aptitude-achievement discrepancies in the diagnosis of learning disabilities.” The Journal of Special Education, 18,477-488. __. (1984b). “Regression effects on part scores based on whole-score selected samples.” Educational and Psychological Measurement, 44,95-99. __. (1985). “Constructing short forms from composite tests: Reliability and validity.” Educational and Psychological Measurement, 45, 453-468.