Methods of setting standards of performance on criterion referenced tests

Methods of setting standards of performance on criterion referenced tests

0~9t~9~~89 $0.00 c 50 Q 7989 Pergamcn Press @C A. standard Ialso aeon as a passing score, cut score, or pass-fall point) is a form of decision rule a...

220KB Sizes 0 Downloads 32 Views

0~9t~9~~89 $0.00 c 50 Q 7989 Pergamcn Press @C

A. standard Ialso aeon as a passing score, cut score, or pass-fall point) is a form of decision rule applied to test scores. People who score at or above the standard are classified in one ways and people who score below the standard are classified in a d~ffer~~~way. ~~o~~g dis~~b~~o~s on most tests are reasonably ~o~~~uo~s~ there are people at almost aU.Ieveb of the score d~str~b~t~on. The standard or p~ss~~~ score splits the continuous score distributions into two parts. Providing a justification for splitting the score distribution at some particular point rather than at a higher or lower one is a difficult task. If all the people who deserved to pass obtained perfect scores and if all of the people who deserved to fail obtained scores below the chance level, setting standards would be a trivial task. The fact that the scxx+esof peopfe who deserve to pass and the scores of people who deserve to fail often overlap makes standard setting a d~ff~c~lt~ f~~s~ra~~~gand ~~~rna~e~ysubjective task.

There is no rigorous, objective way to set standards. Every method of setting standards depends on subjective, human jud~~~ent at some stage of the process. There is no way to prove that any particular standard is correct because the “correctness” of a standard depends on one’s values. No matter where a standard has been set, some people will argue that it is absurdly lowS rnaki~~ a mockery of ~du~a~o~ and ~~da~g~r~~~ the public. Other will argue

336

M. Zieky

t h a t t h e s a m e s t a n d a r d is u n f a i r l y high, d e p r i v i n g s t u d e n t s a n d s o c i e t y of w h a t r i g h t f u l l y b e l o n g s to t h e m . B e c a u s e all s t a n d a r d s d e p e n d o n s u b j e c t i v e v a l u e j u d g m e n t s

and because

d i f f e r e n t p e o p l e h a v e d i f f e r e n t v a l u e s , it is i m p o s s i b l e to a v o i d c o n t r o v e r s y w h e n s e t t i n g s t a n d a r d s o n a n i m p o r t a n t test. g o o d m e t h o d will find.

Standards

T h e r e is n o "true" s t a n d a r d t h a t a

are constructed,

n o t d i s c o v e r e d , a n d the

r e s u l t i n g s t a n d a r d will c a u s e e r r o r s of classification. P o p u l a r M e t h o d s of S e t t i n g S t a n d a r d s Unfortunately, the most widely used methods not

the

most

standards because

appropriate

ones.

are those based

The

most

on tradition.

of s e t t i n g s t a n d a r d s

common

methods

are

of s e t t i n g

("The p a s s i n g s c o r e is 7 0 p e r c e n t

t h e p a s s i n g s c o r e h a s a l w a y s b e e n 70 p e r c e n t " ) or o n p o w e r ("The

p a s s i n g s c o r e is 8 0 p e r c e n t c o r r e c t b e c a u s e I s a y so"). M o r e r e c e n t l y , a m e t h o d of s e t t i n g s t a n d a r d s b a s e d o n n e g o t i a t i o n h a s b e c o m e p o p u l a r . A c o m m i t t e e of j u d g e s is a s k e d to s e t a s t a n d a r d a n d t h e r e s u l t is u s u a l l y a compromise

between

those who want

to s e t a h i g h s t a n d a r d

to m a i n t a i n

"quality" a n d t h o s e w h o a r e c o n c e r n e d t h a t s u c h s t a n d a r d s a r e " u n r e a l i s t i c a l l y " high. The problem with such popular methods

is t h a t t h e y fail to t a k e i n t o

a c c o u n t t h e difficulty of a p a r t i c u l a r s e t of t e s t q u e s t i o n s u s e d for a p a r t i c u l a r purpose. D e f e n s i b l e M e t h o d s of S e t t i n g S t a n d a r d s The

methods

that

follow a r e n o t

perfect.

s u b j e c t i v e j u d g m e n t s of fallible h u m a n b e i n g s .

They

are based

on the

However, they focus the judges'

a t t e n t i o n on t h e a p p r o p r i a t e f a c t o r s a n d pool t h e j u d g m e n t a c r o s s a n u m b e r of p e o p l e to o b t a i n m o r e reliable e s t i m a t e s of t h e r e s u l t i n g s t a n d a r d . J u d g m e n t s c a n b e m a d e a b o u t t h e q u e s t i o n s in t e s t s or a b o u t t h e p e o p l e w h o t a k e t h e tests. P o r t i o n s of t h e d e s c r i p t i o n s of s t a n d a r d s e t t i n g m e t h o d s h a v e b e e n a d a p t e d f r o m t h e m a n u a l P a s s i n g S c o r e s b y L i v i n g s t o n a n d Z i e k y (ETS, 1982). Methods Based on Judgments

of Q u e s t i o n s

T h e t h r e e m e t h o d s to b e d e s c r i b e d a r e k n o w n a s t h e Angoff, (1971), Ebel, (1972) a n d N e d e l s k y (1954) m e t h o d s a f t e r p e o p l e w h o f i r s t d e s c r i b e d

Setting Standards of Performance

them

in

print.

The

"borderline" test-taker.

three

methods

are

based

on

the

concept

337

of t h e

T h i s t e s t - t a k e r is t h e o n e w h o s e k n o w l e d g e a n d skills

a r e on t h e b o r d e r l i n e b e t w e e n t h e p e o p l e w h o d e s e r v e to p a s s a n d t h e p e o p l e w h o d e s e r v e to fail.

T h e s e m e t h o d s a r e b a s e d on t h e i d e a t h a t , s i n c e t h e t e s t -

t a k e r s w h o d e s e r v e to p a s s t e n d to e a r n h i g h e r s c o r e s t h a n t h o s e w h o d e s e r v e to fail, t h e s t a n d a r d s h o u l d b e t h e s c o r e t h a t w o u l d b e e x p e c t e d f r o m a p e r s o n w h o s e skills a r e o n t h e b o r d e r l i n e .

The judgments

these methods require are

m a d e in t e r m s of t h e specific q u e s t i o n s o n t h e test. The methods or a f t e r

the

judgments

test

are relatively convenient and can be applied either before is a d m i n i s t e r e d .

In a d d i t i o n ,

the

process

of m a k i n g

about test questions focuses the judges' attention closely on the

c o n t e n t of t h e test.

Most important, the necessary data--judgments about test

questions--can nearly always be obtained. E a c h of t h e s e m e t h o d s c o n s i s t of five b a s i c s t e p s : 1,

Select the judges;

2.

Define " b o r d e r l i n e " k n o w l e d g e a n d skills;

3. 4.

T r a i n t h e j u d g e s in t h e u s e of t h e m e t h o d ;

5.

C o m b i n e t h e j u d g m e n t s to c h o o s e a p a s s i n g score.

Collect j u d g m e n t s ;

T h e first two s t e p s a r e t h e s a m e for all m e t h o d s . differ. A d e t a i l e d d e s c r i p t i o n of t h e t h r e e p r e s e n t e d in t h e a r t i c l e s r e f e r r e d to a b o v e .

ways

The remaining steps

of s e t t i n g

methods

is

T h e s e c o n d s t e p is to h a v e t h e j u d g e s d i s c u s s a n d define w h a t is m e a n t by "borderline" performance.

It is c r u c i a l t h a t all t h e j u d g e s u n d e r s t a n d

the test measures

a n d h o w t h e t e s t s c o r e s a r e to b e u s e d .

should

to d e s c r i b e a p e r s o n

be asked

whose knowledge

what

Then the judges and

skills would

r e p r e s e n t t h e b o r d e r l i n e b e t w e e n a c c e p t a b l e a n d u n a c c e p t a b l e levels of t h e k n o w l e d g e a n d skills t h e t e s t m e a s u r e s . T h e j u d g e s m a y find it c o n v e n i e n t to d e s c r i b e t h e p e r f o r m a n c e of s p e c i f i c p e o p l e t h e y h a v e w o r k e d w i t h , w h o m t h e y w o u l d c l a s s i f y a s "borderline".

338

M. Zieky

References

Angoff, W.H. Scales, Norms, a n d Equivalent Scores. In: R.L. Thorndike (Ed.) (1971) Educational Measurement, Washington D.C., American Council on Education, pp. 514-515. (Source d o c u m e n t for AngofPs method). Ebel, R.L. (1972). Essentials o f Education Measurement. Englewood Cliffs, N.J., Prentice-Hall, pp. 492-494. (Source d o c u m e n t for Ebel's method). Livingstone, S.A. and Zieky, M.J. (1982). Passing scores, Princeton, N.J., E d u c a t i o n a l Testing Service. (Portions of this paper were a d a p t e d from the manual). Nedelsky, L. (1954). Absolute Grading S t a n d a r d s f o r Objective Tests, Vol. 14, No. 1, pp. 3-19. (Source d o c u m e n t for Nedelsky's method).

The Author

MICHAEL ZIEKY is a Principal M e a s u r e m e n t Specialist at Educational Testing Service in Princeton, New J e r s e y . His major responsibilities include test development, quality assurance, and education of professional staff members.