FILLER

FILLER

GLINER ET AL. ficients (ICCs) allow the researcher to calculate a reliability coefficient in these studies. One criterion that must be satisfied in orde...

66KB Sizes 4 Downloads 110 Views

GLINER ET AL.

ficients (ICCs) allow the researcher to calculate a reliability coefficient in these studies. One criterion that must be satisfied in order to use the ICC is that the behavior to be rated must be normally distributed. ICCs are computed with repeated-measures analysis of variance. ICCs also can be used for test-retest reliability and internal consistency reliability. An advantage of ICCs is that if the judges are selected randomly from a larger group of potential raters, the researcher can generalize the interrater reliability beyond the specific judges who took part in the reliability study. Kappa. A method of calculating ICCs when the data are nominal is the κ statistic, which can be computed with two or more raters. The κ also corrects for chance agreements. For an excellent review of ICC type methods, including κ, see Bartko and Carpenter (1976). Generalizability Theory, Item Response Theory, and Reliability

The methods that we have discussed to assess reliability are based on classical test theory. A major problem with classical test theory is that measurement error is considered to be a single entity, which does not give the researcher the information necessary to improve the instrument. Generalizability theory, an extension of classical test theory, allows the investigator to estimate more precisely the different components of measurement error. In its simplest form, this theory partitions the variance that makes up an obtained score into variance components, such as variance that is attributable to the participants, to the judges (observers), and to items. Item response theory allows the researcher to separate test characteristics from participant characteristics. This differs from both classi-

cal test theory and generalizability theory by providing information about reliability as a function of ability, rather than averaging overall ability levels. Nunnally and Bernstein (1994) and Strube (2000) provide more complete discussions of these topics. SUMMARY ON RELIABILITY METHODS

We have discussed different methods of assessing reliability. If one uses a standardized instrument, reliability indices should have been established and published in an instrument manual, often referred to in the original journal publication. The method section of a research article also should provide evidence about the reliability of the instrument. While each method to assess reliability gives some measure of consistency, they are not the same. To say that an instrument is reliable has relatively little meaning because each statement of reliability should specify the type(s) of reliability, the strength of the reliability coefficient, and the types of subjects used. Before using an instrument, an investigator should evaluate reliability and how it was established. Measurement reliability is a necessary but not sufficient prerequisite for measurement validity, the topic of the next column. REFERENCES Bartko JJ, Carpenter WT (1976), On the methods and theory of reliability. J Nerv Ment Dis 163:307–317 Nunnally JC, Bernstein IH (1994), Psychometric Theory, 3rd ed. New York: McGraw-Hill Strube MJ (2000), Reliability and generalizability theory. In: Reading and Understanding More Multivariate Statistics, Grimm LG, Yarnold PR, eds. Washington, DC: American Psychological Association, pp 23–66

The next article in this series appears in the June 2001 issue: Measurement Validity • George A. Morgan, Jeffrey A. Gliner, and Robert J. Harmon

488

J . A M . A C A D . C H I L D A D O L E S C . P S YC H I AT RY, 4 0 : 4 , A P R I L 2 0 0 1