34E Future Challenges to Psychometrics: Validity, Validity, Validity

34E Future Challenges to Psychometrics: Validity, Validity, Validity

34E Handbook of Statistics, Vol. 26 ISSN: 0169-7161 © 2007 Elsevier B.V. All rights reserved DOI: 10.1016/S0169-7161(06)26044-9 Future Challenges to...

30KB Sizes 2 Downloads 73 Views

34E

Handbook of Statistics, Vol. 26 ISSN: 0169-7161 © 2007 Elsevier B.V. All rights reserved DOI: 10.1016/S0169-7161(06)26044-9

Future Challenges to Psychometrics: Validity, Validity, Validity

Neal Kingston

Any realtor will tell you the three most important factors contributing to the value of a property are location, location, and location. Similarly, the three most important factors contributing to the value of a testing program are validity, validity, and validity. Unfortunately, it is relatively easy to develop sophisticated models to help us reduce error of estimation by a few percent, so the vast majority of recent psychometric research has focused on more accurately modeling and assessing the same constructs we have always measured rather than breaking new ground for better assessments. What is needed is more attention to the confluence of psychometrics, cognitive psychology, developmental psychology, learning theory, curriculum, and other theoretical bases of our assessment blueprints. It is not enough to develop parsimonious models – we need models that support appropriate actions. For example, currently most truly diagnostic tests require individual administration – we need strong psychometrics to support group administered tests that not only provide accurate diagnoses, but also provide valid prescription leading to demonstrable educational improvement. Snow and Lohman (1989) presented the implications of cognitive psychology for educational measurement, but there has been little groundswell in the last decade and a half to address the issues they raise. A review of the table of contents of Volume 69 (2004) of Psychometrika showed one pertinent article (De La Torre and Douglas, 2004) out of 34. A similar review of Volume 42 (2005) of the Journal of Educational Measurement showed one pertinent article (Gorin, 2005) out of 17. Susan Embretson and Kikumi Tatsuoka (to name but two) have done important bodies of research in this area, but much more is needed. Steven J. Gould (1989) posited that the Burgess Shale demonstrated much greater diversity of life in the Cambrian Period than exists today and that many families and species of this period were evolutionary dead ends that have no living progeny. Although a controversial theory within the field of evolutionary biology, there is an easier to demonstrate parallel with item types. In the early part of the 20th century there were far more item types used in large scale assessment than are used currently. The need for efficient scoring (originally by hand using stencils but eventually by high speed optical scanning machines) led to an evolutionary dead end for many item types that might effectively measured many constructs of interest. Although there has been a resurgence 1111

1112

N. Kingston

of the use of constructed response items in large-scale assessment in the last decade (including but not limited to direct measures of writing), most tests today look very much like tests from a half century ago. The use of computers allows for the efficient administration and scoring of a large variety of item types – including those that previously were seen as evolutionary dead ends and new item types that are much more dynamic in nature. One noteworthy example is the work of Bejar and Braun (1999) on the simulation of architectural practice and the knotty problems associated with scoring such novel item types. Much work in this area has also taken place in the field of medical testing, for example, Clauser et al. (1997). One challenge we face in all of this stems from the depth and breadth of knowledge required for a quantum advance in the field of educational testing. It is sufficiently challenging for most of us to keep up with one narrow field let alone with many. Collaboration within interdisciplinary teams is another, perhaps more reasonable, approach. Regardless of the challenges, the potential positive impact from advances in these areas is immense.

References Bejar, I.I., Braun, H.I. (1999). Architectural simulations: From research to implementation: Final report to the National Council of Architectural Registration Boards (RM-99-2). Educational Testing Service, Princeton, NJ. Clauser, B.E., Margolis, M.J., Clyman, S.G., Ross, L.P. (1997). Development of automated scoring algorithms for complex performance assessments: A comparison of two approaches. Journal of Educational Measurement 34, 141–161. De La Torre, J., Douglas, J.A. (2004). Higher order latent trait models for cognitive diagnosis. Psychometrika 69 (3), 333–353. Gorin, J.S. (2005). Manipulating processing difficulty of reading comprehension questions: The feasibility of verbal item generation. Journal of Educational Measurement 42 (4), 351–373. Gould, S.J. (1989). Wonderful Life: The Burgess Shale and the Nature of History. Norton and Company, New York. Snow, R.E., Lohman, D.F. (1989). Implications of cognitive psychology for educational measurement. In: Linn, R.L. (Ed.), Educational Measurement. 3rd ed. American Council on Education, Macmillan, New York, pp. 263–331.