Towards new ways of assessing intelligence

Towards new ways of assessing intelligence

INTELLIGENCE 6, 231-240 (1982) EDITORIAL Towards New Ways of Assessing Intelligence* EARL HUNT The University of Washington Whether for good or ev...

624KB Sizes 1 Downloads 89 Views

INTELLIGENCE 6, 231-240 (1982)

EDITORIAL

Towards New Ways of Assessing

Intelligence* EARL HUNT The University of Washington

Whether for good or evil, intelligence testing is a major application of psychological knowledge. Any application of science should be reviewed periodically to ensure that it is based on the best science and makes use of the best technology. This article is such a review. It contains proposals for new intelligence tests. The proposals are based on three developments of the past 15 years: substantial advances in our understanding of cognition, the development of new techniques in technical psychometrics, and the tremendous explosion in the field of electronic gadgetry. The proposals here are different in kind from proposals that my colleagues and I made some 10 years ago (Hunt, Frost, & Lunneborg, 1973). Hunt et al. argued that research on individual differences should be connected to a particular set of ideas about human information processing that were well developed by the late 1960s. We carefully did not propose the development of a "new intelligence test" as an exercise in applied research. The present proposal, by contrast, is an outline of an applied research project that might conceivably be carded out in the near future. Although it has been reported since publication of our earlier paper (and certainly includes the ideas of many people outside our own group!), the proposal is also based on work that has been done in other fields of psychology and psychometrics since about 1970. Two classes of proposals will be made: leaps and steps. The leaps are major changes in the sort of information that might be used to assess a person's mental abilities. Leaps, then, will deal with areas of intellect that are not assessed, or are assessed only nominally, by present day testing techniques. The steps are modifications that could be made in present testing methods, in order to make them more efficient, without markedly changing the content of the psychological functions being evaluated. Both steps and leaps are necessary to progress, but the immediate implications of a step are always much more apparent than the implications of a leap. Given the progress in psychology from 1960 to 1980 there should be both steps and leaps until 2000. Since this metaphor is becoming unstable, it is perhaps best to leave it and continue the argument by an examination of history. *Preparation of this paper was supported by the Office of Naval Research, Contract N00014-80-0631; Earl Hunt and Marcy Lansman, Principal Investigators. I would like to thank Marcy Lansman and Robert Sternberg for their comments on an earlier version of this paper, and for their general comments on this topic.

231

232

HUNT

Any method of assessment outside of witchcraft must be a compromise between three things: the science on which the method is based, the ability of the assessment method to achieve practical success as a measuring device, and the technology available to support the measuring process. Consider the birth of the intelligence test itself. The early anthropometrists believed that mental competence was directly tied to physical characteristics, and devoted considerable effort to developing a technology for measuring brain size. These measures grossly failed the test of practical prediction (Gould, 1981). Galton tried to improve on anthropometric measures by measuring performance that he thought was indicative of the state of the nervous system. This is an example of a step, and it, too, failed. Binet (who had participated in Galton's attempts) then took a leap; he redefined intelligence as a more global capacity that (a) increased as people developed toward adulthood, and (b) manifested itself in broad, complex tasks that the culture defined as cognitive. As is well known, Binet's procedures passed the test of practical prediction. But what happened next? The new tests and their many successors became the definition of intelligence. A very conservative tradition arose. A test is a measure of intelligence to the extent that it correlates with already accepted tests. That tradition needs some careful examination. Conservatism has a great deal to commend it to applied psychology, including education. Over the years practitioners have developed both intuitions and a body of statistical knowledge about the meaning of scores on various widely used tests. Any step intended to improve assessement should maintain contact with this knowledge. This is especially true if new tests that result from the step are to be used to make decisions that affect people's lives. The introduction of adaptive testing techniques provides a good example. In adaptive testing items are presented under real time computer control. The items are selected on the basis of re, sponses to previous items, and an attempt is made to locate, quickly, a person's maximum level of competence on the psychological function being tested. An adaptive test differs from a conventional test in that the items presented to a person are chosen during the testing session itself, rather than being chosen at some time prior to the examination. (In this sense, individual tests, such as are used in clinical practice, have an adaptive component.) However, the same estimate of ability ought to be obtained by an adaptive and a conventional test. Although the conservative strategy of defining intelligence by the present day test is frequently applied in scientific research as well as in applied psychology, it has less to recommend it there. Consider, for example, the typical challenge to many of the newer approaches to ~valuating individual mental c o m p e t e n c e . . . "The correlation with the intelligence test was only .3." Such a statement is clearly an adoption of Boring's (1923) amazing statement that "intelligence is what the test measures," and begs the point that the new evaluation method may have excellent theoretical justification as an important measure of mental capacity quite apart from its correlation with any test now in use?The purpose of a leap in the study of individual differences is to expand the definition of individual mental

TOWARDS NEW WAYS OF ASSESSINGINTELLIGENCE

233

competence. Indeed, if a leap produced a test that had a correlation of I with previous tests it would be a leap straight into the air. One does not cover ground that way. How should research intended to expand our ideas of intelligence be judged? The research must be successful in locating dimensions of individual variation that are interpretable within the context of a theory about cognition. The psychometric criteria of reliability and internal consistency are always applicable. Once such dimensions have been demonstrated, it is appropriate to ask whether the technology exists for incorporating the measures into our assessment procedures, and if it does, asking what the practical implications of the incorporation would be. These are the issues to be addressed. THE NEW DEVELOPMENTS What new developments make expansion of assessment possible? It will hardly be a surprise that one of them is the development of microcomputers. From the viewpoint of the examinee, a standard station in the 1990s should consist of a high resolution color display (considerably better than today's television screens), a keyboard, and some form of analog motor response device, such as the joystick. This is the sort of station that would be used to test children from age 10 upwards, and would certainly be expected by examinees participating in large scale screening exercises, such as the Scholastic Aptitude Test (SAT) or the Armed Services Vocational Aptitude Battery (ASVAP). Given the role of the Atari Company in our society, it seems reasonable to assume that the average 14-year-old will be able to manipulate such a terminal with a minimum of instruction. More advanced testing stations will probably be available for use in individual testing. They may amplify on the basic testing station in three ways: by the attachment of physiological measuring devices to the examinee, by providing a lightweight cap to monitor eye movements and (perhaps) pupil dilation, and by providing more complex response apparatus. Some form of tracking devices is an obvious example. In specialized situations a person might be asked to manipulate several implements to solve a problem (e.g., put together something akin to a toy truck) while the computer monitored the process. As such stations will certainly be specialized for particular situations, little will be said about them here. Each testing station will be part of network consisting of similar stations and a very large mass storage device. Networking is important because the computer will be used to tailor the test to the examinee and the purpose of the examination. To do this computer must have access to a data base defining potential questions and testing procedures. None of these ideas are "blue sky:" the required technology either exists or could be made to exist today. The problem is strictly one of bringing the equipment down to a reasonable cost . . say $500 in 1982 dollars. This will be easy. There is a harder question. What would we do with the equipment?

234

HUNT ANTICIPATED STEPS IN TESTING

During the last 10 years several psychological tests presently in use have been subjected to intensive analysis. This research, which is generally referred to as "componential analysis," was pioneered by Sternberg (1977-a,b; 1980), although others have been involved (Pellegrino & Glaser, 1979). Componential analysis isolates the information processing steps required to solve various types of problems. There are basically two ways to do this. One is to present problems in stages, and to observe the time required by the examinee in each phase. (This was the original method used by Sternberg.) An alternative method is to select problems that stress one or the other component stage. Pellegrino, Cantoni, and Solter's (1981) modification of the Minnesota Form Board test provides an excellent example. In this test examinees must fit geometric patterns together in a specified configuration. It is a sort of simplified jigsaw puzzle. By varying the location of patterns in the initial configuration, Pellegrino et al. manipulated the amount of movement "in the mind's e y e " that was required to place a partiuclar pattern into its final slot. For instance, a pattern could be placed upside down relative to its orientation in the final configuration. The complexity of the pieces could be manipulated independently of their orientation. By analyzing errors and reaction times on individual items, Pellegrino et al. could obtain a picture of the difficulty that a person had with each component of the spatial task. Thus a test such as that developed by Pellegrino et al. is simply a better examination of "spatial ability" than such stalwarts as the original Form Board test or the Primary Mental Abilities "spatial" test. Componential analyses have been conducted of tests of logical reasoning, inductive reasoning, and spatial visualization. This work was motivated by a concern for individual differences. There is also a very large body of research that could be used to construct componential analyses of other types of tests, although the original research was not motivated by a concern for individual differences. Detailed models of the process of story comprehension have been developed by Kintsch and his co-workers (see, especially, Kintsch & Van Dijk, 1978). These models could be used to construct paragraph comprehension tests in which individual items tested different aspects of the comprehension process. Vocabulary tests could be constructed ko evaluate a person's ability to respond to specific dimensions of word use, as in Gentner's (1975) analysis of the use of verbs of transfer by young children. Models of arithmetical reasoning (Brown & Burton, 1978; Hitch, 1978) could be used to construct tests that go beyond testing the level of a person's skill to testing the manner in which the skills are used. There exist today psychological models of complex processes that,could be used to improve virtually every scale in the present battery of intelligence tests. The only advantage that the present tests have over such improved tests is the existence of "normative data," much of which is in need of updating anyway. Why not conduct the

TOWARDS NEW WAYS OF ASSESSINGINTELLIGENCE

235

updating with better tests? There is really no reason why the step of redesigning items, within the framework of the present tests, should not begin immediately. Computer controlled testing will make new item formats possible. Testing of spatial-visual reasoning can be expanded to include reasoning about moving objects. This is important because many of the performances that we wish to predict by use of these tests are situations in which movement is involved. A somewhat more ambitious possibility involves reasoning about the large scale environment. Given a terminal with a color display, it would be possible to give a person a brief visual "ride" through an imaginary town, and then ask questions about the location of items in it. Verbal tests can be extended to include testing of comprehension of speech. This would give the examiner an opportunity to evaluate discrepancies between language comprehension in the listening and reading modes. While such discrepancies are typically small in normal populations, there are clinically important situations in which either spoken comprehension exceeds written comprehension (as in dyslexia), or the reverse, as may occur in old age (Cohen, 1981). Reaction time analysis can be combined with selective presentation of information in order to reveal a good deal about the strategy a person is using on a particular problem. The best examples are found when information is presented bit by bit, and the examinee has to adjust the information as it arrives. Several studies, in very different fields, have noted that one of the characteristics of good problem solvers is that they take a substantial amount of time to deal with the information that defines a problem, and then proceed to the solution, where poorer problem solvers move through the initial information quickly, in order to examine the alternative answers available (Bloom & Broder, 1950; Dillon & Stevenson-Hicks, 1981; Sternberg, 1979). Given ingenuity, touch sensitive screens that make it easy for examinees to ask for hints and partial information, and reaction time, it should be possible to design tests that tell us how, as well as how well, people respond to the problems now presented on intelligence tests. Although the applied phase of research on all these topics can begin now, the difficulty of that phase should not be underestimated. Our present-day tests have been subjected to a sort of Darwinian selection that has ensured their psychometric properties. In particular, a great deal of care has been taken to ensure that reference tests that are supposed to be markers for a factor are reliable and unidimensional (Ekstrom, French, & Harman,. 1979). Psychometric unidimensionality does not ensure that only a single psychological function is being tested, but it does ensure that, at least, all items on a test are being approached by use of a strategy that emphasizes each of the underlying functions in roughly the same way (Whitely, 1980). This is an important point. If a test is to be used to make a statement about people's relative standing on some underlying dimension of ability, there must be some assurance that the test does indeed measure the same function in all people. Regardless of the theoretical purity of new tests, the prosaic but nec-

236

HUNT

essary studies must be done to ensure that they do meet appropriate psychometric criteria for internal consistency. LEAPS IN SOME DIRECTIONS Suppose that all steps were to be taken to improve present assessment procedures as much as possible. What expansions o f content areas would be advisable? Listing possible new measures could degenerate into a researcher's Christmas list. Only three extensions will be presented here. No doubt others could make a good case for different extensions. That is part of my point. In order to break out of the mold of thinking of intelligence as what the test measures, numerous expansions of testing should be considered. The three areas to be described now are my particular choices. I hope others will be proposed and debated. Computerized testing brings the technology of experimental psychology into psychometrics. The last 20 years have seen a gret deal of expansion in this field, under the general rubric of "Human Information Processing." This could mean almost anything; a somewhat more descriptive rubric is "real time information processing." This term stresses the study of those psychological functions involved in making rapid decisions in situations where the environment is very demanding. The prototype example of a task with a high real time demand is the Choice Reaction Time (CRT) task, in which a person must choose one of n re° sponds literally in a split second. The real time demands of such tasks are substantially greater than those of even a timed psychometric test, where the examinee can control at least the extent of pauses between items. On the other hand, the psychometric test has more real time demands than, say, doing homework or preparing a report. Since all our thinking ultimately has to be done in real time, some real time information processing abilities must underlie all intellectual functioning, including the taking of intelligence tests. Therefore, individual abilities in the execution of such real time processes as word identification or elementary choice making should be correlated with test scores. Indeed they are, but the correlations are in the .3 to .4 range, only slightly higher than the correlations found between psychometric test scores and measures of "real world" performance outside of the academic environment (Jenclf,set al., 1979). Elsewhere I have argued that real time processing abilities should be regarded as supporting functions for complex thought. That is, a person must have certain facilities for symbol identification, short term memory, choice behavior, and the like in order to do well on complex tests, but that these abilities are usually not the limiting factors on test perform° ance (Hunt, 1980). Thus, there is little purpose in tlsing the cumbersome RT measures as substitutes for present tests, although in specialized cases they may serve as useful adjuncts. (See, for instance, Cohen's [ 1979, 1981 ] analyses of how deterioration of underlying information processing capacities may affect verbal comprehension in old age.)

TOWARDS NEW WAYS OF ASSESSINGINTELLIGENCE

237

What is far more interesting is the us6 of the technology of experimental psychology to determine a person's ability to deal with situations involving information overload. The abilities involved have variously been called "time sharing," the ability to "split attention," or the ability to "concentrate." While the picture is by no means clear, there is enough information to indicate that individual differences in the ability to control attention can be measured. There also are bits and pieces of information in the literature that indicate that such abilities may be important in extra-laboratory decision making that involves time pressure (Cooper & Regan, 1982). The usual examples cited are driving cars, flying airplanes, or acting as an air traffic controller. To indicate how these abilities might be tested in a psychometric situation, I shall describe, briefly, an as yet unpublished study in our own laboratory (Poltrock, Lansman, & Hunt, 1981). Participants had to detect the occurrence of one of a preselected set of signals, e.g., by pressing a button whenever the letter " a " was flashed on a computer screen. The detection task could be done under one of three conditions. I n the "single" condition, the participant simply watched the screen as letters appeared on it, and indicated when a target letter appeared. In the "split attention" condition, two letters appeared, one on each side of the screen. The subject now had to scan two possible information channels for the appearance of a target. In the "selective attention" condition the subject had to watch for the appearance of a letter in one channel, while disregarding letters in the other channel. Thus the experiment, as described so far, contained a base line condition, a condition in which attention had to be split over two channels, and a condition in which attention had to be concentrated on one channel while information on another channel was shut out. The entire experimental design was replicated in the auditory mode; words were presented either to one or both ears, and the subject had to listen for targets under the base line, divided attention, or selective attention conditions. Performance was very well described by model that contained two factors-auditory and visual attention--with identical loadings on each factor for each of the three tasks in the appropriate mode. The factors themselves were correlated rather highly (.6), but a single factor model of attentional resources could be rejected decisively. Evidently there is no " a " analogous to Spearman's " g , " but there may be an " h " (for hearing), and an "1" (for looking). To what extent does the pattern of performance in these simple attentional tasks predict performance in more complex situations? We do not know. What we do know, though, is that the control of attention is part of any reasonable definition of intelligence, and that the new technology makes possible more complex evaluations of attentional control than were possible using either the paper and pencil format or the human interaction characteristic of traditional intelligence tests. What we must do is learn how to use the technology to advantage. In our studies, response production was almost trivial. All the participant had to do was press a button. Similarly trivial responses are required in conventional psychometric testing. The advent of microcomputer controlled testing terminals

238

HUNT

makes possible the evaluation of individual differences in the ability to think while making complex motor responses. This is a virtually untapped area of research. Traditionally, the making of responses has been considered "motor control," an area almost remote from "intelligence." It is not at all clear why this should be so. A complete evaluation of mental competence ought to consider both pure mental performance, the ability to make detailed responses, and the interaction between the two. Many tasks that ought to require intelligence--such as driving an automobile--involve precisely such an interaction. Finally, let us consider a very different expansion of mental assessment. An implicit assumption of most modem testing is that people who think well, or poorly, in one situation will perform similarly in another. To some extent this must be true, or the tests would not have any predictive value. On the other hand, there is ample evidence that one of the most frequent causes of failure in problem solving is the failure to see a particular .situation as one in which some form of reasoning is appropriate. The situation is epitomized by the well known finding that people may demonstrate syllogistic reasoning about familiar material, yet fail to solve analogous problems, presented in abstract notation (Wason & JohnsonLaird, 1972). Cross cultural research has shown that this is quite a widespread finding. It is perhaps a consequence of our own cultural setting, as Western academics, that we tend to think of "thinking" as a set of abstract algorithms that can be specialized for different content areas. In fac~, the more natural human mode may be to regard thinking as a set of mental skills that are tied to specific content areas. This idea is certainly not new to psychometricians. It is prominent in Guilford's (1967) structure of intellect theory, and in his criticism of studies that assume that reasoning is a relatively content free process (Guilford, 1982). An alternative to arguing either for or against the generality of "reasoning abilities" is to regard the tendency to apply abstract reasoning to problems as itself a dimension of intellectual performance. Intuitively, people seem to vary in the extent to which they think of mental skills as general algorithms or as procedures appropriate in a limited context. Can this intuition be verified, and can the ability be measured? Somewhat surprisingly, the same principle applies to the use of knowledge. More than 30 years ago, Bloom and Broder (1950) observed that poor college students would give up on a question if they did not know the answer directly, while better students would reorganize their knowledge in order to answer the question. The difference between the good and poor students was not so much in the knowledge they had but in the way that they used it. Similar findings have been observed by Potts et al. (1981) in a more controlled setting. Given modern programming techniques, it should be possible to construct a very large set of analogically similar problems, drawn from different content fields. A person could be asked to indicate fields in which he or she felt competent, be tested in those fields, and then progressively be retested in unfamiliar content areas. The testing should not be limited to determining the extent to which a person either has knowledge of or reasons about a given content area. After having noted a deficiency, the testing program should provide a variety of hints and

TOWARDS NEW WAYS OF ASSESSINGINTELLIGENCE

239

analogies, in order to evaluate the examinee's ability to acquire new ways of approaching problems. Certainly flexibility of thinking should be a part of our definition of intelligence.

THE PROBLEMS THAT REMAIN After taking steps and leaps in new directions, a useful precaution is to ask yourself what you have stepped over and leaped around. Put another way, are there any areas of intellectual endeavor that are not tapped by present tests, and that are not likely to be tapped by the tests that the new technology will make possible? The new technology is suited for the examination of intellectual functioning that is more time bound than the present test. What about those functions that are less time bound? Neither the old nor the new tests would tap individual differences in the ability to organize a coherent attack on problems that may extend over hours, days, or even months. Such differences may be extremely important. Darwin spent 20 years mulling over the results of the voyage of the Beagle. Outside of academic life, how often does a person have to be bright "within the next h o u r " ? We tend to lump individual differences in long term intellectual endeavor under the catchall category "motivation." This simply will not do. Personal accounts and histories indicate that the willingness to expose oneself to ideas, and the ability to see analogies in situations that are, on the surface, dissimilar, are very important parts of intelligence. It is not at all clear that these traits can be evaluated in any one or two hour interview. Computers are not the answer to this problem. In spite of this reservation, objective, standardized cognitive assessment procedures can both be improved and expanded by combining computerization with the considerable recent progress in cognitive psychology. Minor improvements, steps, are obvious. Major expansions, leaps, are also possible. Instead of describing a test battery, I have tried to describe a test " a r m o r y , " an approach to testing that could be tailored both to the person and the situation. Constructing the necessary tests and determining their psychometric properties would be a substantial, but manageable research project, probably requiring a coordinated effort by several laboratories. However, this is a technological problem, rather than a scientific one. It would be possible to plan such a project today, and to realize it within five years. All that is required is the commitment of time and money.

REFERENCES Bloom, B., & Broder, L. The problem solving processes of college students. Chicago, IL: U. Chicago Press, 1950. Boring, E. G. Intelligence as the tests test it. The New Republic, June 6, 1923, 35-37. Brown, J. S. & Burton, R. R. Diagnosticmodels for procedural bugs in basic mathematical skills. Cognitive Science, 1978, 2, 155-192.

240

HUNT

Cohen, G. Language comprehension in old age. Cognitive Psychology, 1979, 11,412-429. Cohen, G. Inferential reasoning in old age. Cognition, 1981, 9, 59-72. Dillon, R. F., & Stevenson---Hicks, R. Effects of item difficulty and method of test administration on eye scan patterns during analogical reasoning. Technical Report # 1. Contract N66001-80-CO467 (Office of Naval Research). Dept. of Psychology, Southern Illinois University, 1981. Eckstrom, R. B., French, J. W., & Harman, H. H. Cognitive Factors: Their identification and replication. Multivariate Behavioral Research Monographs, 79-2, 1-84. Gentner, D. Evidence for the psychological reality of semantic components. The verbs of possession. In D. A. Norman, and D. E. Rumelhart (Eds.), Explorations in Cognition, San Francisco, CA: Freeman, 1975. Gould, S. The mismeasure of man. New York: Norton, 1981. Guilford, J. P. The Nature of Human Intelligence. New York: McGraw-Hill, 1967. Guilford, J. P. Cognitive Psychology's Ambiguities: Some Suggested Remedies. Psychological Review, 1982, 89, 48-59. Hitch, G. The role of short term working memory in mental arithmetic. Cognitive Psychology, 1978, 10, 302-323. Hunt, E. Intelligence as an information processing concept. British Psychology, 1980, 71,449--474. Hunt, E., Frost, N., & Lunneborg, C. E. Individual differences in cognition. A new approach to intelligence, in G. Bower (Ed.), Advances in Learning and Motivation, vol. 7. New York: Academic Press, 1973. Jencks, C. et al. Who Gets Ahead?: The determinants of economic success in America. New York: Basic Books, 1979. Kintsch, W., & Van Dijk, T.-A. Towards a model of text comprehension. Psychological Review, 1978, 85, 363-394. Pellegrino, J. W., Cantoni, W., & Solter, A. Speed, accuracy, and strategy differences in spatial processing. Proceedings of the Psychonomic Society Meetings, 1981, Paper 124. Poltrock, S. E., Lansman, M., & Hunt. E. Automatic and Controlled Processes in Auditory Detection. Technical Report # 9. Contract N00014-80-~31 (Office of Naval Research). Univ. of Washington, Psychology Department, 1981. Potts, G. R., Keeler, R. A., & Rovley, C.J. Factors affecting the use of world knowiedge to complete a linear ordering task. J. Experimental Psychology: Human Learning and Memory, 1981, 7, 254-256. Sternberg, R. J. Component processes in analogical reasoning. Psychological Review, 1977, 84, 353-378. (a) Sternberg, R.J. Intelligence, information processing, and analogical reasoning: The componential analysis of human abilities. Hillsdale, NJ: Lawrence Erlbaum Associates, 1977. (b) Sternberg, R. J. The nature of mental abilities. American Psychologist, 1979, 34, 214-230. Sternberg, R. J. Sketch of a componential subtheory of human intelligence. Behavioral and Brain Sciences, 1980, 3, 573-584. Wason, P. C., & Johnson-Laird, P. N. Psychology of Reasoning: Structure and Content. London: Batsford, 1972. Whitely, S. E. Latent trait models in the study of intelligence. Intelligence, 1980, 4, 97-132.