December 1969 T h e J o u r n a l of P E D I A T R I C S
969
Some pitfalls in interpretation of the IQ. It is often assumed that I.Q. scores are precise measures of intelligence that remain stable over time unless dramatic interventions occur, and that they afford accurate prediction of other behavior. On the basis of a review of some of the literature and the nature of psychological measurement and clinical experience, more tenable interpretations of such data are offered. I.Q. determinations are limited in their usefulness in predicting individual behavior. Interpretations and recommendations are best couched in terms of probability of success and should be founded on more than one score alone.
Ira M. Steisel, Ph.D. PHILADELPHIA, PA.
T H RE E frequent reasons why a pediatrician or psychiatrist seeks to know a child's intelligence quotient (I.Q.) are: (1) to aid in making a differential diagnosis; (2) as a basis for formulating a plan of management, and (3) as an assessment of progress following treatment. For example, it is quite reasonable to ask whether the poor performance of a child in school is the result of limited intellect or is the result of any of a n u m b e r of other psychological or physical reasons. W h e n the clinical findings include or suggest mental retardation, determination of the I.Q. helps to establish the child's functioning
From the St. Christopher's Hospital for Children and the Department of Psychiatry (Child Psychology), School of Medicine, Temple University. Supported in part by Grant No. HD 01437 from the National Institute of Child Health and Human Development, United States Public Health Service. Present address: Psychological Clinic, Rutgers University, New Brunswick, N. 1. 08903.
level. T h e I.Q. often is helpful in the formulation of a plan of m a n a g e m e n t for the child. When alternate possibilitie s of treatment are available, such as institutional placement, special tutoring, and the like, a measure of a child's intellectual abilities m a y be a deciding factor in choosing one or the other. Finally, it it c o m m o n to use the I.Q. as one index of the effectiveness of a therapeutic regimen, e.g., the effects of the low-phenylalanine diet in the treatment of phenylketonuria. 1-3 Although such uses of I.Q. scores are appropriate, a n u m b e r of misconceptions about these values are c o m m o n and must be dispelled if the data are to be utilized and interpreted reasonably. Intelligence is commonly regarded as a fixed quantity, and the I.Q. is often thought to be a direct measure or representation of this amount. Changes in scores over a period of time are believed to signify solely intellectual gains or losses that the child has made. Both precision and universality are attributed to the numerical valVol. 75, No. 6, part 1, pp. 969-976
970
Steisel
ues, so that a child with an I.Q. of 110 on one test is not only thought to be brighter than a child with an I.Q. of 100 on the same test, but is expected to achieve the identical value on a different test. This paper will review some of the published literature, cite clinical and research reasons why these assumptions are dubious or incorrezt, indicate more tenable interpretations that may be made of I.Q. values, and make some recommendations for the referring physician. The comments made here apply both to tests administered to a number of people simultaneously (group tests) and to those in which the examiner and child are in a oneto-one situation. Group tests present additional complications: If the child is to read the instructions, he will be penalized if he has a specific reading disability; the social situation may in some children have a debilitating effect, while in other children it may t~e facilitating; and, observations o[ the child's behavior, which m a y be of clinical significance, are not easily made. As Stott and BalP point out, psychologists believe that intelligence results from an interplay between hereditary and environmental factors. Although some psychologists are inclined to emphasize genetic variables as having major significance and others emphasize environmentM ones, the consensus is that the permanent or temporary change of either of these groups of variables would result in the alteration of the intelligence of an individual. Basically, one can conceive of an intelligence test as an instrument that samples some behavior of the child and from which one or more numerical values can be derived. T h e obtained scores reflect how the particular person compares with those of the group on which the test was standardized, On the basis of known or presumed relationships between such data and other different measures, a diagnosis may be corroborated and some prediction as to future behavior m a y be made. Thus, as long as it it known that children who earn a score below 30 on a certain test will not be able to care for their personal needs, acquire much in the~way of
The Journal o[ Pediatrics December 1969
vocabulary, etc., not only can one make a prediction that a particular child earning such a score will be limited in these spheres, but impressions gleaned from other sources (past history, physical examination, etc.) will be confirmed. A distinction must be made, however, between the score that a child achieves on a test (usually quoted as an I.Q.) and the intellectual resources that are inferred from it. The obtained value is certainly determined in part from the "basic stuff" of a child (his intelligence) and one can generalize from the result to other situations; yet, there are a host of other known and, undoubtedly, unknown variables that determine what the numerical values will be. Cognizance must be taken of such variables in the interpretation and utilization of results because they m a y modify or limit the inferences that are to be made.
EXTRA-INTELLECTUAL FACTORS For expository purposes, one may categorize the extra-intellectual factors that influence test scores into three groups: (1) variables in the person; (2) differences among tests of intelligence; and (3) mathematical (statistical) properties of the tests. Variables in the person. These m a y be either permanent or temporary in nature. Sensory defects, particularly visual and auditory, as well as neural ones m a y adversely affect the obtained I.Q. or certain abilities? A person whose sensory or neural apparatus is not intact would have difficulty not only in comprehending and/or responding to certain tasks but also has been denied exposure to appropriate experiences that are prerequisite for success on some of the items. Such accentuating factors may be most crucial in those instances in which the defect is a minimal one a n d / o r can be hidden from the examiner. For example, an individual with a high-frequency hearing loss may be particularly adept at keeping this information from others; yet, his performance is bound to suffer. Cultural and subcultural variations, as well as regional and sex differences, are also found. T h e work of Lesser and associates ~
Volume 75 Number 6, part 1
has shown that abilities can vary as a function of sex, ethnic, and social class differences. Anastasi T's cites a number of examples in which cultural and subcultural practices can alter (either in an upward or downward direction) the patterns of skills tapped by various tests in current use. The data of Kennedy and associates 9 show that Negroes from the southeastern part of this country obtain sizeably lower scores on the StanfordBinet (Form L-M) than those of the normative (white) group. Wechsler ~~ ~x intentionally excluded Negroes from his normative groups, as have other test constructors2 In addition, an item on the Wechsler adult scale, standardized primarily on residents of New York City, is particularly difficult for individuals from that part of the country: identifying the tail as the part that is missing from a picture of a pig. Yet, otherwise seemingly intellectually limited individuals in rural Iowa readily recognize that " T h e hog ain't got no tail." Those who have had experience with the disorganized and clistant psychotic patient, the hyperactive-inattentive child, or an indecisive, ruminative one recognize that obtaining an accurate or precise measure can be a formidable task. Furthermore, Goldfarb ~-~ and Spitz ~3 have shown that children who early in their lives have been deprived of adequate parenting frequently have impairment of their subsequent achievement. Recent research ~4 has indicated that individuals m a y be differentiated according to what are termed "cognitive styles," which are fairly stable preferences that people have for organizing perceptions and categorizing concepts. They are qualitatively different from each other, and although uncorrelated to level of intelligence, they do enter into the solution of problems. Yet, qualitative differences in intellect such as these and others will not be reflected in I.Q. scores. Included under temporary variables would be those specific to the testing situations or items on the test. The taking of a test frequently stimulates anxiety or other adverse emotional reactions? 5 The coming to or being in a hospital and the seeing of a doctor,
Interpretation o[ I.Q.
9 71
especially in those instances when the child has been poorly prepared or totally unprepared, may have deleterious effects on the child's performance. Also, there undoubtedly are emotional reactions that are specific to certain examiners. Different testers relate to patients in different ways and bring out a variety of reactions: Some may elicit fear or anger while others may produce enthusiasm and a desire to do well. A Negro child who has had little contact with white adults, except as authoritarian and punitive individuals who represent "the foe," might very well remain distant and uncommunicative in the presence of a white tester, and his performance will suffer. 16 Specific test items may arouse anxiety and blocking. A number of years ago, a patient told of being tested when 10 years of age by a young woman who was a friend of the family. Asked to enumerate as m a n y different words as he could in one minute, he found that what he thought of were "dirty" words which he dared not mention and, therefore, felt compelled to discard. As a result of the interference of these unacceptable thoughts, which forced him into silence, the boy failed the item. The physical well-being of a child at the time of testing is crucial. If he is fatigued, reacting to drugs, out of contact by virtue of petit nlal attacks, etc., his scores may very well be influenced. A child who is examined while hospitalized is coping with tile problems brought about by his illness, the separation from his family, the medications he has received as well as the various procedures carried out during his stay; these probably will affect his performance on a test of this type. Although individuals may be aware that negative influences on I.Q. scores may be brought about by such situational factors, it is not readily recognized that tile converse can also occur, i.e., that there are instances in which inflated scores may appear. From the standpoint of psychological measurement, a person's true score is defined as the average of the scores he would receive on a
972
StelseI
large (or infinite) n u m b e r of repeated examinations with a particular test. T h e obtained values, it is assumed, would distribute themselves in a normal fashion a r o u n d this mean. Half of the time, then, his earned scores would be above and the remainder of the time, they would be below this average. Differences a m o n g tests. Intelligence tests vary in a number of dimensions so that it would be remarkable, indeed, if the identical numerical value were obtained on two of them. T h e y differ in the skills required to perform as well as the mode of responding.* T h e Peabody Picture V o c a b u l a r y Test 2~ requires nothing more of a child than that he signify which of several pictures conveys the meaning of a word. T h e StanfordBinet tests, 21 largely requiring verbal replies, contain a variety of items tapping verbal comprehension, memory, vocabulary, etc. T h e Wechsler Intelligence Scale for Children, 11 on the other hand, includes not only questions that assess verbal skills, but also tasks that involve manual abilities. Tests vary in the b r e a d t h of the skills tapped, the age of the groups to which they are applicable, the range of possible scores, the population on which they are standardized, and the degree to which they fulfill the various statistical criteria that test constructors have set forth as the hallmarks of a welldesigned test.22Thus, the results obtained on one cannot be thought to be equivalent to that derived from another. I t is assumed that when a person is given an intelligence test he derives from the same population as those on w h o m the test was standardized; this is the reference group with which his performance will be compared. Users of tests and those responsible for decisions based on the data may lose sight of this restriction, which at times m a y lead *There are psychologists who assume that intelligence is a general characteristic, i.e., the same skill influences performance on tests irrespective of the task at hand. However, Spearman, ~7 the father of this viewpoint, also wrote of special abilities which may be unrelated to the general factor as well as to one another. O t h e r workers, such as Thurstone ~s and Guilford,TM are inclined to enumerate a variety of special abilities not related to each other. G u i l f o r d also includes qualitative aspects which, Qf course, are not revealed when only a numerical value is quoted.
The Journal of Pediatrics December 1969
to ludicrous situations. A brief screening test that is routinely used in a n u m b e r of institutions is one that was devised by Raven3 a Examination of the test m a n u a l reveals that it was standardized on some 600 children in Scotland; to the best of my knowledge, no norms based on children this side of the Atlantic have been published. It m a y be an interesting academic exercise to compare a child from the ghetto with Scottish children, but the significance for his academic career is certainly unclear. I t is not commonly k n o w n that the range of I.Q.'s on some tests is comparatively narrow, while for others it is rather broad. Thus for the Wechsler Intelligence Scale for Children, the talent t h a t is measured varies from a score of approximately 45 to 155.11 At the same time, the I.Q.'s on the Stanford-Binet range from 30 to 170. 21 T h e r e are clearly, then, a n u m b e r of instances in which a child administered these two tests will not be able to achieve the identical value on both of them. Another artifact of the Wechsler Intelligence Scale for Children raises additional questions regarding its usefulness as a discriminating instrument at certain times. It is possible for a child either to respond to no questions on this test or to respond incorrectly to all problems posed and yet achieve some credit. Thus, a child between the ages of 5-0 and 5-3 who obtains zero credit on each of 10 subtests will, nonetheless, earn the following I.Q.'s: verbal, 57; performance, 54; and full scale, 51. T h e structure of the Stanford-Binet test m a y not only make it impossible to compare children varying in age or ability level in terms of specific functions (perceptual-motor, memory, etc.), b u t also makes it difficult to compare the same child's performance on two separate occasions; they m a y not have been tested with the same items. O n this test, one of the criteria used for placing an item at a particular age level was that (usually) 50 per cent of the children of that age in the standardization group had correctly answered it. Some of the same items are repeated throughout the test, but m a n y are not. Thus,
Volume 75
Interpretation o[ I.Q.
Number 6, #art 1
data necessary to evaluate losses or gains specific intellectual functions would not available although they might very well desirable. I n contrast, a test like that Wechsler permits such comparisons.
in be be of
Mathematical (statistical) properties. Psychological measurement is, by and large, still gross. I n the main, two characteristics are lacking which would make exact mathematical statements possible: an absolute zero point and equality of units (i.e., the distance between two adjacent values are the same at any part of the scale). One can say that 40 inches is twice the length of 20 inches because, in the case of such measurement, there is a meaningful zero point and there is a constant distance between any two neighboring points on the scale. T h e difference between 39 inches and 40 inches is identical to that between 1 and 2 inches. Such is not the case with intelligence and, therefore, one cannot say how m u c h brighter one person is than another on the basis of their respective I.Q.'s. Statements on the order of "A is more intelligent than B," however, are permissible and correct. 24, 25 Of course, measurement of various characteristics is subject to errors as the result of the effects of irrelevant chance factors. This is also true of intelligence test results, although frequently they are reported and quoted as though variation were the exception rather than the rule. If the error is small, this will not m a t t e r in most instances. But if the obtained I.Q. is utilized to decide whether a child does receive special treatment or is excluded, then it is of significance. Admission to special classes is often based on some specific I.Q., and state laws indicate a precise value that cannot be exceeded if a child is to be admitted to an institution for retardates. If an intelligence test is a good predictor of some other measure, the correlation between the two measures (e.g., one intelligence test versus another such test) will approach .80. Some limitations should be kept in mind, however. A correlation coefficent that is as large as this does signify that, 5or the group, those who did achieve a high
9 73
score on one of the measures will also tend to do well on the other measure. T h e relationship for the group is well founded, but it is less certain for a n y particular individual. Also, a correlation between two measures on the order of 0.80 indicates a c o m m o n core between the two measures for 64 per cent of the cases. * T h e r e remain 36 per cent of the subjects whose scores cannot be accounted for by this relationship. I n addition, correlations between two measures do not specify w h a t the score on either one is. Thus, there m a y be a constant difference between the two measures for each child (with a higher I.Q. on one of them) and yet the correlation will be unaffected. This leads to the distinct possiblity t h a t although the ranking of a group of children on two separate tests will be somewhat comparable, there may be a systematic difference in the scores. T h e mere fact of having had a particular intelligence test previously may alter the scores that are earned the second time. 26-2~ 127 have found not only that various parts of an intelligence test are affected differently in a retest situation, but also that greater increases occur when the prior evaluation is relatively recent. Parents or referring physicians m a y not know of or inform the examiner of such prior experience; indeed, they may desire an "objective" evaluation when they are u n h a p p y and angry at results obtained previously which have led to a recommendation unacceptable to them. Another artifact of repeated measurement which handicaps prediction is the phenomenon of regression to the mean: Those at the extremes of a distribution of scores will not maintain their position on re-evaluation, but rather will tend to converge towards the average of the new distribution. O n retesting, the extremely low-scoring children will increase their I.Q.'s and those scoring quite high will decrease as both approach the mean of the new distribution of I.Q. values. The questions of stability of the I.Q. score and the relation between different tests of in~This value is calculated by multiplying I00 times the square of the correlation coefficient, and is termed the coefficient of determination.
974
Steisel
telligence were investigated some 20 years ago by Honzik and associates. 29 A group of 252 children was retested over a 16 year period at specified age levels (from age 2 to 18). Their data indicated that the mean I.Q.'s varied comparatively slightly (from a low of 118.2 to a high of 123.2). At the same time, 9 per cent of the group had changes of 30 or more points, over half (58 per cent) of the group had changes of 15 or more I.Q. points, and 35 per cent had a variation of as m u c h as 20 I.Q. points during the school years. Some of the individuals had consistent u p w a r d or downward trends over this period of time with changes reaching as m u c h as 50 I.Q. points. Essentially the same results have been reported more recently by Sontag and associates, a~ who evaluated 140 children from ages 2 8 9 to 12 using the Stanford-Binet tests throughout. Their data corroborated the instability of I.Q. scores of children over a number of years and emphasized the uniqueness of the intelligence curves over time for their various subjects. These curves were highly idiosyncratic; no typical curve could be identified for the group, and any a t t e m p t to predict what a child would earn at a second point on the basis of an earlier score was hazardous. A more subtle problem also arises from making and conveying impressions from limited I.Q. test data, as demonstrated by recent research? 1, 32 In the study of Rosenthal and Jacobson, al teachers were given the names of children whose better academic potential was presumably latent and inferred from prior testing. Such identification, in actuality, was implanted by the experimenters to determine the effects of such "knowledge" on subsequent achievement. T h e children so specified did, indeed, demonstrate significantly greater I.Q. increases on retesting than did their matched controls about w h o m nothing had been conveyed to the teachers. T h e inference that as a result of an induced expectation of greater brightness, teachers treat children differently and thereby foster better learning has been confirmed in a recent report of Beez. 3~ If teachers
The Journal o[ Pediatrics December 1969
expect better learning, they lay down the conditions for academic improvement (including that on intelligence tests) and the children respond. From these studies, one might expect that, within limits, teachers' and parents' behavior will be modified by their expectation of brightness or dullness in their children. I n turn, they stimulate (or inhibit) better or poorer performance in those with whom they are in contact. CONCLUSIONS AND RECOMMENDATIONS
W h a t has been indicated thus far is that the score on an intelligence test is influenced by a n u m b e r of variables. An individual may show losses or gains over time but these may not reflect any change in the assumed underlying ability. Test results are imprecise; the identical value need not appear on repeated use of the same test and is less likely to result if different tests are used. Individual prediction is an uncertain undertaking because of the instability of the scores as well as the overlapping of distributions. Estimates concerning the fate of groups are m u c h more feasible. One can use such scores and speak with some assurance regarding the probability of success in school performance and, to a lesser extent, in some occupations2 Yet, Anastasi 8 has pointed out that the prediction from the Stanford-Binet test is better for the more verbal courses than the less verbal ones and is less applicable to college than to elementary school performance. Intelligence tests do not measure creativity 33 or diligence, work habits, and fortuitous circumstances, 34 all of which enter into success in various occupations and in school performance. Most assuredly one's achievement in "life" cannot be ascertained from an intelligence quotient. Certain recommendations emerge as the result of this analysis. It is best if the child who is to be given an intelligence test is comparatively at ease. This m a y require an explanation of what is to h a p p e n and the purpose of the evaluation. Furthermore, it is wise that the testing not be inserted a m o n g
Volume 75 Number 6, part 1
a host of procedures a n d experiences t h a t might be upsetting. I f the child is receiving medication, it would be better if its effects were dissipated when he is examined. T h e score has c o m p a r a t i v e l y little m e a n ing by itself, but if it is put into the context of a t h o r o u g h medical e x a m i n a t i o n a n d a detailed statement of the person's m e d i c a l and social history, a well-trained psychologist can use it to come to reasonable conclusions a b o u t a person. This is especially so when the psychologist also supplies his clinical a c u m e n , knows w h a t he is to predict, and administers additional tests. W i t h these, he can state some probability of success in some endeavors. I n f o r m a t i o n a b o u t prior testing also helps the psychologist come to a decision. W h e n screening procedures are used by the p e d i a t r i c i a n , a c q u a i n t a n c e with the test's limitations is essential. P e r h a p s most i m p o r t a n t is that the referring physician recognize t h a t such scores are imprecise a n d t h a t changes over time m a y be an artifact of intelligence testing r a t h e r t h a n t r e a t m e n t t h a t has been instituted. T h e y m a y reflect differences between separate tests or d e m o n s t r a t e the highly unique curve of a child's development. A t any rate, c a u t i o n is in o r d e r in discussing results with parents and others involved in a child's care and, when possible, long-term decisions should be deferred until there is repeated corroboration of the findings. Special thanks are due to a number of individuals whose comments and questions were most helpful during the preparation of this paper: Miss Elizabeth Baker, C. Jack Friedman, Ph.D., William Mark, M.A., and Jules Spotts, Ph.D. Dr. Friedman worked on an earlier version, but the paper as it now appears is the sole responsibility of the author. REFERENCES
1. Berman, P. W., Waisman, H. A., and Graham, F. K.: Intelligence in treated phenylketonuric children: A developmental study, Child Develop. 37: 731, 1966. 2. Knox, W. E.: An evaluation of the treatment of phenylketonuria with diets low in phenylalanine, Pediatrics 26: 1, 1960. 3. Hsia, D. Y-Y.: Phenylketonuria: Human biochemical genetics, Pediatrics 38" 173, 1966.
Interpretation of I.Q.
9 75
4. Stott, L. E., and Ball, R. S.: Infant and preschool mental tests: Review and evaluation, Monogr. Soc. Res. Child Develop. 30" No. 3, 1965. 5. Santostefano, S.: Psychologic testing in evaluation and understanding organic brain damage and the effects of drugs in children, J. PEDIAT. 62" 766, 1963. 6. Lesser, G. S., Fifer, G., and Clark, D. H.: Mental abilities of children from different social-class and cultural groups, Monogr. Soc. Res. Child Develop. 30: No. 4, 1965. 7. Anastasi, A.: Psychological tests: Uses and abuses, Teachers College Record 62" 389, 1961. 8. Anastasi, A.: Psychological testing, ed. 3, New York, 1968, The Macmillan Company. 9. Kennedy, W. A., Van De Riet, V., and White, J. C.: A normative sample of intelligence and achievement of Negro elementary school children in the southeastern United States, Monogr. Soc. Res. Child Develop. 28: No. 6, 1963. 10. Wechsler, D.: The measurement of adult intelligence, ed. 3, Baltimore, 1944, The Williams & Wilkins Company. 11. Wechsler, D.: Manual for Wechsler Intelligence Scale fbr children, New York, 1949, Psychological Corp. 12. Goldfarb, W.: Effect of psychological deprivation in infancy and subsequent stimulation, Am. J. Psychiat. 102: 18, 1945. 13. Spitz, R.: Hospitalism: An inquiry into the genesis of psychiatric conditions in early childhood, Psychoanalyt. Study Child 1" 53, 1945. 14. Kagan, J., Moss, H. A., and Sigel, I. E.: Psychological significance of styles of conceptualization, in Wright, J. C., and Kagan, J., editors: Basis cognitive processes in children, Monogr. Soc. Res. Child Develop. 28: No. 2, Chap. 5, 1963. I5. Sarason, S. B., Hill, K. T., and Zimbardo, P. G.: A longitudinal study of the relation of test anxiety to performance on intelligence and achievement tests, Monogr. Soc. Res. Child Develop. 29: No. 7, 1964. 16. Guidelines for testing minority group children, J. Soc. Issues 20: 129, 1964. 17. Spearman, C.: The abilities of man, New York, 1927, The Macmillan Company. 18. Thurstone, L. L.: Vectors of mind: Multiple factor analysis for the isolation of primary traits, Chicago, 1935, University of Chicago Press. 19. Guilford, J. P.: Three faces of intellect, Am. Psychologist 14: 469, 1959. 20. Dunn, L.: Expanded manual for Peabody Picture Vocabulary Test, Minneapolis, 1965, American Guidance Service. 21. Terman, L. M., and Merrill, M. A.: Stanford-Binet Intelligence Scale, Boston, 1960: Houghton Mifflin Company. 22. Standards for educational and psychological tests and manuals, Washington, D. C., 1966, American Psychological Association.
9 76
Steisel
23. Raven, J. C.: Guide to using the coloured progressive matrices, London, 1960, H. K. Lewis. 24. Money, J.: Intellect, brain, and biologic age: Introduction, in Cheek, D. B., editor: Human growth: Body composition, cell growth, energy and intelligence, Philadelphia, 1968, Lea & Feblger, Publishers, pp. 535-540. 25. Tyler, L. E.: The psychology of human differences, ed. 3, New York, 1965, AppletonCentury-Crofts. 26. Steisel, I. M.: The relation between test and retest scores on the Wechsler-Bellevue Scale (Form I) for selected college students, J. Genet. Psychol. 79: 155, 1951. 27. Steisel, I. M.: Retest changes in WechslerBellevue scores as a function of the time interval between examinations, J. Genet. Psychol. 79: 199, 1951. 28. Reger, R.: Repeated measurements with the W.I.S.G., Psychological Reports 11: 418, 1962. 29. Honzik, M. P., Macfarlane, J. W., and Allen,
The Journal o[ Pediatrics December 1969
30.
31. 32.
33.
34.
L.: The stability of mental test performance between two and eighteen years, J. Exper. Educ. 17: 309, 1948. Sontag, L. W., Baker, C. T., and Nelson, V. L.: Mental growth and personality development: A longitudinal study, Monogr. Soc. Res. Child Develop. 23: No. 2, 1, 1958. Rosenthal, R., and Jacobson, L. F.: Pygmalion in the classroom, New York, 1968, Holt, Rinehart and Winston. Beez, W. V.: Influence of biased psychological reports on teacher behavior and pupil performance, in Proceedings of the 76th Annual Convention of the American Psychological Association, Washington, D. C., 1968, American Psychological Association, pp. 605-606. Getzels, J. W., and Jackson, P. W.: Creativity and intelligence: Explorations with gifted students, New York, 1962, John Wiley & Sons, Inc. Woodring, P.: Are intelligence tests unfair? Saturday Review 49: 79, 1966.