Perceived validity of computer- versus clinician-generated MMPI reports

Perceived validity of computer- versus clinician-generated MMPI reports

Computers in Human Behavwr, Vol. 2, pp. 77-83, 1986 0747-5632/86 $3.00 + .00 Copyright © 1986 Pergamon Journals Inc. Printed in the U.S.A. All right...

436KB Sizes 0 Downloads 23 Views

Computers in Human Behavwr, Vol. 2, pp. 77-83, 1986

0747-5632/86 $3.00 + .00 Copyright © 1986 Pergamon Journals Inc.

Printed in the U.S.A. All rights reserved.

Perceived Validity of Computer- Versus Clinician-Generated MMPI Reports L. Michael Honaker Florida Institute of Technology and National Computer Systems, Washington, DC

Vicki Schwartz Hector and Thomas H. Harrell Florida Institute of Technology

Abstract -- Psychology graduate students and practicing psychologists were asked to rate the accuracy of interpretive reports for the Minnesota Multiphasic Personality Inventory that were labeled as generated by either a computer or licensed clinician. There was not any difference between the accuracy ratings for computer-generated reports and the ratings for clinician-generated reports, although both groups tended to rate reports that contained a purposefully inaccurate statement as less valid than reports without the inaccurate statement. Also, experienced clinicians tended to perceive reports labeled as computergenerated as less useful and less comprehensive than the same reports labeled as clinician-generated. The results do not support the claim that computer reports may be seen as more credible than is warranted.

The proliferation of computer programs for psychological test interpretation has led to an increased concern about their use (e.g., Matarazzo, 1983, 1986). One particular concern has been the possibility that practitioners may assign more credibility to a computer-generated report than is warranted. There are several reasons why uncritical acceptance of computer-generated interpretations might occur. Klett and Pumroy (1971) suggest overconfidence in computer reports may result from people being so impressed by the capability of computers that this "awe" may generalize to any output of the computer. Similarly, Graham (1977) suggests that people do not understand the limitations of computers and may associate computerization with objective, scientific "facts." Finally, Butcher (1978) and Adair (1978) implicate misleading advertising by computer services as an additional factor which may lead to inaccurate assessment of a computer narrative's accuracy. The potential overestimation of computer accuracy becomes critical in light of the general lack of research support for the validity of interpretive statements (Butcher, 1978; Matarazzo, 1986; Moreland, 1985). Also, the validation research which has been done is frequently of questionnable quality and often Requests for reprints should be sent to L. Michael Honaker, National Computer Services, 1101 30th Street N.W., Suite 500, Washington, DC 20007.

77

78

Honaker, Hector and Harrell

does not control tor biases of the rater and other confounding variables (Lanyon and Goodstein, 1982, pp. 219-237). Thus, it is possible that practitioners may be placing uncritical confidence in statements of questionnable validity. This situation becomes even more problematic when the user of the program does not carefully evaluate the statements in light of other test data and/or inibrmation about the examinee (Rodgers, 1972). Whatever the consequences of and reasons for overacceptance of computer interpretations might be, there is no direct empirical evidence that practitioners do, in fact, overestimate the computer's accuracy. Several studies have shown that computer-generated reports are perceived as accurate (e.g., Lachar, 1974a) and equal to, or better than, non-computer reports (e.g., Bringmann, Balance, & Giesbrecht, 1972; Webb, Miller, & Fowler, 1970). However, since different computer and non-computer reports were compared, it is unclear if the accuracy ratings resulted from the knowledge that the reports were computer-generated, from greater accuracy of the statements, or both. The goal of the present study was to examine whether readers of test reports do in fact overevaluate the accuracy of computer-generated statements. "Accurate" or "inaccurate" M M P I reports were labeled as either computer- or clinician-generated. The reports were then rated by psychology graduate students or practicing clinicians tbr their accuracy, utility, and comprehensiveness. It was hypothesized that reports labeled as produced by a computer would be rated as more accurate, useful, and complete than those identified as cliniciangenerated. Also, it was expected that graduate students would attribute more accuracy, usefulness, and completeness to the computer reports than practicing clinicians.

METHOD

Subjects Sixty-seven graduate students in psychology and 272 practicing psychologists were solicited as subjects. Of those solicited, 57 graduate students (85.07%) and 79 psychologists (29.04%) returned completed forms, yielding a total response rate of 40.02%. Graduate students were obtained from a sample of students enrolled in a psychology doctorate training program and all had previously taken at least one course that included M M P I interpretation. Practicing psychologists were obtained from a listing of the Directory of Internship Programs in Professional Psychology and a list of individuals who had participated in M M P I workshops. When the materials were sent to a facility, the director of the facility was asked to have "the staff M M P I expert" participate as a subject. Practicing psychologists were solicited by mail and the graduate students were approached in person during a school meeting. Subjects were asked to rate their level of training in M M P I and personality assessment, and their level of experience with computerized assessment on separate 7-point scales (1-None; 4-Moderate; 7-Extensive). Rated level of training was significantly higher for practicing clinicians (M = 4.97, sd = 1.04) than that for graduate students ( M = 3.68, s d = .92), F ( 1 , 1 3 4 ) = 56.65, p < .001. Also, clinicians indicated more computer experience (M = 3.56, sd = 1.62) than that endorsed by students (M = 2.30, sd = 1.22), F(1,134) = 24.79, p < .001. Finally

Perceived validity of computer reports

79

the clinicians reported significantly more years of assessment experience sd = 1.73), (M=ll.03, s d = 6 . 6 9 ) than graduate students ( M = 2 . 1 2 , F(1,134) = 95.83, p < .001. Materials and Procedure

Four packets were used in the study. The first page of each packet was a cover letter requesting subject participation. The letter described the research project as an "examination of the accuracy of certain M M P I interpretations." The next page in the packet included instructions which requested that the subject rate the accuracy of attached M M P I profile interpretations on a 7-point rating scale (1 --"totally inaccurate"; 7 = "totally accurate") and included a description of the source (i.e., computer- or clinician-generated) of the interpretations. Attached were three M M P I profiles presented on separate pages. On each page, the profile was presented in the top half of the paper. Immediately beneath the profile, a brief (4 to 6 sentence) interpretive paragraph was presented. All paragraphs were typed to insure that presentation of stimulus materials was equated across conditions. At the bottom of the page appeared a solid line, with seven sequentially numbered vertical hatch marks. The phrases "totally inaccurate" and "totally accurate" appeared under the hatch marks numbered 1 and 7, respectively. Subjects rated the accuracy of the paragraph by circling the number or corresponding hatchmark. The three M M P I profiles had clear high point codes of 1-3, 2-4, or 7-8. These profiles were selected because of their relative frequent occurrence in clinical populations (e.g., Dahlstrom and Welsh, 1960; Lachar, 1974b) and thus a greater likelihood that clinicians/students had previous experience with the profile and its interpretation. For each profile, the two high point scales: (a) were the only scales graphed above the 70 T-score line, (b) were separated by a maxim u m of 4 T-score units, and (c) were at least 15 T-score units above other scales. Each individual packet included either "accurate" interpretations of all three profiles or "inaccurate" interpretations. The interpretations were derived from Lachar's (1974b) automated interpretation system. An accurate paragraph was simply a verbatim copy of Lachar's interpretive statements for the two-point elevation. Inaccurate paragraphs were developed by replacing an accurate statement with a statement which was a major contradiction to the profile type (e.g., for the 1-3 profile "Due to a high level of discomfort, these individuals are frequently highly motivated clients" was substituted for the statement "Repression and denial make psychiatric intervention difficult"). The profiles were placed in the packet in a random order in an attempt to control for possible order effects. Instructions which clearly identified the packets as generated by either a computer program or by a licensed doctoral level psychologist were randomly attached to the accurate and inaccurate packets. The final page of each packet was a questionnaire to assess the perceived utility of the interpretations and the subject's level of training and experience in assessment. Subjects responded to each of the following statements on a 7-point scale (1 = totally disagree; 4 = moderately agree; 7 = totally agree): (a) "These reports would be helpful in diagnosis," (b) "These reports would be helpful in treatment planning," and (c) "These reports are comprehensive in their descriptions."

80

Honaker, Hector and Harrell

RESULTS

A multivariate analysis of variance yielded a significant difference in accuracy ratings across the three profiles, F(1,134) = 4.32, p < .02. Accordingly, univariate analyses were performed on each profile separately. The obtained accuracy ratings are summarized in Table 1. A Group (student vs. psychologist) X Validity (accurate vs. inaccurate) X Source (computer vs. clinician) analysis of variance.(ANOVA) of profile 1-3 yielded a significant main effect for Validity, F(1,128) = 8.43, p < .02, and a significant Group X Validity interaction, F(1,128) = 4.63, p < .05. Examination of the means indicates that practicing psychologists gave inaccurate interpretations significantly lower ratings than were given for accurate paragraphs, t ( 7 7 ) = 4.41, p < .01; there was no significant difference for student ratings, t ( 5 5 ) = .31. An A N O V A of profile 2-4 resulted in a significant Validity effect, F ( 1 , 1 2 8 ) = 7.32, p < . 0 1 . Accurate interpretations were rated significantly higher than inaccurate reports by both students, / ( 5 5 ) = 2.53, p < .05, and professionals, t ( 7 7 ) = 2.04, p < .05. An A N O V A of profile 7-8 failed to yield any significant main effects or interactions. A summary of F ratios associated with main effects and interactions of Source for each profile is shown in Table 2. It is interesting to note that Source failed to produce a main effect or interaction that even approached significance for any of the profiles. Rated diagnostic and treatment planning utility as well as comprehensiveness ratings are summarized in Table 3. An A N O V A for diagnostic utility yielded a significant Group X Source interaction, F ( 1 , 1 2 8 ) = 7.94, p < .01. Practicing psychologists rated computer-generated reports as being less diagnostically useful than clinician-generated reports, t(77) = 2.77, p < .01. An A N O V A of treatment planning utility resulted in a significant main effect for Group, F(1,128) = 17.42, p < .01. Students gave significantly higher ratings for all reports than did the professionals, t(134) = 4.33, p < .01. Evaluation of

Table 1. Means and Standard Deviations of Accuracy Ratings for Each Profile Students Accurate

Profile: 1-3: m

(SD) 2-4: M

(SD) 7-8: M

(SD)

Clinicians Inaccurate

Accurate

Inaccurate

CL ( N = 13)

CO ( N = 15)

CL (N=12)

CO (N=17)

CL (N=17)

CO ( N = 18)

CL (N=25)

CO ( N = 19)

5.08 (1.26)

5.50 (0.82)

5.08 (1.38)

5.29 (1.45)

5.38 (0.76)

5.70 (0.67)

4.72 (1.41)

4.37 (1.43)

5.54 (0.78)

5.47 (0.92)

4.38 (1.46)

5.09 (1.28)

5.62 (0.96)

5.25 (1.02)

5.06 (1.28)

5.03 (1.02)

4.92 (1.38)

4.83 (1.38)

4.16 (1.35)

4.50 (1.30)

5.06 (1.13)

4.75 (1.09)

5.20 (1.14)

4.90 (0.88)

Note. C L = Clinician report; CO = Computer report. Ratings are based on a 7-point scale (1 = inaccurate; 7 = accurate).

Perceived validity of computer reports

81

Table 2. F-ratios Associated with Main Effects and Interactions of Source (i.e., computer vs. clinician) MMPI Profile Variable Source Source X Group Source X Validity Source X Group X Validity

1-3 0.38 0.68 1,30

2-4 0.01 1.85 1.84

7-8 0.53 1.10 0,19

0.29

0.33

0.25

Note. Degrees of freedom for these F-ratios are (1,128). All values are ns.

Table 3. Means and Standard Deviations of Utility Ratings Students Accurate CL ( N = 13)

CO (N-- 15)

Useful in diagnosis: M 4.62 4,83 {SD) (0.77) (0,75) Useful in treatment planning: M 5.00 4.77 (SD) (0.82) (0.73) Comprehensive: M 2.85 2.90 (SD) (1.07) (1.23)

Clinicians Inaccurate

Accurate

Inaccurate

CL ( N = 12)

CO (N=17)

CL ( N = 17)

CO ( N = 18)

CL (N=25)

CO ( N = 19)

4.38 (1.23)

4.79 (1.30)

5.12 (0.96)

4.19 (1.02)

4.38 (0.92)

3.87 (1.25)

4.58 (1.24)

4.46 (1.13)

4.59 (1.06)

3.83 (0.94)

3.98 (1.08)

3.55 (1.47)

2.83 (1.45)

3.18 (1.19)

2.85 (t.16)

2.14 (0.82)

2.70 (1.30)

2.13 (1.09)

Note. CL = Clinician report; CO = Computer report. Ratings are based on a 7-point scale (1 = Totally disagree; 7 = Totally agree).

c o m p r e h e n s i v e n e s s ratings p r o d u c e d a significant G r o u p m a i n effect, F(1,128) = 6.39, p < .01, and a G r o u p X Source interaction, F(1,128) = 4.24, p < .05. Professionals rated c o m p u t e r - g e n e r a t e d reports as less c o m p r e h e n s i v e than did students, t(67) = 3.69, p < .01, and saw them as less c o m p r e h e n s i v e than clinician interpretations, t(77) = 2.63, p < .01.

DISCUSSION

T h e present results fail to support the claim that c o m p u t e r - g e n e r a t e d interpretations are assigned m o r e credibility than is w a r r a n t e d . Although both students and practicing clinicians t e n d e d to rate accurate reports as m o r e valid than inaccurate reports on two of the three profiles, there were no differences in accuracy ratings as a function of c o m p u t e r versus clinician labels for any of the reports. As expected, g r a d u a t e students did perceive the c o m p u t e r interpretation as being m o r e complete than did the clinicians. H o w e v e r , they did not see the computer interpretations as being m o r e accurate or m o r e useful. E x p e r i e n c e d clinicians did tend to perceive the c o m p u t e r - g e n e r a t e d reports as less useful and less

82

Honaker, Hector and Harrell

c o m p r e h e n s i v e than the clinician reports. T h e s e findings m a y suggest that not only do psychologists not overevaluate the accuracy of the c o m p u t e r , but they actually m a y be m o r e critical of c o m p u t e r output. Perhaps psychologists have learned to expect m o r e from a c o m p u t e r . T h e r e f o r e , a c o m p u t e r report must have considerably m o r e utility and be m u c h m o r e complete in o r d e r to bc perceived as more useful and c o m p r e h e n s i v e than a report provided by a clinician. An interesting finding is that although accurate reports were rated as m o r e valid than inaccurate reports for two of the profiles, the m e a n accuracy ratings for both accurate and inaccurate reports across the profiles tended towards the accuracy side of the rating scale (i.e., were greater than 4 on the 7-point scale). Also there was no difference in perceived accuracy of the accurate and inaccurate reports tor one of tile profiles (7-8). This result suggests that one completely e r r o n e o u s statement in an i n t e r p r e t a t i o n does not detract seriously from the perceived accuracy of the report, even t h o u g h the statement m a y have far-reaching intervention implications. Several of the practicing clinicians did make note of the statement by crossing it out a n d / o r writing c o m m e n t s such as " n o n s e n s e , " "not t r u e , " etc. H o w e v e r , m a n y still gave accuracy ratings of 5 or 6 when asked to evaluate the total p a r a g r a p h . F u t u r e investigations should evaluate how m a n y or what p r o p o r t i o n of inaccurate statements have to occur in an interpretation before it is perceived as seriously inaccurate. T h e reader should also note that all the participants in this study had at least some training in M M P I interpretation and thus could critically evaluate the interpretive paragraphs. It is possible that other c o n s u m e r s of c o m p u t e r reports, without specific training in interpretation, m a y put more credence in a c o m p u t e r report than a clinician report. For example, physicians and psychiatrists who use c o m p u t e r services m a y do so because of more faith in c o m p u t e r output. This possibility needs to be explored further. Finally, the study's results should be considered in light of the following methodological limitations. First, the participation rate of practicing clinicians was considerably lower than that shown by graduate students. Although the response rate is consistent with other mail questionnaire r e t u r n rates (e.g., Bouhoutsos, G o r p , K r u p p , & Schag, 1985) and subjects were solicited from throughout the U n i t e d States, the low clinician response rate m a y attenuate the generalizability of the c u r r e n t results. Direct interaction with potential participants, such as that used in the present study to obtain graduate students, may help to increase the participation rate in future research efforts. Second, although instructions clearly labeled the reports as c o m p u t e r - or clinician-generated, subjects m a y not have clearly perceived the c o m p u t e r reports as c o m p u t e r - g e n e r a t e d . Interpretive p a r a g r a p h s tor all reports were presented in a " t y p e d " t o r m a t in an a t t e m p t to equate the presentation of the stimulus materials across conditions. H o w e v e r , most actual c o m p u t e r - g e n e r a t e d reports are printed on d o t - m a t r i x printers which have a distinctive a p p e a r a n c e that identifies the report as o u t p u t from a c o m p u t e r . T h e absence of d o t - m a t r i x print m a y in some way have w e a k e n e d the association between the report and the " a u t h o r i t y " of the c o m p u t e r . T h i r d , since each subject received either all accurate or inaccurate interpretations, it is possible that response sets m a y have developed which affected the accuracy ratings. T h e failure to find a consistent pattern of difference in accuracy ratings across the profiles would suggest that systematic response sets did

Perceived validity of computer reports

83

not o c c u r . Also, the r a n d o m p r e s e n t a t i o n o f the profiles w i t h i n the p a c k e t was u s e d in a n a t t e m p t to c o n t r o l for possible o r d e r effects. H o w e v e r , a p p r o p r i a t e e v a l u a t i o n o f w h e t h e r r e s p o n s e sets or o t h e r o r d e r effects w e r e p r e s e n t is n o t possible with the c u r r e n t design. I n f u t u r e studies the use o f a single profile or, if multiple profiles are e m p l o y e d , a c o u n t e r b a l a n c e d p r e s e n t a t i o n o r d e r w o u l d serve as a m o r e a d e q u a t e c o n t r o l for a n d e v a l u a t i o n o f p o t e n t i a l o r d e r c o n f o u n d s .

REFERENCES Adair, F.L. (1978). [Re Minnesota Multiphasic Personality Inventory] Computerized scoring and interpreting services. In O.K. Buros (Ed.), Eighth Mental Measurements Yearbook, Highland Park, NJ: Gryphon. Bouhoutsos, J.C., Gorp, W.G., Krupp, G.J., & Schag, D.S. (1985). The professional school controversy: Attitudes of APA members. The Clinical PJychologist, 38, 56-59. Bringmann, W.G., Balance, W.D., & Giesbrecht, C.A. (1972). The computer vs. the technologist: Comparison of psychological reports on normal and elevated MMPI profiles. Psychological Reports, 31, 211-217. Butcher, J.N. (1978). [Re Minnesota Muhiphasic Personality Inventory] Computerized scoring and intrepreting services. In O.K. Buros (Ed.), Ezghth Mental ll4easurements Yearbook, Highland Park, NJ: Gryphon. Dahlstrom, W.G. & Welsh, G.S. (1960). An MJ~IPI handbook." A guide to use in clinical practice and research. Minneapolis: University of Minnesota Press. Graham, J.R. (1977). The MMPI: A pravticalguide. New York: Oxford University Press. Klett, C.J. & Pumroy, D.K. (1971). Automated procedures in psychological assessment. In P. McReynolds (Ed.) Advances- in psychological assessment. (Vol. 2), Palo Alto, CA: Science and Behavior Books. Lachar, D. (1974a). Accuracy and generalizability of an automated MMPI interpretation system. ,Journal of Consulting and Clinical Psychology, 42, 267-273. Lachar, D. (1974b). The MMPI: Clinical assessment and automated interpretation. Los Angeles, CA: Western Psychological Services. Lanyon, R.I. & Goodstein, L.D. (1982). Personality assessment, 2rid ed. New York: Wiley. Matarazzo, J.M. (1983, July 22). Computerized psychological testing. Science, 221, 323. Matarazzo, J.M. (1986). Computerized clinical psychological test interpretations: Unvalidated plus all mean and no sigma. American Ps)Jchologist, 41, 14-24. Moreland, K.L. (1985). Validation of computer-based test interpretation: Problems and prospects. Journal of Consulting and Clinical Psychology, 53, 816-825. Rodgers, D.A. (1972). Minnesota Multiphasic Personality Inventory. In O.K. Buros (Ed.), 3~'venth Mental ]14easurements Yearbook, Highland Park, NJ: Gryphon. Webb, J.T., Miller, M.L., & Fowler, R.D. (1970). Extending professional time: A computerized MMPI interpretation service. Journal of Clinical Psychology, 26, 210-214.