Computers in Human Behavior: Vol. 9. pp. 44-449. Printed in the U.S.A. All rights reserved.
1993 Copyright
0747-5632/‘93 !§6.00 + .oO 0 1993 Pergamon Press Ltd.
Comparing Paper-Pencil and ComputerBased Versions of the Strong-Campbell Interest Inventory Timothy I?. Vansickle Center for Education
and Work, ACT
Jerome T. Kapes Texas A&M University
Abstract - The equivalence of the mode of administration was studied using the paper-pencil (PP) and computer-based (CB) versions of the Strong-Campbell Interest Inventory (SCII); the results provide evidence of increased reliability and decreased administration time for the CB version.
One of the major concerns surrounding computer-based (CB) testing has been expressed in the American Psychological Association’s (APA) (1986) Guidelines for Computer-Based Testing and Interpretations, which specifically states that evidence of the equivalence of the two forms of a test must be provided by the publisher and, further, that the users of CB tests and interpretations be aware of any inequalities resulting from the mode of administration. Additionally, the American Association for Counseling and Development (AACD) (1988) expressed in their Ethical Standards a recognition that computer-based tests and interpretations are not inherently valid and require the user to validate the software prior to use with clients or release for commercial distribution. Butcher (1987) further points out that there are several issues to be considered when using computers in psychological assessment. One of the most important issues is equivalence of form, which includes the problem of establishing the criteria for validating computerized versions of paper-pencil (PP) tests. Although Butcher does not specifically state that the APA guidelines be used as criteria for validating the CB forms, they do provide a starting point. Requests for reprints should be addressed to Timothy R. Vansickle, Center for Education and Work, PO. Box 168, Iowa City, IA 52243. 441
442
Vansickle and Kapes
The purpose of this research was to examine the equivalence of PP and CB versions of the Strong-Campbell Interest Inventory (SCII). This study focused on the equivalence of mode of administration for a specific interest inventory; however, much of the research conducted in the past has been conducted with personality and cognitive instruments and has yielded conflicting results. Over 15 years ago, Lushene, O’Neil, and Dunn (1974) completed an equivalency study of the Minnesota Multiph~ic Personality Inventory (MMP~ and found no signi~cant differences. More recent studies have shown that the use of computers in assessment does not produce results that are significantly different from conventional test administrations with a variety of populations including (a) gifted and behavior-problem children (Katz & Dalby, 1981b), (b) geriatric patients (Carr, Wilson, Ghosh, Ancill, & Woods, 1982), (c) physically handicapped persons (Wilson, Thompson, & Wylie, 19821, and (d) mental health patients (Katz & Dalby, 198la). However, an equivalency study on the California Psychological Inventory (CPQ (Scissons, 1976) revealed that CB ad~nis~ation yielded increased efficiency by reducing counselor time, but also yielded some differences in scale scores. Another study by Watts, Baddeley, and Williams (1982) investigated the difference between PP and CB versions of Raven’s Matrices and found that the CB version scores were significantly different from the PP version, and stated that separate norms needed to be established for the CB version. With the rapid growth of computer technology and its impact on career counseling instruments, some recent studies have been conducted with several interest invento~es. O’Shea (1987) completed a study combing the PP and CB administrations of the Harrington-O’Shea Career Decision Making System (CDM) with college freshmen and found no statistically significant differences. Another recent study (Reardon & Loughead, 1988) compared the CB and PP versions of the SelfDirected Search (SDS) using a counterbalanced design and found no statistical differences for the mode of administration. However, in a study by Vansickle, Kimmel, and Kapes (1989) comparing four combinations of mode of administration (PP/PP, PPlCB, CBIPP, and CBICB), differences in the reliabilities that favored the computer-based SC11 were found (median test-retest coefficients: PP/PP = 84, PP/CB = .84, CB/PP = .89, and CB/CB = .92). The study reported here is a replication and extension of the previous Vansickle et al, research. It was the purpose of this research to answer the following questions concerning the equivalence of PP and CB modes of adminis~a~on for the SCII: I. What are the correlations between the two administrations for all scale scores? 2. Is there a difference between the correlations for combinations of all scafe scores? 3. Is there a difference between the D scores for combinations of all scale scores? 4. Is there a difference between hit rates for all scale scores? 5. Is there a difference among the means for each of four sets of scale scores? 6. Is there a difference among the variances for each of four sets of scale scores? 7. Is there a difference among the times required to complete the instruments? METHOD Sample
The sample contained 52 students enrolled in two sections of Educational Psychology (EPSY) 101, “Improvement of Learning,” a three-credit elective
Strong-Campbell Interest Inventory
443
course at Texas A&M University during the spring semester of 1988. The purpose of EPSY 101 is to help those individuals enrolled make better use of their study time, provide models for efficient learning, and help students clarify academic and career goals. As part of this course, a career exploration section is incorporated in which the Strong-Campbell Interest Inventory is administered. Instrumentation
Two versions of the Strong-Campbell Interest Inventory were used in the current study. Both the PP and CB versions are published by National Computer Systems (NCS), form T325 (1985 4th ed. rev.). This form contains 325 items using a like, indifferent, or dislike response set. The SC11 produces a profile divided into three major parts: general occupational themes (6 scales), basic interest scales (23 scales), and occupational scales (119 scales). The profile report also contains two “special scales,” Academic Comfort and Introversion-Extroversion, which were included in the study. All scores on the SC11 profile report are T scores which have a mean of 50 and a standard deviation of 10 (Hansen & Campbell, 1985). Furthermore, the profile is organized using the six Holland (1973) personality types. Procedure
The current study was conducted using a 2-week test-retest design over the middle 6 weeks of the spring 1988 semester. All subjects from two intact classes of EPSY 101 were randomly assigned to either Group 1, PP, or Group 2, CB. The additional two groups used in the previous study (PPKB and CB/PP) were not included in the present study so that the PP and CB groups could be studied more intensively. The EPSY 101 classes used the SC11 as part of the occupational and academic exploration portion of the course, with the Interest Inventory administered and discussed approximately in the middle of the semester. Because of their participation in this study, all results and discussion of occupational and academic goals were withheld until completion of the second administration. The paper-pencil SC11 was group-administered during the regularly scheduled class periods, while the computer-based SC11 was administered by appointment. To ensure the least possible subject mortality due to missed administration times, subjects in the CB group were allowed to select the day and time for taking the SCII. Thus, the actual period of time between the first administration of the SC11 and the second ranged from 13 to 19 days. Also, each individual’s time to completion was recorded for both the test and retest. Neither group lost any subjects during the study; however, the CB group had 2 students forced into the section by academic advisors, yielding an N = 27, while the PPgrouphadanN=25.
RESULTS
To answer the first research question, Pearson product-moment correlations were computed between the scale scores of the first and second administration for each group. All of the 150 test-retest coefficients were significant beyond the p < .OOl level, as would be expected. The test-retest coefficients for the PP group ranged form .47 to .97, while the coefficients for the CB group ranged from .77 to .99.
444
Vansickle and Kapes
The second research question concerned the differences between the test-retest coefficients for the two modes of administration. The mean and median test-retest coefficients were computed for scale score combinations and were combined to form five logical categories which were based on the SCII’s score report and include special scales (2 scales), general occupational themes (6 scales), basic interest scales (23 scales), occupational scales (119 scales), and total scales (150 scales). The mean and median test-retest coefficients are presented in Table 1. Since the SC11 produces very high reliability coefficients, a modified chi-square statistic (k test of significance; Hakstian & Whalen, 1976) was used to test the significance of the differences between the two groups. From the table it can be seen that all of the combinations yielded larger test-retest coefficients for the CB version; however, only for the comparisons where there were 23 or more scales (6 of 10 comparisons) were the differences large enough that they could be expected to have occurred by chance less than 5% of the time if, in fact, there were no differences in the population from which these coefficients were obtained. The absence of statistical significance in the four comparisons appears to be due both to smaller differences between the respective coefficients as well as to a lesser degree of power in the k test when a small number of coefficients are used to compute the medians or means. From an examination of the individual reliability coefficients for each of the 150 scales of the SC11 which compared the raw data for the analysis reported at the bottom of Table 1, all but 11 (93%) yielded larger test-retest values for the CB version. The third research question concerned differences between the groups D scores (D = the absolute value of the difference between the first and second administrations) for the combinations of scale scores. The 150 scales of the SC11 were analyzed by computing 150 sets of D scores over all individuals, for each group. The means and standard deviations for these D scores were computed for each scale for each group. Further analysis of the mean D scores used an independent t test to analyze differences between groups. The results of these analyses yielded 140 Table 1. Median and Mean Test-Retest Coefficients and kTest the SCll by Group and Scale Scores
of Significance
for
PaperPencil (N = 25)
ComputerBased (N = 27)
Special Scales (2 scales) Median Mean
.90 .91
.96
.97
3.45 2.44
General Occupational Themes (6 scales) Median Mean
.91 .91
.94 .94
0.88 0.88
Basic Interest Scales (23 scales) Median Mean
.83 .81
.94 .93
5.96’ 5.51’
Occupational Scales (119 scales) Median Mean
.86 .83
.96 .94
8.73’ 6.18’
Total All Scales (150 scales) Median Mean
.86 .83
.96
8.75’ 6.19’
SCII Scales
*Significance at the p < .05 level.
.94
k Test
Strong-Campbell Interest Inventory
445
(93%) differences favoring the computer-based SC11 and only 10 favoring the PP version. Furthermore, 34 differences exceeded a chance occurrence at the .05 level, all of which favored the CB version. Even if one chose to interpret these results conservatively and allow for some chance differences among the 34 which reached statistical significance at the 5 in 100 rate (given 150 separate t tests), at least 26 differences, all favoring the computer-based SCII, exceeded a chance occurrence. The fourth research question concerned the differences between hit rates and high point codes. Two sets of hit rates were computed for the 150 scales of the SCII. The cutoff points were set at T scores of 45 and 40, with a T score of 40 being the more traditional decision point, while a T score of 45 is a more conservative measure. A positive hit was counted when a scale score on both administrations was greater than the cutoff, while a negative hit was counted when a scale score on both administrations was equal to or less than the specified cutoff. Total hits were computed by summing the number of positive and negative hits. The hit rates for each cutoff by group are presented in Table 2. When a conservative cutoff of T = 45 was applied, the CB group achieved more positive hits than did the PP group, and furthermore, the total hits favored the CB group by 5 out of 100 (90 vs. .45). Similar findings occurred when a cutoff of T = 40 was applied to the hit rate data (.89 vs. .94). In another form of hit rate analysis the data for each group were scored for high point codes as they would be in a counseling setting. The general occupational themes of the SC11 were converted to 3-point codes and scored in two categories, exact hits or any-order hits for each of the individual’s two administrations. The frequency data collected were then analyzed by first comparing exact hits versus all other possibilities, and then exact and any-order hits versus nonhits. The data for exact hits versus all other possibilities of 3-point codes by group are presented in Table 3. The resultant significant chi-square indicates that the computer-based SC11 provided more consistent 3-point codes than did the paper-pencil SCII. However, the data for both exact and combination hits versus nonhits for 3-point codes by group (Table 3) yielded differences that were not statistically significant. The trend remains, however, that the computer-based SC11 produced more hits than did the paper-pencil SC11regardless of coding combination. The fifth research question concerned the differences among the means. The means and standard deviations for each administration for each group were computed for the 150 scales of the SCII. Further analysis was conducted by computing separate independent t tests for the first administrations of both groups and the second administrations of both groups, and dependent t tests for the first and second
Table 2. Hit Rates by Group for All Scale Scores (N = 150 Scales) PaperPencil (N = 25)
ComputerBased (N= 27)
Positive Hits > 45 Negative Hits 2 45 Total Hits Cutoff 45
.16 .74 .90
.23 .72 .95
Positive Hits > 40 Negative Hits 2 40 Total Hits Cutoff 40
.25 64 .69
33 .61 .94
Note. Hit rates are reported as proportions.
VansickleandKapes Table 3. Hit Rate for SPoint Codes for the SCll PaperPencil (N=25)
ComputerBased (N = 27)
Freq
%
Freq
%
Exact Hits Co~ination and Nonhits
5 20
20 80
14 13
52 48
5.68”
Exact and Combination Hits Nonhits
18 7
72 20
24 3
89 11
2.38
“Chi-square with 1 df significantat the p e 0.05 level.
administration of each group. The results of these 450 t tests produced only a total of 17 differences that reached or exceeded the .05 level. Since 23 differences could be expected to occur by chance at the .OS level, there is no reason to believe that the two versions of the SC11 yielded any real differences. However, because it may be useful to replicate these findings and be sure that certain scales are not affected by mode of administration, the scales that did yield differences at the .OS level are reported in Table 4. For the first adminis~a~on only 10 of the 150 scales of the SCII were significantly different, with all of the means in the CB group being larger. The results for the second administration yielded 7 scales that were found to be statis~cally different. Four of these were also different on the first adminis~ation, and all but one of the differences favored the CB version. The results of the dependent t tests yielded no statistical differences between means from the first to the second administration. The sixth research question concerned the difference among the variances. The data were further analyzed by computing independent Fmax tests between the PP and CB groups for the first and second administrations. The results of this analysis indicated that one of the occupational scales from the first administration, Accountant, was si~i~cantly different (Frn,, = 2.69, p < .OS). Fu~e~ore, this same scale along with one other occupational scale was found to be statistically different on the second administration (Accountant, F,,, = 2.94, p < .Ol; Occupational Therapist, Frnax = 2.63, p < .OS). A dependent t test for variances was computed on all of the scales between the first and second adminis~ations. This analysis yielded no statistical differences. The final research question addressed by this study concerned differences in time of adminis~ation. Each adminis~ation was timed, and each individual’s time to complete the instrument was recorded. The means and standard deviations for time of administration by group are presented in Table 5. The time of administration for the SC11 significantly favored the CB versions on both administrations. Furthermore, the time of administration for the retest of the CB group was significantly less (t = -3.83, p < .OOl), while the PP group showed no difference in time of administration from Time 1 to Time 2.
DISCUSSION
One of the problems facing psychologists, counselors, and educators is that of the rapid growth of technology in the area of psychological testing without the benefit
447
Strong-Campbell Interest Inventory Table 4. Signifkant
Independent
t Tests for the SCII by Group and Scale ComputerBased (N = 27)
PaperPencil (N= 25) R
Administration/Scales
K
SD
t
SD
First Administration General Occupational Themes Investigative Conventional
43.24 46.52
9.61 10.31
48.37 54.11
9.01 10.58
-1.98’ -2.61
Basic Interest Scales Mathematics Medical Science Merchandising
42.36 44.16 47.00
9.22 12.29 11.97
51.26 50.67 54.52
10.29 10.92 10.30
-3.27” -2.01’ -2.41**
Occupational Scales Dietitian Purchasing Agent Reaftor IRS Agent Dietitian
29.64 35.12 31.56 33.76 29.64
12.51 13.06 14.45 12.55 12.51
35.81 43.74 39.93 40.93 35.81
9.08 13.10 14.82 13.43 9.08
-2.02” -2.37 -2.06’ -1.98’ -2.02’
l
Second Administration General Occupational Themes Investigative Conventional
42.12 46.00
10.84 10.49
48.93 53.44
9.60 11.78
-2.38’ -2.41’
Basic Interest Scales Mathematics Medical Science Sales
42.44 44.76 51 .a4
10.05 12.43 11.06
50.93 51.81 58.37
10.77 10.22 11.90
-2.93’ -2.22’ -2.05’
Occupational Scales Optometrist Artist, Fine
28.28 23.12
12.40 12.67
35.15 15.07
11.71 12.29
-2.05’ -2.32’
-
lt significant at the p < .05 level. l*t significant at the p-z .Ol level.
of prior research as to the effects of the technology on the constructs being measured. Many professional associations have expressed concerns about computerbased tests and interpretations in their guidelines and ethical standards. As with all use of tests, it is the practitioner who is ultimately responsible for the proper use and interpretation of the results. Table 5. Independent and Dependent t Tests for Time of Administration for the SCII by Group and Test-Retest PaperPencil (N = 25)
First Administration Second Administration Dependent
ComputerBased (N= 27)
t
x
SD
x
SD
Ind.
30.76 30.08
5.65 6.09
22.37 17.88
5.72 2.92
5.96’ 9.07’
t
Note. All times recorded in minutes.
‘t significant at the p < .OOOl level.
-1.58
-3.83’
448
Vansickle and Kapes
The current study, as a replication and extension of the previous Vansickle et al. (1989) findings, provides strong empirical evidence that computer-based testing has advantages and that equivalence of form for the SC11 is somewhat established. The advantages are that the computer version appears to be more reliable and takes less time, while it is equivalent in the sense that it does not yield scores that are different. Given these findings, there is every reason for the practitioner to be assured that the CB version of the SCII can be interpreted using the PP version’s normative data, and that the results are more reliable and stable when using the CB version. Furthermore, the CB version is significantly faster to administer than is the PP version. The study also raises a number of new questions that need to be answered. Among these are: 1. Do these findings generalize to other computer-based inst~ments? 2. Are certain scales on an interest inventory affected more than others when PP ins~uments are converted to CB mode? 3. Do the findings of greater reliability and speed generalize to other ins~ments, both affective and cognitive (i.e., personality, aptitude, and achievement)? 4. What is it about CB instruments that make them more reliable and faster? The use of the computer-based SC11 instead of the paper-pencil version can be recommended because of both increased reliability and shorter adminis~ation time. However, the equivalence of mode of administration should not be assumed for any computer-based instrument unless there is evidence presented by the test author or publisher. Furthermore, if no evidence exists for equivalence, users should conduct a study using local data.
REFERENCES Ame~can Association for Counseling and ~~velop~nr~ (1988). Ethical standards of the American Association for Counseling and Development. Journal of Counseling and development, 67(l), 4-8.
American Psychological Association. (1986). Guidelines for computer-based tests and interpretations. Washington, DC: Author. Butcher, J. N. (1987). The use of computers jn psychological assessment: An overview of practices and issues. In J. N. Butcher (Ed.), Computer psyclzo~gicuL ~sessments (pp. 3-14). New York: Basic Books. Cam A. C., Wilson, S. L., Ghosh, A., Ancill, R. J., & Woods, R. T. (1982). Automated testing of geriatric patients using a ~crocomputer-based system. Ifzter~zatioizalJournal of ban-machine Studies, 17,297-300. Hakstian, A. R., & Whalen, T. E. (1976). A k-sample significance test for independent alpha coefticients. Psychometrika, 41, 2 19-23 1. Hansen, J. C., & Campbell, D. H. (1985). Manualfor the WI&SCII (4th ed.). Stanford, CA: Stanford University Press. Holland, J. L. (1973). faking vocational choices: A theory of careers. ~nglewood Cliffs, NJ: Prentice-Hall. Katz, L., & Dalby, J. T. (1981a). Computer and manual administration of the Eysenck Personality Inventory. Journal of Clinical P.~yc~logy, 37,587-%X8. Katz, L., & Dalby, J. T. (1981b). Computer-assisted and traditional assessment of elements-sch#laged children. Contemporary Educational Psychology, 6,314-322. Lushene, R. E., O’Neil, H. F., & Dunn, T. (1974). Equivalent validity of a completely computerized MMPI. Journal of Personality Assessment, 38,353-361. O’Shea, A. J. (1987, April). A comparison of microcomputer and paper-pencil career interesf inventory administration. Paper presented at the annual meeting of the American Association for Counseling and Development, New Orleans.
Strong-Campbell Interest Inventory
449
Reardon, R., & Loughead, T. (1988). A comparison of the paper-and-pencil and computer versions of the Self-Directed Search. Journal of Counseling and Development, 67,249-252. Scissons, E. H. (1976). Computer administration of the California Psychological Inventory. Measurement and Evaluation in Guidance, 9,24-30.
Vansickle, T. R., Kimmel, C., & Kapes, J. T. (1989). Test-retest equivalency of the computer-based and paper-pencil versions of the Strong-Campbell Interest Inventory. Measurement and Evaluation in Counseling and Development, 22,88-93.
Watts, K., Baddeley, A., & Williams, M. (1982). Automated tailored testing using Raven’s matrices and the Mill Hill vocabulary test: A comparison with manual administration. International Journal of Man-Machine Studies, 17,331-344.
Wilson, S. L., Thompson, J. A., & Wylie, G. (1982). Automated psychological testing for the severely physically handicapped. International Journal of Man-Machine Studies, 17, 291-296.