clerical speed

clerical speed

Computers in Human Behavior 17 (2001) 111±124 www.elsevier.com/locate/comphumbeh Developing a computerized test of perceptual/ clerical speed S. Park...

121KB Sizes 0 Downloads 40 Views

Computers in Human Behavior 17 (2001) 111±124 www.elsevier.com/locate/comphumbeh

Developing a computerized test of perceptual/ clerical speed S. Parks, A. Bartlett, A. Wickham, B. Myors * Department of Psychology, Macquarie University, Sydney, Australia 2109

Abstract Paper-and-pencil tests of perceptual/clerical speed have been used to select clerical sta€ for decades, however clerical work in the modern oce environment involves the use of personal computers which did not exist when these tests were originally developed. A need therefore exists to update the established predictors of clerical work behaviors. One simple solution is to computerize the traditional pencil-and-paper tests, and this option was investigated in the current study in which a computerized perceptual/clerical speed test (canceling t's and e's) was compared to a paper-and-pencil form. Twenty di€erent versions of the test were developed, varying type of text, feedback and user friendliness. Results from 43 participants demonstrated the presence of a very strong common factor of speediness among all forms of the test in addition to a second factor capturing unique method variance. Although closely related, the two forms were not equivalent. Type of text and the presence of feedback was also found to a€ect performance. Implications of these ®ndings for the creation of new predictors of job performance will be discussed. # 2001 Elsevier Science Ltd. All rights reserved. Keywords: Computerized testing; Speeded tests

1. Introduction Tests of perceptual/clerical speed and accuracy tests have been widely used as predictors of basic oce and administrative skills for many years (Murphy & Davidshofer, 1998, pp. 412±413). Traditionally administered in paper-and-pencil

* Corresponding author. E-mail address: [email protected] (B. Myors). 0747-5632/01/$ - see front matter # 2001 Elsevier Science Ltd. All rights reserved. PII: S0747-5632(00)00031-5

112

S. Parks et al. / Computers in Human Behavior 17 (2001) 111±124

format, these tests are characterized by items of uniformly low levels of diculty enabling all participants to score 100% if given sucient time (Gregory, 1996). However, a strict time limit is imposed with the dependent variable being the number of items successfully completed, re¯ecting the speed with which the test taker worked through the items. Typical speeded tests include: comparing pairs of numbers or names, substituting pairs of symbols from a list or canceling particular letters on a page. With the advent of oce technology during the last twenty years, it is not clear that these paper-and-pencil tests have maintained their predictive validity; certainly they have lower face validity in the modern oce environment. One response to this problem is to develop computerized versions of these tests. Besides greater relevance to the modern work environment, the practical bene®ts of computerized testing are numerous. Advantages include: ease of standardization and administration (Bunderson, Inouye & Olsen, 1989), immediate and more accurate scoring (Pryor, 1989), reduced testing time, the ability to provide immediate feedback (Wise & Plake, 1989), and the ability to collect test-taking time and response latency information (Olsen, Maynes, Slawson & Ho, 1989). Thus, it will become increasingly important to determine whether computer tests produce equivalent results to traditional modes of testing. The advantages of computerized testing are reduced if they are less valid than their conventional paper-and-pencil counterparts. Strict equivalence is a necessity if paper-and-pencil norms are to be used to interpret the computerized results, however, if new norms are developed for the computerized versions, strict equivalence is unnecessary. In order for tests to be considered equivalent, Van de Vijer and Harsveld (1994, p. 852) state that the ``rank orders of scores of individuals must closely approximate each other and the means, dispersions and shapes of the score distributions should be approximately the same''. Although strict equivalence may be dicult to achieve, convergent validity (i.e. high correlation between the two modes of presentation) may be attainable. Several studies have examined the equivalence of paper-and-pencil and computerized tests. Wise and Wise (1987) found a non-signi®cant e€ect for mode of presentation when using achievement tests, indicating that participants performed equally well irrespective of whether they experienced the traditional or computerized format. Similarly, Ward, Hooper and Hanna®n (1989) compared a computerized and traditional multiple choice test and found no signi®cant di€erences between the test performance of the two groups. However, Chin, Donn and Conry (1991) used science achievement tests and noted that scores between the paper-and-pencil and computerized versions were not equivalent. Participants who received the computerized version of the test scored signi®cantly higher. In contrast, Lee, Moreno and Sympson (1986) found that participants who received the paper-and-pencil version of an arithmetic test scored signi®cantly higher than those who were given the computerized version. Consistent with this ®nding, a review by Mazzeo and Harvey (1988) highlighted that for the majority of studies reviewed, the mean scores for tests administered on the computer were lower, but not signi®cantly so, compared to scores on paper-and-pencil tests.

S. Parks et al. / Computers in Human Behavior 17 (2001) 111±124

113

1.1. Speeded tests Mazzeo and Harvey (1988) also noted that ®ndings concerning non-speeded tests (e.g. knowledge or ability tests) seemed to di€er from those of speeded tests. They argued that responding on computer facilitated performance on speeded tasks. In a meta-analysis investigating this issue, Mead and Drasgow (1993) found a crossmode correlation of 0.72 between paper-and-pencil and computerized versions of speeded tests compared to a correlation of 0.97 for timed power tests of knowledge and ability. Greaud and Green (1986) found that participants were signi®cantly faster on speeded tests administered by computer compared to paper-and-pencil mode. This occurred despite there being no signi®cant di€erences in reliability between the computer and paper-and-pencil versions. Van de Vijer and Harsveld (1994) also found participants were faster to respond to computerized versions of a battery of speeded tests. Most pertinent to the current article is the statement by Van de Vijer and Harsveld (1994, p. 858) that ``simple clerical tests are more a€ected by computerization as compared to complex tests''. One reason for this may be due to the simple nature of speeded tasks, with slight di€erences between the method of response leading to large di€erences in scores. This was demonstrated by Greaud and Green (1986) who found that between mode correlations ranged from 0.28 to 0.61, indicating that speeded tasks did in fact depend heavily on mode of presentation. For these reasons, it is important to establish the construct validity of any computerized speed tests. 1.2. Age Studies of speeded and computerized tests implicate a number of other variables that need to be taken into account. Signi®cant negative correlations between age and speed measures is a robust ®nding that has been well documented in the literature (Rabbit & Goward, 1994; Salthouse, 1985; Stankov, 1994). As such, studies of speeded tasks need to control for age-related e€ects. 1.3. Anxiety Level of anxiety is also likely to play a role in the tasks used in the current study. Results by Heinssen, Glass and Knight (1987) and Llabre, Clements, Fitzhugh, Lancelotta, Mazzagatti and Quinones (1987) provide evidence that participants with high levels of computer anxiety perform more poorly on the computerized versions of tests compared to paper-and-pencil versions. In order to provide a measure of anxiety, Heinssen et al. (1987) developed the Computer Anxiety Rating Scale (CARS) and demonstrated that higher levels of computer anxiety reduced participants' con®dence in their ability to perform on computerized tests. Wise, Barnes, Harvey and Plake (1989) found that participants with high levels of computer anxiety exhibited higher levels of post-test anxiety than those with low levels of computer anxiety. However, there was no

114

S. Parks et al. / Computers in Human Behavior 17 (2001) 111±124

di€erence between scores when the test was administrated by the computer compared to the traditional mode. 1.4. Feedback Another variable that has been found to in¯uence performance on many tasks is feedback. Two types of feedback have been identi®ed in the literature, internal feedback and external feedback, and each has been shown to be important. Internal feedback refers to immediate changes that occur in the environment as a result of responding, such as seeing a stroke through a letter immediately after crossing it out, whereas external feedback refers to evaluative information received after the event, such as a compliment from an instructor. A great deal of research has documented the facilitating e€ects of external feedback (Goodman, 1998; Morris & Fulmer, 1976), but relatively few studies have examined internal feedback. Unless individually administered, the level of external feedback in paper-andpencil tests is usually fairly low because they are not scored until after the test taker has ®nished. In contrast, computerized tests can be scored as the participant progresses through the items with feedback provided as a matter of routine. Wise and Plake (1989) suggested that this would allow participants to monitor their own test performance and do better on the task. Morris and Fulmer (1976) con®rmed this suggestion, ®nding that feedback had a positive e€ect on performance, with participants reporting lower levels of anxiety from pre-exam to post-exam. Yet, Wise, Plake, Pozehl, Barnes and Lukin (1989) found that for participants who ®rst received dicult items, along with item feedback and a running total of number of items correct or incorrect, anxiety levels increased; however, this did not a€ect performance. Similarly, Wise, Plake, Eastman, Boettcher and Lukin (1986) found that when items were presented in order of diculty, feedback had little or no e€ect on test performance. Rocklin and Thompson (1985) have proposed a curvilinear relationship between test anxiety and performance, with feedback serving to accentuate the e€ects of success or failure. They suggested that for dicult tests, feedback raises participants' anxiety levels and reduces performance, while for easy tests it lowers anxiety levels and facilitates performance. The con¯icting ®ndings regarding the dynamics of feedback suggest that it may be another factor that a€ects the equivalence of paperand-pencil and computerized tests. In general, however, we would expect to ®nd a facilitatory e€ect in relatively easy speeded tasks. 1.5. The current study The above review indicates that similarities between paper-and-pencil and computer versions of tests cannot be taken for granted. This is especially true of speeded tests in which the psychomotor aspects of performance under the two modes of presentation di€er. The e€ect of using a computer input device such as a keyboard or mouse, as opposed to a pen or pencil, is not clear. As such, the current study aimed to determine the conditions under which computerization produced the

S. Parks et al. / Computers in Human Behavior 17 (2001) 111±124

115

highest convergent validity between a paper-and-pencil test of perceptual/clerical speed and a computerized version. The study used a letter cancellation, or checking task which involved participants canceling all the t's and e's printed on a page or on the computer screen. Considering the suggestions of Corcoran (1966) and Reicher (1969) that content and context of stimulus may a€ect speed of processing, the current study manipulated word content. The e€ects of feedback and the user friendliness of the computer interface were also investigated. 2. Method 2.1. Participants A sample of 43 participants, aged between 16 and 59 years (mean=30.5, S.D.=12.9), were recruited to take part in the study. These participants were drawn from the friends, acquaintances and work mates of the authors in the hope that they would be more representative of the general population than using students. Participants were informed that the purpose of the study was to compare their performance on a computerized version of the task with their performance on a more traditional paper-and-pencil version. Twenty-®ve participants were male and 18 were female. Twelve participants described their occupation as ``engineer or computer programmer''; 18 were classi®ed as ``other professionals'' such as accountants, teachers or sales agents; 10 were students and three were engaged in home duties. On average, participants had between two and ®ve years experience using a computer, although nine said they were infrequent users. Word processing was the most common task performed (58%) followed by data analysis (12%). Seventy-nine percent of participants said that they were experienced in using a mouse. 2.2. Materials Stimuli consisted of 18 lines of text printed on paper or presented on the computer screen. In paper-and-pencil mode, participants had to cross out as many t's or e's as they could within a one minute time limit using a pen. In the computerized mode, participants had to move the mouse cursor and click on as many t's or e's as they could on the screen within the time limit. Participants were instructed to ignore the case of the letters. Four di€erent forms of the test were constructed for paper-and-pencil mode by varying the type of text to be searched. Text varied between poetry, normal prose, non-words and random letters. The poetry consisted of several stanzas from Douglas Stuart's poem `Rutherford' (Stuart, 1973) which relates the story of splitting the atom in the early part of the 20th century. Normal prose and non-words were taken from paragraphs in Microsoft Encarta's entry on Chemistry (Microsoft, 1994). Nonwords were created by randomly rearranging the vowels in each word. These were

116

S. Parks et al. / Computers in Human Behavior 17 (2001) 111±124

subsequently checked to ensure that the rearrangement did not produce any real words. Random letters were produced by putting normal text through the ROT-13 encryption system. The computerized versions were constructed in a 422 factorial arrangement: four types of text, as described above; the presence or absence of internal feedback; and arrow mouse versus block mouse. Internal feedback involved the correctly clicked t or e lighting up in yellow. In the non-feedback conditions, the target letter did not change color. The arrow mouse versus block mouse distinction can also be thought of as a ``user friendliness'' manipulation. The arrow mouse condition involved the text being presented on the screen in graphics mode in which the mouse cursor looked like an arrow and moved smoothly across the screen. This is the familiar Windows mouse pointer. In the block mouse condition, text was presented in text mode and the mouse cursor was a solid block of color that jerked noticeably as it moved between lines and columns. This was the older style DOS mouse pointer. Note that the jerkiness of this pointer is due to the low resolution of the 80 column by 25 line textmode display rather than to computer processing speed. Prior to completing the experimental tasks, participants were administered the CARS (Heinssen, Glass & Knight, 1987) and a short demographic questionnaire that also contained questions about their computer familiarity and use. 2.3. Design There were 20 conditions altogether, four paper-and-pencil conditions and 16 computer conditions. Each participant performed all tests with the order of presentation randomized between subjects to control for order e€ects. The computer prompted the experimenter to administer a pencil-and-paper condition at random and took care of the timing for these conditions. The dependent variable was the total number of t's and e's correctly checked within the 1-min time limit. Previous unpublished research in our laboratory found a high correlation between the number of t's checked on computer in 1 and 2 min (r=0.82). Therefore we considered 1 min would be sucient time to measure perceptual/clerical speed in each condition of the current study. Testing was carried out on a range of computers at the author's or participant's homes or places of work. The speed of computers used ranged between a 50 MHz 80486 and a 233 MHz Pentium. Since all tests were run in raw DOS mode, even the slowest computer was more than fast enough to accurately time the 1-min intervals in each condition (Myors, 1999). Timing was performed using the standard PC system timer which has an accuracy of 55 ms. 3. Results Participants' had a mean CARS score of 38.4 (S.D.=8.5). This was signi®cantly lower than that reported by Heinssen, Glass and Knight (1987) for their development

S. Parks et al. / Computers in Human Behavior 17 (2001) 111±124

117

sample (t(42)=ÿ4.00, P<0.05) indicating that our sample had relatively low levels of computer anxiety, possibly re¯ecting the proliferation of microcomputers since CARS was originally developed. 3.1. Analysis of mean e€ects Table 1 shows the means and standard deviations of the total number of correctly checked t's and e's in each condition. It is readily apparent from Table 1 that participants were faster in the paper-and-pencil conditions compared to the computerized conditions. Comparing the average score from the paper-and-pencil conditions with the average score from the computerized conditions reveals this to be the case (t(42)=13.28, P<0.05). This ®nding di€ers from that of Greaud and Green (1986), described above. All subsequent analyses of mean e€ects were carried out using the GLM repeated measures procedure of the SPSS statistical package with age and CARS score used as covariates. A signi®cant e€ect for type of text was found within the paper-andpencil conditions with participants being signi®cantly faster at checking poetry Table 1 Descriptive statistics for the total number of t's and e's correctly checked in each condition Condition

Mean

S.D.

Paper-and-pencil tests Poetry Normal prose Non-words Random letters

63.74 72.80 53.21 32.40

15.27 14.31 9.04 9.52

40.93 45.80 48.50 43.65

12.43 13.40 15.12 12.62

46.90 40.72 44.63 39.70

12.45 11.90 12.34 11.88

39.30 36.30 33.72 37.47

11.32 11.14 10.50 9.51

26.95 28.02 29.02 29.07

8.85 8.50 9.57 7.91

Computerized tests Poetry Block mouse, feedback Block mouse, non-feedback Arrow mouse, feedback Arrow mouse, non-feedback Normal prose Block mouse, feedback Block mouse, non-feedback Arrow mouse, feedback Arrow mouse, non-feedback Non-words Block mouse, feedback Block mouse, non-feedback Arrow mouse, feedback Arrow mouse, non-feedback Random letters Block mouse, feedback Block mouse, non-feedback Arrow mouse, feedback Arrow mouse, non-feedback

118

S. Parks et al. / Computers in Human Behavior 17 (2001) 111±124

compared to non-words (F(1, 40)=9.33, P<0.05) and at checking non-words compared to random letters (F(1, 40)=26.99, P<0.05). Poetry and normal prose did not di€er signi®cantly. A signi®cant main e€ect for type of text was also found in the computer conditions. In this case, checking poetry was signi®cantly faster than checking normal prose (F(1, 40)=14.58, P<0.05), checking poetry was also faster than checking nonwords (F(1, 40)=30.50, P<0.05), and checking non-words were signi®cantly faster than checking random letters (F(1, 40)=58.26, P<0.05). Neither the main e€ects for feedback nor mouse type were signi®cant. Interactions were examined on a post hoc basis and found to be signi®cant for type of text by mouse type (F(3, 38)=3.22, P<0.05) and type of text by feedback (F(3, 38)=4.38, P<0.05). Examination of the marginal means revealed that the block mouse had a slightly enhancing e€ect for normal prose and non-words and a slightly debilitating e€ect in the poetry condition. This interaction is dicult to interpret but was not particularly strong. The type of text by feedback interaction was much more evident, however. Internal feedback had an enhancing e€ect for normal prose, but no e€ect on any of the other types of text. 3.2. Analysis of correlations Given that this research is primarily motivated by the search for convergent validity, intercorrelations between the 20 conditions are perhaps most pertinent to the main aims of this paper. In particular, we were looking for the condition, or conditions, among the computerized tests that correlated most highly with one or all of the paper-and-pencil tests. If all the conditions were suciently similar, we would also expect a general ``speediness'' factor to emerge. The correlation between the average score from the four paper-and-pencil tests and the average of the 16 computerized tests was moderately high, 0.68, and is statistically signi®cant. This is fairly close to Mead and Drasgow's (1993) meta-analytic correlation of 0.72 between paper-and-pencil and computerized tests of speed but is not high enough for the forms to be considered parallel. The intercorrelations among the four paper-and-pencil tests ranged between 0.68 and 0.84 with an average of 0.75, and the intercorrelations among the 16 computerized tests ranged between 0.77 and 0.94 with an average of 0.87. The reason we studied so many variations in condition was to determine which combination of independent variables produced the strongest association between the two modes of presentation. In this regard, the highest raw correlation between a paper-and-pencil test and a computerized test was of 0.70 between poetry presented on paper and normal words with an arrow mouse and no feedback presented by computer. The next largest correlation was 0.69 for the same computer condition and normal text printed on paper. To examine the relationships among the di€erent conditions more closely, principal components analysis was used. This is in line with Mead and Drasgow's (1993, p. 457) call for a greater use of factor analytic methods in investigating computerization issues. Although our sample size was relatively small, we can interpret this

S. Parks et al. / Computers in Human Behavior 17 (2001) 111±124

119

analysis because the components revealed were so strong and clear (Stevens, 1996, p. 372). The lowest correlation in the matrix was 0.31 and the average was 0.76. Bartlett's test of sphericity indicated that the matrix was non-null (w2(190)=13332.7, P<0.05). Principal components analysis of the covariance matrix among the 20 conditions revealed the presence of a general speediness factor, but there was also a second principal component with an eigenvalue greater than one. The ®rst principal component accounted for 79% of the variance and the second accounted for only 9% of the variance for a total of almost 90% of the variance explained. The smallest communality was 0.72. Component loadings are shown in Table 2. All loadings on the ®rst principal component are highly positive, indicating high convergent validity overall. We interpret this very clear common factor as perceptual/clerical speed. It is also evident that the ®rst principal component re¯ects the computer tests more than the paper-and-pencil tests which load more strongly on the second principal

Table 2 Principal components analysis 1st PCa

2nd PCa

0.524 0.601 0.565 0.635

0.778 0.828 0.666 0.745

0.736 0.718 0.754 0.707

0.954 0.944 0.894 0.958

ÿ0.125 ÿ0.177 ÿ0.286 ÿ0.118

0.933 0.916 0.836 0.938

0.240 0.162 0.046 0.229

0.952 0.933 0.944 0.951

ÿ0.140 ÿ0.121 ÿ0.131 ÿ0.068

0.927 0.911 0.915 0.927

0.202 0.156 0.215 0.222

0.955 0.902 0.922 0.944

ÿ0.125 ÿ0.213 ÿ0.047 ÿ0.110

0.937 0.855 0.902 0.924

0.161 0.049 0.252 0.285

0.921 0.901 0.891 0.887

ÿ0.003 ÿ0.063 0.038 ÿ0.076

0.911 0.880 0.869 0.855

0.266 0.189 0.319 0.198

Condition

1st PC

2nd PC

Paper-and-pencil tests Poetry Normal prose Non-words Random letters

0.778 0.742 0.632 0.574

Computerized tests Poetry Block mouse, feedback Block mouse, non-feedback Arrow mouse, feedback Arrow mouse, non-feedback Normal prose Block mouse, feedback Block mouse, non-feedback Arrow mouse, feedback Arrow mouse, non-feedback Non-words Block mouse, feedback Block mouse, non-feedback Arrow mouse, feedback Arrow mouse, non-feedback Random letters Block mouse, feedback Block mouse, non-feedback Arrow mouse, feedback Arrow mouse, non-feedback a

Age and anxiety partialled.

120

S. Parks et al. / Computers in Human Behavior 17 (2001) 111±124

component. Thus, even though there is considerable common variance among the 20 conditions, there remains a small but identi®able residue of method variance. The last two columns in Table 2 show the component loadings after partialling out the e€ects of computer anxiety and age. These controls had little impact on our interpretation of what was happening in the data, however. 3.3. Analysis of scan paths In the computer conditions, the computer kept track of the location (x and y coordinates) of each correctly clicked t or e. This allowed us to analyze the actual scan paths taken by participants in their search for the target letters. As a precursor to this analysis, a computer program was written to ``play back'' participants' performance by overlaying their scan path onto the original text. Visual inspection of these data revealed that two basic strategies were employed. Some participants took an orderly approach in which their scan paths followed a natural reading of the text, i.e. each screen was scanned left to right, top to bottom. In some cases the usual retrace from the end of one line to the beginning of the next was replaced by a backwards scan of the following line. This may have been slightly more ecient than the right to left saccade. Other participants used what might be described as a ``random walk'' strategy in which their scan path wandered around the screen vertically as much as it did horizontally. We concluded that a simple count of the number of changes in the x and y coordinates would di€erentiate between these two strategies. Close inspection of the data revealed that it was extremely rare to ®nd a line change without an associated column change (i.e. invariably the target letters were on di€erent columns although many of them were on the same line). That is, line and column changes did not appear to be independent so we computed an additional column change measure in which we only counted the column change if it was not accompanied by a line change. It was hoped that this would unconfound line and column change indicators and we denoted this new variable column*. In addition to the above measures, we also computed the total distance traveled by the scan path (``string length'') in each condition. Correlations between the four scan path measures and the total number of t's and e's found in each condition are shown in Table 3. As can be seen, all correlations are positive indicating that longer scan paths or more coordinate changes are, not surprisingly, associated with ®nding more t's and e's. Non feedback, poetry and normal prose seem to be associated with more column* changes, and random text seems to be associated with more line changes. This suggests that poetry and normal prose were more likely to induce the orderly strategy whereas random letters were more likely to result in a random walk. Raw column changes are almost perfectly associated with performance, but this is probably an over re¯ection of the importance of horizontal scanning. Column* changes provide a better basis for comparing the relative strengths of searching vertically or horizontally, and the higher correlations for column* changes indicate

S. Parks et al. / Computers in Human Behavior 17 (2001) 111±124

121

Table 3 Correlations between performance and measures of scan path Condition

Column changes

Line changes

String length

Column changesa

Poetry Block mouse, feedback Block mouse, non-feedback Arrow mouse, feedback Arrow mouse, non-feedback

0.998 0.998 0.995 0.996

0.586 0.327 0.453 0.478

0.599 0.549 0.649 0.585

0.813 0.906 0.887 0.934

Normal prose Block mouse, feedback Block mouse, non-feedback Arrow mouse, feedback Arrow mouse, non-feedback

0.999 0.999 0.996 0.999

0.465 0.546 0.443 0.493

0.618 0.791 0.544 0.648

0.900 0.984 0.837 0.914

Non-words Block mouse, feedback Block mouse, non-feedback Arrow mouse, feedback Arrow mouse, non-feedback

0.995 0.993 0.999 0.989

0.466 0.372 0.568 0.529

0.596 0.602 0.657 0.616

0.796 0.917 0.803 0.797

Random letters Block mouse, feedback Block mouse, non-feedback Arrow mouse, feedback Arrow mouse, non-feedback

0.995 0.998 0.998 1.000

0.601 0.600 0.664 0.493

0.557 0.606 0.547 0.725

0.721 0.851 0.830 0.923

Mean

0.998

0.590

0.609

0.831

a

Column changes are those not accompanied by a line change.

that horizontal scanning is superior to vertical scanning. This means that, in general, the orderly strategy was superior to the random walk strategy. 4. Discussion The main ®nding of this research was that paper-and-pencil and computerized versions of the letter cancellation task show reasonably high convergent validity, but they cannot be deemed parallel or equivalent. Evidence for this conclusion comes from the correlation between average performance on the paper-and-pencil tests and the computer tests and the high, positive loadings of all conditions on the ®rst principal component. Clearly, however, there was a second principal component closely associated with the paper-and-pencil tests, indicating that the two modes of presentation generate unique method variance in addition to their common variance. Although there were 16 di€erent versions of the computerized test, no single condition stood out as especially related to the paper-and-pencil versions. Similarly, no version was especially unrelated to the paper-and-pencil conditions. In fact, all computerized versions were more closely related to each other than they were to any

122

S. Parks et al. / Computers in Human Behavior 17 (2001) 111±124

of the paper-and-pencil versions. Mead and Drasgow (1993, p. 453) suggested that motor skills were the most likely cause of discrepancies between paper-and-pencil and computerized modes of presentation and our analysis concurs with this. Analysis of the mean e€ects showed that performance on the letter cancellation task can be a€ected by a number of simple manipulations. Corcoran (1966) and Reicher's (1969) suggestion that context in¯uences perceptual processing time was clearly supported by these data. Participants were especially slow to ®nd target letters embedded in random letter sequences and signi®cant di€erences were also found between the other types of text in both paper-and-pencil and computer modes of presentation. This could be due to implicit knowledge of the conditional probabilities of sequences of letters in ordinary language. Further, rhyme and meter may have in¯uenced performance in the poetry conditions. Interestingly, user friendliness (mouse type) was found to have little e€ect on performance within the computerized conditions. One explanation for this might be the characteristics of the sample. Most of our participants indicated a high degree of familiarity with using a mouse. Further, presence or absence of internal feedback moderated performance on the letter cancellation task. Internal feedback only facilitated performance in searching normal prose. This may be due to the ability of participants to capitalize on their knowledge of the conditional probability of letter sequences in standard English. A very clear ®nding was that participants were signi®cantly faster in the paperand-pencil mode than in the computerized mode. This ®nding contradicts some previous research on speeded tasks (Greaud & Green, 1986; Van de Vijer & Harsveld, 1994). The probable explanation for this discrepancy is that the previous studies used the keyboard as the input device whereas we used the mouse. Results of the current study suggest that use of the mouse does not facilitate performance on speeded tests. It is suggested that poorer performance on our computerized tests was probably due to relative diculty in maneuvering the mouse compared to using a pen. The task used in this study was designed to measure an individual's checking speed, not their motor ability. Perhaps a light pen or touch screen display would eliminate this problem. Wise and Plake (1989) stated that if the same time limit was used for both paperand-pencil and computerized versions of a test, the two versions would not be equivalent due to fundamentally di€erent speeds required to work with each medium. Our ®ndings support this argument, although we hesitate to recommend that the computerized version of checking should be lengthened to bring the mean scores into line with the paper-and-pencil versions. A better solution would be to develop new norms for the computerized version to re¯ect the di€erent means. The current study has demonstrated that a number of factors need to be considered when attempting to obtain convergent validity for computer-based tests of perceptual/clerical speed. The study illustrated that word content has an in¯uence on participants' speed of performance on a letter cancellation task. In computerizing the letter cancellation task, we recommend using normal prose in graphics mode, without feedback. This was a condition that held up well in all the analysis and has good face validity.

S. Parks et al. / Computers in Human Behavior 17 (2001) 111±124

123

This study has further highlighted the importance of using input devices with which participants are familiar. Further research is required to examine the e€ect of using input devices other than the mouse for this task. While determining the conditions that produce equivalent paper-and-pencil and computerized tests is not as simple as it ®rst appears, this study has aided understanding of the factors required to achieve acceptable levels of convergent validity. References Bunderson, C.V., Inouye, D.K., & Olsen, J.B. (1989). The four generations of computerized educational measurement. In Linn, R.L., Educational measurement (3rd ed.). New York: Macmillan. Chin, C. H. L., Donn, S., & Conry, R. F. (1991). E€ects of computer based tests on the achievement, anxiety, and attitudes of grade 10 science students. Educational and Psychological Measurement, 51, 735±745. Corcoran, D. W. (1966). Prediction of responses to multidimensional from responses to unidimensional stimuli. Journal of Experimental Psychology, 71, 47±54. Goodman, J. S. (1998). The interactive e€ects of task and external feedback on practice performance and learning. Organizational Behavior and Human Decision Processes, 76, 223±252. Greaud, V. A., & Green, B. F. (1986). Equivalence of conventional and computer presentation of speed tests. Applied Psychological Measurement, 10, 23±34. Gregory, R. J. (1996). Psychological testing, (2nd ed.). Needham Heights: Allyn and Bacon. Heinssen, R. K., Glass, C. R., & Knight, L. A. (1987). Assessing computer anxiety: development and validation of the Computer Anxiety Rating Scale. Computers in Human Behavior, 3, 49±59. Lee, J. A., Moreno, K. E., & Sympson, J. B. (1986). The e€ects of mode of test administration on test performance. Educational and Psychological Measurement, 46, 467±474. Llabre, M. M., Clements, N. E., Fitzhugh, K. B., Lancelotta, G., Mazzagatti, R. D., & Quinones, N. (1987). The e€ect of computer-administered testing on test anxiety and performance. Journal Educational Computing Research, 3, 429±433. Mead, A. D., & Drasgow, F. (1993). Equivalence of computerized and paper and pencil cognitive ability tests: a meta-analysis. Psychological Bulletin, 114, 449±458. Microsoft (1994). Chemistry. Washington: Encarta, Microsoft Corporation. Morris, L. W., & Fulmer, R. S. (1976). Test anxiety (worry and emotionality) changes during academic testing as a function of feedback and test importance. Journal of Educational Psychology, 68, 817±824. Murphy, K. R., & Davidshofer, C. O. (1998). Psychological testing: principles and applications (4th ed.). New Jersey: Prentice Hall. Myors, B. (1999). Timing accuracy of PC programs running under DOS and Windows. Behavior, Research Methods, Instruments and Computers, 31, 322±328. Olsen, J. B., Maynes, D. D., Slawson, D., & Ho, K. (1989). Comparisons of paper administered, computer administered and computerized adaptive achievement tests. Journal of Educational Computing Research, 5, 311±326. Pryor, R. G. L. (1989). Some ethical implications of computer technology. Bulletin of the Australian Psychological Society, November, 164±166. Rabbitt, P. M. A., & Goward, L. (1994). Age, information processing speed, and intelligence. The Quarterly Journal of Experimental Psychology, 47A, 741±760. Reicher, G. M. (1969). Perceptual recognition as a function of meaningfulness of stimulus material. Journal of Experimental Psychology, 81, 275±280. Rocklin, T., & Thompson, J. M. (1985). Interactive e€ects of test anxiety, test diculty and feedback. Journal of Educational Psychology, 77, 368±372. Salthouse, T. A. (1985). A theory of cognitive aging. New Jersey: Prentice Hall. Stankov, L. (1994). The complexity e€ect phenomenon is an epiphenomenon of age related ¯uid intelligence decline. Personality and Individual Di€erences, 16, 265±288.

124

S. Parks et al. / Computers in Human Behavior 17 (2001) 111±124

Stevens, J. (1996). Applied multivariate statistics for the social sciences (3rd ed.). New Jersey: Erlbaum. Stuart, D. (1973). Selected works. Sydney: Angus and Robertson. Van de Vijer, F. J. R., & Harsveld, M. (1994). The incomplete equivalence of the paper and pencil and computerized versions of the general aptitude test battery. Journal of Applied Psychology, 79, 852±859. Ward, T. J., Hooper, S. R., & Hanna®n, K. M. (1989). The e€ect of computerized tests on the performance and attitudes of college students. Journal of Educational and Computing Research, 5, 327±333. Wise, S. L., Barnes, L. B., Harvey, A., & Plake, B. S. (1989). E€ects of computer anxiety and computer experience on the computer based achievement test performance of college students. Applied Measurement in Education, 2, 235±241. Wise, S. L., & Plake, B. S. (1989). Research on the e€ects of administering tests via computers. Educational Measurement: Issues and Practices, Fall, 5±10. Wise, S. L., Plake, B. S., Eastman, L. A., Boettcher, L. L., & Lukin, M. E. (1986). The e€ects of item feedback and examinee control on test performance and anxiety in a computer administered test. Computers in Human Behavior, 2, 21±29. Wise, S. L., Plake, B. S., Pozehl, B. J., Barnes, L. B., & Lukin, L. E. (1989). Providing item feedback in computer based tests: e€ects of initial success and failure. Educational and Psychological Measurement, 49, 479±486. Wise, S. L., & Wise, L. A. (1987). Comparison of computer administered and paper administered achievement tests with elementary school children. Computers in Human Behavior, 3, 15±20.