An investigation of data entry methods with a personal computer

An investigation of data entry methods with a personal computer

COMPUTERS AND BIOMEDICAL RESEARCH 19, 543-550 (1986) An Investigation of Data Entry Methods with a Personal Computer I. K. CROMBIE AND J. M. IRVI...

457KB Sizes 16 Downloads 56 Views

COMPUTERS

AND

BIOMEDICAL

RESEARCH

19, 543-550 (1986)

An Investigation of Data Entry Methods with a Personal Computer I. K. CROMBIE AND J. M. IRVING Cardiovascular

Epidemiology Research Unit, Ninewells Hospital and Medical School. Dundee DDl 9s Y, Scotland Received January 30, 1986

The features of three methods of data entry were investigated and a trial of them was carried out using operators with differing amounts of keyboard experience. The first method was a simple system of character entry using a word processing package, and the second and third systems were written in a commercial data base language; one was designed to possess limited intelligence, the other moderate intelligence. The amount of time and computing expertise required to set up each method increased in parallel with its sophistication. The most sophisticated method offered theoretical advantages of reduced error rates and increased keying rates but these were not realized in practice. The limited intelligence method, in which data were entered into a screen image of the record form, was fastest overall and was most popular with all users. This method together with verification by dual keying will provide a convenient, rapid, and high fidelity method of data entry. The simplest method was found to be adequate for short forms but incurred high error rates with longer fOrtl’IS.

0 1986 Academic

Press, Inc.

INTRODUCTION

The advent of comparatively inexpensive microcomputers has been followed by the development of specialized software packages for data entry and management (1,2). Once captured, data can be analyzed on the microcomputer or transferred to a mainframe computer for analysis. The value of such systems for processing medical records is well recognized (3, 4, 5) but the technology has much wider applications. It is now possible for small research groups with limited computing skills to set up their own data entry systems, rather than relying on data processing bureaux (6, 7). The available data entry packages vary in sophistication, but the question is which features are of value: the ideal would be a package which was simple to set up but which provided both high speed and high fidelity of data entry. This paper presents an investigation of three methods of data entry which differed greatly in the amount of on-line intelligence they provide. The study utilized several subjects with different levels of experience of data entry. The object was to assess the value of the different features offered by the methods. 543 0010-4809/86 $3.00 Copyright 8 1986 by Academic Press, Inc. All rights of reproduction in any form reserved.

544

CROMBIE

AND

IRVING

From these it was hoped to identify the method which was acceptable provided rapid accurate data entry for a wide range of users.

to and

METHODS

Data were entered using a Sirius microcomputer; the Sirius is a mid-priced 16-bit business microcomputer. The record form used for the main study (Tables l-5) was a two-page questionnaire on schoolchildrens’ smoking habits previously described by Fee et al. (8) comprising 22 fields or 47 characters. To investigate the effect of record length two additional records, one containing 32 fields and 73 characters, the second a much longer record of 215 fields and 327 characters were also used (Table 6). Three input methods were used. Method l-Low intelligence: Data were entered as a string of digits with a separate row for each record using a visual text editor. This method used the commercial word-processing package Wordstar which allows the entry of any characters in any position on the microcomputer screen. The user can alter any field of any previously keyed record. Method 2-Limited intelligence: Data were entered onto a screen image of the original record form, each field being completed in sequence with the appropriate number of digits. The software for this method was written using the commercial data base language dBase II for the input of the short record form and the data entry package Datastar for the longer records. The user can alter any field of the current record, and any nonnumeric characters are rejected by the system. Method 3-Moderate intelligence: Data were entered sequentially in response to screen prompts with the user being able only to modify the current field. Data validation using the type of range and logic criteria described for a mainframe computer by Wolfenson and Worth (9) was carried out during data entry. The software for this method was also written using the commercial data base language dBase II. Users The users were female and for the main study comprised two research assistants with limited word processing experience (I and II), two secretaries expert at typing and word processing (III and IV), and two experienced data entry clerks (V and VI). For the longer records only two users (I and II) were employed. RESULTS

I

Features of the Methods

The three data entry methods were chosen to cover the range of on-line intelligence currently available with software packages for microcomputers and their features are shown in Table 1. The amount of intelligence available ranged

DATA ENTRY WITH PERSONAL TABLE ERROR DETECTION

1

AND SCREEN EDITING

OF THE DATA ENTRY

On-line intelligence level Character type checking Range checking Logical checking Ability to skip inappropriate On-screen editing

fields

545

COMPUTERS

FACILITIES

METHODS

Method 1

Method 2

Method 3

Low

Limited

Moderate

No No No No All records,

Yes No/Yes” No No Current record only

Yes Yes Yes Yes Current field only

and fields

a The system written under dBASE II permits full range checking but that under Datastar does not.

from no checking, or checking of character type only, to character type, range, and logical checks. A theoretical advantage of the most intelligent method is its ability to skip inappropriate fields. In the present study which used a questionnaire on smoking in schoolchildren this facility enabled questions on smoking habits to be skipped for those declaring themselves nonsmokers. This brought about a considerable reduction in the number of key strokes required to input the record form. For smokers the saving was only 9.4% but for nonsmokers it was 35.1% giving an average saving for the whole data set (comprising 3 1.3% smokers and 0.9% missing) of 26.8%. One consequence of on-line intelligence illustrated in Table 1, is that it restricts the amount of on-screen editing which can be carried out. On-screen editing enables the user to amend any errors seen in data already keyed while still in the process of entering new data. This inverse relationship between online intelligence and on-screen editing might be expected to influence the observed error rates. Increased on-line intelligence requires a more sophisticated package which has to be tailored for each type of record form to be input. This is not only more difficult and time consuming, but also requires a greater degree of computing experience. ZZ The Induction

of Errors

The error rates (expressed as the number of characters in error per 100 forms) produced by each user with each method are shown in Table 2. There is considerable variation between users, with a clear tendency for users to have consistently low (or high) error rates with the different methods. The users have been arranged in order of previous keyboard experience (see Methods) but no relationship is apparent between this and error rate. The differences

546

CROMBIE

AND IRVING

TABLE FREQUENCY

2

OF CHARACTERS

OF THE THREE

IN ERROR”

METHODS

User

Method 1

Method 2

Method 3

I II III IV V VI

4.6 14.9 8.6 16.0 6.3 31.4

I.1 6.0 4.1 21.8 7.5 43.1

3.5 13.1 3.6 62.4 4.9 16.3

Mean

13.6

14.0

17.3

a Error rates calculated as the number of characters in error per 100 record forms.

between the average error rates for each method are small, particularly in comparison with differences between users. Some types of keying mistakes can give rise to several errors in the data file. The omission of a single character results in all the subsequent digits in that record being one position out of sequence. For this reason the number of characters in error could give a misleading impression of the error rates. A simple, though approximate, method of estimating the number of mistakes actually made is by counting the number of records containing one or more errors. Table 3 gives these data which again show large differences between individuals, but with the individuals being consistent across methods. The differences between the methods are small but their rank order as measured by records in error (Table 3) is the inverse of that obtained with characters in error (Table 2). TABLE FREQUENCY

OF RECORDS

FOR THE

User

THREE

3 WITH

ERRORS”

METHODS

Method I

Method 2

Method 3

I II III IV V VI

2.1 4.3 5.0 11.7 4.0 8.1

0.1 5.0 3.6 9.7 4.4 11.2

2.8 5.7 3.6 6.7 4.1 4.1

Mean

5.9

5.7

4.4

DError rates expressed per 100 record forms

DATA ENTRY WITH PERSONAL TABLE KEYING

TIMEO

COMPUTERS

4

FOR THE THREE

ENTRY

547

DATA

METHODS

User

Method 1

Method 2

Method 3

1 II III IV V VI

50.9 42.4 40.8 47.8 15.8 34.1

45.9 46.2 30.4 45.1 22.7 33.4

54.7 50.9 40.1 48.7 28.1 53.3

Mean

38.6

37.3

46.0

0 Times expressed as minutes per 100 records.

III

The Time for Data Entry

The keying rates of the different users follow the pattern of previous experience, with the data clerks (V and VI) fastest, the secretaries (III and IV) slower, and the research assistants (I and II) slightly slower (Table 4). The limited intelligence one (method 2) was fastest overall and for four of the six users: the low intelligence (method 1) slightly slower; and the moderate intelligence (method 3) much slower. But the process of data entry involves not just keying but also error detection and correction and any comparison of methods should examine the overall time for each step. Table 5 shows that the times for error detection and correction contribute about 25% of the data entry time but that they are similar for the three methods. Thus keying rates are the major factor determining the rank order of overall data entry times, but the differences are not large, being of the order of lo-15%. IV

The Effect of Record Length

To investigate the effects of record length two additional forms, one 50% longer the second considerably longer, as described under Methods were used. TABLE COMPARISON

OF OVERALL

5 DATA

ENTRY

TIMES”

Activity

Method I

Method 2

Method 3

Keying Error detection Error correction

38.6 4.0 12.2

37.3 4.3 10.8

46.0 3.7 10.3

Total

55.7

52.4

59.9

LITimes are taken over all six users.

548

CROMBIE

AND IRVING

TABLE EFFECT

OF RECORD

LENGTH

ON KEYING

AND ERROR RATES”

Medium record

Short record

Keying time (min/lOO characters) Error rate (errors/100 characters)

6

Long record

Method 1

Method 2

Method I

Method 2

Method 1

Method 2

0.99 (1.6)”

0.98

(1.6)

1.29 (2.2)

1.29 (2.2)

1.42 (2.4)

1.21 (2.0)

0.21

0.07

0.05

0.30

3.79

0.05

L?Results based on average of two users. b Figures in brackets are keying rates (characters/second).

It proved impractical to write the input programs for the longer records using the commercial package dI3aseI1, because of limitations in the permissible number of input fields. An alternative package, Datastar, was used for the moderate intelligence method, However this package does not allow the programming flexibility required for method 3, so that this could not be investigated. A comparison of the input of the three record types by the low and limited intelligence methods (1 and 2) showed that the longer records had a 30% slower keying rate (Table 6). The dramatic finding was the very great increase in the error rate induced by the low intelligence method with the largest form. This finding occurred with both users who reported difficulties in aligning fields when each record occupied 8 rows on the VDU screen. DISCUSSION

Microcomputers provide a convenient and inexpensive method for the management of medical data (3-7). Data entry is a time consuming task and although the importance of accurate data is widely recognized little research has been carried out comparing different methods of input on a microcomputer. Of the three methods compared in this study that of data entry onto a screen image of the record provided the best combination of low error rate and high speed. This method was also most popular among all users. In the present study the method was set up using a commercial data base language which involved much time and programming skill. However, equivalent systems can now be set up using packages such as Datastar (MicroPro) or Delta (Compsoft PLC) which are much simpler to use and require a much lower level of computing experience. The unexpected result of this study was that the theoretical advantages of the most sophisticated method were not realized in practice. Firstly the online error checking of the most intelligent method did not result in a lower error rate

DATA ENTRY WITH PERSONAL

COMPUTERS

549

in the data files. One explanation for this would be that only a small proportion of the errors could be detected by the range and logic checks. When this was investigated it was found that of the total errors induced by the methods lacking such checking only 10% would have been detected by range and logic checks. This finding supports the opinion expressed by Hasman and Chang (10) that range checks leave significant numbers of errors in the data. There would appear to be no simple alternative to dual key verification. Another factor affecting the observed error rates was the difference in the amount of on-screen editing carried out. The users reported that this occurred most frequently with the simple character entry method although this could not be quantified. It is possible that this method had an initially high error rate but this was reduced during data entry by the user. The second unexpected result was that although the most sophisticated method required 26.8% fewer data characters (because it could skip inappropriate fields) the keying rate was 20% slower than the other methods. In fact the users reported that a keying rhythm could not be developed with this method because of the need to skip fields and the consequent variation in the processing of each form. This problem was exacerbated by occasional slow response from the microcomputer when it was required to carry out logical operations and led to considerable dissatisfaction being expressed with this method. The data entry system using the word-processing package required minimal computer experience and was more than adequate for the short record form used in this study. But when employed with longer records which occupied more than one or two rows on the VDU screen this method was slower and produced a substantially higher error rate. A similar effect of record length on keying rate has been found previously (II). The errors are predominantly due to insertion and deletion of single characters and may arise because of difficulties in lining up fields when records spill over one or more rows. These errors are catastrophic since all subsequent fields are affected. In conclusion the screen image or moderate intelligence method is the most satisfactory having a high keying rate and low error rate for a range of form types. It requires only limited computing skill to be used and can easily be set up to permit dual key verification. ACKNOWLEDGMENTS We thank the six members of staff for being subjects in these experiments and Professor H. D. Tunstall Pedoe and Dr. W. C. Smith for useful comments on the manuscript. This work was supported by a grant from the Scottish Home and Health Department. REFERENCES 1. MYERSCOUGH, 2. Anonymous.

P. A data base by any other name. Pratt. Computing, 50 (April 1984). No need to go in a single file. What Micro, 35 (April 1984). 3. PARENTI, I. V., FERRARI, G., ZOPPI, S., FIACCO, E., BERGER, D., AND CARLO, V. D. A low cost computerized system for data management in a surgical department. Ric. Clin. Lab. 14,73 (1984).

550

CROMBIE

AND

IRVING

4. REID, J. A., AND KENNY, G. N. C. Data collection in the intensive care unit. J. Microcomput. Appl. 7, 251(1984). 5. RAWLS, G. M., DWIGGINS, G. A., AND FEIGLY, C. E. Application of the microcomputer to Occupational Health data management. Amer. Ind. Hyg. Assoc. J. 44, 301 (1983). 6. SLADEN, J. G. The personal computer as a clinical research and teaching tool. Amer. J. Surg.

147, 654 (1984). 7. 8.

9. 10. 11.

BURAU, K. D., WOOD, S. M., AND BUFFLER, P. A. Microcomputer-assisted data management in a case-comparison study. Compur. Biomed. Res. 18, 369 (198.5). FEE, W. M., BRISCOE, C., CROMBIE, I. K., IRVING, J. M., SMITH, W. C. S.. THREAPLETON, L., AND TUNSTALL PEDOE, H. D. Prevalence of cigarette smoking in Dundee schoolchildren in 1964 and 1984. Comm. Med. 7, 283 (1985). WOLFENSON, L. B., AND WORTH, N. An approach to health data validation. Comput. Biomed. Res. l3, 501 (1980). HASMAN, A., AND CHANG, S. C. ADAMO. A data storage and retrieval system for clinical research. Comput. Biomed. Res. 15, 145(1982). EKLUNDH, K. S., MARMOLIN, H., AND HEDIN, C. E. Experimental evaluation of diaiogue types for data entry. ht. J. Man-Much. Stud. 22, 651 (1985).