Computer-Assisted Data Collection in Multicenter Epidemiologic Research The Atherosclerosis Risk in Communities Study* David H. Christiansen, DrPH University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
James D. Hosking, PhD University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
Andrew L. Dannenberg, MD, MPH National Heart, Lung, and Blood Institute, NIH, Bethesda, Maryland
O. Dale Williams, PhD University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
ABSTRACT: The Atherosclerosis Risk in Communities (ARIC) Study uses a computer-assisted data collection (CADC) system in which staff at four Field Centers directly record into microcomputers much of the data obtained from the 16,000 study participants during 4 hours of interviews and exams. A pilot study was conducted to evaluate the feasibility of training Field Center staff in the use of a CADC system and to assess study participants' reaction to such a system. When asked to compare CADC to a paper-based system, all five of the pilot study staff members preferred the CADC system. The 16 pilot study participants either had no preference (63%) or preferred CADC (37%). With respect to data quality, no systematic differences between the two methods of data collection were evident in the pilot study. The CADC system required approximately 10% longer for data collection, keying, and editing than the paper-based system
Address reprint requests to: Dr. David H. Christiansen, Collaborative Studies Coordinating Center, University of North Carolina, Suite 203 NCNB Plaza, 137 E. Franklin St., Chapel Hill, NC 27514. Received October 6, 1988; revised June 30, 1989. *The authors have no past or present affiliation with any of the commercial computer hardware and software vendors referenced in this article. Controlled Clinical Trials 11:101-115(1990) © Elsevier Science Publishing Co., Inc. 1990 655 Avenue of the Americas, New York, New York 10010
lOl
0197-245~1990~3.50
102
D.H. Christiansen et al. took for collection alone. Immediate data entry in a CADC system may improve data quality by eliminating a transcription step and by allowing prompt detection of suspicious values while the participant is still available to provide confirmation or correction. CADC simplifies data collection by automating complex branching questions and can enhance data completeness. The ARIC CADC system is based on commercially available software customized by the study's Coordinating Center. The microcomputer-based CADC system described in this report may serve as the prototype for future epidemiologic studies that collect standardized data on large numbers of participants at a small number of sites.
KEY WORDS: Data collection in epidemiologic studies, distributed data management, remote data entry, data quality control
INTRODUCTION Data m a na ge m ent systems used in multicenter studies have evolved over the past two decades as new computer technology has become available (Table 1). Until the late 1970s, a typical data management system involved the collection of data by Field Centers using paper forms that were then mailed to a central coordinating center to be keyed prior to processing on a mainframe computer. The actual data entry and editing were often done weeks to months after data collection. Error correction typically involved time-consuming and labor-intensive exchanges of paper edit messages by mail. In some cases, error correction was not possible because neither the examiner nor the participant was readily available at the time errors were investigated. In the late 1970s, investigators began to use distributed data entry systems that involved entering data into computers at each Field Center from paper forms [2-11]. Errors detected at the time of data entry, which occurred hours to days after data collection, were easier to correct because the examiner (but not the participant) was likely to be available. A logical extension of this approach is to record data on computer screens as they are collected. This computer-assisted data collection (CADC) method eliminates the need to handle and store paper forms and removes one transcription step and its associated errors. Further, the CADC system can be
Table 1
Evolution of Data Management Systems in Multicenter Studies
Data of introduction Example Format of data collection Location of data entry and editing Interval between data collection and editing Persons available during data editing: Participant Examiner
Centralized Data Management
Distributed Data Management
Computer-Assisted Data Collection
LRC-CPPT [1] Paper forms
late 1970s SHEP [2,3] Paper forms
late 1980s ARIC [17] Computer screens
Coordinating Center Weeks
Field Centers
Field Centers
Days
Seconds
Absent Absent
Absent Nearby
Present Present
Computer-Assisted Data Collection
103
used to assist the data collection process by automating skip rules, enforcing the completion of required data fields, and editing responses while the participant is present to confirm or correct suspicious values. This type of system has been used for some time for collection of data during telephone interviews [12-15] but only recently has been used to collect data in face-to-face encounters with participants [16]. Three concerns typically cited for the reluctance of researchers to adopt this approach are its impact on the data collection process, data security, and cost effectiveness. Concerns about impact on the data collection process include the possibility that the participants will be bothered by the process, that staff will dislike using the system, or that the data collection process will be more time consuming or difficult. Data security concerns include the need for archival copies of records to replace paper forms as the source documents, the need for an audit trail of all additions, changes, and deletions, and the need for secure backups of these items. Cost effectiveness concerns arise because, in comparison with business systems, research projects are of short duration and generate much more varied and complex data. A CADC system for large multicenter studies must have the power and flexibility to handle dozens of data collection screens with frequent revisions. At the same time, the total system cost (including development and operational costs for hardware, software, and personnel) must be reasonable within the context of a research project of fixed duration. This article provides an evaluation of a CADC system on the data collection process. It describes the results of a pilot study conducted to evaluate the feasibility of using a CADC system in a large, multicenter study. Data security and cost effectiveness of the CADC system are also discussed.
ARIC Study Description The Atherosclerosis Risk in Communities (ARIC) Study is designed to investigate the etiology and natural history of atherosclerosis and its clinical sequelae [17]. The study consists of two components, Cohort and Community Surveillance. The Cohort component is a prospective epidemiologic study in which a random sample of 4,000 men and women aged 45-64 years are examined and followed in each of four communities: Forsyth County, NC; Jackson, MS; suburban Minneapolis, MN; and Washington County, MD. In the Community Surveillance component, the occurrence of hospitalized myocardial infarction and coronary heart disease death is recorded for all community residents aged 35-74 years. Data collection for the Cohort component includes a baseline examination, a repeat examination 3 years later, and annual telephone contacts in the intervening years. During the 4-hour baseline and subsequent clinic visits, data are collected by interview, physical examination, arterial ultrasonography, electrocardiography, spirometry, and phlebotomy for lipid, hemostasis, hematology, and clinical chemistry measurements. Six participants per day are examined at Field Centers located in each of the four study communities. Examinations are conducted at several different workstations and are primarily performed in the morning in order to accommodate several procedures that require participants to have fasted overnight. There-
104
D.H. Christiansen et al.
Puloona y F
otion
l
l
Anth o ooot y
PhysicalExam
ECG
~
MedicalReview
Ultrasound
~
ClinicCoordinator Interviews
BloodLaboratory
~
Interviews
WaitingRoom
~
Lobby Reception
DataCoordinator~ I
~
I
IBM PC/XT Workstation IBM PC/AT LocalDatabase Figure I
Floor plan of a typical ARIC field center indica~ng locafion of microcomputer worksta~ons.
fore, each Field Center examines several participants simultaneously, requiring participants to move among workstations in a variety of predetermined sequences.
CADC
System Description The ARIC CADC system uses stand-alone IBM PC/XT ~ microcomputers for data collection at multiple locations within each Field Center, as shown in Figure 1. The primary data collection software used at each Field Center is Viking Forms Manager, I which was adapted for use in ARIC by the study's Coordinating Center. The data entry software was extensively customized using the C programming language. In addition to collecting and editing the
1The trade name is used for identification only and does not represent an endorsement by the study or the National Institutes of Health.
105
Computer-Assisted Data Collection
ID N'IJ~'~: IC I~'la l--~1'+15 le [
PI
Ii/I/86
I 1 I I I I I
INSTRUCTIONS : 1~is form s~o~id b~ co~leted d u a ' ~ ~ participant's visit. ~ter~ a~v@. ~ r ~ri~l ~ @ ~ ~ ~ i ~ , ~ r digit ~ i~ ~ ri~st ~x. ~ter 1 ~ z ~ ~ If a n ~ is ~ ~o~tly, ~k ~ ~ ~ o ~ t corr~t ~t~ cl~ly a~ve ~ ~co~t ~ . For '~ltiple t~ ~tio~, ci~le ~ letter c o ~ s ~ n d ~ to ~ ~at ~ letter is c i ~ l ~ i ~ o ~ t l y , ~ ~ it w i ~ ~ '~' ~
$ir/ii~
ID ~ r ~ ~ ~t ~ ~ ~ $o ~ t ~ l~t ~ to fill all ~x~. ~ wi~ ~ '~'. ~e ~ ~ic~" ~ "~/no" r ~ ~ e . If a ci~le ~ co~t ~ e .
B I . , ~ PRES~UR~ FOR}* (SBP^ s c r e e n i o f , ) 3. How l o n g ago d i d you l ~ t s~ok~ o r l a s t u~e chewir~ to~afico o r sr~u~f? . . .
^. T I ~ E ~ A ~ I. Room Temperature (degrees centigrade): ... ~
[ ]
B. XO~ACCO AND CAFFEIN~ USE "~okir~ can chan~e ~ ~sults of ~ e ~ l a ~ r a t o ~ tests we will do to~y. ~ a ~ e ~is ~ ~Id 1 ~ e to ~ k ~ . .. 2. ~ v e you s ~ d or s ~ f w i ~
I
~
~
ainutes
'94e are goir~ to ask you not to smoke until you h~ve co~lete~ your visit with us today. ~ do this so that your test results ~re Rot a l l o t t e d by smoking. I f you ~ a s t s ~ o k ~ , p l e a s e t e l l us t h a t you d i d b e f o r e you l ~ a v e . "
~d of
or ~ e d c ~ i ~ to.co ~ e 1 ~ t . h o ~ s ? ....... Yes
to I t ~ ~
hours,
~
Y
®
,.
Have you had any c o f f ~ , t~a~ or chocolate within the last . hours? . . . . . . I Go to Item 6, I Screen 2
~-~ Yes ~o |
N
Figure 2 Paper version of ARIC sitting blood pressure form.
data, the system permits users to enter text in an electronic note log as needed, to confirm out-of-range values as valid, and to indicate status codes when data values are questionable or unresolvable. The CADC system displays screens that resemble paper forms. For example, Figure 2 shows the first page of the ARIC Sitting Blood Pressure Form and Figure 3 is the corresponding first screen displayed by the CADC program. As data for a field are entered, they are edited by the system. Values failing the edit tests cause an error message to be displayed. The data collector can then correct the value, confirm it, or flag it as "questionable" and in need of further investigation. A floppy diskette is created for each participant prior to his or her first Field Center visit. The diskette is stored inside the participant's medical folder. Participants carry their folders and diskettes as they move from station to station. At each CADC station, the computer is used to collect, edit, and write data on the diskette. The final stop is a review of data by the Field Center medical staff. At this time, any unresolved questions about data values are reviewed and resolved if possible. After completion of the examination, the data are transferred to the Field Center database maintained on the IBM PC/AT shown in Figure 1. Once a week, all new or modified data are copied from the database to a diskette and mailed to the Coordinating Center. Since the CADC system is designed for data collection without the use of
106
D.H. Christiansen et al. ARSBPA V01.00 ID: C1234567
NAME: J _ _ . , FORM: SBP VERSION: A
CONT. YR: 01
M _
UPDATE: 00
SITTING BLOOD PRESSURE FORM (screen i of 4) A. TEMPERATURE I. Room Temperature (degrees centigrade): 20
3 . How long ago did y o u last smoke or last use chewing tobacco or snuff? _
hours,
_
_
minutes
B. TOBACCO AND CAFFEINE USE "Smoking can change the results of the exams and laboratory tests we will do today. Because of this we would llke to ask you ... 2. Have you smoked or used chewing tobacco or snuff within the last & hours? Yes (Y) or No (N)*
N
"We are going to ask you not to smoke until you have completed your visit with us today. We do this so that your test results are not affected by smoking. If you must smoke, please tell us that you did before you leave." 4. Have you had any coffee, tea, or chocolate within the last 4 hours? Y Yes (Y) or No (N)*
Figure 3 CADC screen version of ARIC sitting blood pressure form.
paper forms, several design features are incorporated to ensure completeness of data collection and security of the data. First, data records corresponding to each form are written to multiple files on the participant diskette and on the workstation hard disk as the data are collected. Thus a system failure will only affect the data form currently being entered. In addition, the copy of the data stored on the workstation hard disk provides for the recovery of data should the participant diskette be damaged or lost during an examination. Second, a Field Center database can be restored from the collaborative data base at the Coordinating Center. Finally, if the CADC system is nonfunctional for any reason, paper versions of the CADC forms are available for use by the Field Center staff, allowing data collection to continue uninterrupted. The system creates a file containing a copy of every new, modified, or deleted data record. Each record in this file is marked with the date and time of this version of the record and the access code of the staff member creating it. This file provides a comprehensive audit trail for each record. Although data are stored at multiple locations, the confidentiality of participant data is maintained by passwords and data encryption [18] that restrict data access at each workstation. In order to use the CADC system, each staff member must have a password and access code that identifies the user and determines which functions the user is allowed to perform. Using this system, study data can be entered, reviewed, or edited only by those staff members whose passwords permit that specific function. To guard against access to the data by persons not authorized for such access, each data record is encrypted before being written to the database and audit trail files. Tables of access codes and passwords are likewise encrypted to deter tampering.
Computer-Assisted Data Collection
107
METHODS
Pilot Study Design In March 1986 a pilot study of the ARIC CADC system was conducted at the Johns Hopkins Training Center for Public Health Research in Hagerstown, Washington County, Maryland. The pilot study involved a total of five staff members representing the four ARIC Field Centers, including physicians, study coordinators, and other personnel. None had prior experience with CADC systems and few had any prior experience with computers. ARIC Coordinating Center staff set up four microcomputer workstations, prepared and presented the training materials, and collected performance data related to use of the CADC system. The primary objectives of the pilot study were to determine if Field Center staff could learn to use the CADC system with limited training effort, to assess study participants' reaction to the system, and to ascertain whether staff and participants preferred the CADC system or a paper-based system. A secondary objective was to compare the two systems with respect to quality of data collected and time required for collection. The pilot study was preceded by 2! days of training in study interviews and examinations, completion of paper forms, and use of the CADC system. The training session was followed by 2 days of data collection involving 16 adult volunteers (8 men and 8 women) recruited by the Washington County ARIC Field Center staff. Three interviews (Reception, Medical History, and Respiratory Symptoms) and two examinations (Sitting Blood Pressure and Anthropometry) were chosen out of the approximately 30 ARIC study procedures. To simulate the anticipated flow of participants in the ARIC study, participants were scheduled to arrive at half-hour intervals over the 2-day period. Twelve participants were interviewed and examined using the CADC system, while paper forms were use for the remaining four. Each participant then had either the interviews or the examinations repeated using the opposite data collection method. The order of examinations and interviews was varied, as shown in Table 2. An unbalanced design was chosen in order to provide more data to evaluate the CADC system, which was one of the primary objectives of the pilot study. The full set of initial and repeated procedures required approximately 2 hours per participant. Interviews and examinations were conducted simultaneously using the microcomputers at the workstations. Times of arrival and completion at each workstation were recorded for each participant. At several times during the pilot study, the ARIC staff completed questionnaires about the CADC and paper form methods of data collection. All 16 volunteers were given an exit interview by the Coordinating Center staff to ascertain their reaction to the CADC system and to determine if they preferred one of the data collection methods. The quality of data collected by the two methods was assessed by examining the frequency and manner of resolution of "suspicious" data values, defined as those failing predetermined edit tests. Resolution involved changing the value to one passing the edit, confirming the value was correct despite failing the edit, or labeling the value as "questionable" because the true value could
108
D.H. Christiansen et a1.
Table 2 Pilot Study Design and Procedure Sequence Procedures (Data Collection Method) Initial
Repeated
Reception", interview,b examC (paper) Reception, exam, interview (paper) Reception, interview, exam (CADC) Reception, exam, interview (CADC) Total participants
Interview (CADC) Exam (CADC) Interview (Paper) Exam (Paper)
Number of Participants 2
2 6 6
16
aReception form. bMedical history and respiratory symptoms forms. CAnthropometry and sitting blood pressure forms.
not be determined at that time. Relative quality of the data was also estimated by comparing discrepancies between data collected on paper and those collected using the CADC system, although recording and keying errors were obviously confounded by response and examiner variation.
RESULTS Impact on Field Center Staff In questionnaires administered before training, the staff indicated a general willingness to use the CADC system despite a lack of previous computer experience. After the training sessions and pilot study, the staff was positive about their experience with the CADC system, thought the CADC system would work in their Field Centers, and believed the training they received was adequate. In addition, all staff indicated a preference for the CADC system over a paper-based system for interviewing and examining participants, citing faster and more accurate entry of data and less likelihood of erroneously entering or skipping an item. The preferences given for specific aspects of the CADC system are shown in Table 3.
Impact on Participants When the volunteers were asked if anything about the procedures bothered or distracted them in such a way as to interfere with their ability or willingness to answer questions, 14 of the 16 participants answered "No"; the other two expressed concerns related to not being told their blood pressures following measurement on the random-zero sphygmomanometer. None of the 16 expressed any concerns related to the CADC system. When asked which of the two methods (paper or computer) they preferred, six responded "computer" and ten had "no preference." None of the 16 expressed a preference for the paper forms method of data collection. The majority of the participant com-
Computer-Assisted Data Collection Table 3
109
Staff Preference of Data Collection Method No
Aspects of the DES Paper Computer Preference Physical layout of forms or screens Flow (moving from question to question) Answering multiple choice questions Correcting multiple choice questions Answering "fill-in" questions Correcting "fill-in" questions Method of skipping irrelevant questions Identification of "suspicious" values Method of entering extreme values Ability to explain extreme values (questionable and problem logs) Ease of use when interviewing participants (reception, medical and respiratory history forms) Ease of use when examining participants (sitting blood pressure and anthropometry forms)
ments referred to the fact that the computer was faster and more accurate than paper forms.
Impact on Data Collection
The median times required for data collection at the reception, examination, and interview stations were 5.0, 15.0, and 8.0 minutes for the paper forms and 5.5, 16.0, and 10.0 minutes for the CADC system, respectively. The times for paper forms do not include keying or editing the forms, whereas they do for the CADC method. The time required to key a participant's paper forms was between 5 to 7 minutes total for all three stations. Analyses indicated that the paper and CADC methods differed with respect to the resolution of suspicious v a l u e s ~ t h a t is, data values that fell outside the predetermined limits. The percentage of suspicious data values was similar for each method, as shown in Table 4. A total of 23 of the 861 (2.7%) data items collected on paper were suspicious, compared to 25 of 1273 (2.0%) for the CADC method. Because the CADC system requires that all suspicious values be resolved at the time of data collection, the suspicious value must either be corrected or labeled as "confirmed" or "questionable" before continuing to the next item. For data collected on paper forms, suspicious values are not identified until after the examination when the forms are keyed. The CADC method allowed 21 of the 25 suspicious data values (84.0%) to be confirmed at the time of collection, compared with only 1 of 23 (4.3%) confirmed at the time of data entry with the paper system. As described earlier, a subset of data was collected twice from each participant, once using CADC and once with paper forms. Comparison of the Medical History and Respiratory Symptoms data collected with both methods resulted in 93% agreement, as shown in Table 5. Review of the paper forms and CADC records revealed that 11 of the 27 nonmatches were due to errors in completing the paper forms. For the other 16 discrepancies, it was not
110
D.H. Christiansen et al.
Table 4
Resolution of Suspicious Values Using the Paper and CADC Methods Data Items Method Collected
Form Reception
Paper CADC Medical history Paper CADC Respiratory Paper symptoms CADC Anthropometry Pa per CADC Sitting blood pressure Paper CADC All forms Paper CADC
45 95 190 276 221 326 183 266 222 310 861 1273
Suspicious Values Identified 1 5 3 4 3 1 12 15 4 0 23 25
Resolution Confirmed Questionable 1 5 0 1 0 0 0 15 0 0 1 21
0 0 3 3 3 1 12 0 4 0 22 4
possible to determine which response was correct since the participants had left the Field Center before the paper forms were keyed. The comparison of examination data collected both on paper forms and using CADC is complicated by the fact that repeating the examination procedures on the same participant often results in different values. Both technician measurement errors and data entry errors could be expected to contribute to recorded differences for repeated anthropometric, blood pressure, and heart rate measurements. Differences in repeated blood pressure and heart rate measurements could also be due to normal physiologic variation during the 30 minutes between measurements. Although it was generally not possible to separate the sources of error in reviewing the intraparficipant differences observed, we were unable to detect any suggestion of systematic differences between the two data collection methods with respect to data quality.
DISCUSSION The impact of the CADC system on the staff was positive. All staff expressed a preference for the system over paper forms and all felt confident of their ability to use it after less than 1 week of training and practice. The staff was particularly pleased with features that assisted data collection such as automated skip rules, rapid calculations, and automated recording of dates and times. The impact on participants was minimal, with none reporting
Table 5
Agreement Between Paper Forms and CADC Methods for Interview Forms
Form Medical history Respiratory symptoms Total
Number of Items 169 19~4 363
Matches 164 172 336
Nonmatches 5 22 27
Agreement 97% 89% 93%
Computer-Assisted Data Collection
111
anxiety or concern over the use of the CADC workstations during data collection. Participants either did not express a preference or preferred the CADC system. In addition, editing responses during collection reduced the number of unresolved data problems and errors in form completion. Although median data collection times using the CADC method were slightly longer than those for the paper system, the former yielded edited data in machine-readable format, while information from paper forms still had to be keyed and edited. Adding keying time for the paper forms indicates that the paper-based method takes approximately 8% longer than CADC. Considering the additional time needed to edit data from paper forms (while participants are not readily available to confirm suspicious values), the total staff time required collecting, keying and editing data on paper is predicted to be even larger. All staff had prior experience with paper forms but not had previously used a CADC system; further time savings might be expected as the staff gains more experience with the new system. One important objective in implementing a CADC system is to improve data quality. The elimination of paper forms removes one transcription step, which is a source of error. Automated range and consistency checking, as well as the ability to edit data while the participant is still present, decrease the likelihood of entry of incorrect data into the database. Required entry for selected fields facilitates data completeness. Both data quality and the efficiency of data collection may be enhanced by several features available with a CADC system. For forms such as the Rose Questionnaire for Angina [19], using automated skip patterns ensures that all appropriate questions are asked but irrelevant questions are avoided. Fields such as the current date and participant identification number can be entered automatically by the CADC system. Calculations, such as those needed to derive corrected blood pressures from readings of a random zero sphygmomanometer, are more efficiently and accurately performed by the computer than by hand. A CADC system yields relatively "clean" data, reducing the need for further error resolution. The elimination of a transcription step potentially reduces the clerical data entry time required for the study. In addition, a CADC system can provide an automated audit trail indicating all the corrections made since initial data entry. The costs of implementing and operating a CADC system are different than the costs for a paper-based distributed system. Prior to the ARIC study, the Coordinating Center staff had developed a paper-based distributed data entry system for the Lipid Research Clinics (LRC) CPPT Follow-up Study [20] that required approximately 3 programmer years of effort over an 11-month period. As an extension of the LRC system, Coordinating Center staff has since developed both a paper-based distributed system for the Studies of the Left Ventricular Dysfunction (SOLVD) [21] and the CADC system for the ARIC study. The modifications of the LRC paper-based system for the SOLVD project required approximately 1.9 programmer years over 10 calendar months, compared with 2.3 programmer years over the same 10-month period for development of the CADC system for ARIC. Differences in the design of SOLVD (a clinical trial) and ARIC produced numerous differences in detailed specifications of the two systems that are not directly related to the paper
112
D.H. Christiansen et al. forrn/CADC comparison. Nonetheless, we feel that this 20% (0.4 programmer year) difference is reasonable estimate of the incremental development effort for a CADC system. For each ARIC Field Center, a paper-based distributed data entry system would have required two data entry PCs in addition to the three dedicated to ultrasound, pulmonary function, and the Field Center database. As described above, the ARIC CADC system involves five data collection PCs, in addition to the three special-use microcomputers at each Field Center. The increased number of workstations also involves some increased training costs for Field Center personnel. Thus, we estimate that the CADC approach required approximately $20,000 in increased personnel cost for development and $36,000 for the 12 additional workstations (3 incremental machines per Field Center). Balanced against the increased development and equipment costs for a CADC system is the reduction in Field Center staff due to the elimination of keying of the paper forms and resolution of suspicious values. Given the large data volume to be keyed during the 6 years of ARIC data collection, the savings in data entry personnel alone would more than offset the incremental development and hardware costs of the CADC system. For example, if all ARIC cohort data items currently collected using CADC were collected on paper first, it would require keying approximately 2300 additional characters for each of the six participants seen in the Field Center per day. Assuming a keying rate of 4,000 characters per hour, this would generate approximately 7 hours of keying and verifying per day per Field Center. Based on current labor costs for data entry, this would result in direct labor costs of over $400,000 across the four Field Centers during the 6 years of Cohort data collection. The ARIC CADC system was placed in the field in November 1986. Through January 1989, 10,136 participants have been examined at the four ARIC Field Centers using the system. Reactions of both participants and staff have remained positive over this time. The major problem has been that the programming time required to make changes and additions to the data entry screens has caused delays in implementation. Microcomputer hardware and software have functioned as planned without major breakdowns to date. Several changes during the next 5-10 years may increase the use of CADC systems in epidemiologic studies. Microcomputer hardware costs will continue to decline, making CADC more feasible for projects collecting a small amount of data at multiple sites. Networking of workstations within a site will also become more feasible as Local Area Network (LAN) software and hardware become more cost effective. Better commercial software would allow more efficient programming ("painting") of data entry screens. Alternative data entry devices (e.g., voice recognition and touch screens) may supplement keyboard data entry. As more powerful portable microcomputers are introduced, the CADC approach will become increasingly applicable at remote field sites, including those in developing countries [22,23]. Finally, researchers, staff, and participants may be expected to become increasingly comfortable with microcomputers as an integral feature of research studies.
Computer-Assisted Data Collection
113
The ARIC Coordinating Center is supported by NIH contract N0! HC 55015.
REFERENCES
1. The Lipid Research Clinics Program: The Coronary Primary Prevention Trial: Design and implementation. J Chron Dis 32:609-631, 1979 2. Bagniewska A, Black D, Molvig K, Fox C, Ireland C, Smith J, Hulley S, SHEP Research Group: Data quality in a distributed data processing system: The SHEP pilot study. Controlled Clin Trials 7:27-37, 1986 3. Black D, Molvig K, Bagniewska A, Edlavitch S, Fox C, Hulley S, McFate-Smith W: Distributed data processing system for a multicenter clinical trial. Drug Inform J 20:83-92, 1986 4. Karrison T, Meier P: Watching the watchers: Data quality in the PARIS study. Proceedings of the Fourth Annual Symposium on Coordinating Clinical Trials, Chapel Hill, May 1977, NTIS Accession No. PB-289-461 5. Kronmal RA, Davis K, Fisher LD, Jones RA, Gillespie MJ: Data management for a large collaborative clinical trial (CASS: Coronary Artery Surgery Study). Comput Biomed Res 11:553-556, 1978 6. Bill J, Anderson R, O'Fallon J, Silvers A: Development of a computerized cancer data management system at the Mayo Clinic. Int J Biomed Comput 9:477-481, 1978 7. Rasmussen W, Neaton JD: Design, implementation and field experience with the use of intelligent terminals in clinical centers in the Multiple Risk Factor Intervention Trial. Proceedings of the Fifth Annual Symposium on Coordinating Clinical Trials, Arlington, May 1978, NTIS Accession No. PB-289-461 8. Jefferys J, for the HPT Investigative Group: Performance characteristics of the Hypertension Prevention Trial distributed data system. Controlled Clin Trials 4:148, 1983 9. Burau KD, Wood SM, Buffler PA: Microcomputer-assisted data management in a case-comparison study. Comput Biomed Res 18:369-375, 1985 10. Hawkins BS, Singer SW: Design, development and implementation of a data processing system for multiple controlled trials and epidemiologic studies. Controlled Clin Trials 7:89-117, 1986 11. Irving JM, Crombie IK: The use of microcomputers for data management in a large epidemiological survey. Comput Biomed Res 19:487-495, 1986 12. Nicholls WL: Experiences with CATI in a large-scale survey. Proc ASA, Sec on Surv Res Meth 9-17, 1978 13. Palit, CD, Sharp, H: Microcomputer-assisted telephone interviewing. Sociolog Meth Res 12:169-189, 1983 14. Shanks JM: The current status of computer-assisted telephone interviewing. Sociolog Meth Res 12:119-142, 1983 15. Harlow BL, Rosenthal JF, Ziegler RG: A comparison of computer-assisted and hard copy telephone interviewing. Am J Epidemiol 122:335-340, 1985 16. Birkett N]: Epidemiologic Programs for Computers and Calculators. Am J Epidemiol 127:684-690, 1988 17. ARIC Investigators: The Artherosclerosis Risk in Communities Study: Design and objectives. Am J Epidemiol 129:687-702, 1989 18. Bosworth B: Codes, Ciphers and Computers. Rochelle Park, NJ: Hayden, 1982 19. Rose GA, Blackburn H, Giilum RF, Prineas RJ: Cardiovascular Survey Methods. Geneva: World Health Organization, 1982
114
D.H. Christiansen et al. 20. Lipids Research Clinics Coronary Primary Prevention Trial Follow-Up Protocol. Collaborative Studies Coordinating Center, Chapel Hill, NC, 1986 21. Protocol: Studies of Left Ventricular Dysfunction (SOLVD) Prevention and Treatment Trials. Collaborative Studies Coordinating Center, Chapel Hill, NC, 1986 22. Gould JB, Frerichs RR: Training faculty in Bangladesh to use a microcomputer for public health: Followup report. Public Health Rep. 101:616-623, 1986 23. Bouckaert A, Lechat MF, de Bruycker M, de Kettenis YP, Speeckaert C: Microcomputers for field studies in epidemiology: An experience in southern Italy. Meth Inform Med 22:210-213, 1983
APPENDIX: PARTICIPATING INSTITUTIONS A N D PRINCIPAL STAFF Field Centers
Forsyth Co., NC--University of North Carolina, Chapel Hill: Drs. Gerardo Heiss, James Fo Toole, Herman A. Tyroler, L.E. Chambless, and Fredric J. Romm, Ann Rhyne DiSanto, Karen Barr, Betty Barnhardt, Jane Bergsten, Judy Jackson, Jane Jensen, Phyllis Johnson, Jean Marlow, Bonnie Monger, Diane Mooney, Dililah Posey, Dawn Scott, Charles Sofley, Cathy Tatum, Ann Toledo, Alice White, Carmen Woody Jackson MS--University of Mississippi: Drs. Richard Hutchinson, Robert Watson, Robert Smith, David Conwill, Seshadri Raju, and William Cushman, Jane Johnson, Drs. Alfredo Figueroa and Allen Thompson, Barbara Davis, Bobbie Alliston, Dr. Herbert Langford, Brenda Asken, Royanne Asken, Faye Blackburn, Clara Bowman, Sandra Bowton, Lisa Fetid, Rose Franklin, Dorothy Hathorn, Roberta Howell, Martha Nelson, Virginia Overman, Stephanie Oxner, Doris Pitts, Gloria Shelton, Brenda Watson, Mattye Watson Minneapolis, MN--University of Minnesota: Drs. Aaron Folsom, Linda Goldman, Stanley Edlavitch and Ken Cram, Dorothy Buckingham, Gina Trifle, John O'Brien, Katherine Provinzino, Leone Reed, Gail Murton, Azmi Nabulsi, Elizabeth Justiniano, Laura Bartel, Anne Murrill, Marilyn Bowers, Barb Kuehl, Hilare Hamele, Virginia Wyum, Linda Sherman, Donna Ottenstroer Washington Co., MD~Johns Hopkins University: Drs. Moyses Szklo, George Comstock, Linda Fried and Roger Sanders, Joel Hill, Dr. Robert Rock, Joyce Chabot, Carol Christman, Dorrie Costa, Sunny Harrell, Vicki Hastings, Will Kelly, Beverly Kittel, Trudy Littenburg, Thomas Markam, Joan Nelling, Brenda Price, Ann Seibert, Carvel Wright Coordinating Center--University of North Carolina, Chapel Hill: Drs. O. Dale Williams, David Christiansen, Lloyd E. Chambless, Jean Burge, Lars Ekelund, James Hosking, William Kalsbeek, Allen Rosman, and Frederick Eckel, Jeffery Abolafia, Dr. Edward Bachmann, I-Yiin Chang, Richard Cohn, Donna Coyne, Connie Hansen, Doris Jones, Dr. Edward Kelly, Joanne Kucharski, Michael Litszinger, Robert McMahon, Alex Melnick, Mark Park, Catherine Paton, Allan Rosen, Debra Rubin-Williams, Julie SmithFortune, Robert Thompson, Theresa Wahome, Laurence Wallman, Deborah Weiner, Kiduk Yang, Marston Youngblood Ultrasound Reading Center--Bowman Gray School of Medicine: Drs. Ralph Barnes, Mo Gene Bond, George Howard and Ward A. Riley, Jr., Delilah Cook,
Computer-AssistedData Collection
1].5
Robert Ellison, Gina Enevold, Gregory Evans, Cheryl Fishel, Gina Gladstone, Maureen Goldstein, Barbara Owens, Suzanne Pillsbury, Anne Safrit, Omega Smith, Betsy Vestal, Sharon Wilmoth, Billie Young Central Hemostasis Laboratory--University of Texas, Houston: Drs. Kenneth Wu, Arthur Bracey, and Keith Hoots, Audrey Papp Central Lipid Laboratory--BaylorCollege of Medicine: Drs. Wolfgang Patasch, Spencer Brown, Joel Morrisett, and Louis Smith, Charlie Rhodes, Lynette Rogers, Sarah Helbing Central Chemistry Laboratory--University of Minnesota: Dr. John Eckfeldt, Mavis Hawkinson ECG Centers--University of Minnesota: Dr. Richard Crow, Margaret O'Donnell; Dalhousie University: Dr. Pentti Rautaharju, Brian Hoyt Pulmonary Function Center--Johns Hopkins University: Dr. Melvyn Tockman, Pat Wilkinson, Michele Donithan, Anne Chase National Heart, Lung and Blood Institute--Drs. A. Richey Sharrett, Project Officer, Paul Sorlie, Andrew Dannenberg, and Millicent Higgins, Betty Nordan