41S
Abstracts
08 PROCESSING OF DATA FORMS USING HANDPRINT R E C O G N I T I O N s o F r W A R E : AN A C C U R A C Y A N A L Y S I S
Stephen C. Grubb and M. Marvin Newhouse
The Johns Hopkins University Baltimore, Maryland When processing data collection forms using handprint recognition software, it is essential to evaluate the accuracy of the data interpretation in order to adequately estimate the amount of time that must be spent on subsequent visual review or verification of the extracted data. Such evaluation may also expose systematic technical problems as well as problems with individual form layouts or fields. An accuracy analysis was performed on data collected for the Submacular Surgery Trials (SST). NestorReading 3.0 software is used for recognition of handprint and marks on forms that are completed and faxed to a computer from 18 clinical centers. The forms contain mostly numeric and checkmark data responses and were designed for recognition processing. The analysis was based on automatically generated computer logs of all recognition errors corrected during visual data verification and/or after data editing. All incorrect interpretations were considered errors, even when the handprint was poor or a response was crossed out and re-written. Blank fields, if interpreted correctly, were counted as successful interpretations. • 2801 pages representing 1792 forms and 53,011 fields were processed as of 11/25/97. • Numeric field error rates ranged from 2.7% for 1 digit to 27.7% for 6 digits. • Checkbox field error rate was 3.0%. • Overall field error rate for our forms was 6.0% (a mean of 1.1 field corrections per page). Discussion will include updated results, additional analyses, and a summary of factors affecting recognition and verification accuracy such as software settings, form layout and display, and data edit checking.
09 P O T E N T I A L BIAS IN R E M O T E D A T A E N T R Y
Jean Maupas, Nathalie Vis~le, Jean-Pierre Boissel
Service de Pharmacologie Clinique Claude Bernard University Lyon, France Introduction: Remote data entry (RDE) at the investigator site appears to present several advantages for clinical trial data management as data checks and editing benefit from being performed online and so time should be saved and costs reduced during drug development. However does the gain in costs and time have a trade off in quality'? We carried out several experiments. Methods: Data entry was performed on-line by investigators as soon as data became available, using paper as an intermediate support. Edit checks were activated and errors resolved during data entry. The paper forms were then mailed to the coordinating center and double data entry (DDE) was performed. The resulting files were then compared. The investigator was then asked to class each discrepancy as: RDE value correct, paper value correct or another value correct. Corrections were entered using DDE. Results: For one trial, 24 centers entered 427,584 items. Of 1,697 (0.4%) inconsistencies the paper value was confirmed for 1,139 (67.1%), the RDE for 541 (31.9%), 14 (0.8%) other values were given and 3 values remained unknown. In another trial, 325 centers entered 239,589 items. Of 4,587 (1.9%) inconsistencies, the paper value was confirmed for 3,620 (79%) the RDE for 691 (15%) and 276 (6%) other values were given.
42S
Abstracts Discussion: Discrepancies may enter into the system either as modifications during RDE (e.g., to meet stringent error controls) that are not subsequently reported on the paper form or as post hoe corrections of the trial documents (e.g., by the CRA). Whatever the cause of the distortion the evaluation process could have favored the paper copy as it was probably referred to by the investigator to establish the ,~ truth ~,. This could be considered as a form of the Hawthorne effect. Conclusion: RDE is promising but we must remain aware of the specific problems it presents. The f'mal quality of the trial may be degraded rather than improved if, either the system is not adequate or the process is not correctly managed. 10 IDENTIFICATION AND PREDICTION OF RESPONDERS TO A THERAPY: A MODEL AND ITS PRELIMINARY APPLICATION TO ACTUAL DATA
Wei Li, Jean-Pierre Boissel, Pascal Girard, Florent Boutitie, Michel Cucherat, Francois Gueyffier and the INDANA group Clinical Pharmacology Unit Lyon, France The effect of a given treatment for a given disease is usually estimated from randomized controlled clinical trials, resulting as a single treatment effect averaged over the trial population. However, there has been in the recent years an increasing willingness to individualize therapeutic decisions. The method we report here identifies the responders by assessing the individual probability of event, according to the treatment group. We used a treatment-stratified Cox regression model including interaction between treatment and patient's covariates, with common regression coefficients for treated and untreated, except for the special case of a prognostic variable which has an interaction with treatment. We obtained a reduced model to assess the probability of event in different treatments for each individual. Further, we used a discriminate function based on the final reduced model, representing the absolute therapeutic effect, to identify the patients to be treated according to different threshold of clinical efficacy. The number of responders was given by the number of avoided events. The model was explored on the INDANA database (which pools individual patient data from clinical trials of anti-hypertensive drug intervention). Data of 36444 patients, from five randomized controlled trials were included. The results show the number of avoidable events predicted by our method (number of responders), and the number of patients to treat in order to avoid an event (NPT). The confidence intervals of the absolute therapeutic benefit for each individual were calculated, by using Monte Carlo simulation and bootstrap percentile confidence interval method. The power of the model was evaluated using cross-validation method, and no statistical significant difference was found between the estimated and observed number of responders. On the basis of the results presented in this work, we conclude that the tools for predicting individual therapeutic benefit do exist. 11 ANALYSIS OF MULTIVARIATE FAILURE TIME DATA IN A CLINICAL T R I A I ~ : ESTIMATION OF OVERALL TREATMENT EFFECT AND STRENGTH OF CORRELATION
C~dric Mahe and Sylvie Chevret H@ital St-Louis Paris, France Clustered survival data are frequently encountered in longitudinal studies when there is a natural or an artificial grouping of individuals into an unit. To take into account the dependence of the failure times within the unit as well as censoring, two multivariate generlisations of the Cox proportional hazards model are commonly used: the marginal