Quality control of cancer screening examination procedures in the prostate, lung, colorectal and ovarian (PLCO) cancer screening trial

Quality control of cancer screening examination procedures in the prostate, lung, colorectal and ovarian (PLCO) cancer screening trial

ELSEVIER Quality Control of Cancer Screening Examination Procedures in the Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening Trial Joel ...

544KB Sizes 6 Downloads 108 Views

ELSEVIER

Quality Control of Cancer Screening Examination Procedures in the Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening Trial Joel L. Weissfeld, MD, MPH, Richard M. Fagerstrom, PhD, and Barbara O'Brien, MPH for the PLCO Project Team University of Pittsburgh Cancer Institute, Pittsburgh, Pennsylvania (J.L.W.); Biometry Research Group, Division of Cancer Prevention, National Cancer Institute, Bethesda, Maryland (R.M.F.); and Westat, Inc., Rockville, Maryland (B.O.)

ABSTRACT: Investigators for the Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening Trial describe quality control procedures for the digital rectal examination, ovarian palpation examination, transvaginal ultrasound, chest X-ray, and flexible sigrnoidoscopy. These cancer screening tests are subjective and difficult to standardize. PLCO quality control procedures aim to measure and, where possible, reduce variation, across examiner and screening center, with respect to cancer screening test performance. Initial protocols stressed examiner qualifications, experience, and training; equipment specifications; examination procedures; and definitions for positive tests. The PLCO quality assurance subcommittee developed a final quality assurance plan, which included central approval and registration of PLCO examiners, direct observation of screening test performance during periodic site visits by the National Cancer Institute and coordinating center auditors, periodic analysis of screening test data, and procedures for independently duplicating or reviewing selected examinations. For each modality, the periodic data analyses examine the test-positive and the test-inadequate proportions and aim to identify divergent centers or examiners. Procedures for duplicating examinations specify feasible sample sizes for precise estimates of agreement between examiners, at each center, for each screening test modality, and over a 1-year period. These quality control procedures will help characterize the consistency and reliability of the PLCO cancer screening tests. Control Clin Trials 2000;21:390S-399S © Elsevier Science Inc. 2000 KEY WORDS: Quality control, cancer screening, procedures, data analysis INTRODUCTION The integrity of the Prostate, Lung, Colorectal and O v a r i a n (PLCO) Cancer Screening Trial of the National Cancer Institute (NCI) rests on the cancer

Address reprint requests to: Dorothy Sullivan, Early Detection ResearchGroup, Division of Cancer Prevention, National Cancer Institute, EPN 330, 6130 Executive Blvd., Bethesda, MD 20892-7346 (E-mail: [email protected]). ReceivedMarch 27, 2000; acceptedMay 31, 2000. Controlled Clinical Trials21:390S-399S(2000) © ElsevierScienceInc. 2000 655 Avenueof the Americas,New York,NY 10010

0197-2456/00/S--see front matter PII S0197-2456(00)00094-5

PLCO Quality Control Procedures

391S

screening examinations that comprise the intervention. The examination procedures include flexible sigrnoidoscopy, transvaginal ultrasound (TVU), chest X-ray, ovarian palpation examination, and digital rectal examination (DRE). These procedures have complicating and distinguishing characteristics. Some require specialized equipment (flexible sigmoidoscopy, TVU, and chest X-ray); others do not. Some produce a permanent record (chest X-ray, TVU); others do not. Some are relatively new (TVU); others are well established in routine medical practice. The PLCO uses multiple and geographically dispersed screening centers (SCs). In practice, each SC employs multiple examiners. For practical reasons, including the need to control costs, the SCs typically employ nonphysician examiners or, for flexible sigmoidoscopy, nonspecialist physicians. In general, examination results depend strongly on examiner skill and experience. The examination procedures, by their very nature, are subjective. Often, thresholds defining an abnormal examination resist specification. For these reasons, as is the case with most clinical procedures, the PLCO investigators expect substantial variation, across examiners and SCs, with respect to the cancer screening test performance. Elsewhere, this supplement describes overall trial organization and procedures for protecting data quality [1-3]. This paper treats only quality control procedures for the PLCO cancer screening tests. The PLCO examines whether early PLCO cancer detection (with DRE, prostate-specific antigen blood testing, chest X-ray, flexible sigmoidoscopy, TVU, ovarian palpation examination, and CA125 blood testing) and treatment reduces cumulative PLCO cancer sitespecific mortality. High-quality cancer screening represents one of the conditions needed for detection of mortality benefit. Also, interpretation of final PLCO results requires assessment of the characteristics and quality of the cancer screening examinations. If the PLCO shows no mortality benefits, the cancer screening examinations must be shown to have been technically adequate. If the PLCO shows mortality benefits, the mortality benefit (output) must be interpreted in relation to the character and quality of the intervention (input). Any demonstration of mortality benefit should stimulate public health efforts to promote widespread screening. The PLCO should provide the standards (in terms of the characteristics and quality of the intervention) against which to judge any subsequent attempts to disseminate PLCO cancer screening methods in the general population. Fortunately, certain features intrinsic to the PLCO design help promote highquality cancer screenings. Specifically, the PLCO limits the number of SCs. Cancer screening at each PLCO SC achieves very high volume. The health services literature consistently shows favorable relationships between volume and medical care outcomes [4-6]. However, the PLCO considered and adopted additional specific quality control procedures for the cancer screening examinations. DEVELOPMENT OF A QUALITY CONTROL PLAN During the pilot phase, the PLCO steering committee and the organ-specific subcommittees specified technical and equipment needs and the minimal qualifications and training for examiners. The coordinating center (CC) codified these expectations in the PLCO manual of operations and procedures (MOOP).

392S

J.L. Weissfeld et al. Each SC used these guidelines to develop a local quality assurance plan, reviewed and approved by the NCI and the CC. Initially, procedures required central submission of curriculum vitae to establish each examiner's identity, credentials, and background. Later, the NCI constituted a PLCO quality assurance subcommittee, which articulated three goals: to measure the quality (acceptability, conformity with external standards, reliability, and accuracy) of PLCO cancer screening examination procedures, to create mechanisms that continually improve performance, and to develop objective and quantitative standards for any subsequent widespread or public health implementation of PLCO cancer screening technologies. The quality assurance subcommittee used a classic framework to help organize and conceptualize quality [7]. The framework defined structural, processrelated, and outcome-related components. Structural criteria included equipment specifications; the training, background, credentials, and experience of primary examiners; and the training, background, credentials, and experience of persons who train and supervise primary examiners. Process-related criteria included written specifications for performing the clinical examinations and measures of the extent to which examinations conform with written protocols. Outcome measures included such items as immediate and delayed complications from screening; participants' subjective experiences (satisfaction) with screening; distribution of clinical findings against normative standards; extent to which second examiners reproduce observations made by primary examiners (reliability); judgments made by an independent, external, and expert review of primary test results; and concordance between screening test results and subsequent diagnostic evaluations. The quality assurance subcommittee sought credible, objective, unbiased, methodologically sound, and efficient quality assessment procedures. The subcommittee also considered procedures enabling continual and concurrent monitoring of quality. For each of the screening test modalities, the subcommittee considered the value and feasibility of the following specialized quality assurance methods: central training of examiners, central certification and registration of examiners, on-site inspections, and central audits [8, 9]. Several practical considerations moderated the ambitions of the quality assurance subcommittee. First, funding limitations prohibited central examiner training. Second, only the screening chest X-ray appeared amenable to central audits. The TVU produces a permanent image available for central and independent review. However, permanent images may miss important sources of variability. The permanent image depends on skillful real-time observations, activities that aim to locate ovaries and to recognize morphologic abnormality. A negative or normal TVU examination, particularly one that fails to image one or both ovaries, may reflect inadequate searches for and failed real-time recognition of morphologic abnormality. Third, the complexities and dimensions of the overall PLCO effort severely limited the extent to which sophisticated analyses of central data resources could be used to monitor concurrently and continually the performance of individual SCs or of individual examiners within separate SCs. Therefore, the PLCO found it necessary to depend on the initial selection of participating SCs, principal investigators (PIs), clinical collaborators, and examiners. In principle, this selection represents those academic institutions and investigators committed to high-quality research and

PLCO Quality Control Procedures

393S

successful PLCO implementation. The PLCO presumed that these institutions and individuals know how to perform flexible sigrnoidoscopy and interpret chest X-rays. The final quality control plan, therefore, aims primarily to standardize examination procedures across SCs, to produce uniform and consistent data, and to maintain accountability. THE FINAL QUALITY CONTROL PLAN (INCORPORATING AND ENHANCING QUALITY ASSURANCE ELEMENTS)

The final quality control plan has the following basic elements: (1) uniform policies, procedures, definitions, and data collection forms; (2) distinction between trainers and examiners; (3) central approval and registration of PLCO trainers and examiners; (4) direct observation of screening test performance during periodic site visits by the NCI; (5) periodic analysis of screening test data; and (6) procedures for independently duplicating or reviewing selected examinations. The PLCO developed written protocols and data forms that standardize the performance of cancer screening examinations and the reporting of test results. Chapters in the PLCO MOOP define timing of examinations, examination protocols, equipment specifications, participant preparation, definitions for adequate examination, definitions for a positive screening test result, and minim u m examiner qualifications, experience, and training. For each examination procedure, the PLCO developed scannable data forms, which are completed by the examiners. In general, forms record adequacy of examinations, complications, primary findings, interpretations, referral recommendations, and examiner identities. Examiners receive copies of the relevant MOOP chapter and both written and oral instructions on completing PLCO data forms. PLCO procedures include manual and automated editing of data forms. By identifying forms with inconsistent data, these procedures identify examiners who need to be reminded about PLCO definitions or rules for completing forms. For example, written protocols specify that TVU examiners spend no less than 5 minutes looking for each ovary or the iliac vessels, image both ovaries in two planes, and record the transverse, longitudinal, and anteroposterior diameters of each ovary. The PLCO uses examiners who have performed at least 50-100 TVU examinations and passed the OB/Gyn section of the American Registry of Diagnostic Medical Sonographers certification examination. The MOOP specifies use of 5-7.5-Mhz transvaginal probes. The PLCO defines a positive TVU screening test result as any examination that shows an ovary or cyst greater than 10 cc in volume, a uterine adnexal morphologic abnormality consisting of a solid area or papillary projection extending into a cyst cavity, or a uterine adnexal morphologic abnormality with both solid and cystic components [10, 11]. Table 1 lists the minimum qualification for TVU examiners, flexible sigrnoidoscopists, radiology technicians, radiologists, digital rectal examiners, and ovarian palpation examiners. In the PLCO, nonphysician examiners are allowed to perform DRE, ovarian palpation, TVU examination, and flexible sigmoidoscopy. In each case, the PLCO expects local oversight of nonphysician examiners by so-called trainers, physician specialists, or expert technicians. In general, urologists supervise nonphysician digital rectal examiners, gynecologists supervise nonphysician

394S

J.L. W e i s s f e l d et al.

o

...In

0

~ ~

v

o

o

,u -~

~0 ~ ,~

'~ O 0

0

~ ~

0

0

0 ~

~ ~

~

~

~



~



~

~ ~

e ~ "o

~ 0

~

~

<

~o

~ o~

~..~

-B o "o X r.z.l

©

<

~

o..u



o~

-..in r~

c~

0

"~

"i ~•" ,~

[.-,

~

~

~ "8

~

.

~'~i~

PLCO Quality Control Procedures

395S

ovarian palpation examiners, gastroenterologists supervise nonphysician flexible sigmoidoscopists, and radiologists, gynecologists, or senior technologists supervise less-senior TVU examiners (see Table 1). The trainers formally attest to each nonphysician examiner's qualifications and prior experience. For some modalities (DRE, ovarian palpation examination, and flexible sigrnoidoscopy), persons seeking PLCO certification must complete a minimum and specified number of training examinations under the direct supervision of a PLCO trainer. The training examinations use PLCO procedures, definitions, and forms. The trainers directly observe trainees and repeat training examinations. The trainers use observations made during the training examination to confirm each examiner's competence. For accountability, SCs must centrally register each trainer and examiner. SCs record examiner and trainer qualifications on a standardized PLCO data form (the Record of Experience, Credentials, and Training) submitted to the CC, along with licenses or certificates that provide documentary support. The central register contains the identity, unique trial identification number, date of registration, and qualifications for each trainer and examiner. The NCI and the CC regularly visit each SC to review medical and operational issues. Site visitors directly observe cancer screening examinations and use detailed checklists to determine whether examination procedures conform with written protocols. Site visitors also audit screening examination data collection forms. The PLCO routinely examines two primary results from cancer screening. For each modality, the PLCO examines the test-positive proportion (the proportion of examinations with findings that define a positive screening test and prompt referral for diagnostic evaluation) and the test-inadequate proportion (the proportion of examinations deemed technically inadequate or deficient). Periodically, the PLCO examines cumulative data and calculates these proportions, by SC and by examination sequence (T0-T5). Baseline (TO) examinations occur soon after randomization. T1-T5 examinations occur soon after the annual anniversary of randomization. The PLCO plans to monitor the test-positive proportion and the test-inadequate proportion, not only according to SC and sequence, but also according to calendar time and examiner identity. These analyses aim to identify SCs or individual examiners who appear to perform in a nonstandard manner. For assessment of the TVU, additional analyses calculate the proportion of examinations that visualize one or both ovaries, the average ovarian volume, and the proportion of ovaries containing any morphologic abnormality [12, 13]. These analyses have already demonstrated substantial variability with respect to ovarian visualization. These observations stimulated special on-site training in TVU procedures at selected SCs by a PLCO physician expert. The PLCO implements procedures for duplicating, directly observing, or otherwise reviewing selected examinations. Quality assurance subcommittee guidelines encourage test replication by independent examiners. These guidelines specify selection of digital rectal, ovarian palpation, TVU, or flexible sigmoidoscopy examinations for independent replication by a second examiner at the same participant visit, videotaping of sigmoidoscopic mucosal inspections for later independent review by a second examiner, and selection of chest X-rays for independent second readings. Using these data, the PLCO will

396S

J.L. Weissfeld et al. evaluate interexaminer agreement with respect to principal examination findings. Some examination procedures are more often positive than others. Samplesize goals for each screening test modality vary as a result (see Appendix). Using data collected over 12-month intervals, this approach should produce reasonably precise estimates of concordance at each SC for each screening test modality.

DISCUSSION In general, important quality control issues for multicentered, randomized clinical trials include trial coordination, data collection and management, and standardization of measurement [14, 15]. Quantitative measures of test performance include compliance rates, test positivity, cancer yield, positive and negative predictivity, test sensitivity and specificity, and interval cancer incidence [16]. By studying these measures as a function of calendar time, SC, examiner, and examiner characteristic, the investigators will learn about the quality of the PLCO cancer screening examinations. Also, the screening chest X-rays and flexible sigmoidoscopy videotapes provide opportunities for retrospective assessments that could only uncover quality problems after the intervention is largely over. Reasonable and measured prospective steps designed to encourage uniformity and to guarantee accountability are essential. The PLCO interventions, particularly the digital rectal and ovarian palpation examinations, contain qualitative and subjective elements. Thresholds that define abnormal physical findings resist standardization. Substantial interobserver variation is probably unavoidable [17]. No amount of examiner selection, training, or supervision will eliminate the fundamental limitations of these examination procedures. Therefore, some reliance must be placed on the integrity of local investigators and the competence of clinical examiners. During the design phase of the trial, the PLCO identified quality assurance as a high priority. Each SC PI developed a local quality assurance plan. The NCI reviewed and approved these plans. These quality assurance plans varied in particulars, but uniformly addressed examiner credentials, training, and supervision. The goal of the trialwide PLCO quality assurance plan, described above, is to standardize definitions and to guarantee accountability. Interim data analyses are designed to spot nonuniformity, in terms of the test-positive and testinadequate proportions. Finally, procedures encourage total or partial replication of selected examinations by a second independent examiner. At a minimum, replicating procedures enforces local oversight and encourages communication between examiners. In addition, data produced as a result potentially allow estimation of test reliability and, in the extreme, identification of SCs with exceptionally poor reliability. Together, the final PLCO quality assurance plan is a deliberate effort that balances idealized notions for quality control against very pragmatic considerations. REFERENCES

1. Prorok PC, Andriole GL, Bresalier RS, et al. Design of the Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening Trial. Control Clin Trials 2000;21:273S-309S.

PLCO Quality Control Procedures

397S

2. Hasson MA, Fagerstrom RM, Kahane DC, et al. Design and evolution of the data management systems in the Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening Trial. Control Clin Trials 2000;21:329S-348S. 3. O'Brien B, Nichaman L, Browne JEH, et al. Coordination and management of a large multicenter screening trial: The Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening Trial. Control Clin Trials 2000;21:310S-328S. 4. Hannan EL, O'Donnell JF, Kilburn H Jr, et al. Investigation of the relationship between volume and mortality for surgical procedures performed in New York State hospitals. JAMA 1989;262:503--510. 5. Hosenpud JD, Breen TJ, Edwards EB, et al. The effect of transplant center volume on cardiac transplant outcome. A report of the United Network for Organ Sharing Scientific Registry. JAMA 1994;271:1844-1849. 6. Hughes RG, Hunt SS, Luft HS. Effects of surgeon volume and hospital volume on quality of care in hospitals. Med Care 1987;25:489-503. 7. Donabedian A. Evaluating the quality of medical care. Milbank Mem Fund Q 1966;44 (Suppl):166-206. 8. Baines CJ, McFarlane DV, Miller AB. The role of the reference radiologist. Estimates of inter-observer agreement and potential delay in cancer detection in the National Breast Screening Study. Invest Radiol 1990;25:971-976. 9. Baines CJ, Miller AB, Kopans DB, et al. Canadian National Breast Screening Study: Assessment of technical quality by external review. Am J Roentgenol 1990;155:743-747. 10. DePriest PD, van Nagell JR Jr, Gallion HH, et al. Ovarian cancer screening in asymptomatic postmenopausal women. Gynecol Oncol 1993;51:205-209. 11. DePriest PD, Gallion HH, Pavlik EJ, et al. Transvaginal sonography as a screening method for the detection of early ovarian cancer. Gynecol Oncol 1997;65:408-414. 12. Higgins RV, van Nagell JR Jr, Woods CH, et al. Interobserver variation in ovarian measurements using transvaginal sonography. Gynecol Oncol 1990;39:69-71. 13. Wolf SI, Gosink BB, Feldesman MR, et al. Prevalence of simple adnexal cysts in postmenopausal women. Radiology 1991;180:65-71. 14. Gagnon J, Province MA, Bouchard C, et al. The HERITAGE Family Study: Quality assurance and quality control. Ann Epidemiol 1996;6:520-529. 15. Gassman JJ, Owen WW, Kuntz TE, et al. Data quality assurance, monitoring, and reporting. Control Clin Trials 1995;16(Suppl):104S-136S. 16. Baines CJ, Miller AB, Bassett AA. Physical examination. Its role as a single screening modality in the Canadian National Breast Screening Study. Cancer 1989;63:1816-1822. 17. Smith DS, Catalona WJ. Interexaminer variability of digital rectal examination in detecting prostate cancer. Urology 1995;45:70-74. 18. Fleiss JL. Statistical Methodsfor Rates and Proportions. New York: John Wiley & Sons; 1981. 19. Kraemer HC, Bloch DA. A note on case-control sampling to estimate kappa coefficients. Biometrics 1990;46:49-59.

APPENDIX Method for Determining Numbers of Repeat Screening Examinations for Quality Assurance This a p p e n d i x describes m e t h o d s for d e t e r m i n i n g n u m b e r s of screening examinations that m u s t be r e p e a t e d or re-evaluated for quality assurance (QA) purposes. A different e x a m i n e r p e r f o r m s the initial screening a n d the Q A evaluation, so the desired Q A m e a s u r e is a g r e e m e n t b e t w e e n t w o examiners. For this p u r p o s e , w e use a k a p p a coefficient [18], w h i c h w e denote b y K, defined as follows: Let subscript I denote a positive screening test result a n d subscript

398S

J.L. Weissfeld et al. 0 denote a negative screening test result. Denote b y pq the probability that examiner I obtains result i and examiner 2 obtains result j, where i and j assume values in {0,1}. Define the marginal probabilities: 1

1

pi. = ~ pij

and

pq= ~pij

j=0

i=0

If we define: 1

1

To = "~ pii

and

"~Te =

i=0

Pip.i

~ i=0

then: ,iT° - K

,rl- e

-,ITmax - -

,TI e

where "rrma×is the maximum possible value of "aogiven the marginal probabilities. Note that ,: measures interexaminer agreement after correcting for the agreement expected if the examiners independently tossed biased coins. This agreement is measured relative to its m a x i m u m possible value, resulting in an index whose value lies between - 1 and 1. The estimation of ~ is straightforward under cross-sectional sampling in which a simple random sample of participants is selected and both examiners perform evaluations, since reasonable estimates of all marginal probabilities are then available. To estimate K, we make the assumption that the positivity rates for the two examiners are the same (i.e., pl. = p.1). Call this common positivity rate P. Then: ,rr° _ ITma x =

1

and

K

=

(1 2P(1 - P) p2 _

p)2

Let N be the total number of screening examinations re-evaluated, and, for i and j in {0,1}, let nq be the number of screening examinations interpreted as being in category i by examiner 1 and in category j by examiner 2. Define: pq = _~

N Then the maximum likelihood estimator [19] of K is: " ~ o - P 2 - (1-p)2 2P(1 - P) where: ~0 = P00 + P,,

and

P = 2fin + ill0 + fi01 2

The asymptotic variance of this estimator is: ¢2_N NlI(1-n)2(1-2n'+K(1-~(l'(-----2P~n)] Note that the variance depends on P, the underlying positivity rate assumed to be c o m m o n to both examiners. This rate can be estimated using results from all of the screening examinations, not just those singled out for re-evaluation.

PLCO Quality Control Procedures

Table A-l

399s

Annual Sample Sizes per Screening Center Needed to Detect (OneSided Test, (Y = 0.05, S = 0.10) an Inter-Examiner Agreement (K) of 0.40 When the Null Value Is 0.80, by Screening Modality

Modality

N

Digital rectal examination Chest X-ray Flexible sigmoidoscopy Ovarian palpation Transvaginal ultrasound

77 104 38 456 164

It is then possible to test the hypothesis I-&,:K 3 tion for the distribution of the statistic (R - K)/&, 6 =

[

(1 -

R)Z(l -

22) +

~~ using

a normal approxima-

where:

1

k(1 - I?)(2 - A) 1’2 2P(l - P)

This normal approximation

permits computation of the sample size required to detect with power 1-p a value ~~ for K using the one-sided a-level test. The sample size is: N = (G-p% + z1--ao0)2 (Kg - Kl)*

where zq denotes the qth quantile of the standard normal distribution and u. and u1 correspond to K = ~~ and K = K~, respectively. Using this one-sided statistical test (o = 0.05, I3 = 0.10) and estimating P with interim PLCO data, we computed sample sizes for distinguishing K = 0.4 from K = 0.8 as indicators of fair and good agreement, respectively (Table A-l). With one exception, Table A-l shows the number of examinations each year replicated or reviewed by each PLCO screening center. The sample-size requirements for the ovarian palpation examination reflect a low positivity rate (P). Having each screening center replicate 456 ovarian palpation examinations each year was judged unreasonable and burdensome. Therefore, the desired number of replicate ovarian palpation examinations was reduced to 164, the sample-size goal for transvaginal ultrasounds. This smaller sample size decreases the power for the ovarian palpation examination from 0.90 to 0.63.