TheBreasr(1996)5,398403 0 1996 Pearson Professional Ltd
ORIGINAL ARTICLE
Dual reading in a non-specialized breast cancer screening programme B. SCradour*, S. Wait+, J. Jacquemier$, M. Dubuc5 *ARCADES (Association pour la Recherche et le Depistage des Cancers du Sein) Hopital la Timone, Marseille, ?Private Public Health Consultant, Paris, $Recueil Histologique des Pathologies du Sein Departement d’anatomie pathologique, Institut Paoli Calmettes, Marseille, sLaboratoire de Sante’ Publique, Faculte’ de Mdecine, Marseille, France
S UM MA R Z This paper evaluates the contribution of dual reading to cancer detection rates in a non-specialized breast cancer screening programme in the Bouches-du-RhBne district (France). All mammograms were read first by one of 120 field radiologists and secondly by one of six experts at the coordinating centre. All women with at least one positive reading were recalled. 2799 of the 77 236 women screened were recalled by the field radiologist and 1770 by the expert. The expert detected an additional 15% (62/427) of cancers. 49% of these tumours were smaller than 10 mm and 38% of grade 1. The marginal cost of dual reading was $4,040 per additional cancer detected and 4.6% of total programme costs. In our decentralized programme, dual reading allows for the detection of good prognosis cancers which are missed by an isolated reader. The effects of dual reading on screening quality greatly outweigh its costs.
INTRODUCTION
tit of dual reading. 5-9The aim of this paper is to assessthe added value of dual reading to cancer detection within the framework of a non-specialized screening progmmme. Sources of observer error and inter-observer variability in the interpretation of mammograms are examined. Lesions missed by the first or second reader are compared with respect to their prognostic factors and histology. Finally, the marginal cost of dual reading is calculated and measured against its effect on cancer detection.
The Bouches-du-Rh6ne breast cancer screening programme was initiated in 1989 as one of 13 district programmes within the French Prevention Department (Fonds National de Prevention) breast cancer screening policy. Despite discrepancies in the chosen number of views, the screening interval and the target age group, all 13 programmes have adopted a non-specialized model of screening using existing radiology facilities as opposed to specialized screening units. Dual reading of lihns and quality assuranceof mammographic units are performed systematically in order to ensure optimal screening quality. The purpose of this paper is to examine the specific impact of dual reading on the detection of breast cancers during the first round of the Bouches-du-RhBne breast cancer screening programme. In our programme, dual reading involves an initial interpretation of screens by one of 120 radiologists practising in the public or private sector and a second interpretation by one of six expert radiologists. All women for whom at least one reader suggests further assessment are recalled. This system was chosen to achieve optimal screening sensitivity. There exist few reports in the literature on the variability of mammogram interpretationslA or on the potential beneAddress correspondence to: S. Wait, rue Fondary, 75015 Paris, France
Private
Public
Health
Consultant,
MATERIALS
AND METHODS
The Bouches-du-RhGne breast cancer screening programme addresses a target population of 190 075 women aged 50-69 years. Women receive an invitation to attend screening from the sickness insurance fund to which they are affiliated. This invitation features a list of the 120 radiologists participating in the programme and located throughout the district. Women are free to contact the radiologist of their choice. Women who are not affiliated to one of the participating sickness insurance funds do not receive an invitation. It is estimated that this group represents approximately 10% of the target population. In 1990, 60 radiologists participated in the programme. In 1991 this figure rose to 100 and in 1992, by the end of the first round, all 120 radiologists performing mammograms in the district whose equipment was found to be compatible
8
398
Dual reading in a breast cancer screening programme with the screening programme’s quality standards were involved in the programme. Fifty-five per cent of radiologists are concentrated in Marseille and the remaining 45% cover an area which is essentially semirural. Ninety-four per cent are in the private sector and 6% in the public hospital sector. Fifty-eight per cent of mammography units used for screening are General Electric, 17% are Metaltronica, 9% Philips, 5% Siemens, 7% Merate and 4% are other brands. Screen film systems are 70% Kodak, 20% AGFA and 10% a mixture of Fuji, DuPont, Konica and 3M. Participating radiologists have different levels of training with screening as well as yearly throughput of mammograms. Over 60% of mammograms were reported by less than 10% of radiologists in the first round of screening. Participating radiologists must sign a binding agreement with the programme coordinators whereby they agree to perform single mediolateral oblique view mammograms, to submit their results to the coordinating centre for dual reading and not to divulge any results to the patient directly. They must also accept to partially fund the quality assessment of their equipment. A district-wide quality assurance programme of all mammographic equipment was implemented in 1993 under the supervision of the biomedical engineering department of La Timone hospital in Marseille. The initial interpretation of mammograms is made by a radiologist in the usual clinical setting (‘field radiologist’). The results are then sent to the coordinating centre where mammograms are sorted for dual reading. Dual reading has been performed using an automated multiviewer since 1993 and two film illuminators before that date. The six radiologists performing dual reading are specialized in breast imaging and each had read at least 12 000 mammograms before entering the programme. Each expert spends on average 1.5-2 hours per week reading 120-200 screens. The weekly volume of screens read ranges from 720 to 900 (mean, 750). Dual reading is double-blind in that the identity of the first radiologist is unknown to the second reader and vice-versa. The result of the first reading is made available to the expert once interpretation has been made. All six experts are also ‘field radiologists’ in that they regularly perform screening mammograms in their clinical setting. Films are therefore presorted for dual reading by the coordinating centre secretary in order to avoid experts rereading their own films. Screening results are classified into three possible categories: normal (no cancerous lesion), benign lesion not requiring further assessment, and lesions suggestive of cancer (further assessment required). The presence of microcalcifications is noted for all suspicious lesions. Mammograms judged of insufficient technical quality are put aside for technical recall and must be performed by the field radiologists at their own expense.
399
Women with a normal screening result are asked to return in 3 years for a routine screen whereas women with an indication for further assessment are referred to their treating physician (gynaecologist or GP) for diagnostic work-up. The screening programme plays no role in the diagnosis or treatment options of women with positive-screen results. The outcome of all positive-screen results is monitored at the coordinating centre through patient and physician surveys. Histological information is obtained from a pathology registry held at the Pathology Department of the Institut Paoli Calmettes in Marseille which collects data from 30 pathology laboratories on all breast biopsies performed in the district. The registry covers all age groups and its completeness was estimated at 85% for the first round of screening. By matching the screening programme and the registry databases it is possible to identify all cancerous and benign breast lesions biopsied in the district in the screened, non-attending and non-invited women. For the purposes of this study, we have classified all detected lesions into three groups: lesions detected both by the field radiologist and the expert (Rl+R2+), lesions found positive by the field radiologist but negative by the expert (Rl+R2-) and lesions found negative by the field radiologist but positive by the expert (RI-R2+). Using registry data, the results from surgical biopsies of all positive-screen cases and the size, nodal involvement and grade of all tumours detected have been studied with respect to reader interpretations (Rl+R2-, Rl-R2+ and Rl+R2+). Information on interval cancers was only available for those cancers occurring within 12 months following screening. Using this information, the potential contribution of dual reading to sensitivity, specificity and small cancer detection rates has been estimated. The cost of dual reading has been calculated in terms of direct and indirect costs. Direct costs include personnel time (expert readers, secretarial and data entry staff) and other capital costs. Indirect costs include the cost of assessment of false-positive cases recalled by the expert reader. These costs are based on a survey of all women with a positivescreen result. A questionnaire was mailed to these women or their physician 6 months after screening in order to determine which diagnostic procedures had been performed following screening. Based on these data, the average frequency of each procedure was multiplied by the national tariff to define a standard work-up protocol. This protocol was then applied to the 581 false-positive cases attributable to dual reading. The cost of 63 benign biopsies performed on these women was then added to approximate the indirect medical costs of dual reading. The overall cost of the Bouches-du-Rh6ne programme was estimated in a previous analysis at $67 per woman screened.” This estimate did not include any diagnostic or assessment costs and merely covered direct costs incurred
400 Table
The Breast 1
Age distribution
of screening
Age’ 0-r)
Screening n
attenders %
5&54 55-59 60-64 65-69
17 19 19 18
554 650 847 819
23.1 25.9 26.2 24.8
Total
75 870
100.0
attenders
Target population n % 46 48 48 46
Attendance
427 644 240 764
24.4 25.6 25.4 24.6
37.8 40.4 41.1 40.2
190 075
100.0
40.6’
rate
‘For 1366 women, either the age was incorrectly entered into the database or these women were erroneously selected outside of the target fge gro”p. For the overall taken.
attendance
rate, the total number
of attenders
(77 236) is
at the level of the coordinating centre, radiology clinics and sickness insurance funds.
RESULTS The first round of screening was attended by 77 236 women (47% of the invited population). The age distribution of attenders is provided in Table 1. Three thousand four hundred and forty-two (4.4%) women were recalled for diagnostic assessment and 795 surgical biopsies were performed. Information on assessment was incomplete for 7% of women recalled. Four hundred and twenty-seven cancers and 368 benign lesions were found and 5.5 cancers were detected per 1000 women screened. Quality and efficacy indicators for the first round of screening are summarized in Table 2. Recall rates varied from 2.0-3.4% for the six experts and from l&13.0% for the field radiologists. The overall recall rate for field radiologists was 3.6% (n = 2799) and 2.3% for experts (n = 1770). This difference is statistically significant (P
Table 2 Results RhBne programme
of tbe first round (1990-1992)
Target population Number of women invited Number of women screened Number of women recalled Number of women assessed Number of women biopsied Cancers detected (%) Interval cancers at O-12 months Histological properties2: Carcinoma in situ Invasive carcinoma Tumour size ~10 mm 1 l-20 mm >2omm unknown Grade: 1 2 3 non-gradable grade unknown Nodal involvement N+ Nunknown
(%)
of screening
of the Bouches-dun
%’
190 075 159 472 77 236 3442 3097 798 427 51
100.0 83.9 40.6 4.4 4.0 0.1 5.5 0.7
41 370 99 124 81 66 100 153 30 34 53 75 216 79
10.0 90.0 32.6 40.7 26.6 31.5 48.3 9.4 10.7 25.8 74.2
‘Recall, technical recall, assessment and biopsy rates are calculated with reference to the number of women screened; histological properties, with reference to the number of invasive cancers with known size, grade or nodal involvement; interval cancers, with reference to the number of women screened. *Histology is known for 411 out of 427 cancers.
(PPV) of surgical biopsy was 54.7% for the field radiologist, 63.3% for the expert and 66.8% for Rl+R2+ lesions. Table 3 presents quality indicators for three possible screening modalities: mammograms are read by the field radiologist only (non-expert reading only), mammograms are read by the expert only (expert reading only) or mammograms are read by both and all positive cases are recalled (current dual reading system). A system with single reading by a non-expert would recall 3.7% of women, would miss at least 62 cancers and would increase specificity by 0.7% with respect to the current dual reading system. A system with expert reading only would recall 2.3% of women, would miss at least 44 cancers and cause 9.4% fewer benign biopsies and increase specificity by 2.1%. The three groups of lesions differed with respect to tumour size. Of invasive cancers missed by the field radiologist 48.8% were smaller than 10 mm and 37.8% were of grade 1. These rates for cancers missed by the expert were 42.4% and 23.5%, respectively. For cancers detected by both readers, they were 27.6% and 31.5%. Conversely, 14.6% of Rl-R2+ were larger than 20 mm as compared to 18.2% of Rl+R2- cancers and 29.3% of Rl+R2+ cases. 19.6% of Rl-R2+ cancers had microcalcifications present as compared to 33% of Rl+R2- cancers. The rate of microcalcifications was 32% for cancerous lesions detected by both the field radiologist and the expert.
Dual reading in a breast cancer screening programme Table
3
The relative
quality
and effectiveness
of three screening
Single reading radiologist No. of mammograms read No. of cases recalled Recall rate No. of biopsies Biopsy rate (%) No. of benign biopsies No. of false positives Positive predictive value of biopsy No. of cancers detected Carcinoma in situ Tumour size2: < 10 mm > 20 mm Interval cancers at O-12 months Sensitivity3 Specificity
by field
76 173’ 2799 3.7 673 0.9 305 2434 54.7 365 33 79 32.1% 75 30.5% 114 76.2 96.8
401
modalities Single reading by expert radiologist
Dual reading (current system)
77 236 1770 2.3 605 0.8 222 1385 63.3 383 35 85 34.6% 75 30.5% 96 80.0 98.2
77 236 3442 4.4 798 1.1 368 3015 53.9 427 41 99 32.5% 81 26.6% 52 89.1 96.1
1At the beginning of the programme, 63 mammograms were not interpreted by the field radiologist and were interpreted directly to the expert. 2Tumour size is known for 304 invasive carcinomas. The percentage of tumours < 10 mm or > 20 mm is calculated on the total number of invasive cancers of known size detected by the field radiologist (n = 246) or the expert (n = 246). 3A preliminary estimate of sensitivity was made based on our knowledge of 12-month interval cancer rates.
13.3% of Rl-R2+ cancers were carcinoma in situ as compared to 13.6% of Rl+R- and 9.1% of Rl+R2+ cases. Table 4 provides a breakdown of the marginal costs of dual reading in our programme. The cost of screening without double reading has previously been estimated at $65.60 (328 French Francs) per woman screened in the first round. The average cost of assessment for a positive-screen woman is $181 without biopsy and $758 with biopsy. The added cost of surveillance was not included in our calcula-
Table
4
Marginal
cost components
of the dual reading
Cost components Structural costs 2 film illuminators
system Total cost (US$)’
depreciated
over 7 years
Personnel costs 6 radiologists 1 secretarial staff Assessment costs for falsepositives 1.98 GP consultations 0.22 gynecological consultations 0.463 surgical consultations 0.988 diagnostic mammograms 0.772 ultrasound 0.312 thermography2 0.046 fine needle aspiration benign biopsies (n = 63)
78 000 29 884 (n = 581)
Total costs Total cost of dual reading Marginal cost per woman screened Share of total programme cost Cost per extra cancer detected through
29 863 5112 7785 45 899 13 479 2324 813 36 325
dual reading
250 484 3.24 4.6 4040
‘Costs in US currency based on a conversion rate of 5 Francs per US dollar ‘Thermography was still used by very few physicians in the initial years of the programme.
tion for lack of precise data on women’s behaviour once they leave the screening programme. The total marginal cost of dual reading was $250 484 for the first round of screening, which translated into $3.24 per woman screened and $586 per cancer detected. A marginal cost of $4040 is required for each additional case of cancer detected through dual reading.
DISCUSSION Systematic dual reading has been recommended in France as a means of optimizing sensitivity in our decentralized breast cancer screening programme. Dual reading is not an integral part of many national protocols such as in the UK and there is considerable debate on the appropriate role and level of experience required of the second reader. In publications on this topic, dual reading alternately involves the double reading of films by equally trained radiologists, the rereading of films by one experienced radiologist in a retrospective audit procedure or the introduction of a reading committee which reaches a consensus decision on the discordant interpretations of two separate readers. These different models of dual reading have been adopted by certain district programmes in France. In the Bouches-du-Rhone, the system chosen was independent reading by two separate radiologists in which the interpretation of both readers is taken at par. The dual reading system adopted in the Bouches-duRhone is unique in two respects: firstly, it involves a large number of first readers with heterogeneous levels of experience in screening mammography. These 120 radiologists all practise mammography within their individual clinical
402
The Breast
settings using equipment of differing quality. Screening throughput ranges from 27 to 750 mammograms per year; however, this does not necessarily reflect overall volume of mammography as certain clinics may specialize in diagnostic radiology yet do relatively little screening. Despite these important discrepancies, recall rates are relatively homogeneous among field radiologists with the exception of a few outliers (four radiologists had recall rates greater than 10% and 12 less than 2%). This can partly be explained by the fact that all participating radiologists received training before entering the programme. A quality-assurance programme has been applied to all radiologists participating in the programme since 1993, however, its introduction occurred after completion of the first round of screening. The second unique feature of the Bouches-du-Rhone dual reading system is that the expert radiologist’s interpretation is taken as equal to that of the field radiologist. This system was also chosen in the Scottish experience reported by Anderson et al.’ and allows for optimal sensitivity while possibly undermining specificity. However, whereas the first and second reader in the UK setting are of equivalent training, it is assumed that there is a net differential in expertise between our field radiologists and the six expert readers. The 15% increase in cancer detection attributable to dual reading is comparable to increases reported in other studies: 9% in the Finnish study,6 11% in the Scottish study,5 15% in the Swedish studies7V8and 5% in the American report9 It should be noted that, in our programme, the first non-expert reader contributed an additional 10% to cancer detection and the difference in increased detection rates by both readers is not statistically significant. This further underscores the desirability of dual reading even when opposed to a system of single reading by an expert only, as is illustrated in Table 3. Cancerous lesions missed by either the field or the expert radiologist tend to be of good prognosis and a higher rate of carcinoma in situ is detected by one reader only than by both readers together. Similar findings were reported in the Finnish,6 Swedish’,’ and Scottish studies.5 Only in the Scottish study was there no statistically significant difference in the size of tumours missed by one of the readers with respect to those detected by both readers. Intra-observer and inter-observer variability among radiologists has been assessed in a number of studies.lA The kappa value for overall agreement in our programme is relatively low (0.67) despite a high level of agreement (97.8%). It has been suggested that the kappa may not be the most reliable test for agreement on very heterogeneous samples.” Improved inter-observer agreement for cancerous as opposed to non-cancerous cases has also been reported in an ad hoc study of 45 screen detected cases in Italy’ as well as in a Canadian review of over 5000 ran-
domly selected mammograms.’ In our second round of screening (1993-95), inter-observer agreement improved for second-time attenders (45%) whereas it remained the same for women attending for the first time (33%). Similarly, the recall rate for second-time attenders is lower in the second round yet it remains at approximately 4% for first-attenders. These results suggest that inter-observer agreement is more contingent on the type of images read than on the actual quality of interpretation. Inter-observer variability on diagnostic work-up recommendations is not a direct outcome of our programme since all diagnostic and treatment strategies are made by the women’s treating physician outside of the scope of the actual screening programme. Our estimates of interval cancer rates were limited to carcinomas detected in normal-screen women between 0 and 12 months following screening. Based on this definition, dual reading increases sensitivity by at least 12.9% and specificity is decreased by 0.7%. The Scottish study found that a similar dual reading system increased sensitivity by 10.8% and specificity was decreased by 1.8%. At present, insufficient time has elapsed to estimate both interval cancer rates at 24 or 36 months after screening and the number of cancerous lesions detected at the second round which were missed at the first round. The true effect of dual reading on sensitivity may only be fully appreciated once these data are available. Dual reading added 4.6% to the total cost of the first round of screening. Medical costs for women unnecessarily recalled because of dual reading amounted to $141600 ($243 per false-positive, $1.83 per woman screened). Our cost of $758 per false-positive with biopsy compares with a Swedish estimate of 7537 Swedish Crowns per false-positive case in women aged 50-64 years.12False-positives represent a small share of our total programme costs given our relatively low recall rate (4.4%). However, this cost would be greatly inflated in programmes where recall rates exceed 10% and it is thus of primary concern for programmes with inappropriate training of radiologists and inadequate quality of mammographic equipment. The marginal cost of dual reading was of $3.24 per woman screened. Anderson et a1.5 estimated the added cost at El3 773 for a screening population one-third the size of our programme. Although our total cost ($250 484) is significantly higher than the UK estimate, we also assume that our ongoing programme cost is at least 60% higher due to the decentralized model of our screening system.‘3*14 There has been significant alarm lately among breast cancer screening programmes with regards to unexpectedly high interval cancer rates.15 As a result, attention is being given to dual reading and double-view mammography as possible means of increasing the small cancer detection rate and thereby improving the odds of achieving mortality reduction targets. In his study comparing dual reading with dou-
Dual reading in a breast cancer screening programme ble views, Thurfjell’ found that the contribution of dual reading to cancer detection rates was more significant than that of double views. Further research is needed in this area to determine which screening modality can provide the most important gain in small cancer detection rates at a reasonable cost. Double-view mammography is not performed in our programme, however, dual reading has contributed at least 15% to cancer detection and 20% to the detection of small cancers. Double-view mammography could be considered as a means of increasing specificity. Given that dual reading adds less than 5% to programme costs, we conclude that the marginal cost of dual reading is justified in terms of its benefit to our screening programme.
3.
4. 5.
6.
7.
8. 9. 10.
Acknowledgements We would like to thank the expert radiologists responsible for dual reading as well as all radiologists participating in the screening programme. We are also grateful to Dr Jean-Pierre Giordanella and Dr RCmy Didelot of the Doria Centre d’Examens de Sante as well as the entire staff of ARCADES for their important contribution to the breast cancer screening programme in the Bouches du RhBne. Finally, expressed thanks goes to Professor Lucien Piana, President of ARCADES, for his valuable comments on previous versions of this manuscript.
11.
12.
13
References 1. Baines C J, McFarlane D V, Miller A B. The role of the reference radiologist. Estimates of inter-observer agreement and potential delay in cancer detection in the national breast screening study. Investigative Radio1 1990; 25: 971-976. 2. Ciccone G, Vineis P, Frigerio A, Segnan N. Inter-observer and intra-
14
15
403
observer variability of mammogram interpretation: a field study. Eur J Cancer 1992; 28A(6-7): 1054-1058. Elmore J G, Wells C K, Lee C H, Howard D H, Feinstein A R. Variability in radiologists’ interpretations of mammograms. N Engl J Med 1994; 331: 1493-1499. Kopans D B. The accuracy of mammographic interpretation (editorial). N Engl J Med 1994; 331: 1521-1522. Anderson E D C, Muir B B, Walsh J S, Kirkpatrick A E. The efficacy of double reading mammograms in breast screening. Clinical Radiology 1994; 49: 248-25 1. Anttinen I, Pamilo M, Soiva M, Roiha M. Double reading of mammography screening films - one radiologist or two? Clinical Radiology 1993; 48: 414-421. Thurfjell E L, Lemevall K A, Taube A A. Benefit of independent double reading in a population-based mammography screening program. Radio1 1994; 191: 241-244. Thurfjell E. Mammography screening. One versus two views and independent double reading. Acta Radio1 1994; 35: 345-350. Bird R E. Professional quality assurance for mammography screening programmes. Radio1 1990; 177: 587. Lancry P J, Wait S. Evaluation Cconomique des depistages de masse du cancer du sein. A propos de cinq programs exp&imentaux francais. (Economic analysis of breast cancer screening: an analysis of 5 French pilot projects). Paris: CREDES Report No 997, 1993. Feinstein A R, Cicchetti D V. High agreement but low kappa.1. The problems of two paradoxes. II. Resolving the paradoxes. J Clin Epidemiol 1990; 43: 543-558. Lidbrink E, Elfving J, Frisell J, Jonsson E. Neglected aspects of false-positive findings in mammography screening. Abstract. Fourth International Cambridge Conference on Breast Cancer Screening. Cambridge, 1995. Lancry P J, Fagnani F. Evaluation tconomique du depistage systematique des cancers du sein et du co1 de I’uterus. Paris: COMAC-HSR, Paris. van Ineveld B M, van Oortmarssen G J, de Koning H J, Boer R, van der Maas P J. How cost-effective is breast cancer screening in different EC countries? Em J Cancer 1993; 29A(12): 1663-1668. Woodman C B J, Tbrelfall A G, Boggis C R M, Prior P. Is the three year breast screening interval too long? Occurrence of interval cancers in NHS screening programme’s north western region. BMJ 1995; 310: 224-226.