Effectiveness and cost-effectiveness of double reading of mammograms in breast cancer screening: findings of a systematic review

Effectiveness and cost-effectiveness of double reading of mammograms in breast cancer screening: findings of a systematic review

The Breast (2001) 10, 455–463 # 2001 Harcourt Publishers Ltd doi:10.1054/brst.2001.0350, available online at http://www.idealibrary.com on REVIEW Ef...

308KB Sizes 0 Downloads 39 Views

The Breast (2001) 10, 455–463 # 2001 Harcourt Publishers Ltd doi:10.1054/brst.2001.0350, available online at http://www.idealibrary.com on

REVIEW

Effectiveness and cost-effectiveness of double reading of mammograms in breast cancer screening: findings of a systematic review J. Dinnes,1,2 S. Moss,3 J. Melia,3 R. Blanks,3 F. Song1,4 and J. Kleijnen1 1

NHS Centre for Reviews and Dissemination, University of York, UK; 2Wessex Institute for Health Research and Development, University of Southampton, UK; 3Cancer Screening Evaluation Unit, Institute of Cancer Research, Surrey, UK; 4Department of Public Health and Epidemiology, University of Birmingham, UK S U M M A R Y. There is a lack of direct evidence on the effectiveness of double reading of breast screening mammograms within the context of national screening programmes even though about half of the countries that use mammography screening have implemented double reading. A systematic review was conducted to compare double reading with single reading of mammograms for screening accuracy, patient outcomes and costs. We searched an extensive range of electronic databases, bibliographies of studies were scanned and experts were contacted. Data extraction and quality assessment was undertaken independently by two reviewers. Estimates of the diagnostic accuracy were calculated for those studies with follow-up to identify interval cancers. Only 10 cohort studies met the inclusion criteria with reported extractable data on the effectiveness of double compared to single reading. The mix of methodologies meant that few conclusions could be drawn about the effect of double reading independent of number of views, or effects on size and type of tumours detected. Overall, double reading increases the cancer detection rate by 3–11 per 10,000 women screened and has a double impact on recall rates depending on the recall policy used. The benefit could be mainly in the detection of small cancers, and could be greatest where two readers have different strengths and weaknesses, or where readers are less experienced. Double reading can improve accuracy as compared with single reading. In particular, double reading by consensus or arbitration achieves an increase in cancer detection rate together with a reduction in the rate of women recalled for assessment. Further research should quantify the relative benefit from double reading according to recall policy and number of mammographic views, and estimate the impact on patient outcome. # 2001 Harcourt Publishers Ltd detection rates and to maintain standards in terms of population coverage. A survey conducted by the International Breast Cancer Screening Network (IBCSN) that assessed the breast cancer screening programmes in 21 countries found that about half had implemented independent double reading of mammograms.3 Most attempts to improve screening programmes focus either on increasing the uptake and coverage of screening or on improving the screening process itself. There is evidence that cancer detection rates in the UK have improved in recent years.4 This is likely to be due to a variety of factors, including changes in film density, increasing expertise in mammogram reading over time, and the introduction in 1995 of two-view mammography at all prevalent screens. The latter decision was made on the basis of evidence from a randomized controlled trial (RCT) showing that the use of two-view

INTRODUCTION Early trials of breast cancer screening indicated a significant reduction (25–30%) in breast cancer mortality,1,2 leading to the introduction of national breast screening programmes in a number of Western countries. Although the specifics of each individual programme vary, all are under pressure to achieve target cancer

Address correspondence to: J. Dinnes, Senior Research Fellow, Wessex Institute for Health Research and Development, University of Southampton, Mailpoint 728, Boldrewood, Southampton SO16 7PX, UK. Tel.: +44 (0)23 8059 5631; Fax: +44 (0)23 8059 5639; E-mail: [email protected] Received: 6 February 2001 Revised: 7 May 2001 Accepted: 16 May 2001 Published online: 22 August 2001

455

456

The Breast

mammography reduces recall and increases cancer detection at prevalent screens.5 A further decision that two views should be performed at all screens has recently been announced.6 A similar trial of double reading has not yet been conducted. However, in the UK some form of double reading [unilateral recall, recall by consensus or recall by arbitration (see Box 1)] has been used by about 88% (72/82) of breast screening units.7 The impact of double reading on recall and cancer detection is likely to depend on both the number of mammographic views and the recall policy used. Unilateral recall will increase the recall rate compared with that of a single reader, as the second reader will recall mammograms passed as normal by the first. Recall by consensus or arbitration can either increase or decrease recall and cancer detection depending on the proportion of mammograms requiring consensus or arbitration that are subsequently returned to routine recall. This paper summarizes the results of a systematic review of the effect of double reading of mammograms in terms of screening accuracy, patient outcomes (from longer-term follow-up and possible adverse effects of screening) and costs.

METHODS The review was carried out between April 1991 and July 1999. A broad literature search was conducted using an extensive range of electronic databases (see Appendix for details). For inclusion in the review, studies had to be RCTs of single versus double reading, populationbased cohort studies of breast screening programmes, or case-control studies of single versus double reading. Mammogram reading had to be undertaken by two people; studies including only automated methods of mammogram reading were excluded. At the very minimum, studies had to present sufficient data to enable recall rate and cancer detection rate (see Box 1) to be extracted or calculated for single reading and double reading of the same sample of mammograms. Additional studies that provided only summary estimates of test accuracy (including case-control studies) are presented in the full report.8 Papers were screened for inclusion independently by two reviewers (JD and JM), and any disagreements resolved by consensus. Outcomes assessed No studies were identified that compared impact of screening using double and single reading on breast cancer mortality. Comparison of screening sensitivity requires complete data on interval cancers and would

ideally exclude ‘true interval’ cancers not detectable on the screening mammogram. However, epidemiological studies that estimate screening sensitivity generally include all interval cancers. In practice, only limited data on interval cancers were identified in this review, and these did not permit further sub-classification. Our comparisons therefore focused largely on the impact of double reading on recall and cancer detection rates as estimated from cohort studies. Data extraction and quality assessment Relevant study data were extracted and an assessment of study quality was undertaken independently by two reviewers (JD and JM). Any discrepancies were resolved by discussion or consultation with one or more additional reviewers (JK or SM). Data analysis The impact of double reading on the number of women recalled and number of cancers detected per 10,000 women screened was calculated, and studied according to potential modifying factors such as number of mammographic views, screening round, age range and the qualifications and experience of the reader of the mammogram. The sensitivity and specificity of single and double reading were estimated only for those studies with follow-up to identify interval cancers. Because of the heterogeneity of the included studies with respect to the factors described above, formal statistical pooling of the data was not performed. However some crude comparisons of data on cancer detection from studies using either single or two-view screening were undertaken. Other statistical estimates of test accuracy (positive and negative predictive value, likelihood ratios and the diagnostic odds ratio) are available in the full report and were not felt to add to the interpretation of the available data in this article.8

RESULTS Of the 165 retrieved papers, 10 cohort studies provided sufficient data for this evaluation.9–18 The authors of three studies supplied additional data not provided in the published papers.12–14 Table 1 provides an overview of pertinent study characteristics and quality criteria. The studies were all carried out within population-based national, regional or pilot screening programmes, with relatively large sample sizes, and included women in a similar age range. Seven studies were described by authors as

Table 1 Study characteristics and selected quality criteria Reference (Country)

Sample size

Age range (years)

Recall policy

Seradour et al., 1997 (16), France

95 967

50–69

Unilateral

Ciatto et al., 1995 (11), Italy

18 817

50–70

Unilateral

Deans et al., 1998 (12), UK

258 003

50–64

Unilateral

Anderson et al., 1994 (9), UK Williams et al., 1995 (18), New Zealand

31 146 5659

50–64 50–64

Anttinen et al., 1993 (10), Finland

15 457

Renaud et al., 1991 (15), France Pauli et al., 1996 (14), UK

Screening round Appropriate gold standard?* (FU period)

Blinded

No. of views

Type of views

Scoring system

Mammogram readers

Three categories NS

Radiols

Three categories NS Two categories Two categories NS Four categories Two categories Five categories

Radiols

Partial (11 months) No

No

Single

MDL

Yes

No

Yes

No No

No No

Single/ Two Single/ Two Single Two

MDL CAD NS

Unilateral Consensus

Prevalent/ Incident Prevalent/ Incident Prevalent/ Incident Prevalent Prevalent

50–59

Consensus

Prevalent

Yes

Two

17 228 17 202

50–65 50–64

Arbitration Mixed

NS Yes

33 734

50–64

Mixed

Yes (36 months)

Yes

Leivo et al., 1999 (13), Finland

95 243

50–64

Mixed

No

Yes

Single Single/ Two Single/ Two Two

MDL CAD MDL NS

Warren et al., 1995 (17), UK

Prevalent Prevalent/ Incident Prevalent

Partial (24 months) No No{

Prevalent/ Incident

CAD NS

MDL CAD NS

Radiols

Radiols Radiols Radiols Radiols Radiol & radiogs Radiols Radiols

Double reading of mammograms

NS – not specified; MDL – mediolateral view (side to side); CAD craniocaudal view (top to bottom). *‘Yes’ indicates where true interval cancers were distinguished from false negative and occult cancers; ‘Partial’ indicates where all interval cancers were included; ‘No’ indicates no follow-up to identify interval cancers. { 18 Month follow-up data were available, but not in a format suitable for use in the review.

457

458

The Breast

prospective 9–11,15–17,19 and three studies were retrospective.12,13,18 Programmes used either single- or two-view mammography or both, and double reading was performed with a range of recall policies. Most of the studies were conducted at least partially during prevalent round screening. No data were available for incident round screening only, so it was not possible to investigate the effect of the length of screening interval on cancer detection rates. In all studies, the readers were aware that double reading was being carried out, and in four studies either the second reader was not blinded to the report of the first reader or this was not specified.9,12,13,15 Three studies provided follow-up information regarding interval cancers; the length of follow-up ranged from 1 to 36 months, and in each case the information was obtained from national cancer registries. Only one of these studies distinguished between true interval cancers and false negative results.17 RECALL RATES AND CANCER DETECTION RATES Recall policy Figures 1 and 2 show the recall and cancer detection rates respectively for single and double reading, grouped by the type of recall policy. As anticipated, double reading with unilateral recall increased the number of women recalled (by between 38 and 149 per 10,000 women screened), whereas consensus or arbitration policies, or a mix of the two, decreased recall (by between 61 and 269 per 10,000 women screened). In the

Fig. 1

study that used either unilateral (increasing recall) or consensus (decreasing recall) policies, there was an overall increase in recall rate.13 In every study, double reading increased the cancer detection rate (Fig. 2), with increases ranging from +2.9 to +11.2 (median +4.4) per 10,000 women screened. Insufficient evidence was available to detect any pattern in cancer detection according to recall policy. Results by cancer size (or stage) at detection The five studies presenting some information on the size and/or stage of cancers detected suggest that double reading might identify higher proportions of small or early stage cancers (Table 2). In each study the cancers detected only by one of the radiologists included a larger proportion of small cancers (1 cm) or Stage 0 or 1 cancers compared with those detected by two readers. Incidence of interval cancers and impact on screening accuracy Three of the four studies that provided follow-up data beyond the initial screening gave breakdowns for single and double reading.10,16,17 The number of interval cancers detected as a percentage of the total number of women screened increased with longer follow-up. One study also provided a breakdown of interval cancers by size; almost half (32/67) were 42 cm and 4 one-third (26/67) were between 1 and 2 cm.17 The breakdown was not given according to whether the cancers were trueinterval or false-negative. These data on interval cancers have also been used to demonstrate the difference in sensitivity and specificity

Recall rates for single and double reading. SR: single reading; DR: double reading.

Double reading of mammograms

Fig. 2

459

Cancer detection rates for single and double reading. SR: single reading; DR: double reading.

Table 2 Results by cancer size or stage at detection Author, year

Cancer size or stage

Cancers detected by both readers

Anderson et al., 1994 (9)

1 cm 4 1 or 2 cm 42 cm DCIS Lymphoma

24% (45/191) 40% (76/191) 21% (40/191) 15% (29/191) 0% (1/191)

Seradour et al., 1997 (16)

1 cm 41 or 2 cm 4 2 cm In situ Missing

25% (127/506) 31% (155/506) 20% (99/506) 10% (51/506) 14% (74/506)

Field radiologist 33% (18/54) 28% (15/54) 17% (9/54) 15% (8/54) 7% (4/54)

Warren et al., 1995 (17)

1 cm 41 or 2 cm 42 cm DCIS Missing

14% (36/261) 40% (105/261) 26% (69/261) 18% (46/261) 2% (5/261)

27% (9/33) 39% (13/33) 15% (5/33) 18% (6/33) 0

Ciatto et al., 1995 (11)

0 to 1 II+

71% (89/125) 29% (36/125)

82% (9/11) 18% (2/11)

Leivo et al., 1999 (13)

0 I II III IV

17%(31/181) 48%(86/181) 22%(39/181) 11%(19/181) 2% (4/181)

between single and double reading (Table 3). Sensitivity increases in each case, whereas specificity decreases with unilateral reacall16 and increases with consensus10 or mixed recall.17

Cancers detected by only one reader 29% 24% 24% 24% 0%

40% 35% 10% 15% 0%

(6/21) (5/21) (5/21) (5/21) (0/21) Expert radiologist 32% (25/78) 27% (21/78) 13% (10/78) 9% (7/78) 19% (15/78)

(8/20) (7/20) (2/20) (3/20) (0/20)

Studies using only single-view mammograms demonstrated increases in detection ranging from 4.416 to 6.915 per 10,000, whereas for the three studies using only twoview mammograms, cancer detection was increased by between 3.0 to 4.4 per 10,000.10,13,18

Number of mammographic views None of the identified studies allowed a direct comparison of the effect of double reading depending on whether single- or two-view mammography is used.

Reader qualifications Limited information is available on the effect of reader qualifications. In one study which teamed a radio-

460

The Breast

Table 3 Sensitivity and specificity of double reading Author, year Seradour et al., 1997 (16) Single reading Double reading Difference Anttinen et al., 1993 (10) Single reading Double reading Difference Warren et al., 1995 (17){ Single reading Double reading Difference

Sensitivity*

Specificity*

440/569=77.3% 506/569=88.9% +11.6%

92 939/95 398=97.4% 91 571/95 398=96.0% 71.4%

63.5/70=90.7% 68/70=97.1% +6.4%

14 943/15 387=97.1% 15 070/15 387=97.9% +0.8%

240/290=82.8% 270/290=93.1% +6.4%

31 354/33 444=93.8% 32 291/33 444=96.6% +2.8%

*Where blinded recommendations were available for more than one reader, the mean of the two readers was taken. { True interval cancers distinguished from false negative and occult cancers.

grapher with a radiologist,14 double reading recalled fewer women than the radiographer (median–153.8 per 10,000 women screened) but more than recommended by the radiologist (+20 per 10,000). Cancer detection increased for both comparisons, but by a larger differential for double reading compared to radiographer single reading. Similarly, in a study that employed field radiologists and expert radiologists,16 double reading resulted in larger increases in recall compared to the recommendations of the expert single reader (+209.7 per 10,000) than the field radiologist (+89.2 per 10,000). Again, double reading also had a greater impact on cancer detection compared to single reading by the field radiologists (8.1 extra cancers detected) than compared to single reading by the experts (5.6 extra cancers detected).

France (FF 21,838). Underlying these figures, however, are differences in both the recall and cancer detection rates and in the additional screening and assessment costs used. The analysis of Leivo et al. of a mixed recall policy (unilateral and consensus) produced an estimate of US$25,523 per extra cancer detected.13 The latter result is significantly higher than the other estimates mainly because of differences in the costs included. In one of the UK studies double reading with consensus recall was found to be both more effective and less costly than a single reading policy.21 Difficulties in transferring the results of cost-effectiveness analyses from one setting to another, and particularly from one country to another where there can be significant variation on the organization of services, complicates the interpretation of these variations.

ECONOMIC EVALUATIONS

DISCUSSION

The use of double reading has an impact on costs as well as benefits. Additional costs will be incurred from reading the same mammograms twice (plus the additional time costs of consensus or arbitration), and as a result of changes in the recall rate (increasing or decreasing the number of women referred for further assessment). Benefits could be gained or lost through the effect on both the cancer detection rate (which may ultimately translate into a reduction in breast cancer mortality), and on the false positive rate, through the impact on patient morbidity. Four high quality economic evaluations, all based on cohort studies included in this review, were identified.13,16,20,21 Two UK studies produced estimates of the incremental cost per additional cancer detected by double reading with unilateral recall within a similar range from £1162 to £2221 (excluding patient costs).20,21 A similar estimate was identified in a study conducted in

The findings of this review suggest that double reading results in the detection of additional cancers compared to single reading, and has a variable impact on recall rates depending on the policy used. Double reading with consensus reduces recall rates and increases specificity, whereas unilateral recall increases recall rates. However, the difference between the effect on cancer detection rates cannot be quantified from these results. The additional benefit of double reading could be mainly in the detection of small cancers. It has not been possible to identify the specific influences of the number of views used or the impact of the screening round(s) in which the studies were conducted. Although somewhat limited, there is evidence to suggest that the impact of double reading compared with single reading could also be greater where the single reader is less experienced. This issue is complicated by the possibility that the same reader can have different

Double reading of mammograms recall rates depending on the identity of the other reader and whether he/she is single reading, double reading with unilateral recall, or double reading with consensus. It was not possible to evaluate this because in every case the reader was aware of the double reading protocol. Perhaps the most important influence on the effectiveness of double reading is the use of single- or twoview mammography. Double reading is likely to be more effective where single-view mammography is used, and there is some evidence to support this from this review; however, the data are insufficient to quantify the difference. There is indirect evidence that double reading may be cost-effective even where two-view mammography is practised. An analysis by Johnston and Brown,22 based on an indirect comparison of reading practices at different breast screening units in the UK, estimated the effect of introducting an additional view and/or double reading from a baseline of single view with single reading (not a commonly practised policy). The authors concluded that two-view mammography could be more cost effective than single view, but that priority should be given to introducing double reading at those units currently practising single-view mammography and single reading.22 The original study on which these calculations were based was not included in the present review because it was a comparison of programmes that had selected to perform single or double reading and one or two views. However, the conclusions were broadly in agreement with those from this review. Unfortunately we found no studies reporting on the impact of double reading on patient outcomes in terms of morbidity or mortality. However, an increase in recall is likely to increase the number of women undergoing unnecessary assessment. This has both cost implications and a potential impact on physical and psychological morbidity, although it may be that the psychological impact of a false positive result is transient and shortlived. A further consequence of using double reading is the delay in delivery of screening results. However, there is some (limited) evidence that both women23 and physicians24 prefer the delayed results from double reading. Cost-effectiveness estimates have been produced which lie within the range of what may be considered to be ‘cost-effective’25,26 and furthermore, consensus recall may even be cost-saving compared to single reading alone, recalling fewer women and detecting more cancers.21 We were unable to identify any RCTs of single versus double reading. Although it has been possible, to some extent, to extract the relevant data from the identified cohort studies, the lack of prospective design leaves several methodological issues outstanding, in particular

461

the use of an appropriate gold standard. No study had sufficient follow-up to allow any estimate of the impact on mortality, and few studies provided information on interval cancers, such that it has not been possible to evaluate fully the impact of double reading on the incidence of true interval cancers. Further research is required to quantify the relative benefit from double reading according to recall policy and number of mammographic views and to estimate the impact on patient outcomes. Whereas an RCT might be considered the gold standard for such an evaluation, any such trial would need to be extremely large to show increases in cancer detection of the order suggested by this review. An economical approach would be to randomize only those mammograms where single and double reading recall decisions differed; but it is likely to be difficult not to recall women when this is recommended by double reading. Further cohort studies might confirm the results from the studies reviewed here and decrease the influence of potential publication bias. Alternatively, a controlled observational study using programmes currently carrying out single reading could be conducted. The crucial question that remains to be answered by further research is the impact of double reading on mortality, morbidity and quality of life. In conclusion, the available evidence suggests that double reading of mammograms can be cost-effective, improving sensitivity by detecting more breast cancers, and either increasing or decreasing specificity, as compared to single reading. In particular, consensus double reading has the potential to reduce the number of women recalled for assessment and should be the preferred protocol where double reading is employed. However, further research is required on the effectiveness and cost-effectiveness of double reading in relation to the number of mammographic views used, and the qualifications and experience of mammogram readers, from which to recommend changes to current screening programmes. The latter points are of particular relevance in the UK given the planned introduction of twoviews at all screens and possible changes in the workforce used to read and report mammograms. Acknowledgements This review was commissioned by the UK Department of Health R&D Division. We would like to thank Lisa Mather and Christine Ellis for assisting with the literature search and Drs Regina Pauli (Roehampton Institute, UK), Hilary Deans (Scottish Breast Screening Programme, UK) and Tiina Leivo (University of Helsinki, Finland) for the provision of additional data from their studies.

462

The Breast

BOX 1. GLOSSARY Unilateral recall: where a woman is recalled if either reader recommends it, without further consulation. Consensus recall: where the two radiologists disagree on a recall recommendation, they will confer in order to reach agreement. Recall by arbitration: where the two radiologists disagree on a recall recommendation, a third reader will usually meet with the original two readers in order to reach agreement. Recall rate: the proportion of women who are recalled for further assessment, e.g. if 600 women are recalled for further investigation after screening 10,000 women, the recall rate is 6%. Cancer detection rate: the number of breast cancers detected divided by the total number of women screened. For example, if 60 cancers are detected after screening 10,000 women, the cancer detection rate is 6/1000. Interval cancers: cancers detected in the screening interval following a negative screen. Can be classified as: ‘true negative’ where the tumour was not visible on the original screening mammogram; ‘false-negative’ where in retrospect the tumour was detectable; and ‘occult’ where even at diagnosis the tumour cannot be detected mammographically. Prevalent round: first round of screening to be undertaken; large pool of prevalent undiagnosed disease. Incident round: subsequent screening rounds; current UK national policy is for 3 year intervals between screening rounds. Sensitivity: the proportion of those with breast cancer who are recalled for assessment. For example, if there are a total of 100 women with breast cancer in all women screened, and 95 of these 100 women with breast cancer are recalled, then the sensitivity is 95%. Specificity: the proportion of those without breast cancer who are not recalled for further assessment. For example, if there are a total of 10,000 women without breast cancer, and 9800 of these women are not recalled for further investigation, then the specificity is 98%. Gold standard: a method, procedure or measurement that is widely conceived to be the best available, against which the interventions of interest should be compared. In this review, the appropriate gold standard can be provided only be follow-up to the next screening round with true interval or occult cancers excluded.

References 1. Tabar L, Fagerberg C J, Gad A et al. Reduction in mortality from breast cancer after mass screening with mammography. Randomised trial from the Breast Cancer Screening Working Group of the Swedish National Board of Health and Welfare. Lancet 1985; 1: 829–832. 2. Shapiro S, Venet W, Strax P et al. Ten to fourteen year effect of screening on breast cancer mortality. J Natl Cancer Inst 1982; 69: 349–355. 3. Ballard-Barbash R, Klabunde C, Paci E et al. Breast cancer screening in 21 countries: delivery of services, notification of results and outcomes ascertainment. Eur J Cancer Prev 1999; 8: 417–426. 4. Blanks R G, Moss S M. Breast cancer screening sensitivity in the NHSBSP: recent results and implications. Breast 1999; 8(6): 301–302. 5. Wald N J, Murphy P, Major P et al. UKCCCR multicentre randomised controlled trial of one and two view mammography in breast cancer screening. BMJ 1995; 311: 1189–1193. 6. Department of Health. The NHS Cancer Plan. London: Department of Health, 2000. 7. Gerard K, Brown J, Johnston K. UK breast screening programme: how does it reflect the Forrest recommendations? J Med Screen 1997; 4(1): 10–15. 8. NHS Centre for Reviews and Dissemination. Effectiveness and cost-effectiveness of double reading of mammograms in breast cancer screening – a systematic review. York: University of York; in press. 9. Anderson E D, Muir B B, Walsh JS et al. The efficacy of double reading mammograms in breast screening. Clin Radiol 1994; 49(4): 248–255. 10. Anttinen I, Pamilo M, Sovia M et al. Double reading of mammography screening films – one radiologist or two? Clin Radiol 1993; 48(6): 414–422. 11. Ciatto S, Del Turco MR, Morrone D et al. Independent double reading of screening mammograms. J Med Screen 1995; 2(2): 99–101. 12. Deans H E, Everington D, Cordiner C et al. Scottish experience of double reading in the National Breast Screening Programme. Breast 1998; 7(2): 75–79. 13. Leivo T, Salminen T, Sintonen H et al. Incremental costeffectiveness of double reading mammograms. Breast Canc Res Treat 1999; 54: 261–267. 14. Pauli R, Hammond S, Cook J et al. Radiographers as film readers in screening mammography: an assessment of competence under tests and screening conditions. Br J Radiol 1996; 69: 10–14. 15. Renaud R, Schaffer P, Gairard B et al. Principles and first results of the European program of breast cancer screening in the BasRhin. Bull Acad Natl Med 1991; 175(1): 129–147. 16. Seradour B, Wait S, Jacquemier J et al. Modalities of reading of detection mammographies of the programme in the Bouchesdu-Rhone. Results and costs 1990–1995. J Radiol 1997; 78(1): 49–54. 17. Warren R M, Duffy S W. Comparison of single reading with double reading of mammograms, and change in effectiveness with experience. Br J Radiol 1995; 68(813): 958–962. 18. Williams S M, Doyle T C A, Chartres S et al. Impact of independent double reading of mammgrams from the inception of a population-based breast cancer screening programme. Breast 1995; 4(4): 282–288. 19. Pauli R, Hammond S, Cooke J et al. Comparison of radiographer/ radiologist double film reading with single reading in breast cancer screening. J Med Screen 1996; 3(1): 18–22. 20. Cairns J, Van der Pol M. Cost-effectiveness of non-consensus double reading. Breast 1998; 7(5): 243–246. 21. Brown J, Bryan S, Warren R. Mammography screening: an incremental cost effectiveness analysis of double versus single reading of mammograms. BMJ 1996; 312: 809–812.

Double reading of mammograms 22. Johnston K, Brown J. Two view mammography at incident screens: cost effectiveness analysis of policy options. BMJ 1999; 319: 1097–1102. 23. Hulka C A, Slantez P J, Halpern E F et al. Patients’ opinion of mammography screening services: immediate results versus delayed results due to interpretation by two observers. AJR AM J Roentgenol 1997; 168(4): 1085–1089. 24. Slanetz P J, Moore R H, Hulka C A et al. Physicians’ opinions on the delivery of mammographic screening services: immediate interpretation versus double reading. AJR Am J Roentgenol 1996; 167(2): 377–379. 25. Laupacis A, Feeny D, Detsky AS et al. How attractive does a new technology have to be to warrant adoption and utilization? Tentative guidelines for using clinical and economic evaluation. Can Medical Assoc J 1992; 146: 473–481. 26. Drummond M, Mason J, Torrance G. Cost-effectiveness league tables: think of the fans. Health Policy 1995; 31: 231–238.

463

APPENDIX: DETAILS OF LITERATURE SEARCH Databases included MEDLINE, CINAHL, DHSS, BIOSIS, Embase, BIDS, CancerLit, NHS EED, CCTR, Dissertation Abstracts, PASCAL, Conference Papers Index, SIGLE, Health Star and EconLit. The strategy was designed using text words, such as ‘(double or dual) near reading’ combined with ‘breast or mammogra*’, as it was thought that the papers would not be well indexed. Two reviewers screened the search results (JD and FS or JK). The selected studies were retrieved and their reference lists scanned for additional relevant publications, supplemented by contact with experts to identify any further literature.