ARTICLE IN PRESS The Breast (2006) 15, 503–509
THE BREAST www.elsevier.com/locate/breast
ORIGINAL ARTICLE
Audit of negative assessments in a breast-screening programme in women who later develop breast cancer—implications for survival P.C. Allgooda, S.W. Duffyb,, R. Warrenc, G. Hunnamd a
QA Reference Centre for Cancer Screening for Eastern Region, Wolfson Institute of Preventive Medicine, Charterhouse Square, London EC1 M 6BQ, UK b Cancer Research UK, Department of Epidemiology, Mathematics and Statistics, Wolfson Institute of Preventive Medicine, Charterhouse Square, London EC1 M 6BQ, UK c Addenbrooke’s Hospital, Cambridge d King’s Lynn, QA radiologist for Eastern Region Received 13 July 2005; received in revised form 26 September 2005; accepted 1 October 2005
KEYWORDS Breast cancer; Screening; Mortality; Fatality; Core biopsy; Guidelines
Summary The aim of this study was to examine observed short-term survival, to estimate future survival, and to assess the impact on survival of amending procedures to avoid false negatives in women recalled for further assessment due to a suspicious screening mammogram. From the start of screening in the seven centres in the East Anglian region, 1 April 1989 to 31 December 1999, 503 493 women from a total population of 2.2 million were screened, 25 346 were recalled for an assessment and 3689 were diagnosed with breast cancer. Of the 21 657 women given a negative result at these assessments, 193 women were subsequently diagnosed with 194 breast tumours at the site previously assessed. These women were followed up for survival, with survival analysis adjusting for host and tumour attributes. We also predicted long-term survival using the pathological features of the tumours diagnosed. From previous estimates of tumour progression rates, we estimated the reduction in incidence of advanced tumours and the potential saving of lives had unsatisfactory assessments been carried out within guidelines. There were 17 deaths, 15 in women who had unsatisfactory assessments. Five-year survival was estimated at 93% (95% CI: 88–97%) for breast cancer and 91% (95% CI: 86–95%) for all cause deaths. Women with positive nodes and/or larger tumours had significantly worse survival. Twenty years survival for women with unsatisfactory assessments was estimated at 66% (35 deaths) and predicted a potential saving of 7–9 lives (14–18% reduction in expected fatality within this special tumour population) had original assessments been carried out within current guidelines. This retrospective audit of a small and special tumour population shows a potential reduction in breast cancer
Corresponding author. Tel.: +44 20 7014 0252; fax: +44 20 7014 0258.
E-mail address:
[email protected] (S.W. Duffy). 0960-9776/$ - see front matter & 2005 Elsevier Ltd. All rights reserved. doi:10.1016/j.breast.2005.10.002
ARTICLE IN PRESS 504
P.C. Allgood et al. deaths of 18%, had current guidelines been available for the original assessments. Increased use of percutaneous biopsy in recent years should address the problem. & 2005 Elsevier Ltd. All rights reserved.
Introduction In screening for disease, some false negative tests are inevitable. The likelihood of false negative tests is taken into account in the public health policy decision to implement a screening programme. Nevertheless a false negative may have important consequences for the individual concerned. In those who have a suspicious mammographic finding and are recalled for further assessment, it is possible in principle to avoid false negative results almost completely. While audit of the original screening radiology is often practised,1–3 audit of the assessment process for suspicious screens is less common.4 In our companion paper, we have summarised the results of an audit of cancers occurring in women previously assessed with negative findings.5 The major shortcomings in assessment practice identified in the audit related to a failure to perform percentaneous biopsy or suboptimal practices of percentaneous biopsy, such as failure to repeat inadequate biopsies. It is likely that practice has improved considerably since the period studied in this audit, since practice was observed to improve systematically with time during the audit periods. In addition, the purpose of the audit was nonjudgemental: the main aim was to identify the likely cases missed at assessment, the probable reasons for this, and the consequent targets for monitoring, training and practice. The latter were consistent with the 2001 guidelines for assessment.6 Here we examine observed short-term survival, estimate future survival using the attributes of the tumours diagnosed and the likely consequences of amending procedures to avoid false negatives at assessment. These estimates may be helpful in reinforcing educational messages for our own screening programmes and others. It also gives an estimate of the benefit of good assessment procedures.
Methods The details of the audit of cancers diagnosed in women with a previous negative assessment are
described in our companion paper.5 Assessments were judged to be satisfactory or not according to the procedures carried out at the original assessment compared with the procedures outlined in the 2001 guidelines.6 These include specific recommendations with respect to further mammography, clinical examination, ultrasound, percutaneous biopsy and surgical biopsy. Further details are described in our companion paper.5 From the start of screening in the seven centres in the East Anglian region, 1 April 1989 to 31 December 1999, 503 493 women from a total population of 2.2 million were screened, 25 346 were recalled for an assessment and 3689 were diagnosed with breast cancer. Of the 21 657 women given a negative result at these assessments, 193 women were subsequently diagnosed with 194 breast tumours at the site previously assessed. Of these, 139 assessments were judged to be unsatisfactory. The original assessments took place between the implementation of screening in the East Anglian region at the beginning of April 1989 to a last assessment date of 31 December 1999. Tumours were diagnosed up to 31 December 2000. Follow-up for survival was available to 1 December 2003. In tracking survival the duration of follow-up after diagnosis ranged from 6.4 months to 13 years 9 months. Women still alive on 1 December 2003 were defined as censored in the survival analysis. Survival was estimated using the Kaplan–Meier method and effects of tumour and host attributes on survival were assessed using Cox regression.7 We estimated long-term survival based on the size and node status of the tumours diagnosed, using the results of the survival analysis of 2468 tumours diagnosed during a trial of breast screening in Sweden.8–10 For two-sample comparisons of continuous quantities, the Wilcoxon rank sum test was used. In addition we used estimated rates of progression of disease with respect to tumour size and node status from the same Swedish trial to estimate the distribution of size and node status which would have been observed in the invasive cases if the tumours had been detected at the original assessment, based on the results of Markov process models.10,11 Details of the estimation are given in the Appendix A. From these results we further estimated the predicted survival which
ARTICLE IN PRESS Survival in women assessed at screening who later develop breast cancer would accrue from improving the assessment process to eliminate a proportion of the false negatives at assessment. Our procedures can therefore be summarised as follows: (1) Take observed long-term survival by tumour size and node status from an external source. (2) Apply (1) above to the observed size and node status distributions in our tumour population, to predict long-term survival. (3) Identify a putative set of cases likely to have been missed at assessment and for which improved assessment procedures might have been expected to result in detection of the case. (4) Using external estimates of tumour progression with respect to size and node status, estimate the likely distribution of these factors if the tumours had been diagnosed at the original assessment. (5) Predict the long-term survival from (4) and compare with that predicted from the actual size and node status distribution. The third aim requires making a judgement as to which cases were missed at assessment due to suboptimal assessment, and which would have been diagnosed with better assessment procedures. Clearly those cases for which the assessment was judged satisfactory in the audit do not fall into this category. Cases which might have been detected with better assessment procedures must come largely or entirely from those where the original assessment was judged to be unsatisfactory. We therefore estimated the predicted benefit of improved assessment under two assumptions: first that all of the tumours arising after assessments
Table 1
505
judged to be unsatisfactory would have been detected at the assessment with improved procedures; Second, that only those tumours from this group which were 420 mm in size at subsequent presentation would have been detected with improved assessment.
Results Table 1 shows numbers of tumours by size and node status, ages at assessment and diagnosis, deaths from breast cancer and deaths from other causes by whether the original assessment was retrospectively judged to be satisfactory according to the 2001 guidelines6 (referred to as ‘assessment quality’ hereafter for brevity). The table also shows age at original assessment and at diagnosis. Overall, there were more deaths in women with unsatisfactory assessments. Age at assessment and age at diagnosis did not substantially vary by assessment quality (satisfactory or unsatisfactory). In the 194 tumours as a whole, 5-year survival was estimated as 93% (95% CI: 88–97%). For deaths from all causes, the figure was 91% (95% CI: 86–95%). Due to the small numbers of deaths, we use deaths from all causes in the analyses below. Table 2 shows the results of Cox regression, giving hazard ratios for tumour size, node status and assessment quality. The effect of assessment quality was not statistically significant but very suggestive of better survival for women who had satisfactory assessments. The effects of size and node status on survival were as one might expect. Women with positive node status had significantly poorer survival than women with no nodal involvement and although the association
Tumour attributes, ages at assessment and diagnosis, and numbers of deaths to assessment quality.
Quantity
Tumours DCIS (%) Invasive o20 mm, node negative (%) Invasive o20 mm, node positive (%) Invasive X20 mm, node negative (%) Invasive X20 mm, node positive (%) Average age at diagnosis (years) Average age at assessment (years) Average time from assessment to diagnosis (years) Breast cancer deaths Other cause deaths All cause deaths
Assessment quality Satisfactory
Unsatisfactory
55 9 (17) 19 (35) 5 (9) 15 (28) 6 (11) 60.2 57.3 2.9 2 0 2
139 34 (25) 48 (35) 14 (10) 22 (16) 20 (14) 60.6 57.8 2.8 12 3 15
ARTICLE IN PRESS 506
P.C. Allgood et al.
Table 2
Effects on survival time until death from any cause: Cox’s proportional hazards model. Category
Crude HR (95% CI)
Adjusted HR (95% CI)
Age at assessment Mean age/month
Continuous
1.07 (0.99–1.16)
1.08 (0.99–1.8)
Correct No
Yes
0.36 (0.82–1.57)
0.37 (0.08–1.74)
1993–1996 1996–1999
1.07 (0.32–3.58) 1.11 (0.32–3.89)
1.43 (0.37–5.60) 1.13 (0.27–4.80)
Size of tumour o20 mm
X20 mm
2.36 (0.86–6.51)
2.57 (0.8–8.26)
Node status of tumour Node negative
Node positive
3.94 (1.44–10.74)
3.29 (1.03–10.48)
Grade of tumour Grade 1+2
Grade 3
3.47 (1.24–9.77)
2.13 (0.66–6.91)
Factor
Adjusted P-value
Reference category 0.08 0.17
Time of assessment 1989–1993
0.86
0.10 0.04 0.22
Invasive tumours only, adjusted for all the other variables in the model.
Kaplan-Meier survival estimates, by size and nodes 1.00
Table 3 Twenty-year survival by size and node status observed in 2316 invasive cancers in the Swedish Two-County Trial.
Node -ve, < 20mm Node +ve, < 20mm
0.75
Node -ve, >= 20mm
Node +ve, >= 20mm
0.50
0.25
0.00 0
5 10 analysis time (years)
15
Figure 1 Kaplan–Meier all cause survival curve for invasive tumours by size and nodal status (N ¼ 149).
between survival and the dichotomised size of tumour was not significant, there was a significant trend of increasing hazard ratio as tumour size in millimetres increased, after adjustment for the other factors (P ¼ 0:02). Figure 1 shows survival by size and node status. Table 3 shows the 20-year survival rates by size and node status, estimated from the Swedish study data. Applying these rates to the invasive cases in Table 1, we predict that 20-year survival in our invasive cases overall will be 66% (50 breast cancer deaths). The corresponding figures for those with satisfactory and unsatisfactory assessments are 67% (15 deaths) and 66% (35 deaths), respectively.
Size and node status
20-year survival (%)
95% CI
Invasive o20 mm and node negative Invasive o20 mm and node positive Invasive X20 mm and node negative Invasive X20 mm and node positive
86
82–89
62
54–70
60
52–67
29
23–36
Chen et al.10 and Chen11 estimated the instantaneous progression rates from node negative to node positive cancer as 0.2315. This corresponds to around 21% of node negative tumours progressing to node positive within one year if left untreated. The corresponding rate of progression from a tumour smaller than 2 cm to one of size 2 cm or more was estimated as 0.2727, which implies that 24% of tumours smaller than 2 cm would be expected to grow to 2 cm or larger within one year. In our first analysis, we assumed that an improvement in practice could have led to the detection at the original in those assessments which were unsatisfactory according to the 2001 guidelines, but not in those which were judged to be satisfactory. We estimated, using the methods in
ARTICLE IN PRESS Survival in women assessed at screening who later develop breast cancer
507
Table 4 Expected 20-year deaths from the actual observed size/node status distribution and from that expected if the tumours with unsatisfactory original assessments had been detected at those assessments. Size and node status
o2 cm, negative o2 cm, positive 2+ cm, negative 2+ cm, positive Overall
Observed tumours Assessment satisfactory
Assessment unsatisfactory
19 5 15 6 45
48 14 22 20 104
Estimated deaths
9.4 7.2 14.8 18.5 50
the Appendix A, the size and node status of the tumours which would have been observed had the former group of tumours been detected at the original assessment. Table 4 shows the results of applying the predicted survival in Table 4 to the actual observed size and node status distribution, and to that expected if the tumours with unsatisfactory original assessment had been diagnosed at the original assessment. This gives an expected number of deaths over 20 years of 41, a saving of 9 lives, or an 18% reduction in expected fatality in this special group of tumours. In our companion paper we estimated that 45% (59 cases) of those excluded due to missing information would also be potentially false negative assessments. We therefore estimate that these will have 67% survival and therefore 19 deaths, of which 18%, 3 deaths, could have been avoided if diagnosis had taken place at the original assessment. In our second analysis, we assumed that only the tumours of size 20 mm or more at eventual presentation would have been detected at original assessment with improved procedures. This would have resulted in the node status/size distribution in Table 5. When the tumours with satisfactory assessments are added to these, this in turn would imply 43 deaths, a 14% reduction in fatality in this tumour population.
Discussion The results reported here suggest that improvement in the assessment process could have led to a reduction in deaths from breast cancer in this specialist tumour population of cancers in women previously assessed for a suspicious finding in the anatomical location of the subsequent diagnosed cancer of 14–18% in the long run. It should be borne in mind that this is a small and very special group,
Estimated as in Appendix A Assessment satisfactory
Assessment unsatisfactory
19 5 15 6 45
65 13 19 7 104
Estimated deaths
11.8 6.8 13.6 9.2 41
Table 5 Distribution by size and node status in the unsatisfactory which would have resulted if only those invasive tumours of size 20 mm or more at actual presentation had been detected at the original assessment. Size
Node status
Predicted number of tumours
o2 cm o2 cm 2+ cm 2+ cm
Negative Positive Negative Positive
59 18 19 8
representing only a small proportion (1%)5 of women assessed for positive screening results. However, failure to detect at assessment a tumour which has given rise to a screening recall is a serious failure of the system and the screening programme has a duty to minimise such failures and their consequent mortality and morbidity. It should be realised that these results arise from the start of screening in the UK and all the procedures used and their application were not yet developed to their present standard of effectiveness. The guidelines used as the benchmark were published in 200110 and the last assessment in our study took place before December 1999. The tumours with assessments judged to be unsatisfactory had only slightly poorer predicted long-term survival than those with satisfactory assessment (66% compared to 67%) although early observed survival was poorer. They did have poorer lymph node status (33% positive compared to 24%), but slightly smaller size. This suggests that the probability of presence of tumour at the original assessment was similar for the two groups. The difference between the two groups therefore relates to the quality of assessment rather than to the tumours. Our basic assumption was that if the assessment was satisfactory, there is no prospect
ARTICLE IN PRESS 508 for improved assessment to enhance the process of diagnosis, and this seems reasonable. Our results depend on other assumptions, notably the exponential distribution of times to transition (to node positive or large tumours). This assumption is probably reasonable, judging by the goodness of fit observed in the past.10 Also, the magnitude of the long-term benefits of improvement seems consistent with an additional 3 years advance in the time of diagnosis of the tumours. We have not dealt with the possibility of further saving of life from some of the invasive tumours having been in situ at the previous assessment. Thus our results may be conservative, although not greatly so, since there is evidence that detection of ductal carcinoma in situ accounts for only a minority of the benefit of screening.12 Since the implementation of breast screening in various countries as a result of the trials that took place in the 1970s and 1980s, there has been a steady evolution of practice. The procedures used at assessment have developed throughout this time. Initially the professionals working in the new service (and whose work is reported here) had to work out for themselves how best to assess the cases recalled on account of suspicious mammograms using the available methods. In our region this was initially dependent on fine needle aspiration cytology, not available in all our units, magnification mammography and ultrasound using machines which would now be regarded as crude. Large-scale implementation of screening has driven development of improved tools for all the procedures and commercial investment to give the hardware to support a substantial screening industry. This has been coupled with a research effort, reporting the various new techniques and technologies. This research is a tradition of the breast-screening programme in many countries, and makes breast screening an exemplar of evidence-based practice. Our study demonstrates a potential for improvement in practice. It is likely that many of the improvements required to achieve this potential benefit have already been made. Throughout the region whose work is described here, both skills and hardware have changed considerably. An equivalent audit describing the next decade of screening would in all likelihood give different results. In particular there has been increased use of, and expertise in, percutaneous biopsy, with a documented rise in the use of cytology/core biopsy in women recalled for assessment from 14.3% in 1989–1993 to 30.9% for 1996–1999 in our original paper.5 This trend is significant (Cuzick’s test13 z ¼ 2:52, P ¼ 0:01) and has continued since then,
P.C. Allgood et al. with 34.3% of all women recalled for assessment in 2002/2003 now undergoing some form of percutaneous biopsy (figure from the East Anglian Breast Screening Units’ annual KC62 returns). Nevertheless the educational message has to be constantly reinforced that failure to diagnose a case of cancer at assessment is likely to have a cost, estimated here, of life years for the particular patient. Although we have good guidelines and better tools, those who apply them must be constantly vigilant in their application. This is a reason for constant professional development based on audit of results, and justifies the continuing quality assurance effort and investment applied to the breast-screening programme.
Acknowledgements This audit has been dependent on the help of the staff of the screening units, notably the clerical staff who maintain patient records, the radiographers whose films enable the cases to be diagnosed, the pathologists who support the assessment procedures. Sara Godward and Warren Carmody from the East Anglian Cancer Registry provided essential patient data. The audit was supported by the resources of the Eastern Region Breast Screening Quality Assurance Service. Radiologists and radiographers participating in the audit: R. Bannon, P. Britton, C. Brown, E. Clark, J. Curtis, A. Demara, E. Denton, B. Evans, C. Eve, A. Freeman, S. Girling, R. Godwin, S. Hibbitt, G. Hick, R. Hiscock, F. Holly Archer, E. Howe, G. Hunnam, G. Hurst, J. Inglis, D. Lamb, B. Millet, J. Mills, D. O’Driscoll, J. Rehman, M. Rimmer, M. Shaw, M. Simmons, R. Sinnatamby, M. Smith, M. Sparks, A. Tate, R. Warren, P. Whelehan, P. Whitear.
Appendix A To estimate the proportion of cases which were diagnosed at age x and which would have been node negative if diagnosed at original assessment t years before x, we use PðN; x tjN; x Þ ¼ 1, PðNþ; x N; x tÞPðN; x tÞ PðN; x t Nþ; xÞ ¼ PðNþ; xÞ by Bayes’ theorem. If l0 is the exponential incidence rate of node negative disease, l1 the exponential rate of transition from node negative
ARTICLE IN PRESS Survival in women assessed at screening who later develop breast cancer to node positive disease, the above is PðN; x tNþ; xÞ ¼
original assessment as 0:41 0:36 ¼ 0:15. The probability of having been node negative but of size 2 cm or more is estimated as
ð1 el1 t Þl0 ðel0 ðxtÞ el1 ðxtÞ Þ=ðl1 l0 Þ Rx . l0 v ð1 el1 ðxvÞ Þ dv 0 l0 e
0:41 ð1 0:36Þ ¼ 0:26.
After integration and a little algebra, ð1 el1 t Þl0 ðel0 ðxtÞ el1 ðxtÞ Þ PðN; x tNþ; xÞ ¼ . ðl1 l0 l1 el0 x þ l0 el1 x Þ
(1) One other adjustment to the formula is necessary. The probability should be conditional on the tumour being present at time x–t. A good approximation to this is achieved by dividing the above by the probability of a tumour by time x–t, which is 1 el0 ðxtÞ .
509
(2)
The same equations apply to the probability of having been smaller than 2 cm at time x–t, given that the tumour was of size 2 cm or larger at time x. Chen et al.10 and Chen11 estimated the rate of incidence of node negative disease as 0.00176 and the progression from node negative to node positive as 0.2315 for women aged 50–59 at randomisation in the Swedish Trial (corresponding to 50–66 at diagnosis). The corresponding rates for incidence of tumours o2 cm and progression from o2 cm to 2 cm or more in size were estimated as 0.00175 and 0.2727. From Table 3, for assessments judged unsatisfactory, the average time from assessment of the node positive, o2 cm tumours was 3.2 years, and average age at assessment was 57.9. This gives t ¼ 3:2 and x ¼ 57:9 þ 3:2 ¼ 61:1. Substituting these in Eq. (1) and correcting by division by expression (2), with l0 ¼ 0:00176 and l1 ¼ 0:2315 gives the proportion of such tumours which would have been node negative if diagnosed at the original assessment as 0.39. Thus we would expect 0:39 14 ¼ 5:5 of these tumours to have been shifted to node negative if detected at the original assessment. Similarly, for those node negative and of size 2 cm or more, we have x ¼ 61:2, t ¼ 3:2, l0 ¼ 0:00175 and l1 ¼ 0:2727. This gives as the proportion, which would have been smaller than 2 cm if diagnosed at the original assessment as 0.37. For those node positive tumours of size 2 cm or more, x ¼ 60:1 and t ¼ 3:0. The proportion which would have been node negative at original assessment is estimated as 0.41. The proportion which would have been smaller than 2 cm is estimated as 0.36. From this, we estimate that the probability of having been node negative and smaller than 2 cm at
The probability of having been smaller than 2 cm but node positive is estimated as ð1 0:41Þ 0:36 ¼ 0:21. Thus, of these 20 tumours one would expect that 0:15 20 ¼ 3 would have been both node negative, o2 cm at original assessment, and so on.
References 1. Burrell HC, Sibbering DM, Wilson AR, Pinder SE, Evans AJ, Yeoman LJ, et al. Screening interval breast cancers: mammographic features and prognosis factors. Radiology 1996;199(3):811–7. 2. Evans AJ, Wilson AR, Burrell HC, Ellis IO, Pinder SE. Mammographic features of ductal carcinoma in situ (DCIS) present on previous mammography. Clin Radiol 1999;54(10):644–6. 3. Burhenne HJ, Burhenne LW, Goldberg F, Hislop TG, Worth AJ, Rebbeck PM, et al. Interval breast cancers in the Screening Mammography Program of British Columbia: analysis and classification. Am J Roentgenol 1994;162(5):1067–71 (discussion 1072–5). 4. Burrell HC, Evans A, Wilson AR, Pinder SE. False-negative breast screening assessment. What lessons can we learn? Clin Radiol 2001;56:385–8. 5. Warren R, Allgood PC, Hunnam G, Godward S, Duffy SW. An audit of breast assessment procedures in women who later develop cancer after a negative result. J Med Screen 2004;11:180–6. 6. Wilson R, Asbury D, Cooke J, Michell M, Patnick J. Clinical guidelines for breast cancer screening assessment. Sheffield: NHSBSP; 2001. 7. Parmar KB, Machin D. Survival analysis. A practical approach. Chichester: Wiley; 1995. 8. Tabar L, Fagerberg CJ, Gad A, et al. Reduction in mortality from breast cancer after mass screening with mammography. Randomised trial from the Breast Cancer Screening Working Group of the Swedish National Board of Health and Welfare. Lancet 1985;i:829–32. 9. Tabar L, Fagerberg G, Duffy SW, Day NE, Gad A, Grontoft O. Update of the Swedish two-county program of mammographic screening for breast cancer. Radiol Clin N Am 1992;30:187–210. 10. Chen HH, Duffy SW, Tabar L, Day NE. Markov chain models for the progression of breast cancer Part 1: tumour attributes and the preclinical screen-detectable phase. J Epidemiol Biostat 1997;2:9–23. 11. Chen HH. Mathematical models for progression of breast cancer and evaluation of breast cancer screening. PhD thesis, University of Cambridge, 1995. 12. Duffy SW, Tabar L, Vitak B, Day NE, Smith RA, Chen HHT, et al. The relative contributions of screen-detected in situ and invasive carcinomas in reducing mortality from the disease. Eur J Cancer 2003;39:1755–60. 13. Cuzick J. A Wilcoxon-type test for trend. Stat Med 1985;4: 87–90.