Is clinical information important when interpreting breast fine needle aspirates?

Is clinical information important when interpreting breast fine needle aspirates?

The Breasr (1998) 7,340-343 Q 1998 Harcout Brace & Co. Ltd ORIGINAL ARTICLE Is clinical information important when interpreting breast fine needle a...

428KB Sizes 1 Downloads 67 Views

The Breasr (1998) 7,340-343 Q 1998 Harcout Brace & Co. Ltd

ORIGINAL ARTICLE

Is clinical information important when interpreting breast fine needle aspirates? I. A. Robinson Department of Histopathology, Derbyshire Royal Injkmmy, Derby, UK

S U MMA R Y. Comparison was made between 2 years when breast fine needle aspirates (BrFNAs) were reported oblivious of clinical information (no details) and 2 years where relevant information (details) was incorporated into the diagnostic process to assess the influence on interpretation. Performance year on year was compared. Receiver operating characteristic (ROC) curve analysis was used to assess change in specificity and sensitivity. Likelihood ratios, being the most meaningful clinical measure, were established for each diagnostic group. An average of 770 fine needle aspirates were reported each year, Fewer false positive cases were reported with details and better absolute sensitivity (66.0% vs 62.0%) and complete specificity (63.9% vs 56.2%) were achieved. This was above that expected by increasing experience, i.e. comparing year 1 and 2 but the difference failed to reach statistical significance. ROC curves were virtually unchanged. There is a difficulty in ascertaining if improved performance is significant; however, the change supports the accepted need for clinical details when reporting BrFNAs.

INTRODUCTION

either increasing cancer detection or reducing the rate of equivocal (suspicious or probably benign) diagnoses.

The triple approach of clinical, radiological and tine needle aspiration assessment of breast lesions is a triage to separate those women who do not need surgery from those who need an excision, either as therapy or occasionally to establish the diagnosis. Breast fine needle aspiration (BrFNA) has been shown to be the best of the three modalities at establishing a malignant diagnosis.’ It is this reliance on BrFNA that has resulted in the dramatic fall in the number of biopsies for benign breast disease.2V3 Intuitively, interpreting BrFNAs with clinical details should increase accuracy and confidence in diagnosis, but there is a theoretical risk that any error in expressed clinical and/or radiological perception may contaminate the cytopathological opinion. Indeed, were the latter the case, it would be preferable that the BrFNA diagnoses were secured in isolation. A strong argument can be made for not allowing a less discerning investigation (clinical and/or radiological evaluation) to influence a better test (BrFNA). The aim of this project was to see if (clinical) details improve performance of experienced cytopathologists by

MATERIALS

AND METHODS

At the Derbyshire Royal Infirmary, three consultant pathologists report BrFNAs. The specimens are prepared using the Shandon Cytospin method and stained with Haematoxolin & Eosin. Immediate reporting is not practised and pathologists do not take the BrFNAs. For each year of the study, the relevant table was constructed to compare the cytological diagnostic groups of malignant (C5), suspicious of malignancy (C4), abnormal probably benign (C3), benign (C2) and inadequate material (Cl) with subsequent histological or clinical outcome. The specific performance parameters for each year were calculated as per the United Kingdom’s National Health Service Breast Screening Programme (UK-NHSBSP) guidelines.4 For the first two years of the study no clinical information (no details) was given to the reporting pathologist beyond patient identification (name, age, etc.), side, specimen type and localization technique as recommended on page 15 in the 1992 UK-NHSBSP guidelines.4 From 1996, pertinent history, clinical findings, imaging characteristics and requesting clinician’s impression of the possibility of cancer (details) were included on the request form and used when reporting the BrFNA.

correspondence to: I. A. Robinson MRCPath, Consultant Histoand Cytopathologist, The Department of Histopathology, Derbyshire Royal Infumary, London Road, Derby DE1 ZQY, UK Tel.: +44 (0) 1332 347141 ext. 4537; Fax: +44 (0) 1332 254763; e-mail: dr.robinson@ btintemetcom Address

340

Clinical information and breast FNAs Fisher exact test analysis was used for comparing identification of true and false positive and negative cases between the years. Chi square tests were used to assess if the frequency of inadequate and/or equivocal diagnoses changed with clinical information. Likelihood ratios for cancer in BrF’NAs reported with and without details were calculated for each given diagnostic group (malignant, suspicious of malignancy, probably benign, and benign). The likelihood ratio of a positive test is the true positive fraction divided by false positive fraction. It compares the ratio of probabilities of the disease being present or absent for that test result. Likelihood ratios range from zero to infinity. The greater the likelihood ratio, the greater the potential of disease.5 Receiver operating characteristic (ROC) curve analysis was used to calculate the effect of clinical information on the pathologist’s certainty of diagnosis. This is a technique adopted from radar science6 and the difficulty in separating aeroplanes (true positive) from other detected echoes (false positive). An ROC curve is a graph of true and false positive values for a degree of diagnostic certainty.’ In this case we used benign, probably benign, suspicious of malignancy and malignant (C2-0 categories in the UK-NHSBSP). Thus, any point along an ROC curve corresponds to a likelihood ratio and describes the compromise between true positive and false positive fractions for that degree of observer certainty. Using this method the operation of expert cytopathologists at a higher level of sensitivity and specificity compared to non-experts has been demonstrated.8 The points on the ROC curves were calculated using the following method. First the assumption is made that all observations correspond to the presence of disease giving a sensitivity of 1 (all cancers identified) but gives a specificity of 0 (all benign conditions called cancer). Next the cytologically benign cases are removed. The sensitivity and specificity are calculated for the group now composed of cytological unequivocally malignant cases, probably malignant cases and probably benign cases (C5, C4 and C3 groups). This results in a decrease in sensitivity (fewer cancers identified as the false negatives are eliminated) but dramatically increases specificity. This gives the second set of values. The third set is obtained by calculating the specificity and sensitivity for a group combining the definitely and probably malignant (C5 and C4) groups. The fourth point is calculated for the unequivocally malignant (C5) group. These points represent lessened sensitivity but increased specificity. Finally it is assumed all cases represent benign changes and these give a sensitivity of 0 (no cancers identified) but a specificity of 1 (all benign cases correctly identified). The plot is constructed by comparing the sensitivity against (1 - specificity). The higher and further left the

341

curve the better the performance, a diagonal line represents random guessing. Comparing areas beneath the curves can give an estimation of the magnitude of change in performance. RESULTS Seven hundred and fifty-nine BrFNAs were reported in 1994, 802 in 1995, 707 in 1996 and 812 in 1997 with a biopsy rate of 44.0% in 1994, 51.0% in 1996, 49.2% in 1996 and 45.2% in 1997. When details were included, there was a slight improvement. The performance figures for each year are given in Table 1. Fisher’s exact test analysis was used to compare performance in identifying true and false positive cases and true and false negative cases between 1994 and 1995, 1996 and 1997 and (1994 and 1995 vs 1996 and 1997) (see Table 2). There was no significant change between 1994 and 1995 and between 1996 and 1997 suggesting one year of extra experience did not result in better performance. There was no significant difference in identification of positive and negative cases with and without details. Improvement in the identification of positive cases with details was evident but failed to reach statistical significance (P = 0.086). Table 1 Performance of BrPNA over the 4 study years compared with the NHSBSP standards Absolute sensitivity Complete sensitivity Biopsy specificity Full specificity PPV c5 PPV c4 PPV c3 False negative rate False positive rate Inadequate rate Inad from cancer Suspicious rate

1994

1995

1996

1997

Standard

65.3 86.9 39.8 58.8 98.1 55.1 38.1 4.9 1.2 16.9 8.1 11.3

58.7 81.2 39.9 53.6 97.5 50.8 39.5 5.2 1.5 29.6 13.7 13.5

66.7 88.5 50.0 64.8 100 69.4 35.5 3.6 0.0 19.8 7.8 14.1

65.4 84.0 46.8 62.9 99.0 58.0 28.1 4.2 0.6 22.5 11.8 12.0

~60% >80% No standard X50% >95% No standard No standard 4%
Table 2 Comparison of identification of positive and negative cases year on year without (1994/1995) and with (19960997) clinical information Positive cases 1994 1995 True pos. 160 False pos. 3 2milP=

159 4 1.00

Negative cases 1994 1995 True neg. 298 279 False neg. 12 14 2 tail P = 0.689

1996 1997 True pos. 186 204 False pos. 0 2 2tai1 P = 0.500

1996 1997 True neg. 271 312 False neg. 10 13 2tai1 P= 0.833

199411995 199611997 Truepos. 319 390 False pos. 7 21 2 tail P = 0.086

199411995 199611997 True neg. 577 583 False neg. 26 23 2tail P= 0.664

342

The Breast

Table 3 Likelihood suspicious, equivocal (1996/1997) clinical

CS C4 C3 C2

ratios for the presence of cancer, for malignant, and benign groups without (19940995) and with information

Malignant Suspicious Equivocal Benign

19940995

199.5/1996

45.6 3.86 0.64 0.05

195 5.13 0.47 0.04

Table 3 shows the likelihood ratio of breast cancer for each of the diagnostic groups. This shows an increase in likelihood of cancer in the malignant and suspicious categories and a reduction in the other two diagnostic groups when the aspirate was reported with clinical details. Interestingly, the proportion of cancers reported as either ‘suspicious of malignancy’ or ‘probably benign’ showed no reduction when BrFNAs were reported with details (x2 P = 0.611) but the positive predictive values for aspirates ‘suspicious of malignancy’ were increased and for the ‘probably benign’ group were reduced. The parameter that showed a significant change was a reduction in the inadequate rate (x2 P < 0.001). The ROC curve graph showed the lines for 1994/1995 and 1996/1997 were virtually superimposed with no significant difference in the areas below the curve. This is a recognized method of comparing performance. The values for the ROC plots are given in Table 4.

DISCUSSION Comparing studies of cytological performance can be difficult because of diverse means used to calculate effectiveness.’ Different patient populations and sampling methods” may also affect performance. Even when the same population is used, as in this study, difficulties arise in establishing if change is statistically significant. This is due to the multiple parameters used to assess performance as no single performance parameter can measures the ability of both the aspirator and cytopathologist. Accuracy of a result (in this case BrFNA) as a measure of performance needs to be precisely defined, especially if it is to be the basis for clinical decision making. Sensitivity Table 4 Receiver clinical information

All benign c5 c5,4 C5,4,3, All malignant

and specificity are the usual means of proving a test’s merit. Sensitivity and specificity are of most use when the data are binary. At first glance, BrFNAs can be categorized as benign or malignant. However, despite the best desires of breast clinicians, a proportion of adequate BrFNAs fall into the indeterminate zone and so results generated from BrFNA should be considered semi-quantatative for analysis. One needs to be clear about precisely what is being measured when interpreting reports of test performance. Many different terms alluding to performance are used and their meanings can be vague and variable.6 Absolute and complete sensitivities are different measurements with very different clinical relevance, the former representing cancers diagnosed as such and the latter indicating a perceived deviation from normal. Similarly, full and biopsy proven specificity are not the same and have distinct implications as to the impact of cytopathology on clinical practice. After all, the aim of the triple approach is to avoid biopsy of benign disease. Knowledge of the disease prevalence in the test population is also needed for full interpretation. Whilst not directly affecting sensitivity and specificity, prevalence influences positive predictive values and will affect reporting practices. Likelihood ratios calculated from the true and false positive fractions, are the more meaningful measurements, especially if the given result is to be incorporated into the clinical decision-making process. However, these measures do not singularly describe diagnostic performance as they depend on an arbitrary selection of a decision threshold. Few diagnostic test results fall into one or other clinically desired outcome, i.e. presence or absence of disease. Tests resulting in a value, for example serum thryoxin, in populations with and without disease overlap to a greater or lesser extent as a threshold value must be chosen (arbitrarily or empirically) in the full knowledge that a proportion of normal individuals will lie beyond this value. Subjective tests, especially the interpretation of radiology or cellular pathology images also require the creation of a threshold, this time in the ‘mind’s eye’ of the observer. This threshold is how explicit an image of disease must be before the observer calls it positive. This depends on many factors, the observers’ style (hawk vs dove), training, experience and understanding of the clinical consequences of an

operator characteristic curve points for the two groups and 1996/1997 reported with clinical information

199411995

reported

without

Sensitivity

19940995 Specificity

l-specificity

Sensitivity

1996/1997 Specificity

l-specificity

0 0.670 0.897 0.945 1

1 0.989 0.945 0.890 0

0 0.011 0.055 0.11 1

0 0.714 0.921 0.957 1

1 0.996 0.963 0.897 0

0 0.004 0.037 0.103 1

Clinical information and breast FNAs erroneous result. The aim of this study was to assess the influence of clinical information on this threshold. As was expected, the provision of details positively affected the reporting of BrFNAs. Problems were encountered in assessing whether the changes were statistically significant or not. The improved identification of positive cases failed to reach statistical significance at the 5% level using Fisher’s exact test. Not every parameter is of equal importance. The ROC curves capture all the information necessary for valid comparisons of performance. The virtual super-imposition of the ROC curves suggests no statistical difference in performance. However, the ROC curve is closely related to likelihood ratios, which are the best measure of the clinical usefulness of a test, and these showed improvements. The aim of this research was not to compare clinical and/or radiological opinion vs BrFNA. We know cytology opinion is best.’ Few would argue that BrFNAs should be reported in isolation. Recent publications and guidelines are adamant that these details must accompany all requests for BrFNA interpretation.“.” This study suggests that the necessity for clinical details is overstated. A robust relationship between the provision of clinical details and better diagnosis of cancer is not established. However, the benefits of increased likelihood ratios, better predictive values for ‘suspicious’ and ‘probably benign’ groups and a reduction in false positive cases should not be disregarded. Crucially, this study shows that clinical information does not adversely influence BrFNA performance and there is no evidence to alter current practice.

343

References 1. Negri S, Bonetti F, Capitano A, Bonzanini M. Pre-operative diagnostic accuracy of fine needle aspiration in the management of breast lesions: comparison of specificity and sensitivity with clinical examination, mammography, echography and thermography in 249 patients. Diagn Cytopathol 1994; 11: 4-8. 2. Bates A, Bates T, Hastrich D et al. Delay in the diagnosis of breast cancer: the effect of the introduction of tine needle aspiration cytology to a breast clinic. Em J Surg Oncol 1992; 18: 433437. 3. Green B, Dowley A, Turnbull L, Smith P, Leinster S, Winstanley .I. Impact of fine-needle aspiration cytology, ultrasonography and mammography on open biopsy rate in patients with benign breast disease. Br J Surg 1995; 82: 1509-1511. 4. Guidelines for Cytology Procedures and Reporting in Breast Cancer Screening. National Health Service Breast Screening Programme 1992; 22. 5. Rabb S. Diagnostic accuracy in cytopathology. Diagn Cytopathol 1994; 10: 68-95. 6. Zweig M, Campbell G. Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. Clin Chem 1993; 39: 561-577. 7. Metz C. Basic principles of ROC analysis. Semin Nucl Med 1978: 8: 283-298. 8. Cohen M, Rodgers R, Hayles M et al. Influence of training and experience on tine needle aspiration biopsy of the breast: receiver operating characteristic curve analysis. Arch Path01 Lab Med 1987; 111: 518-520. 9. Wells C. Quality assurance in breast cancer screening: a review of the literature and a report on the UK National Cytology Scheme. Eur J Cancer. 1995; 31: 273-280. 10. Robinson I, McKee G. Cytological performance in palpable versus stereotactically sampled breast lesions. The Breast 1996; 5: 415417 11. Coghill S. Normal breast cytology and breast screening. In: Gray W, ed. Diagnostic Cytopathology. Edinburgh: Churchill Livingstone, 1995. 12. The Unified Approach to Breast Fine Needle Aspiration Biopsy: A Synopsis. Acta Cytol 1997; 40: 1120-l 126.