Likelihood ratios for clinical examination, mammography, ultrasound and fine needle biopsy in women with breast problems

Likelihood ratios for clinical examination, mammography, ultrasound and fine needle biopsy in women with breast problems

7-k Brenst (1998) 7, 85-89 0 1998 Harcourt Brace & Co. Ltd ORIGINAL ARTICLE Likelihood ratios for clinical examination, mammography, ultrasound and ...

504KB Sizes 0 Downloads 49 Views

7-k Brenst (1998) 7, 85-89 0 1998 Harcourt Brace & Co. Ltd

ORIGINAL ARTICLE

Likelihood ratios for clinical examination, mammography, ultrasound and fine needle biopsy in women with breast problems N. Houssami and L. Irwig* The Sydney-Square Breast Clinic, MBF Sydney, *Department of Public Health and Community Medicine, UniversiQ of Sydney, Australia S U M MA R Y. Instead of the conventionally used sensitivity and specificity, we present likelihood ratios (LRs) as a better measure of the accuracy of clinical examination, mammography, ultrasound and fine needle aspiration cytology in women presenting with breast problems. In addition, we point out that previous studies are prone to verification bias in the selection of subjects, and demonstrate methods of avoiding this problem. Both issues are illustrated using a dataset consisting of 7259 women who attended the Sydney-Square Breast Clinic consecutively in 1994, of whom 375 had surgical biopsy and 145 were found to have malignancy. Information on the non-cancer group was based on all women with a benign outcome on surgical biopsy and, to avoid verification bias, a sample of 232 subjects chosen randomly from consecutive attenders. LRs for five levels of each test, categorized from normal to malignant, ranged from 0.43 to 46.41 for clinical examination, from 0.18 to 00 for mammography, from 0.17 to 00 for ultrasound, and from 0.009 to 00 for cytology. LRs calculated using only subjects with a benign outcome on histology differed appreciably from those calculated using the more appropriate random sample of all non-cancers. Future researchers need to avoid verification bias in studies of test accuracy by including a random sample of consecutive subjects rather than only those who proceed to histology. LRs have an advantage over sensitivity and specificity because they provide a measure of test accuracy which allows test results to be reported as several categories. In clinical practice, these LRs are easy to use to calculate the probability of cancer. INTRODUCTION

LRs can be calculated for each specific result or value of a multicategory (or continuous) test, and have several advantagesL9-*’ First, they can be generated for several levels of a diagnostic test result and are therefore more stable than sensitivity and specificity when the spectrum of disease changes. Second, they avoid the loss of information caused by dichotomizing a test result. In addition, they can be used to calculate post-test odds of a disease for patients with estimated pretest odds, using a version of Bayes’ theorem. A literature review did not identify any studies that have calculated LRs for the diagnostic components of the triple test, though a recent study by Kerlikowske et a12’ reported LRs for screening mammography. In determining outcome or true disease status, most studies use histopathology as the reference standard, thus avoiding misclassification. However, estimates of accuracy of a diagnostic test may be biased if the test result influences whether patients proceed to verification by the reference standard - this is known as verification bias or work-up bias.23 Therefore the common practice of including only subjects who proceed to surgery and have a histological diagnosis, as in many of the papers reviewed,3”.9~“~‘5.i7~‘8 may cause verification bias. To avoid any potential for

The combination of clinical examination, imaging (mammography and ultrasonography) and fine needle biopsy (the socalled triple test) has been widely used in breast assessment. Numerous studies have reported the sensitivities and specificities of these tests.‘-” A difficulty with evaluating and comparing sensitivity and specificity estimates is that they rely on a single threshold for classifying a test result as positive or negative. Yet results of investigations are often measured as categories, rather than a dichotomous outcome of positive or negative. Studies in the literature’-‘* use different criteria in their definition of a positive and a negative test result, thus differing in their threshold. We propose that the best measure of test performance is the likelihood ratio (LR). A likelihood ratio is defined as the ratio of the probability of a particular test result in people with the disease to the probability of the same test result in people without the disease.” Address correspondence to: N. Houssami, Medical Square Breast Clinic, MBF, 97-99 Bathurst Street, Australia

Director, Sydney,

SydneyNSW 2000,

85

86

The Breast

verification bias, a random sample of women presenting with breast problems can be included as a representation of the ‘normal’ population of patients who did not necessarily proceed to surgery. We recognize the possibility that cancers could be misclassified in this random sample, but the probability of this is extremely low because of the high sensitivity achieved by performing several parallel tests.lg

METHODS Subjects Seven thousand two hundred and fifty-nine patients attended the Sydney-Square Breast Clinic, a multidisciplinary breast centre, in 1994. They were generally referred by their general practitioners for assessment of breast symptoms. Subjects included all women who attended for diagnostic evaluation during 1994, of whom 375 had an excision biopsy. As discussed in the introduction to this paper, a random sample of 232 subjects who did not have cancer was chosen from the same patient population. This group is referred to as the ‘random’ sample, whereas the ‘benign biopsy’ group included only patients whose outcome was verified as benign on histopathology. Medical records of all eligible subjects were reviewed by one of the authors (NH), and the results of tests as reported at time of assessment were retrieved. All reporting was done prior to and without knowledge of final histopathology. Test results were crossclassified against histopathology reports for all subjects who proceeded to surgery. All subjects had a clinical examination, performed by one of 23 consulting surgeons. Clinical findings were recorded at time of consultation, without knowledge of imaging findings. Clinical and imaging findings were reported on a categorical scale of 1 to 5 (1 - normal; 2 - benign; 3 - indeterminate; 4 - suspicious; 5 - malignant). Women aged 25 years and over had mammography, reported by one of five experienced radiologists, with knowledge of the patient’s presentation. Ultrasound examination was performed when clinical evaluation and/or mammography identified an abnormality, and in women younger than 25 years or pregnant. Ultrasound scanning was performed by one of four qualified sonographers. Lesions were reviewed in real time and correlated with the patient’s symptom by one of six breast physicians, with knowledge of clinical and mammographic findings. Fine needle aspiration biopsy was performed by the breast physician when a significant abnormality was identified on any test modality, with the majority of cases biopsied under sonographic guidance and using a standard aspiration method. Aspirates were smeared by the physician, and the slides sent off-site to a dedicated teaching hospital laboratory for processing and interpretation. Cases were reported by one of three cytopathologists

and coded according to the National Program for the Early Detection of Breast Cancer minimum dataset for reporting fine needle aspirates.24 Clinical and imaging findings were reported at time of assessment without knowledge of cytology or final histopathology. Data analysis LRs were calculated for each of the five levels for each test. Two sets of LRs were calculated, one using the ratio of the proportion of malignant to benign cases (as verified by histology) and another using the ratio of the proportion of malignant cases to the random sample. Confidence limits were computed using the method described by Armitage and Berry for calculating confidence intervals for the ratio of two proportions.25 A x2 analysis was performed to compare the benign biopsy and random distributions, after combining the small numbers for the suspicious and malignant categories. RESULTS Histopathological reports were obtained for all subjects who underwent surgical biopsy. A total of 391 biopsies was performed on 375 women, and 147 malignant lesions were found in 145 subjects. Breast cancer was found in 2% of women (increasing with age from 0.46% to 3.31%), and the biopsy rate was 5%. There were 106 cases of invasive ductal carcinoma, 20 cases of ductal carcinoma in situ (DCIS) (4 of which were un-related to the patients’ presentation), 13 cases of invasive lobular carcinoma, and 8 cases of the less common types of breast malignancy. Tables l-4 show LRs calculated using all patients who underwent surgical biopsy (malignant to benign), and using the proportion of malignant cases to the random sample, showing that including only subjects who had been verified by surgery does bias the estimates of test accuracy. Comparison of the random and benign biopsy groups showed that their proportions were significantly different for all investigations (P < 0.01). For all test modalities, there was a consistent trend of increasing LRs for report codes of 1-5, with the increase being greatest for categories 4 and 5. Considering the proportion of malignant to random cases as the correct estimate of the LR (with the corresponding LR using the benign biopsy group in parentheses), the range of LRs was 0.4346.41 (0.88-16.85) for clinical examination, 0.18-x, (0.324) for mammography, 0.174 (0.43-25.32) for ultrasound, and 0.0094 (0.01-00) for fine needle cytology. Using the random sample, an indeterminate result (code 3) increased the likelihood of malignancy when reported for clinical, mammographic or sonographic evaluation, with

Likelihood ratios for women with breast problems Table

1

Likelihood

ratios

for clinical

examination

Clinical examination code - description

Malignant pathology proportion

1 2 3 4 5

30 17 26 20 7

34 47 16 3 0.4

n = I43

n = 241

-

normal benign indeterminate suspicious malignant

Sample

Table

number

2

Likelihood

ratios

Malignant pathology proportion

1 2 3 4 5

13 9 14 31 33

normal benign indeterminate suspicious malignant

Sample

Table

number

3

Likelihood

n=

ratios

Benign pathology proportion

(%)

Benign pathology proportion

146

7 5 18 49 21

15 68 15 2 0.8

n = 141

n = 238

Table

number

4

Likelihood

ratios

(%)

Benign pathology proportion

Malignant pathology proportion

1 2 3 4 5

1.6 0.8 4 23 71

11 62 22 5 0

n = 124

n=

insufficient benign atypical suspicious malignant

Sample

number

0.43 0.68 6.81 46.41 16.57

ratio normal

(0.33-0.56) (0.45-l .03) (3.39-13.70) (6.38-337.41) (2.14-128.12)

(%)

Likelihood ratio malignant:benign (95% CI)

‘Random’ normal proportion

0.32 0.22 1.02 5.57

72 27 2 0 0

(0.21-0.50) (0.13-0.38) (0.62-1.70) (3.1 l-9.97)

(%)

Likelihood ratio malignant:normal (95% CI) 0.18 (0.12-0.28) 0.34 (0.19059) 8.13 (2.85-23.20)

n = 226

(%)

Likelihood ratio malignant:benign (95% CI)

‘Random’ normal proportion

0.43 0.07 1.25 23.29 25.32

37 59 3 0.6 0

(0.224.88) (0.04-0.15) (0.79-1.99) (9.63-56.36) (6.14-104.35)

(%)

Likelihood malignant: (95% CI)

ratio normal

0.17 (0.09-0.34) 0.08 (0.040.18) 5.68 (2.24-14.39) 75.36 (10.61-535.49)

n = 154

for fine needle cytology

Fine needle cytology result code - description -

70 26 4 0.4 0.4

(0.65-1.20) (0.25-0.55) (1.10-2.45) (3.02-15.03) (2.18-130.29)

(%)

Likelihood malignant: (95% CI)

for ultrasound

1 2 3 4 5

Sample

0.88 0.37 1.64 6.74 16.85

n = 235

Malignant pathology proportion

normal benign indeterminate suspicious malignant

‘Random’ normal proportion

n = 237

40 40 14 6 0

Ultrasound result code - description -

(%)

Likelihood ratio malignantbenign (95% CI)

for mammography

Mammogram result code - description -

(%)

87

(%)

Benign pathology proportion

(%a)

Likelihood ratio malignant:benign (95% CI)

‘Random’ normal proportion

0.14 0.01 0.19 4.64

7 91 2 0 0

(0.03-0.60) (0.002-0.09) (0.08-0.46) (2.28-9.55)

186

LRs of 6.81, 8.13, and 5.68, respectively. For these results, choosing only subjects who were verified by histology altered measures of test performance, giving LRs of 1.64, 1.02, and 1.25, respectively, all values of around 1.00 neither increasing nor decreasing the odds of disease. A different pattern was observed for cytology, with an LR of 1.77 for an atypical result using the random sample not altering the odds of disease, compared to an LR of 0.19 if the benign biopsy sample is used in the calculation which

(%)

Likelihood malignant: (95% CI)

ratio normal

0.24 (0.04-1.37) 0.009 (0.001-0.063) 1.77 (0.21-14.77)

n = 44

reduces the probability of disease. Results reported as suspicious or malignant (code 4 or 5) had high LRs for all tests.

DISCUSSION Comparing the two sets of LRs for each result category, LRs calculated using our random sample as the denominator are more predictive of the absence of disease for a nor-

88

The Breast

ma1 result, and more predictive of malignancy for a suspicious result, than those generated using only cases who had surgery. At the indeterminate level, LRs calculated using the random group increase the odds of malignancy for clinical, mammographic and sonographic examination. When we only include cases verified by surgery, however, the indeterminate report for these tests does not appreciably change the odds of disease. This effect is likely to be due to verification bias, since cases verified by histology are more likely to have had an abnormal result than those who did not undergo surgical biopsy. Therefore, including only cases which had histology in the calculation of test accuracy underestimates the test’s ability to distinguish between those with and without the disease of interest. The advantages of using LRs for multiple categories as a measure of test accuracy can be shown using our data. To calculate sensitivity and specificity, one needs to decide which results will be considered positive (or negative) despite the test outcome being reported in multiple categories. If indeterminate, suspicious and malignant ultrasound results are counted as positive, then sensitivity is 89% and specificity is 96%. These estimates compare favourably to those reported in the literature;“’ however, measures of test accuracy based on such a dichotomy do not reflect the additional information gained by calculating result-specific LRs for all levels of test outcome. This is easily demonstrated using the ultrasound example, by applying the calculated sensitivity and specificity to Bayes’ theorem, and using the 2% prevalence of breast cancer in this clinic as the prior probability of disease. Post-test probability of breast cancer is about 30% for a patient with a positive test. Now, using LRs we note that an indeterminate ultrasound result (LR = 5.68) is associated with higher odds of the patient having malignancy, whereas a suspicious result (LR = 75.36) is highly predictive of the disease. LRs can also be used to obtain the likelihood of disease given a particular test result, but to avoid calculations a nomogram can be used, with pretest and post-test odds simply converted to their corresponding probabilities and given as percents.21 Using the same disease prevalence of 2%, an LR of 5.68 for a patient with an indeterminate ultrasound finding, the post-test probability of breast cancer is about 10% using the nomogram. For a patient with a suspicious ultrasound report (LR = 75.36), the probability of breast cancer is about 60%. Using LRs for multiple categories can therefore provide additional information about the probability of malignancy being present in a patient whose test result is well above the threshold level chosen as positive, or one whose result is just above the threshold value. The same loss of information occurs if we dichotomize the results of any of the other tests. For cytology, if atypical results are included with suspicious and malignant results in our data then sensitivity exceeds 99% (98% if insufficient

aspirates are included and regarded as negative for malignancy) and specificity is 98%. If only suspicious and malignant results are considered positive (many series dichotomize results in this manner) then sensitivity is 95% (94% including insufficients) and specificity is 100%. The accuracy of cytology in this study is also of equal standard to published series,2-11~14~16-‘8 but another advantage of using LRs is the ability to calculate accuracy for each result category, avoiding the practice of excluding atypical or insufficient results when measuring test performance. The diagnostic accuracy of the tests employed in assessing women symptomatic of breast disease is best obtained using likelihood ratios, as shown in this paper. In the common situation where results of a test are reported as several categories or levels, LRs can be calculated for the different outcomes, avoiding the need for dichotomy and associated loss of information. LRs may also be of value to clinicians in calculating post-test probability of disease for patients with estimated pretest probability.

References 1. NHS Executive: improving outcomes in breast cancer: the research evidence. 1996. 2. Negri S, Bonetti F, Capitanio A, Bonzanini M. Preoperative diagnostic accuracy of fine-needle aspiration in the management of breast lesions: comparison of specificity and sensitivity with clinical examination, mammography, echography, and thermography in 249 patients. Diagn Cytopathol 1994; 11: 4-8. 3. Hansel1 D M, Cooke J C, Parsons C A. The accuracy of mammography alone and in combination with clinical examination and cytology in the detection of breast cancer. Clin Radio1 1988; 39: 150-1.53. 4. Hermansen C, Poulsen H S, Jensen J et al. Diagnostic reliability of combined physical examination, mammography, and fine-needle puncture (‘triple-test’) in breast tumors. A prospective study. Cancer 1987; 60: 1866-1871. 5. Hermansen C, Poulsen H S, Jensen J et al. Palpable breast tumours: ‘triple diagnosis’ and operative strategy. Results of a prospective study. Acta Chir Stand 1984; 150: 6255628. 6. Dixon J M, Anderson T J, Lamb J, Nixon S J, Forrest A P M. Fine needle aspiration cytology, in relationship to clinical examination and mammography in the diagnosis of a solid breast mass. Br J Surg 1984; 71: 593-596. 7. Ciatto S, Carriaggi P, Bulgaresi P, Confortini M, Bonardi R. Fine needle aspiration cytology of the breast: review of 9533 consecutive cases. The Breast 1993; 2: 87-90. 8. Warwick D J, Smallwood J A, Guyer P B, Dewbury K C, Taylor I. Ultrasound mammography in the management of breast cancer. Br J Surg 1988; 75: 243-245. 9. Thomas J M, Fitzharris B M, Redding W H et al. Clinical examination, xeromammography, and fine-needle aspiration cytology in diagnosis of breast tumours. BMJ 1978; 2: 1139-1141. 10. Vetrani A, Fulciniti F, Di Benedetto G et al. Fine-needle aspiration biopsies of breast masses. An additional experience with 1153 cases (1985 to 1988) and a meta-analysis. Cancer 1992; 69: 736-740. 11. Martelli G, Pilotti S, Coopmans de Yoldi G et al. Diagnostic efficacy of physical examination, mammography, Brie needle aspiration cytology (triple-test) in solid breast lumps: an analysis of 1708 consecutive cases. Tumori 1990; 76: 476-479. 12. Edeiken S. Mammography and palpable cancer of the breast. Cancer 1988; 61: 263-265.

Likelihood ratios for women with breast problems 13. Sibbering sensitivity with breast 14. Frable W.

D M, Burrell H C, Evans A J et al. Mammographic in women under 50 years presenting symptomatically cancer. The Breast 1995; 4: 127-129. Needle aspiration of the breast. Cancer 1984;

19. Fletcher R H, Fletcher S W, Wagner E H. Clinical epidemiology. The essentials, 2nd ed. Baltimore: Williams & Wilkins, 1988: 61-67. 20. Irwig L, Tosteson A N A, Gatsonis C et al. Guidelines for metaanalyses evaluating diagnostic tests. Ann Intern Med 1994;

53: 67 l-676.

15. Egeli R A, Urban J A. Mammography in symptomatic 50 years of age and under, and those over 50. Cancer 43: 878-882. 16. Kaufman Z, Shpitz

89

120: 667-676.

women 1979;

B, Shapiro M et al. Triple approach in diagnosis of dominant breast masses: combined physical examination, mammography, and fine-needle aspiration. J Oncol 1994; 56: 254-257. 17. Di Pietro S, Fariselli G, Bandieramonte G et al. Diagnostic of the clinical-radiological-cytological triad in solid breast results of a second prospective study on 63 1 patients. Eur Oncol 1987; 13: 335-340. 18. Azzarelli A, Guzzon A, Pilotti S et al. Accuracy of breast diagnosis by physical, radiologic and cytologic combined examinations. Tumori 1983: 69: 137-14 1.

the

21. Sackett D L, Haynes R B, Guyatt G H, Tugwell P. Clinical epidemiology. A basic science for clinical medicine, 2nd ed. Boston: Little Brown, 1991: 119-124. 22. Kerlikowske K, Grady D, Barclay J, Sickles E A, Emster V. Likelihood ratios for modem screening mammography. JAMA

Surg

Jaeschke R, Guyatt literature. How to results of the study 24. National Program Minimum Dataset 1993. 25. Armitage P, Berry Oxford: Blackwell

23.

efficacy lumps: J Surg cancer

1996:

216: 3943.

G, Sackett D L. Users’ guide to the medical use an article about a diagnostic test. Are the valid? JAMA 1994: 271: 389-391. for the Early Detection of Breast Cancer. for Screening and Assessment Services. June G. Statistical methods in medical Scientific. 1994: 130-132.

research,

3rd ed.