Adjusting Mammography—Audit Recommendations in a Lower-Incidence Taiwanese Population

Adjusting Mammography—Audit Recommendations in a Lower-Incidence Taiwanese Population

Adjusting Mammography—Audit Recommendations in a LowerIncidence Taiwanese Population Chin-Yu Chen, MD, MSa,b,c,d, Wen-Sheng Tzeng, MDa, Christopher C...

122KB Sizes 0 Downloads 69 Views

Adjusting Mammography—Audit Recommendations in a LowerIncidence Taiwanese Population Chin-Yu Chen, MD, MSa,b,c,d, Wen-Sheng Tzeng, MDa, Christopher C. Tsai, MD, MPHc, Chee-Wai Mak, MDa, Chia-Hui Chen, MDa, Mei-Chun Chou, MDa Rationale and Objectives: Medical auditing of screening mammography is crucial to improving the quality of breast cancer care. Audit methodology and recommendations are well documented in the ACR’s Breast Imaging Reporting and Data System® (BI-RADS). However, when screening a population with a lower incidence rate of breast cancer, performance recommendations should be adjusted for a better fit. Materials and Methods: On the basis of known lower breast cancer incidence rates in Taiwan compared with the BI-RADS study populations, the authors investigated a proposed calculation method to adjust the recommendations accordingly. A medical audit of 8,249 consecutive digital mammographic screening examinations was completed. All examinations were done by a hospital-based breast imaging department in Taiwan. Imaging interpretation and medical auditing followed the BI-RADS standards. The results were then compared with those of previous studies as well as the proposed recommendations. Results: Two of the BI-RADS medical auditing recommendations were adjusted for the Taiwanese population. They were the positive predictive value (PPV) of the initial screening mammographic examination (PPV1) (changed from 5%-10% to 1.7%-3.4%) and cancer detection rate (changed from 2-10 per 1,000 to 0.7-3.4 per 1,000). In the medical auditing results, there were 89 biopsies, with 22 breast malignancies detected. PPV1 was 3.1%, PPV2 was 16.2%, and PPV3 was 24.7%. The cancer detection rate was 2.7 per 1,000 screens, with minimal cancer of 50%, node negative cancer of 71.4%, and a recall rate of 8.5%. Conclusion: The medical auditing results of this study are consistent with the authors’ proposed adjustments to the BI-RADS recommendations for the Taiwanese population. The calculation methods would be generally applicable to other countries or populations to generate their own recommendations for screening mammography. Key Words: Breast neoplasms, digital mammography, medical audit, breast cancer screening J Am Coll Radiol 2008;5:978-985. Copyright © 2008 American College of Radiology

INTRODUCTION To maintain a high level of quality in the practice of screening mammography, medical auditing has become an essential and important task in every screening practice. In the United States, the Mammography Quality

a

Department of Radiology, Chi-Mei Medical Center, Yung-Kang City, Tainan, Taiwan. b Institute of Biomedical Engineering, National Cheng-Kung University, Tainan City, Taiwan. c Decision Systems Group, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts. d Department of Radiological Technology, Central Taiwan University of Science and Technology, Taichung City, Taiwan. Corresponding author and reprints: Chin-Yu Chen, MD, MS, Chi-Mei Medical Center, Department of Radiology, 321 Dong-Fong Road, Tainan 70457, Taiwan; e-mail: [email protected].

978

Standards Act of 1992 requires the performance of annual medical audits for selected clinical outcomes [1,2]. In Taiwan, with rising concerns over quality control, screening mammography has also become regulated by the Department of Health. In addition to meeting regulatory requirements, medical auditing also helps providers measure and improve their performance in detecting occult breast cancers. Medical auditing has been shown to improve interpretation skills as well as the detection rate of minimal breast carcinomas when applied to a mammography practice for at least 2 years [3,4]. Finally, auditing helps providers demonstrate their screening performance to patients, referring physicians, and thirdparty payers [5]. To date, almost all published large-scale mammographic audit data have been based on Western populations [3-13]. In addition, most studies have been per© 2008 American College of Radiology 0091-2182/08/$34.00 ● DOI 10.1016/j.jacr.2008.05.009

Chen et al/Mammography Audit Recommendations in Taiwan 979

formed with traditional screen-film mammography, with the exception of the Digital Mammographic Imaging Screening Trial (DMIST), which screened with full-field digital mammographic systems from multiple organizations and was coordinated by the DMIST Investigation Group [6]. A standardized medical auditing method was proposed by ACR in its Breast Imaging Reporting and Data System® (BI-RADS). Benchmarks for each item in a medical audit are detailed [14]. The recommended values come from the Agency for Health Care Policy and Research (AHCPR; now the Agency for Healthcare Research and Quality) [15] and are based on published results in North America. Because the incidence rates of breast cancer are generally lower in Asian countries, the recommended benchmark values should be adjusted to specific Asian populations. According to recent reports by the Department of Health in Taiwan [16], the age-adjusted incidence rate of breast cancer in 2002 was 43.3 per 100,000 women. The corresponding rate in the United States was 129.1 per 100,000 women between 2000 and 2003 [17]. Thus, the incidence rate in Taiwan was about one-third of that in the United States. In this study, we investigated a proposed calculation method to adjust the BI-RADS/ AHCPR recommendations on the basis of different breast cancer incidence rates. We performed a medical audit of 8,249 digital mammographic screening examinations of Taiwanese women and compared the results with published results from North America as well as benchmarks based on our proposed adjustments for Taiwanese population.

(1)

CDR ⫽ TP ⁄ (TP ⫹ FP ⫹ TN ⫹ FN),

(2)

and

where TP is the number of true-positive results, FP is the number of false-positive results, TN is the number of true-negative results, and FN is the number of falsenegative results. With sensitivity defined as TP/(TP ⫹ FN) and known to be 0.85 according to our assumption, FN is a covariate of TP. We can thus replace (TP ⫹ FN) in the numerator of equation 1 with TP multiplied by a constant. Thus, the ratio of CDRs between 2 different populations would be the same as the ratio of cancer incidence. To illustrate, we demonstrate the ratio of CDRs between Taiwan and the United States in equation 3. This is the equation we propose to adjust the recommendations for CDR to the Taiwan population: CDRTW ⁄ CDRUS ⫽ ITW ⁄ IUS ⫽ 43.3 ⁄ 129.1 ⫽ 0.335. (3) To adjust the recommendation of the positive predictive value (PPV) of the initial screening mammographic examination (PPV1), additional calculations are necessary. With W as the number in the screening cohort, W ⫽ TP ⫹ FP ⫹ TN ⫹ FN. From the definitions of sensitivity and specificity, equations 4, 5, and 6 are as follows: sensitivity ⫽ TP ⁄ (TP ⫹ FN) ⫽ TP⁄(I ⫻ W), TP ⫽ I ⫻ W ⫻ sensitivity,

METHODS

(4)

specificity ⫽ TN⁄(TN ⫹ FP)

Calculating Methods to Adjust the Medical Auditing Recommendations The methods we propose here are based on the assumption that screening mammography performs equally well in detecting breast cancers among different populations. With this assumption, we retained the same values for sensitivity (85%) and specificity (90%) documented by the BI-RADS/AHCPR recommendations. Let IUS be the breast cancer incidence rate in the United States (129.1 in 100,000) and ITW be the incidence rate in Taiwan (43.3 in 100,000). The ratio ITW/IUS is close to onethird. Without any risk assessment protocol applied before screening mammography, we also assumed that the incidence rate of the screening group is representative of that of the general population. Thus, we could use the population incidence rate as the screening-group incidence rate. The equations for screening group cancer incidence (I) and cancer detection rate (CDR) are thus

I ⫽ (TP ⫹ FN) ⁄ (TP ⫹ FP ⫹ TN ⫹ FN),

⫽ TN ⁄ [(1 ⫺ I) ⫻ W], TN ⫽ (1 ⫺ I) ⫻ W ⫻ specificity,

(5)

and FP ⫽ (1 ⫺ I) ⫻ W ⫺ TN ⫽ (1 ⫻ I) ⫻ W ⫻ (1 ⫺ specificity).

(6)

Because PPV is defined as TP/(TP ⫹ FP), using equations 4 and 6, equation 7 for PPV can be derived: PPV ⫽ TP⁄(TP ⫹ FP) ⫽ (I ⫻ sensitivity) ⁄ [I ⫻ sensitivity ⫹ (1 ⫺ I) ⫻ (1 ⫺ specificity)].

(7)

With the same sensitivity and specificity by our assumption, the ratio of PPV1 in Taiwan (PPV1TW) to that in the United States (PPV1US) would be

980 Journal of the American College of Radiology/ Vol. 5 No. 9 September 2008

PPV1TW ⁄ PPV1US ⫽ [IUS ⫻ ITW ⫻ sensitivity ⫹ (1 ⫺ specificity) ⫻ (1 ⫺ IUS) ⫻ ITW] ⁄ [IUS ⫻ ITW ⫻ sensitivity ⫹ (1 ⫺ specificity) ⫻ (1 ⫺ ITW) ⫻ IUS].

(8)

Inserting real values for IUS, ITW, sensitivity, and specificity into the equation results in PPV1TW/PPV1US ⫽ 0.336. This is the equation we propose to adjust PPV1. The other items in the medical audit are correlated with the quality and performance of screening mammography rather than cancer incidence rates, so no further adjustment of the BI-RADS/AHCPR recommendations is necessary. They include the PPV of diagnostic positive results (PPV2), the PPV of biopsy results (PPV3), minimal cancer percentage, node-negative cancer percentage, and the recall rate. Medical Audit of a Screening Mammography Practice in Taiwan Retrospectively, a total of 8,249 consecutive digital mammographic screening examinations were completed over a period of 4 years from 2002 to 2005, which were performed in a hospital-based breast imaging department in Taiwan. The candidates came from the Taiwanese population, among which the largest ethnic group is Han Chinese (98%); the remaining 2% are composed of indigenous peoples of Taiwan and immigrants from Southeast Asia. The age distribution of our screening cohort ranged from 40 to 89 years, with a mean age of 54.4 years. The screening facility operated one full-field digital mammographic scanner (Senographe 2000D; GE Medical Systems, Milwaukee, Wisconsin). The site was also accredited in performing mammography by the Taiwan Department of Health. The screening candidates came from two referral groups: the larger group was from the governmental screening program, consisting of 4,889 examinations (59.3%), and the smaller group consisted of self-referrals, with 3,360 examinations (40.7%). All screening mammograms were interpreted by board-certified radiologists who specialized in breast imaging and were approved to read screening mammograms. All interpretations, reporting, and medical outcome audits followed BI-RADS standards [14]. A series of 4 images of bilateral craniocaudal and mediolateral oblique views were taken for all examinations. Additional images were taken or ultrasound was performed only when the initial 4 images resulted in a BI-RADS category of zero; that is, the study was incomplete and additional imaging was necessary for evaluation. The interpretations were categorized as positive if the results were cat-

egory 0 (need additional imaging evaluation or prior mammograms for comparison), 4 (suspicious abnormality), or 5 (highly suggestive of malignancy); interpretations were categorized as negative for categories 1 (negative), 2 (benign findings), and 3 (probably benign findings). Women with “probably benign findings” were all to receive subsequent follow-up after a 6-month period. For any imaging finding considered to require immediate follow-up within 6 months, the category would be 4 (normally 4a) instead of 3. A mammography tracking system developed by Chen et al [18] was used for the purpose of medical outcome auditing. To fulfill the requirements of the Mammography Quality Standards Act for medical auditing and outcome analysis, the system was designed to track the positive mammographic results of BI-RADS categories 0, 4, and 5; record final assessment categories; collect pathologic results for all biopsies; correlate pathologic results with the final BI-RADS assessment categories; record any case of breast cancer with FN mammographic results; and generate medical audit data on the basis of any period of time, different referral groups, different age groups, and different interpreting radiologists. The FN studies were defined by cancer diagnosed within one year after negative screening mammographic results, including BI-RADS category 1, 2, or 3. The cancer could be diagnosed either clinically as a symptomatic interval cancer or by abnormal subsequent screening examination results performed within one year. The subsequent screening examination could be either ultrasound or mammography, which depended on the selection of self-referrals or the policy of the governmental screening programs. There are 2 different governmental screening programs in Taiwan; one offers screening mammography biannually for women aged 50 and 69 years and is reimbursed by the National Health Insurance, and the other is a screening trial directed by the Taiwan Department of Health for women aged 40 to 49 years that offers annual screening alternatively by mammography or ultrasound. Candidates for the governmental screening programs who live in the same county were referred to our hospital to undergo screening. The TN studies and non-biopsy-proven FP studies were later confirmed by negative results on subsequent screening examinations. According to the follow-up and outcome-monitoring sections in BI-RADS, the items listed in the basic clinically relevant mammography audit were calculated, including TP results, FP results (FP1, FP2, and FP3), PPV (PPV1, PPV2, and PPV3), the CDR for screening examinations, the percentage of minimal cancers found, the percentage of node-negative invasive cancers found, the abnormal interpretation (recall) rate of screening cases, and sensitivity and specificity. FP1 and PPV1 were cal-

Chen et al/Mammography Audit Recommendations in Taiwan 981

Table 1. BI-RADS medical audit results by cohort and different age groups Age Intervals Medical Audit Item Examination number True-positive results False-positive (FP) results FP1 FP2 FP3 Positive predictive value (PPV) PPV1 PPV2 PPV3 Cancer detection rate (per 1,000) Minimal cancer percentage Node-negative cancer percentage Recall rate Sensitivity Specificity

Cohort 8,249 22

40–49 2,062 3

50–59 4,055 11

>60 2,132 8

685 114 67

216 38 22

347 61 34

122 15 11

3.1% 16.2% 24.7% 2.7 50% 71.4% 8.5% 81.5% 91.7%

1.4% 7.3% 12.0% 1.5 100% 100% 10.6% 100% 89.5%

3.1% 15.3% 24.4% 2.7 54.6% 75.0% 8.8% 73.3% 91.4%

6.2% 34.8% 42.1% 3.8 25.0% 50% 6.1% 88.9% 94.3%

Note: BI-RADS ⫽ Breast Imaging Reporting and Data System.

culated on the basis of abnormal findings at screening examinations (ie, initial BI-RADS category 0, 4, or 5). FP2 and PPV2 were calculated on the basis of recommendations for biopsy or surgical consultation (ie, final diagnostic BI-RADS category 4 or 5). FP3 and PPV3 were calculated on the basis of the results of performed biopsies. PPV3 is also known as the positive biopsy rate. Minimal cancer was defined as invasive cancer equal to or less than 1 cm in size, or ductal carcinoma in situ. In addition, to calculate the audit results of the cohort, we further divided them into smaller groups by different age intervals (Table 1). We also calculated the percentages of initial screening results of BI-RADS categories and made a comparison to larger scale data from the Breast Cancer Surveillance Consortium (BCSC) of the United States [19] (Table 2). The initial screening BI-RADS categories were collected from the results of the original screening mammographic examinations. Thus, they are composed of all the different categories from 0 to 5. For patients categorized as zero in the initial screening stage, further imaging studies would be recommended for abnormal or uncertain image findings. Thereafter, the final diagnostic BI-RADS categories were collected into our computerized tracking system after finishing their additional imaging studies. To compare overall performance, we constructed a table of our medical audit results, accompanied by the results of 4 prior studies from DMIST [6]; the New Mexico Mammography Project [10]; the University of California, San Francisco [20]; and British Columbia [13]. We also included the BI-RADS/AHCPR recom-

mendations and the modified recommendations we proposed for the Taiwanese population (Table 3). RESULTS Proposed Medical Audit Recommendations for the Taiwanese Population With the modifications we proposed in equations 3 and 4, only 2 items from the BI-RADS/AHCPR recommenTable 2. Distribution of screening cohort in BI-RADS categories Our Data From Breast Screening Cancer Results in Surveillance Taiwan Consortium [19] ⴱ n (%) n (%) Category 0 672 (8.1%) 176,922 (7.7%) 1 6,798 (82.4%) 1,552,941 (67.8%) 2 661 (8.0%) 488,543 (21.3%) 3 83 (1.0%) 42,346 (1.8%) 4 24 (0.3%) 9,540 (0.4%) 5 11 (0.1%) 1,244 (0.1%) Total 8,249 2,271,536 Note: BI-RADS ⫽ Breast Imaging Reporting and Data System. ⴱ BI-RADS assessment categories [14]: 0 ⫽ need additional imaging evaluation and/or prior mammograms for comparison; 1 ⫽ negative; 2 ⫽ benign findings; 3 ⫽ probably benign finding—initial short-interval follow-up suggested; 4 ⫽ suspicious abnormality— biopsy should be considered; 5 ⫽ highly suggestive of malignancy—appropriate action should be taken.

5.2 89.5 95.2

Note: AHCPR ⫽ Agency for Health Care Policy and Research; BI-RADS ⫽ Breast Imaging Reporting and Data System; DMIST ⫽ Digital Mammographic Imaging Screening Trial; PPV ⫽ Positive Predictive Value; UCSF ⫽ University of California, San Francisco. ⴱ Minimal cancer is invasive cancer equal to or less than 1 cm or ductal carcinoma in situ. †Minimal cancer was secondarily acquired by summation of stages in Tis, T1mic, T1a, and T1b on digital mammography in DMIST. ‡Node positivity was secondarily acquired by applying equation of (N1 ⫹ N2)/(N0 ⫹ N1 ⫹ N2) on digital mammography in DMIST. §Cancer detection rate was secondarily acquired by (number of cancers detected on digital mammography)/(numbers with verified cancer status screened by digital mammography) in DMIST.

1.7–3.4 25–40 ⬎30 ⬍25 0.7–3.4 ⬍10 ⬎85 ⬎90 5–10 25–40 ⬎30 ⬍25 2–10 ⬍10 ⬎85 ⬎90 5.8 34 44.1 13.8 6.2 6.8 84 92.9 9.3 33.5 56.7 11

4.3 16.9 49 18.3 5 11.4 79.9 90.5 3.1 16.2 50 28.6 2.7 8.5 81.5 91.7

Audit Measure

PPV1 (screen) (%) PPV2 (diagnosis) (%) Minimal cancerⴱ (%) Node positivity (%) Cancers per 1,000 screens Recall rate (%) Sensitivity (%) Specificity (%)

5 — 57.8† 25.6‡ 4.3§ 8.6 70 92

BI-RADS AHCPR Recommendation British Columbia Outcome [13] UCSF Outcome [20] New Mexico Mammography Project [10] DMIST, BI-RADS Score on Digital Mammography [6] Our Medical Audit Results Taiwan

Table 3. Comparison of medical audit results with outcomes from other studies and AHCPR recommendation in BI-RADS [14, 15]

Adjusted Recommendation for Taiwan

982 Journal of the American College of Radiology/ Vol. 5 No. 9 September 2008

dations needed to be adjusted for the different cancer incidence rates: the CDR and PPV1. According to our calculations for the Taiwanese population, the CDR should be adjusted from 2 to 10 per 1,000 down to 0.7 to 3.4 per 1,000, and PPV1 should be adjusted from 5% to 10% down to 1.7% to 3.4%. Recommendations for the other items remained the same. Our adjusted recommendations are listed in the rightmost column in Table 3. Medical Audit Results The results of our medical audit are illustrated in Table 1. The first column demonstrates the results of all 8,249 examinations. There were 89 biopsies, with 22 breast malignancies detected, 11 of which were minimal cancers, including 7 results of ductal carcinoma in situ and 4 results of early invasive ductal carcinoma (⬍1 cm, T1aN0). We calculated PPV1 of 3.1%, PPV2 of 16.2%, and PPV3 of 24.7%. The CDR was 2.7 per 1,000 screening examinations. Minimal cancers constituted 50% among all cancers detected by screening. Node-negative invasive cancers were found to be 71.4%. The overall recall rate (ie, examinations with initial BI-RADS categories of 0, 4, or 5) was 8.5%. In total, 5 FN results were revealed later and recorded into our tracking system. One was a cluster of microcalcifications initially categorized as probably benign, and malignancy was later diagnosed during the 6-month follow-up study. Another was a small invasive ductal carcinoma revealed by subsequent ultrasound screening within one year. The remaining 3 FN studies were presented as interval cancers, which were clinically detected within one year. Thus, we had sensitivity of 81.5% and specificity of 91.7%. Under the definition of BI-RADS, a FN result is defined as having a tissue diagnosis of breast cancer within one year after a negative mammographic examination result (ie, BI-RADS category 1, 2, or 3). According to different age groups in the right columns of Table 1, the PPVs and CDRs were highest in the eldest group (aged ⬎ 60 years), with 6.2% for PPV1, 34.8% for PPV2, 42.1% for PPV3, and 3.8 per 1,000 for the CDR. They were lowest in the group aged 40 to 49 years, at 1.4% for PPV1, 7.3% for PPV2, 12.0% for PPV3, and 1.5 per 1,000 for the CDR. Other noticeable results were a higher recall rate of 10.6% in the youngest group (40-49 years) and lower recall rates of 8.8% and 6.1% in the 2 next oldest groups. BI-RADS Categories of the Screening Cohort In Table 2, the number and percentage of BI-RADS categories in our screening mammography cohort are shown in the left column. There were 672 in category 0 in the initial screening results, which constituted 8.1% of the cohort. Normally, they would need additional imag-

Chen et al/Mammography Audit Recommendations in Taiwan 983

ing studies for abnormal or uncertain image findings. Unfortunately, 174 of them did not complete the studies, which constituted 2.0% of the cohort and were marked as “lost to follow-up.” This situation is a common problem in real-world mammographic screening. Here we had initial positive results of 8.5% in categories 0, 4, and 5, which were also represented as the recall rate. Our initial positive results of categories 4 and 5 were 0.4%. A recently published study with larger scale data from the BCSC is also listed in the right column for comparison [19]. Although some difference is demonstrated between categories 1 (negative) and 2 (benign), we obtained similar results when we combined these 2 categories (90.4% for our study and 89.1% for the BCSC study). Because these 2 categories are considered to be negative clinically, it is reasonable to combine them for comparison. Overall, we had similar percentages in each category compared with the BCSC’s results. Comparison of Medical Audit Results In Table 3, results from our medical audits of screening mammography are listed together with 4 prior studies in North America, the BI-RADS/AHCPR recommendations, as well as the adjusted recommendations we proposed for Taiwanese women. Our PPV1, PPV2, and CDR were generally less than those of the other studies in North America. However, both PPV1 and the CDR fit our adjusted recommendations. Although sensitivity was not as high as the recommended 85%, it was still comparable with that of the other studies. The recall rate, specificity, and minimal cancer percentage all fell in the range of recommendations. However, our node positivity was over the recommended margin. DISCUSSION The Proposed Adjustment of the BI-RADS/ AHCPR Recommendations Because of different breast cancer incidence rates between Taiwan and the United States, we proposed calculation methods to adjust the original recommendations of BI-RADS and the AHCPR (Table 3). We proposed changing two items: PPV1 and the CDR. This is because the difference in breast cancer incidence rates affects PPV1 and the CDR. The other items in the medical audit are related to the performance of mammography, and recommendations for them are not expected to change assuming equal performance of screening mammography. Although PPV3 is not included in Table 3 and has not been addressed in BI-RADS/AHCPR recommendations, we could assume that it is equal to the recommendations for PPV2. Ideally, if all candidates diagnosed in

BI-RADS categories 4 and 5 underwent tissue proof, the positive results in diagnosis and performed biopsies would be the same. With the same denominators, PPV2 and PPV3 would have equal values. The Effectiveness of Screening Mammography for the Taiwanese Population In comparison with BI-RADS/AHCPR recommendations [15] as well as results from DMIST [6], the New Mexico Mammography Project [10], the University of California, San Francisco [20], and British Columbia [13] (Table 3), the PPV1 of 3.1% from our screening cohort satisfies the adjusted recommendation we proposed (from 1.7% to 3.4%). As expected, it was less than that of the other 4 studies (range, 4.3%-9.3%) as well as the initial recommendations of 5% to 10%. Thus, our actual medical audit result for PPV1 and our proposed modification of PPV1 are consistent. However, our PPV2 of 16.2% was below the recommended value of 25% to 40%. Note that a similar PPV2 of 16.2% was reported by the New Mexico project [10]. Corresponding to our PPV3 of 24.7%, the main reason for our unsatisfactory PPV2 could be a low biopsy completion rate of only 61.4%. That means that 38.6% of the candidates we suggested for biopsy did not follow through. Some of them switched to other hospitals for second opinions. Others were simply lost to follow-up. To handle this problem, an accessible national cancer registry would be helpful for tracking the results. Among this group, some decided to take herbal medicines instead of modern treatment. The enhancement of public education would be one way to improve public awareness and compliance. The relatively low biopsy rate would be a remarkable problem in a real-world mammographic screening practice. Other results in our study more closely fit the desired values. The CDR of 2.7 per 1,000 is an example. It not only satisfies the initial BI-RADS/AHCPR recommendation of 2 to 10 per 1,000 but also better fits the midrange of our modified recommendations of 0.7 to 3.4 per 1,000. Although the first screening mammographic examination should have a higher CDR (known as prevalent cancers) than the subsequent screening (known as incidental cancers), we did not separate the calculations, because our cohort of 8,249 was not large enough to prevent any possible bias. Our minimal cancer percentage of 50% and recall rate of 8.5% both also fit the recommendations. The node positivity percentage of 28.6% is a little above the recommended margin. These factors could mainly be used to address the quality and performance of our screening mammography, with the CDR serving “as a useful measure of the effectiveness of screening mammography” [1].

984 Journal of the American College of Radiology/ Vol. 5 No. 9 September 2008

Thus, the effectiveness of screening mammography in the Taiwanese population is acceptable according to our results. Although we did not have large randomized controlled trials to prove decreasing mortality resulting from screening mammography, we were at least able to detect an optimal percentage of early breast cancers, which tend to reduce the mortality rate. Overall, we calculated a recall rate of 8.5%, sensitivity of 81.5%, and specificity of 91.7%, which fit or were close to the margins of the medical audit recommendations. There is still some debate in Taiwan and other Asian countries regarding the appropriateness of using mammography in breast cancer screening. Some argue that the higher breast tissue density in Asian women decreases the ability of mammography to detect cancer, resulting in lower detection rates than in Western countries. The lower PPV1 and CDRs in our medical audit are well explained and corrected by the difference in breast cancer incidence rates. With similar sensitivity and specificity, and good minimal cancer detection percentage in our results, we still conclude that screening mammography can perform almost equally well in the Taiwanese population. In our study, we actually experienced 5 FN results. These patients’ cancers were later picked up either by subsequent ultrasound screening without any microcalcification or interval cancers. Ultrasound has been reported to be better in detecting occult breast mass in dense breasts [21,22]. However, the feasibility of ultrasound applied to mass screening of breast cancer is not yet known. We do not know whether ultrasound achieves the same CDR and minimal cancer percentage in the same population, nor do we know if it can reduce cancer mortality. Empirically, microcalcifications, a major feature of early breast cancers, are much more difficult to detect by ultrasound than by mammography. Analysis of Different Age Subgroups According to our results in Table 1, with groups of different age intervals, the eldest group of those aged more than 60 years had the highest PPVs (PPV1 of 6.2%, PPV2 of 34.8%, and PPV3 of 42.1%) and a relatively higher CDR (3.8 per 1,000). These results indicate better performance of mammography in the eldest group. However, the percentage of minimal cancers detected in the eldest age group is not adequate, which at 25% is less than the generally desired 30%. That means that although cancer is more easily found in older women, it is less likely to be in early stages. In addition, specificity is also highest in the eldest age group (94.3%). The finding was empirically compatible. Breast tissue in older women is less dense, making us more confident in interpreting mammographic results at negative. According to the description of BI-RADS, dense fibroglandular tissue could

lower the sensitivity or even obscure a lesion in mammography. CONCLUSION Recommendations in medical auditing for screening mammography should be adjusted for populations or countries with different breast cancer incidence rates. We proposed calculation methods to modify the recommended PPV1 and CDR according to the known difference in cancer incidence rates between Taiwan and Western countries. These modifications are crucial in applying governmental regulation of mammography quality standards on the basis of medical audit results. The methods of adjustments we describe can be also applied to other countries with breast cancer incidence rates known to be different. We conclude that mammographic screening for breast cancer performs well in our population. More data are necessary to evaluate the feasibility of screening younger women. ACKNOWLEDGMENTS We would like to thank Robert A. Greenes, MD, PhD, Lucila Ohno-Machado, MD, PhD, and all our colleagues in the Decision Systems Group at Brigham and Women’s Hospital for their guidance and support. REFERENCES 1. Linver MN, Osuch JR, Brenner RJ, et al. The mammography audit: a primer for the Mammography Quality Standards Act (MQSA). AJR Am J Roentgenol 1995;165:19-25. 2. Monsees BS. The Mammography Quality Standards Act: an overview of the regulations and guidance. Radiol Clin North Am 2000;38:759-71. 3. Linver MN, Paster SB, Rosenberg RD, et al. Improvement in mammography interpretation skills in a community radiology practice after dedicated teaching course: 2-year medical audit of 38,633 cases. Radiology 1992;184:39-43. 4. Schmidt F, Hartwagner KA, Spork EB, et al. Medical audit after 26,711 breast imaging studies: improved rate of detection of small breast carcinomas (classified as Tis or T1a,b). Cancer 1998;83:2516-20. 5. Sickles EA, Ominsky SH, Sollitto RA, et al. Medical audit of a rapidthroughput mammography screening practice: methodology and results of 27,114 examinations. Radiology 1990;175:323-7. 6. Pisano ED, Gatsonis C, Hendrick E, et al. Diagnostic performance of digital versus film mammography for breast-cancer screening. N Engl J Med 2005;353:1773-83. 7. Dee KE, Sickles EA. Medical audit of diagnostic mammography examinations: comparison with screening outcomes obtained concurrently. AJR Am J Roentgenol 2001;176:729-33. 8. Robertson CL. A private breast imaging practice: medical audit of 25,788 screening and 1,077 diagnostic examinations. Radiology 1993;187:75-9. 9. Tabar L, Vitak B, Chen HH, et al. The Swedish two-county trial twenty years later. Radiol Clin North Am 2000;38:625-51. 10. Rosenberg RD, Lando JF, Hunt WC, et al. The New Mexico Mammography Project: screening mammography performance in Albuquerque, New Mexico, 1991 to 1993. Cancer 1996;78:1731-9.

Chen et al/Mammography Audit Recommendations in Taiwan 985 11. Thurfjell E. Population-based mammography screening in clinical practice: results from the prevalence round in Uppsala county. Acta Radiol 1994;35:487-91.

(2000-2003), National Cancer Institute, DCCPS, Surveillance Research Program, Cancer Statistics Branch, released April 2006, based on the November 2005 submission.

12. Lynde JL. Low-cost screening mammography: results of 21,141 consecutive examinations in a community program. South Med J 1993;86: 338-43.

18. Chen CY, Mak CW, Chen CH, et al. Using Microsoft® Office in BIRADS mammographic outcome monitoring. Chin J Radiol 2005;30: 341-6.

13. Burhenne LJ, Burhenne HJ, Kan L. Quality-oriented mass mammography screening. Radiology 1995;194:185-8.

19. Jiang Y, Miglioretti DL, Metz CE, et al. Breast cancer detection rate: designing imaging trials to demonstrate improvements. Radiology 2007; 243:360-7.

14. American College of Radiology. ACR BI-RADS—mammography. 4th ed. Reston, Va: American College of Radiology; 2003. 15. Bassett LW, Hendrick RE, Bassford TL, et al. Quality determinants of mammography. Clinical Practice Guideline No. 13. AHCPR Publication No. 95-0632. Rockville, Md: Agency for Health Care Policy and Research; 1994. 16. Department of Health, Taiwan. Health and vital statistics. Taiwan: Department of Health; 2004. 17. Surveillance, Epidemiology, and End Results (SEER) Program. SEER* Stat database: incidence—SEER 17 regs public-use, Nov 2005 Sub

20. Sickle EA. How to conduct an audit. In: Kopans DB, editor. Categorical course in breast imaging. Oak Brook, Ill: Radiological Society of North America; 1995:81-91. 21. Wu GH, Chen LS, Chang KJ, et al. Evolution of breast cancer screening in countries with intermediate and increasing incidence of breast cancer. J Med Screen 2006;13:23-7. 22. Kolb TM, Lichy J, Newhouse JH. Occult cancer in women with dense breasts: detection with screening US— diagnostic yield and tumor characteristics. Radiology 1998;207:191-9.