Intraindividual Comparison of Two Methods of Volumetric Breast Composition Assessment Constanze Schmachtenberg, Sophie Hammann-Kloss, Ulrich Bick, Florian Engelken Rationale and Objectives: To compare the results of two software-based methods, Quantra and Volpara, for volumetric breast composition assessment. Materials and Methods: Four hundred forty-five normal, bilateral, two-view, digital mammograms were included. Breast volume (BV), fibroglandular tissue volume (FTV), and percent density (PD) were measured using both methods and compared. Deming regression was performed to obtain linear equations for mapping the results of one software on the other. Results: The median and quartile ranges of both methods agreed well for BV but were different for FTV and PD, with Quantra showing much higher values of FTV and PD. The correlation of results obtained by both methods for BV, FTV, and PD was 0.99, 0.91, and 0.94, respectively. Intraclass correlation in the assignment of quartiles of BV, FTV, and PD was 0.96, 0.86, and 0.90, respectively. Both methods showed a similar association of FTV and PD with patient age and similar left-to-right correlation. Mapping of results onto each other using linear equations removed the systematic differences. Conclusions: Although Quantra and Volpara use different models for analysis of volumetric breast composition and produce different nominal results of FTV and PD, both methods are highly correlated and show very good to excellent agreement in quartile assignment of all parameters measured. Both methods show a similar association with patient age and similar reproducibility. Both methods can be mapped onto each other using the equations suggested. Key Words: Mammary glands; human; mammography; automatic data processing; breast density. ªAUR, 2015
M
ammographic breast density is a strong predictor of breast cancer (1,2). Recent studies indicate that it may also be a useful parameter for assessing the response to tamoxifen (3,4) and for estimating the prognosis of breast cancer patients (5,6). Breast density also strongly affects the sensitivity of screening mammography and may be used as a variable to individualize screening regimens in the future (7). Visual breast density assessment, which is used in most clinical and many research settings, is fast but is limited by only moderate inter-reader and intra-reader agreement (8–10). Semiquantitative methods were developed to improve reproducibility, but are time consuming because they require reader interaction (11). More recently, several methods for automated volumetric breast composition assessment have
Acad Radiol 2015; 22:447–452 –Universita €tsmedizin Berlin, From the Department of Radiology, Charite platz 1, 10117 Berlin, Germany (C.S., U.B., F.E.); and Department of Charite € Berlin, Arztlicher Radiology Charite Dienst, Evangelisches Geriatriezentrum Berlin gGmbH, Berlin, Germany (S.H.K.). Received July 31, 2014; accepted December 6, 2014. Address correspondence to: C.S. e-mail: constanze.
[email protected] ªAUR, 2015 http://dx.doi.org/10.1016/j.acra.2014.12.003
been introduced which provide fast and highly reproducible quantification of breast density and absolute volumes of breast tissue components (12,13). Different models are used to derive the volumes of breast tissue components from digital mammogram raw data, and these produce very different results (14–16). This potentially limits the applicability of the results of research studies using either technique. Although the Quantra software (Hologic Inc., Bedford, MA) uses the acquisition parameters of the mammogram together with a model of x-ray attenuation of different breast tissues, the Volpara software (Matakina, Wellington, New Zealand) uses ‘‘relative physics,’’ with features of the mammogram to be analyzed for calibration (14). Individual software algorithms have been compared to visual assessment of breast density (15,17,18) and to magnetic resonance imaging (MRI) as a reference standard (14,19). However, owing to the different models used, it is not clear how the results obtained with different softwares can be compared to each other. The aim of this study was to perform intraindividual comparison of two volumetric breast composition assessment software methods, the Quantra and Volpara, 1) to determine whether there is a linear or a more complex relationship between the results obtained with both methods and 2) to determine how results from one software can be compared to those from the other. 447
SCHMACHTENBERG ET AL
Academic Radiology, Vol 22, No 4, April 2015
Figure 1. Example of a mammogram (left side only). The results of the volumetric breast composition analysis were as follows: Quantra: breast volume (BV), 593 cc; fibroglandular tissue volume (FTV), 89 cc; percent density (PD), 15%; Volpara: BV, 519 cc; FTV, 67 cc; PD, 13%.
MATERIALS AND METHODS
Volumetric Breast Composition Analysis Software
This retrospective study was approved by the local institutional review board.
Raw image data of all four views was analyzed with Quantra 2.0 and Volpara Research, version 1.4.3. The software determined breast volume (BV), fibroglandular tissue volume (FTV), and breast percent density (PD). Unless stated otherwise, BV, FTV, and PD of both sides were averaged. An example of a mammography (left side only) and the results of the volumetric breast composition analysis are shown in Figure 1.
Image Data
We analyzed 445 bilateral two-view (craniocaudal and mediolateral oblique) mammograms from a database of patients who were subjects in a longitudinal study on changes in breast composition with aging (20). The mammograms were acquired on the same mammography unit (GE Senographe 2000D, General Electric Company, Fairfield, CT) in our institution between August 2000 and December, 2009. Patients with a history of breast surgery or objects in the projection area were not included in the database. The initial mammogram of each patient was used in the current analysis. Both sides and both views were included. Images were reviewed visually for positioning errors. Minor errors (such as suboptimal depiction of the inframammary fold) were tolerated, whereas sides with major positioning errors were excluded. The minimum normal mammographic follow-up was 2 years. The age range was 28–80 years.
448
Statistical Analysis
On scatter plots of the results of both methods, lines of best fit using linear, quadratic, cubic, and logarithmic models were drawn. No relevant improvement of goodness of fit (r2) was found for any of the nonlinear models; therefore, linear correlation was assumed. For assessing the correlation of breast composition parameters with patient age and of breast composition parameters of the left and right breasts, the Spearman correlation coefficients were calculated. For assessing correlation of BV, FTV, and PD quartiles, the intraclass correlation coefficient (ICC) for both methods was calculated
Academic Radiology, Vol 22, No 4, April 2015
INTRAINDIVIDUAL COMPARISON OF TWO METHODS
TABLE 1. Median and Quartile Ranges of Both Softwares Median and Qartile Ranges of both Softwares
Quartiles Median
1
2
3
4
107–430 98–405
431–603 406–563
604–855 564–770
856–1618 771–1565
61–87 42–55
88–131 56–80
132–501 81–246
9.8–15.0 6.9–10.8
15.1–23.5 10.9–16.7
23.6–69.0 16.8–37.3
3
BV (cm ) Quantra Volpara FTV (cm3) Quantra Volpara PD (%) Quantra Volpara
604 563 87.0 55.4
16.5–60 16.0–41
15.0 10.8
3.0–9.7 2.3–6.8
BV, breast volume; FTV, fibroglandular tissue volume; PD, percent density.
using a two-way mixed-effects model for single measures in IBM SPSS Statistics, version 21. For method comparison, the Deming analysis was performed. The coefficient of variation was estimated as follows: Cy ¼
s m
(1)
where s is the sample standard deviation and m is the sample mean.
discrepancies with high FTV and PD. However, because the relative error of these data points was comparable to many others, we did not consider them outliers. We mapped the results of one software over the other using the linear equations obtained from the Deming regression. Following mapping, the relative errors in FTV and PD were normally distributed, whereas for BV, they were minimally skewed toward higher Volpara results. The standard deviations of the relative errors were 0.07, 0.23, and 0.22 for BV, FTV, and PD, respectively.
RESULTS There was a good agreement of median and quartile ranges for BV with both methods. In contrast, the results were significantly different for FTV and PD, with Volpara showing a much lower spread in results than Quantra leading to generally lower nominal values. However, minimum values agreed well. The ICC between Quantra and Volpara in assigning patients to density and volume quartiles ranged from 0.86 (FTV) to 0.96 (BV; Table 1). Figure 2 shows the agreement of quartiles classification of both softwares for each parameter. Agreement was highest for BV and least for FTV and was lower in the middle two quartiles than at the extremes. Disagreement of more than one quartile was rare for FTV and PD and absent for BV. Both softwares showed a similar association of breast composition with patient age (Fig 3). The correlation between PD and age was 0.38 for both methods, whereas the correlation between FTV and age was 0.30 and 0.27 for Volpara and Quantra, respectively (P < .001 for all). The correlations between the left and the right breasts for Volpara and Quantra were 0.97 and 0.96, respectively, for BV; 0.91 and 0.87, respectively, for FTV; and 0.95 and 0.90, respectively, for PD. The Deming regression of the results of Quantra and Volpara (Fig 4) showed linear correlation between the two software algorithms for each volumetric breast composition parameter. Spearman correlation coefficients of the results were 0.99 for BV, 0.91 for FTV, and 0.94 for PD. Figures 4b and 4c show data points with large nominal
DISCUSSION In this study, we compared two software applications for assessing volumetric breast composition. Although the results obtained with both methods differ in magnitude, Quantra and Volpara show excellent correlation in all parameters. The correlation is linear and the results can be calibrated to produce similar readings. The results of both softwares show a similar correlation with patient age. Although both softwares agreed well on BV and on the minimum FTV and PD, Quantra produced a wider range of results, with a maximum FTV and PD that were nearly twice as high as with the Volpara software. These discrepancies reflect differences in the models used for the estimation of FTV by the software algorithms. Possible sources of variation are the methods used to exclude skin as well as the way in which the breast edge is included, where the skin is not in contact with the compression paddle. Despite nominal differences, the correlation of software results was nearly perfect for BV and excellent for FTV and PD. The goodness of fit of lines/curves of best fit did not improve with nonlinear models and was therefore assumed to be linear. This is relevant to exclude that the use of different approaches with respect to internal calibration produces nonlinear systematic error, such as deviation in results at either end of the breast density spectrum. Linear correlation means that the results produced by each software can easily be converted (mapped) to the other. The remaining discrepancies between both softwares followed a random 449
SCHMACHTENBERG ET AL
Academic Radiology, Vol 22, No 4, April 2015
Figure 2. Comparison of quartiles classification for (a) breast volume (BV), (b) fibroglandular tissue volume (FTV), and (c) percent density (PD). The black bars indicate complete agreement, whereas the gray bars indicate disagreement by one or two categories. ICC, intraclass correlation coefficient.
Figure 3. Correlation of volumetric breast composition (a) mean percent density (PD) in 1-year age cohorts, (b) mean fibroglandular tissue volume (FTV) in 1-year age cohorts. Note that the data points represent mean values in each 1-year age group and represent single individuals at the extremes of age. The regression lines were derived from the raw data.
distribution for FTV and PD. Relative errors in mapped data >14% for BV and >46% for FTV and PD were rare. With respect to forming quartiles of BV, FTV, and PD, both methods showed a high ICC. Both methods showed a similar correlation with patient age. This is consistent with the assumption that both methods are equally useful for the assessment of breast composition for the purpose of breast cancer risk stratification. However, further direct testing of this assumption is necessary using databases of mammograms containing breast cancer cases. Correlation of the left and right breasts was slightly higher for Volpara, possibly indicating minimally higher reproducibility. One previous study determined the association of volumetric PD with age and found a correlation of 0.20, which is lower than in the present study (21). A possible reason for 450
the stronger correlation with age found in this study is the wider age range of the subjects. It is of interest to compare the agreement of the softwares tested to the interobserver agreement of visual breast density assessment. A number of studies reported interobserver agreement of the Breast Imaging Reporting and Data System (BI-RADS) four-category rating system for mammographic density. The quoted interobserver agreements range from 0.54 (kappa) (9) to 0.65–0.84 (weighted kappa) (8,10,22). However, the use of kappa statistics for this purpose (>2 categories, ordinal categories) has been criticized (23). Among studies using the ICC, interobserver agreement on breast density ranged from 0.71 to 0.77 (10,13). The two softwares analyzed in this study showed an agreement (ICC) of 0.90 for PD in the quartile-based comparison; therefore,
Academic Radiology, Vol 22, No 4, April 2015
INTRAINDIVIDUAL COMPARISON OF TWO METHODS
Figure 4. Deming regression of the results of Quantra and Volpara for (a) breast volume (BV), (b) fibroglandular tissue volume (FTV), and (c) percent density (PD).
even when results from these different methods have to be compared, the agreement is superior to visual assessment by human readers using the BI-RADS system. Wang et al. (19) recently compared both software tools with MRI and single–energy x-ray absorptiometry, another volumetric density method. They found that Volpara correlated more closely with MRI for the assessment of PD, whereas the Quantra results were in better agreement with MRI in terms of absolute volumes. They noted moderate to substantial agreement of both softwares to MRI and to each other. However, the study was limited by the use of unilateral craniocaudal view (CC) views only. This increases the effect of positioning errors on the result and may affect the performance of the Volpara software, which relies on reference points of fat for its calculation, which are more difficult to find on CC than on mediolateral oblique view (MLO) views. Also, the number of patients was much lower than in the present study. MRI is often considered the gold standard for assessing breast composition. However, even in MRI, variation in total breast volume measurement may be introduced by difficulties in drawing the line between mammary and truncal adipose tissue (19). When assessing breast composition, it is therefore not an easy task to define the ground truth. For most clinical and research applications, however, consistency of the measurements is more important than accuracy. We did not compare volumetric breast density analysis to visual assessment of breast density, for several reasons. First, this has been the subject of a number of other studies (15,17,18). Second, visual assessment of breast density shows large variation between different readers and may vary across groups of different readers, for example, readers from different institutions (24). The results of comparisons of visual and volumetric breast density assessment are therefore difficult to interpret. Furthermore, it may be questioned whether visual assessment represents a useful outcome for comparison and evaluation of automated breast density assessment methods. Tools to objectively measure breast density and composition were developed to improve consistency, not to simulate visual assessment. Much of the research on the association between breast density and breast cancer is based on semiautomated techniques such as interactive thresholding. Automated techniques such as volumetric breast composition analysis take this approach further by allowing larger patient numbers to be analyzed in research studies while at the same time facilitating the application of research findings to clinical practice. However, efforts toward standardizing the results of
different methods are required, which was the purpose of this study. The software tools analyzed in this study are useful for objectifying breast composition analysis for the purpose of estimating sensitivity of mammography as well as breast cancer risk prediction. The results of our study represent an important basis for further research by showing that the results of both methods are easily comparable. If necessary, the suggested linear equations can be used for mapping the results of both methods on each other. An alternative is the use of quartiles for categorization. Both methods show very good agreement in the quartile-based analysis and appear preferable to human readers even when results from different software products have to be compared. Our study has several limitations. Most importantly, the association of the results of both methods to breast cancer risk was not tested, which is a necessary next step. Second, we did not correlate the results of volumetric analysis with other imaging modalities, such as MRI, which could have served as an additional reference. We only used images from one mammography platform for comparison. We cannot exclude that platform-specific factors may affect how the software results compare on units from other manufacturers. In conclusion, the results of Volpara and Quantra for volumetric breast composition analysis differ in magnitude but are highly correlated. Using volume or density quartiles, both methods show very good to excellent agreement, which is superior to the reported interobserver agreement of human readers using the BI-RADS four-category scale.
ACKNOWLEDGMENTS Our department has received equipment support from Hologic, Inc. and Matakina International Limited. REFERENCES 1. Assi V, Warwick J, Cuzick J, et al. Clinical and epidemiological issues in mammographic density. Nat Rev Clin Oncol 2012; 9(1):33–40. 2. McCormack VA, dos Santos Silva I. Breast density and parenchymal patterns as markers of breast cancer risk: a meta-analysis. Cancer Epidemiol Biomarkers Prev 2006; 15(6):1159–1169. 3. Cuzick J, Warwick J, Pinney E, et al. Tamoxifen-induced reduction in mammographic density and breast cancer risk reduction: a nested case-control study. J Natl Cancer Inst 2011; 103(9):744–752. 4. Li J, Humphreys K, Eriksson L, et al. Mammographic density reduction is a prognostic marker of response to adjuvant tamoxifen therapy in postmenopausal patients with breast cancer. J Clin Oncol 2013; 31(18):2249–2256.
451
SCHMACHTENBERG ET AL
5. Maskarinec G, Pagano IS, Little MA, et al. Mammographic density as a predictor of breast cancer survival: the Multiethnic Cohort. Breast Cancer Res 2013; 15(1):R7. 6. Sandberg ME, Li J, Hall P, et al. Change of mammographic density predicts the risk of contralateral breast cancer—a case-control study. Breast Cancer Res 2013; 15(4):R57. 7. Drukteinis JS, Mooney BP, Flowers CI, et al. Beyond mammography: new frontiers in breast cancer screening. Am J Med 2013; 126(6):472–479. 8. Bernardi D, Pellegrini M, Di Michele S, et al. Interobserver agreement in breast radiological density attribution according to BI-RADS quantitative classification. Radiol Med 2012; 117(4):519–528. 9. Ciatto S, Houssami N, Apruzzese A, et al. Categorizing breast mammographic density: intra- and interobserver reproducibility of BI-RADS density categories. Breast 2005; 14(4):269–275. 10. Ooms EA, Zonderland HM, Eijkemans MJ, et al. Mammography: interobserver variability in breast density assessment. Breast 2007; 16(6): 568–576. 11. Byng JW, Boyd NF, Fishell E, et al. The quantitative analysis of mammographic densities. Phys Med Biol 1994; 39(10):1629–1638. 12. Engelken F, Singh JM, Fallenberg EM, et al. Volumetric breast composition analysis: reproducibility of breast percent density and fibroglandular tissue volume measurements in serial mammograms. Acta Radiol 2014; 55(1):32–38. 13. Singh JM, Fallenberg EM, Diekmann F, et al. Volumetric breast density assessment: reproducibility in serial examinations and comparison with visual assessment. Rofo 2013; 185(9):844–848. 14. van Engeland S, Snoeren PR, Huisman H, et al. Volumetric breast density estimation from full-field digital mammograms. IEEE Trans Med Imaging 2006; 25(3):273–282.
452
Academic Radiology, Vol 22, No 4, April 2015
15. Ciatto S, Bernardi D, Calabrese M, et al. A first evaluation of breast radiological density assessment by QUANTRA software as compared to visual classification. Breast 2012; 21(4):503–506. 16. Jeffreys M, Warren R, Highnam R, et al. Initial experiences of using an automated volumetric measure of breast density: the standard mammogram form. Br J Radiol 2006; 79(941):378–382. 17. Gweon HM, Youk JH, Kim JA, et al. Radiologist assessment of breast density by BI-RADS categories versus fully automated volumetric assessment. AJR Am J Roentgenol 2013; 201(3):692–697. 18. Seo JM, Ko ES, Han BK, et al. Automated volumetric breast density estimation: a comparison with visual assessment. Clin Radiol 2013; 68(7): 690–695. 19. Wang J, Azziz A, Fan B, et al. Agreement of mammographic measures of volumetric breast density to MRI. PLoS One 2013; 8(12):e81653. 20. Hammann-Kloss JS, Bick U, Fallenberg E, et al. Volumetric quantification of the effect of aging and hormone replacement therapy on breast composition from digital mammograms. Eur J Radiol 2014; 83(7): 1092–1097. 21. Skippage P, Wilkinson L, Allen S, et al. Correlation of age and HRT use with breast density as assessed by Quantra. Breast J 2013; 19(1):79–86. 22. Redondo A, Comas M, Macia F, et al. Inter- and intraradiologist variability in the BI-RADS assessment and breast density categories for screening mammograms. Br J Radiol 2012; 85(1019):1465–1470. 23. Chmura Kraemer H, Periyakoil VS, Noda A. Kappa coefficients in medical research. Stat Med 2002; 21(14):2109–2129. 24. Sauber N, Chan A, Highnam R. BI-RADS breast density classification - an international standard? Poster Presentation presented at: European Congress of Radiology. Vienna, Austria 2013 Mar 7-11.