ARTICLE IN PRESS
JID: JJBE
[m5G;February 11, 2020;19:54]
Medical Engineering and Physics xxx (xxxx) xxx
Contents lists available at ScienceDirect
Medical Engineering and Physics journal homepage: www.elsevier.com/locate/medengphy
CT-based internal density calibration for opportunistic skeletal assessment using abdominal CT scans Andrew S. Michalski a,b, Bryce A. Besler a,b, Geoffrey J. Michalak a,b, Steven K. Boyd a,b,∗ a b
Department of Radiology, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada McCaig Institute for Bone and Joint Health, University of Calgary, Calgary, Canada
a r t i c l e
i n f o
Article history: Received 9 May 2019 Revised 16 January 2020 Accepted 26 January 2020 Available online xxx Keywords: Internal density calibration CT opportunistic screening Osteoporosis Finite element analysis
a b s t r a c t CT-based opportunistic skeletal assessment complements current osteoporosis diagnosis. Quantitative assessment by internal density calibration overcomes the limitations of phantom-based calibration. We sought to establish and validate an internal calibration technique using abdominal CT scans and establish reproducibility precision for three density calibration techniques. Ten full-body cadavers were CT scanned at the spine and pelvis with a calibration phantom. Internal calibration was performed using in-scan tissue references and deriving a voxel-specific calibration. Bone mineral density (BMD) and finite element (FE) failure load assessed skeletal health. Three independent users measured intra-exam precision by manual tissue selection. To verify results, ten subjects were imaged using an abdominal imaging protocol. Internal calibration performed equivalently to gold-standard phantom-based calibration in the cadaver spine and hip. Internal calibration BMD precision in the spine was 7 mg/cc (4.9%) and FE precision was 163 N (7.2%), whereas phantom-based precision was 3 mg/cc (1.8%) and 77 N (3.8%). Internal calibration hip BMD and FE precision was 11 mg/cc (5.3%) and 84 N (6.0%), whereas phantom-based precision was 2 mg/cc (1.3%) and 30 N (3.4%). Using the abdominal imaging protocol, internal calibration performed comparably to phantom-based calibration. Internal calibration provides BMD and FE outcome precision within 7.2% for opportunistic skeletal health assessment. © 2020 IPEM. Published by Elsevier Ltd. All rights reserved.
1. Introduction Osteoporosis is typically asymptomatic, until a fragility fracture occurs. Dual X-ray Absorptiometry (DXA) is the standard modality for osteoporosis screening and diagnosis; however, there is a gap in skeletal health screening, as many individuals who fracture are not diagnosed as osteoporotic [1]. To narrow the treatment gap, computed tomography (CT) is well-suited to aid osteoporosis screening by assessing bone mineral density (BMD) and finite element (FE) failure load. CT-FE failure load analysis is thoroughly validated [2–5] and can determine bone strength at the spine and hip [3,6–8]. Osteoporosis CT-BMD clinical guidelines have been established by the American College of Radiology (ACR) [9] and
Abbreviations: ACR, American College of Radiology; BMD, bone mineral density; CVRMS , root-mean-square coefficient of variation; DXA, dual x-ray absorptiometry; FE, finite element; HUs, Hounsfield units; KUB, kidney, urinary, and bladder; ROI, region-of-interest; SDRMS , root-mean-square standard deviation. ∗ Corresponding author at: McCaig Institute for Bone and Joint Health, University of Calgary, 3280 Hospital Drive NW, Calgary, AB T2N 4Z6, Canada. E-mail address:
[email protected] (S.K. Boyd).
similarly, CT-FE bone strength thresholds for identifying individuals with low bone strength have been defined using equivalent DXA-based thresholds [10]. However, population-wide CT-FE osteoporosis screening is not available, in part because a normal reference population of healthy young subjects is not characterized. To address this shortcoming, CT-based opportunistic screening, a secondary analysis of clinical CT scans, is an ideal basis to develop a healthy young sex-specific reference population. CT-based skeletal health assessment requires density calibration to convert Hounsfield Units (HUs) to equivalent bone density. Traditionally, in experimental situations, a density calibration phantom is imaged within the scan field-of-view. However, phantombased calibration is rarely used in a clinical setting, because it requires additional time, clinical resources and produces image artifacts. Another technique, asynchronous density calibration [11,12], scans the calibration phantom separately. This technique requires additional phantom imaging, and therefore, limits its clinical applicability. Alternatively, internal density calibration uses in-scan regions-of-interest (ROI) to determine a voxel-specific density calibration. Internal calibration eliminates the requirement of
https://doi.org/10.1016/j.medengphy.2020.01.009 1350-4533/© 2020 IPEM. Published by Elsevier Ltd. All rights reserved.
Please cite this article as: A.S. Michalski, B.A. Besler and G.J. Michalak et al., CT-based internal density calibration for opportunistic skeletal assessment using abdominal CT scans, Medical Engineering and Physics, https://doi.org/10.1016/j.medengphy.2020.01.009
JID: JJBE 2
ARTICLE IN PRESS
[m5G;February 11, 2020;19:54]
A.S. Michalski, B.A. Besler and G.J. Michalak et al. / Medical Engineering and Physics xxx (xxxx) xxx
additional clinical resources and time for performing a calibration scan, as the subject serves as the basis for the calibration. Additionally, the development and validation of internal calibration techniques was defined as a priority by the International Society of Clinical Densitometry [13] to move the field towards clinically implemented skeletal health opportunistic assessment. Internal density calibration techniques have been in development for a number of years now; however, these techniques, either commercially available or custom-made, rely on the basic assumptions of ground truth values for each calibration tissue. Weaver et al. determine the ground truth value of fat to be −69 mg/cc and 77 mg/cc for muscle and then use these values for the internal density calibrations [14]. Similarly, using a commercially available system, subcutaneous fat and paraspinal muscle ROIs are selected and used to derive the BMD measurement [15,16]. Alternatively, ROIs of air and blood or fat have been paired with developed reference values of equivalent BMD to determine a scan-specific density calibration [17]. Despite the strong results supporting the validations of these internal calibration techniques, it is uncertain if these reference values are applicable across imaging protocols for opportunistic screening. Therefore, a more scanspecific approach for density calibration is necessary to reduce the assumptions made by relying on ground truth reference values. To eliminate the need of a calibration phantom for opportunistic CT screening, we sought to establish an internal density calibration approach that does not rely on ground truth reference values. We hypothesize outcomes of BMD and finite element analysis will be equivalent using an internal density calibration approach as compared to phantom-based calibration. Our first aim was to develop and validate an internal density calibration method in a cadaveric model at the spine and the hip and compare it to an in-scan calibration phantom. This internal calibration approach would remove the assumption of ground truth reference values for specific tissues, as used in other internal calibration techniques described previously. Our second aim was to compare asynchronous calibration performance to the phantom-based calibration technique, as a clinical alternative. Our third aim established the short-term reproducibility of our calibration techniques to understand measurement variability. Finally, to demonstrate clinical implementation of opportunistic assessment, we aimed to validate our internal density calibration technique in vivo using a widely used clinical CT protocol. 2. Materials and methods 2.1. Cadaveric imaging Ten full body cadavers (6 males, 4 females, 81.8 ± 10.7 years) were acquired from the Body Donation Program at the University of Calgary. The spine and pelvis regions were scanned separately for each cadaver using a research dedicated clinical CT scanner (Revolution GSI, GE Healthcare, Waukesha, WI, USA). Each scan field-of-view included a density calibration phantom (Model 3 CT Calibration Phantom, Mindways Inc., Austin, TX, USA). The spine region was scanned using a standard helical imaging protocol (120 kVp, 0.625 mm slice thickness, 2.5 mm slice interval, standard-type reconstruction kernel) with exposure set by automatic exposure control. Similarly, the pelvis region was scanned using a standard helical imaging protocol (120 kVp, 0.625 mm slice thickness, 5 mm slice interval, standard reconstruction kernel) with exposure set by automatic exposure control. 2.2. In vivo CT imaging Clinical imaging protocols of the abdomen/pelvis were reviewed for potential use for opportunistic skeletal assessment from which
a general abdominal CT imaging protocol was selected. The general abdominal CT imaging protocol selected was based on the inclusion of both the lumbar spine and the proximal femur, as those are relevant osteoporotic screening sites. Ten male subjects (62.3 ± 10.4 years) were recruited, provided informed consent and imaged using a CT imaging protocol of the kidneys, urinary tract and bladder (KUB) imaging protocol. The KUB imaging protocol (120 kVp, 1.25 slice thickness, −0.8 slice interval, standard-type reconstruction kernel) extended from superior of the kidneys to inferior of the femoral lesser trochanter. The density calibration phantom was included in the scan for validation comparisons. The University of Calgary Conjoint Health Research Ethics Board approved all prospective scanning procedures. 2.3. Density calibration techniques All scans were performed with a density calibration phantom within the scan field-of-view. Additionally, after each subject was scanned, an asynchronous density calibration scan was performed according to standardized methods [18]. For both phantom (Fig. 1A) and asynchronous (Fig. 1B) calibrations, the density calibration rods were manually extracted, and a linear relationship was used to convert HUs to equivalent K2 HPO4 density. Internal density calibration was performed using a custom method developed using the Visualization Toolkit (version 6.3.0) and NumPy (version 1.11.0). From each scan, tissue ROIs for adipose, air, blood, cortical bone, and skeletal muscle were manually sampled from the scan field-of-view, as depicted in Fig. 1C. To reduce influence of variations in tissue HUs across the scan field-ofview, three ROIs were placed adjacent to the bones of interest (L4 vertebra and proximal femur) for each tissue on separate image slices, and the mean HUs were determined from the tissue sample aggregated histograms of each 2D slice. Expected tissue HUs for each ROI were approximately −100 for adipose, −10 0 0 for air, 20 for blood, 1200 for cortical bone and 30 for skeletal muscle, as depicted in Fig. 1D. Polygon ROIs on each slice required a minimum area of 10 mm2 selected for each individual tissue. Using mass attenuation coefficients attained from the National Institute of Standards and Technology (www.nist.gov) [19], the scan effective energy was estimated by iteratively correlating the ROI-specified HUs and corresponding mass attenuation coefficient at each energy level and maximizing the coefficient of determination [20], as shown in Fig. 1D. Once the scan effective energy was determined, HUs were converted to total mass attenuation equivalent values. A two-component mass fraction model [21] was then used to determine voxel-specific mass attenuation and equivalent K2 HPO4 density, which derived the density calibrated image for quantitative analysis. 2.4. Quantitative bone analysis The bone of interest (left proximal femur or L4 spine) was manually segmented for each CT image using ITK-SNAP (www.itksnap. org) [22]. Integral bone mineral density (BMD), reported in mg/cc of K2 HPO4 , was measured using the three density calibration techniques. FE analysis estimated the bone failure load using FAIM (v8.0, Numerics88 Solutions Ltd., Calgary, Canada). FE analysis of the L4 was performed in a standard compression model [2,4]. The standard spine compression model added virtual PMMA caps to the superior and inferior faces of the vertebral body. A 1 mm compression was applied to the superior PMMA cap, while the inferior PMMA cap was constrained in the same axis of the compression. Proximal femur FE was performed in a standard sideways fall loading configuration [5,23]. The standard sideways fall configuration for the proximal femur has virtually added PMMA caps to the greater trochanter and femoral head. A 1 mm compression was
Please cite this article as: A.S. Michalski, B.A. Besler and G.J. Michalak et al., CT-based internal density calibration for opportunistic skeletal assessment using abdominal CT scans, Medical Engineering and Physics, https://doi.org/10.1016/j.medengphy.2020.01.009
JID: JJBE
ARTICLE IN PRESS
[m5G;February 11, 2020;19:54]
A.S. Michalski, B.A. Besler and G.J. Michalak et al. / Medical Engineering and Physics xxx (xxxx) xxx
3
Fig. 1. Density calibration methods for quantitative CT analysis. (A) Phantom-based calibration uses the phantom located in green box. Each calibration rod is sampled from the image to determine linear conversion between HUs and equivalent density. (B) Asynchronous calibration uses the phantom, located in green box and imaged under a quality assurance phantom, scanned separately from the subject. Calibration rods are sampled for density conversion. (C) In scan tissues of reference (adipose [green], air [yellow], blood [red], cortical bone [blue] and skeletal muscle [purple]) are sampled adjacent to the bone of interest for internal calibration. (D) HUs and mass attenuation coefficients for each tissue (open circle data points) are correlated by iterating at each effective energy (EE). Scan effective energy is determined by maximizing the coefficient of determination across all effective energies (black diamond). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
applied to the greater trochanter PMMA cap, while the femoral head PMMA cap was constrained in the same axis of the compression, and the distal femoral diaphysis was constrained in the longitudinal axis. All models consisted of linear isotropic material properties using a density-elastic modulus power-law relationship
[24]. The density calibrated images were first manually aligned to the loading configuration, based on a pre-defined sideways fall or vertebral compression standard. Visual inspection of the boundary conditions ensured accuracy between the standard and aligned image. Image interpolation was performed using a cubic spline and
Please cite this article as: A.S. Michalski, B.A. Besler and G.J. Michalak et al., CT-based internal density calibration for opportunistic skeletal assessment using abdominal CT scans, Medical Engineering and Physics, https://doi.org/10.1016/j.medengphy.2020.01.009
ARTICLE IN PRESS
JID: JJBE
[m5G;February 11, 2020;19:54]
88.16 −237.88 0.96 1.08
Bland-Altman 95% agreement intervals are presented as lower and upper limits. Range values presented as minimum and maximum. R2 = linear regression coefficient of determination. P-value less than 0.05 is statistically significant, denoted by a ∗ .
0.4, 13.1 0.1, 22.6 5.3 8.0 −63, 67 −160, 46
0.31 8.54 1.01 0.97
Phantom-Asynchronous Phantom- Internal FE Failure Load (N) Phantom-Asynchronous Phantom-Internal
0.99 0.97
0.27 0.22
0.31 0.14
2 −57
2.8 3.1 0.97 0.29
2 3
−2, 5 0, 6
R2 Y-Intercept Slope
Cadaveric spine linear regression and Bland-Altman metrics for BMD and FE Failure Load are illustrated in Table 1, and individual data plotted in Fig. 2. Linear regression slopes for both asynchronous (slope = 1.01, p = 0.88) and internal calibration (slope = 0.97, p = 0.48) BMD are not statistically different than one, and from Bland-Altman analysis, there is a small mean absolute difference for the internal calibration data (3 mg/cc, 3.1%), whereas the asynchronous calibration does not have an estimated bias. For failure loads, no statistical differences in slopes (asynchronous slope = 0.96, p = 0.27; internal slope = 1.08, p = 0.22) were found, and there was no estimated bias determined for both the asynchronous and internal calibrations when compared to the phantom calibration technique. Cadaveric femur analysis metrics are shown in Table 2 and are plotted in Fig. 3. When comparing the asynchronous and phantom calibration techniques, regression slopes for both BMD (slope = 1.09, p = 0.01) and failure load (slope = 1.19, p = <0.0 0 01) are statistically different than one. Additionally, an estimated bias of 17 mg/cc (8.8%) was determined for the BMD and 210 N (16.4%) for the FE failure load. The coefficient of determination for both linear regression analyses is 0.99, suggesting that the regression line closely represents the data and there is an inherent difference due to the asynchronous calibration technique. For comparison of the internal and phantom calibration
Table 1 Statistical analysis for cadaveric spine BMD and CT-FE failure load.
3. Results
Slope p-value
Intercept p-value
Mean Absolute Difference
All statistical analyses were performed using R (v3.2.4, The R Foundation for Statistical Computing, Vienna, Austria). This study used 10 cadaveric subjects and 10 in vivo subjects, which allowed us to meet the degree of freedom requirements (a minimum of 27 degrees of freedom is recommended) sufficient for our reproducibility analysis [26]. Linear regression and Bland-Altman analyses were performed to compare asynchronous or internal calibration to the gold-standard phantom-based density calibration method. Linear regression slopes were tested to determine if they were statistically different than one, and intercepts were tested to be statistically different than zero. Estimated measurement bias was assessed by Bland-Altman plots and analysis of the mean difference and estimated 95% agreement intervals [27]. Mean percent differences were calculated from the absolute difference between the two measures and division by the mean of the two measures. Statistical test p-values are reported as significant with an alpha level of 0.05. All grouped data are presented as mean and standard deviation (SD).
0.88 0.48
2.6. Statistical analyses
0.98 0.98
Reproducibility measures were performed for each density calibration technique from the cadaveric subjects at both the spine and hip. For each calibration technique (phantom-based, asynchronous and internal), three users manually selected ROIs appropriate for the density calibration technique. Inter-operator and intra-operator precision measures were determined for each calibration technique by averaging the outcome standard deviations from each repeated ROI placement [26]. Outcomes of BMD and estimated failure load are reported as root-mean-square standard deviation (SDRMS , absolute units) and root-mean-square coefficient of variation (CVRMS ,%).
Mean Percent Difference (%)
2.5. Measurement reproducibility
95% Agreement Interval
Percent Difference Range (%)
then resampled to 1 mm isotropic voxels and meshed directly to linear hexahedral elements. Using previously published mechanical test data [5], our finite element model was validated to estimate bone failure load by the effective strain criterion [25] using a critical volume of 7% and critical strain of 1.1%.
0.1, 6.1 0.9, 8.0
A.S. Michalski, B.A. Besler and G.J. Michalak et al. / Medical Engineering and Physics xxx (xxxx) xxx
BMD (mg/cc)
4
Please cite this article as: A.S. Michalski, B.A. Besler and G.J. Michalak et al., CT-based internal density calibration for opportunistic skeletal assessment using abdominal CT scans, Medical Engineering and Physics, https://doi.org/10.1016/j.medengphy.2020.01.009
ARTICLE IN PRESS
JID: JJBE
[m5G;February 11, 2020;19:54]
0.62 0.50 <0.0001 0.57 −13.44 −166.53 1.19 1.09
Bland-Altman 95% agreement intervals are presented as lower and upper limits. Range values presented as minimum and maximum. R2 = linear regression coefficient of determination. P-value less than 0.05 is statistically significant, denoted by a ∗ .
8.6, 25.1 3.5, 29.3 16.4 17.0 16, 129 −222, 142
5.4, 14.0 0.9, 19.5
0.99 0.86
210 −40
Mean Percent Difference (%)
8.8 8.3 13, 21 −23, 2
95% Agreement Interval Mean Absolute Difference
0.93 0.69 0.01 0.99
Intercept p-value Slope p-value
0.99 0.89 0.50 −11.21
Phantom-Asynchronous Phantom- Internal FE Failure Load (N) Phantom-Asynchronous∗ Phantom-Internal
1.09 1.00
R2 Y-Intercept Slope
∗
BMD (mg/cc)
Table 2 Statistical analysis for cadaveric hip BMD and CT-FE failure load.
17 −11
Percent Difference Range (%)
A.S. Michalski, B.A. Besler and G.J. Michalak et al. / Medical Engineering and Physics xxx (xxxx) xxx
5
Fig. 2. Comparison of calibration techniques from cadaveric subjects at the spine. Linear regression and Bland-Altman analysis compares (A) asynchronous-phantom calibrations BMD, (B) internal-phantom calibrations BMD, (C) asynchronousphantom calibrations FE failure load, (D) internal-phantom calibrations FE failure load. Regression lines and corresponding line equations are in each plot. BlandAltman plots display mean difference (solid line) and 95% agreement intervals (dashed line). R2 is the linear regression coefficient of determination. x¯ is the mean difference between measures.
techniques, regression slopes were not statistically different than one (BMD slope = 1.00, p = 0.99; FE failure load slope = 1.09, p = 0.57) and no estimated bias was determined for either BMD or FE failure load. Although the mean absolute difference was determined to be −40 N between the phantom and internal calibration techniques, the resulting mean percent difference was determined to be 17%. For all linear regression comparisons, all the intercepts were found to not be statistically different than zero.
Please cite this article as: A.S. Michalski, B.A. Besler and G.J. Michalak et al., CT-based internal density calibration for opportunistic skeletal assessment using abdominal CT scans, Medical Engineering and Physics, https://doi.org/10.1016/j.medengphy.2020.01.009
JID: JJBE 6
ARTICLE IN PRESS
[m5G;February 11, 2020;19:54]
A.S. Michalski, B.A. Besler and G.J. Michalak et al. / Medical Engineering and Physics xxx (xxxx) xxx
BMD SDRMS of 3 mg/cc (CVRMS = 1.8%) and FE failure load SDRMS of 77 N (CVRMS = 3.8%). Using asynchronous calibration, we report a BMD SDRMS of 3 mg/cc (CVRMS = 2.2%) and FE failure load SDRMS of 85 N (CVRMS = 4.0%). Similar to the spine, the internal calibration in the hip was found to have the largest measurement variability. Internal calibration BMD was reported having a SDRMS of 11 mg/cc (CVRMS = 5.3%) and a FE failure load SDRMS of 84 N (CVRMS = 6.0%). For phantom-based calibration, we report a BMD SDRMS of 2 mg/cc (CVRMS = 1.3%) and FE failure load SDRMS of 30 N (CVRMS = 3.4%). For asynchronous calibration, we report a BMD SDRMS of 4 mg/cc (CVRMS = 2.1%) and FE failure load SDRMS of 52 N (CVRMS = 3.8%). Intra-operator precision results followed the same trends as the inter-operator precision results, where the internal calibration measurement variability for the spine FE failure load was a SDRMS of 89 N (CVRMS = 4.0%) and the hip FE failure load was a SDRMS of 67 N (CVRMS = 4.9%). In vivo analysis outcomes are presented in Table 5 and Fig. 4. Statistical differences of slopes were not found for either BMD outcome (asynchronous slope = 1.13, p = 0.22; internal slope = 0.96, p = 0.59). Internal FE failure load (slope = 1.11, p = 0.39) slope was not statistically different than one; however, the asynchronous FE failure load (slope = 1.30, p = 0.01) slope compared to the phantom was different than one. The linear regression coefficients of determination for the BMD (R2 = 0.95) and FE failure load (R2 = 0.96) again suggest that there is little variation in the data, and there is an inherent difference from the asynchronous calibration to the phantom calibration in the femur analysis. Additionally, Bland-Altman analysis identified a bias of BMD from asynchronous calibration of 38 mg/cc (13.6%) and internal calibration of 12 mg/cc (4.3%). In the FE failure load outcomes, asynchronous calibration had a bias of 933 N (27.8%), and internal calibration has a bias of 469 N (12.8%). 4. Discussion
Fig. 3. Comparison of calibration techniques from cadaveric subjects at the pelvis. Linear regression and Bland-Altman analysis compares (A) asynchronous-phantom calibrations BMD, (B) internal-phantom calibrations BMD, (C) asynchronousphantom calibrations FE failure load, (D) internal-phantom calibrations FE failure load. Regression lines and corresponding line equations are in each plot. BlandAltman plots display mean difference (solid line) and 95% agreement intervals (dashed line). R2 is the linear regression coefficient of determination. x¯ is the mean difference between measures.
Inter-operator and intra-operator reproducibility were measured at the spine and hip from the cadaveric samples. These data are presented in Tables 3 and 4 for the spine and hip. The internal calibration technique was found to have the largest variability of all calibration techniques for inter-operator measurements at both the spine and the hip. In the spine, BMD was reported having a SDRMS of 7 mg/cc (CVRMS = 4.9%) and a FE failure load SDRMS of 163 N (CVRMS = 7.2%). For phantom-based calibration, we report a
CT-based opportunistic screening requires density calibration to perform quantitative analysis. We validated an internal density calibration technique for both spine and pelvis CT scans, where we quantified BMD and FE estimated failure load. In the cadaveric analyses, we confirmed that internal calibration performs equivalently to the traditional phantom-based calibration. As expected, we report internal density calibration BMD and FE failure load reproducibility measures that are consistently larger than those reported for our phantom-based calibration. By applying an in vivo abdominal CT scanning protocol, we determined strong agreement between the internal and phantom-based calibration approaches, using our skeletal health outcomes. Our CT-based skeletal analysis, using internal calibration, provides a full volumetric BMD characterization and FE estimated bone failure load. Consistent with other studies [15,17,28,29], we find that internal calibration, using in-scan ROIs, is valid at both the lumbar spine and proximal femur. Our findings are consistent with previous reproducibility studies, which use a single slice BMD analysis [15,29] and follow ACR guidelines [9] for skeletal analysis. Mueller et al. reported an internal calibration BMD coefficient of variation of 5.3% in the spine [15], where we report a similar coefficient of variation of 4.9%. However, for studies performing a full volumetric BMD and FE analysis, our measurement reproducibility has a larger variability. Lee et al. report BMD reproducibility at both the spine and femur to be less than 1 mg/cc [17], whereas we report SDRMS of 7 mg/cc at the spine and 11 mg/cc at the femur. Lee’s et al. FE analyses report a SDRMS of 30 N in the spine and 20 N in the femur [17]. Conversely, we report a SDRMS of 163 N in the spine and 84 N in the femur. When comparing absolute differences, we report a sevenfold (7 mg/cc versus 1 mg/cc) increase in BMD at the spine. However, this increase may not be clinically
Please cite this article as: A.S. Michalski, B.A. Besler and G.J. Michalak et al., CT-based internal density calibration for opportunistic skeletal assessment using abdominal CT scans, Medical Engineering and Physics, https://doi.org/10.1016/j.medengphy.2020.01.009
ARTICLE IN PRESS
JID: JJBE
[m5G;February 11, 2020;19:54]
A.S. Michalski, B.A. Besler and G.J. Michalak et al. / Medical Engineering and Physics xxx (xxxx) xxx
7
Table 3 Inter-operator reproducibility analysis for cadaveric spine and hip BMD and CT-FE failure load. Spine
Femur
BMD
Phantom Asynchronous Internal
FE Failure Load
BMD
FE Failure Load
SDRMS (mg/cc)
CVRMS (%)
SDRMS (N)
CVRMS (%)
SDRMS (mg/cc)
CVRMS (%)
SDRMS (N)
CVRMS (%)
3 3 7
1.8 2.2 4.9
77 85 163
3.8 4.0 7.2
2 4 11
1.3 2.1 5.3
30 52 84
3.4 3.8 6.0
SDRMS = root-mean-square standard deviation, CVRMS = root-mean-square coefficient of variation.
Table 4 Intra-operator reproducibility analysis for cadaveric spine and hip BMD and CT-FE failure load. Spine
Femur
BMD
Phantom Asynchronous Internal
FE Failure Load
BMD
FE Failure Load
SDRMS (mg/cc)
CVRMS (%)
SDRMS (N)
CVRMS (%)
SDRMS (mg/cc)
CVRMS (%)
SDRMS (N)
CVRMS (%)
9 1 4
4.2 0.4 2.6
289 14 89
7.3 0.8 4.0
2 1 13
1.0 0.3 6.6
33 10 67
2.7 0.7 4.9
SDRMS = root-mean-square standard deviation, CVRMS = root-mean-square coefficient of variation.
relevant, as the ACR suggests CT BMD thresholds of 40 mg/cc between osteoporosis and normal trabecular bone health. Furthermore, based on our reproducibility data, we can define the least significant change [30] for our analyses to detect clinically relevant bone changes in future studies. Although the absolute differences in BMD may not be clinically significant, the differences in FE failure load may be significant. In our in vivo results, the FE failure load mean absolute difference between the phantom and internal calibration techniques was 469 N (12.8%), whereas the BMD mean absolute difference was only 12 mg/cc (4.3%). This suggests that small BMD differences can propagate to the large differences in estimated failure load we observed, such as a maximum percent difference of 26.9% between phantom and internal calibration outcomes for one femur. A possible explanation for this occurrence is the use of power-law relationships to determine a voxel-specific elastic modulus. The sideways fall on the hip FE model is known to be sensitive to the density-elastic modulus relationship [31]. As a voxel density increases, the power-law relationship causes the stiffness of the FE model element becomes exponentially larger, contributing to the overall increased estimated failure load. Simply by modifying this FE model parameter, this effect could potentially be mitigated. Nevertheless, it is unclear if these large percent differences in estimated failure load would be clinically significant and warrants further investigation. We employed an abdominal imaging protocol, relevant for osteoporosis opportunistic assessment, and validated our internal calibration technique. Other studies have used abdominal scans, generally collected as CT colonography [12,28,29], as an appropriate scan type for opportunistic assessment. The KUB imaging protocol provides a more generalized imaging protocol of the abdomen, rather than being specific for the colon. By using a consistent imaging acquisition and single imaging protocol, we improve our measurement precision [32]. Our validation of skeletal assessment using this generalized protocol allows for an increased number of usable abdominal scans for development of a CT-BMD and CT-FE reference database and future screening. In addition to not requiring a calibration phantom in the scan field-of-view, internal density calibration has multiple advantages over phantom and asynchronous density calibration. Variations in how a subject lies on top of a density phantom can introduce artifacts due to air pockets between the body and the phantom, or the phantom can cause beam hardening and scatter [33,34]. Similarly, for the asynchronous calibration, the subject’s body mass
index is assumed to be the same across all subjects. Body mass index influences the image HUs, as increased amounts of soft tissues preferentially absorb lower energy X-ray photons leading to beam hardening and contribute to HUs variability. The beam hardening artifact could also influence the HUs measurement of the inscan calibration phantom, leading to errors in the calibration. By sampling tissues adjacent to the bone of interest, our internal calibration takes into account the HUs variability by reducing the influence of beam hardening and scatter [35]. A limitation of this study is the lack of automated analysis. Manual input reduces the reproducibility for each technique, as different ROIs were selected for each subject. These minute differences in ROIs lead to increased variation in the observed results. Furthermore, our sample size of ten cadavers and ten in vivo subjects is small and limiting. Similarly, the in vivo cohort consisted solely of men in the analysis. Men are typically larger than women and may lead to increased beam hardening to influence our results. Additional limitations are introduced by the cadaveric model used in the development and validation of this study. It is known that tissues change (e.g., blood coagulation, tissue dehydration) shortly after death; however, we imaged the cadavers within 48 h of death to minimize this effect [36]. With the advent of internal density calibration, clinically acquired CT images become readily available for opportunistic screening. Current internal density calibration approaches rely on assumptions for tissue-specific ground truth values. We developed an internal calibration technique that does not rely on assumptions for tissue-specific ground truth values and validated the technique at the relevant osteoporotic sites of the spine and hip. We found internal calibration estimated failure load was not statistically different at the spine and hip when compared to phantom calibration. Precision outcomes were found to be less than 7.2% at both the spine and hip. In vivo validation performed with the KUB imaging protocol demonstrated that it captures appropriate skeletal sites for screening and achieved strength and BMD assessments by internal calibrations within 4.3% for BMD and 12.8% for FE failure load compared to phantom calibration. Using similar analysis methods as presented here, clinical guidelines are being established for phantom-based CT analysis [9] and FE analysis [10,37]. However, to date clinical data is lacking a subject-specific reference to a population database. By leveraging the clinical availability of the KUB protocol and our internal density calibration technique, we have established the foundations to build a population
Please cite this article as: A.S. Michalski, B.A. Besler and G.J. Michalak et al., CT-based internal density calibration for opportunistic skeletal assessment using abdominal CT scans, Medical Engineering and Physics, https://doi.org/10.1016/j.medengphy.2020.01.009
ARTICLE IN PRESS
JID: JJBE
[m5G;February 11, 2020;19:54]
0.01 0.39 63.73 107.00 1.30 1.11
Bland-Altman 95% agreement intervals are presented as lower and upper limits. Range values presented as minimum and maximum. R2 = linear regression coefficient of determination. P-value less than 0.05 is statistically significant, denoted by a ∗ .
21.5, 33.8 3.1, 26.9 27.8 12.8 754, 1113 289, 650 933 469
11.3, 17.9 0.2, 7.5
0.96 0.91
0.83 0.80
Mean percent difference (%)
13.6 4.3 33, 42 8, 16
95% agreement interval Mean absolute difference
0.22 0.59 0.95 0.95
Intercept p-value Slope p-value R2
4.75 23.96 1.13 0.96
Phantom-Asynchronous Phantom- Internal FE Failure Load (N) Phantom-Asynchronous∗ Phantom-Internal
Y-intercept Slope BMD (mg/cc)
Table 5 Statistical analysis for in vivo hip BMD and CT-FE failure load.
38 12
Percent difference range (%)
A.S. Michalski, B.A. Besler and G.J. Michalak et al. / Medical Engineering and Physics xxx (xxxx) xxx
0.85 0.29
8
Fig. 4. Comparison of calibration techniques from in vivo subjects at the pelvis. Linear regression and Bland-Altman analysis compares (A) asynchronous-phantom calibrations BMD, (B) internal-phantom calibrations BMD, (C) asynchronous-phantom calibrations FE failure load, (D) internal-phantom calibrations FE failure load. Regression lines and corresponding line equations are in each plot. Bland-Altman plots display mean difference (solid line) and 95% agreement intervals (dashed line). R2 is the linear regression coefficient of determination. x¯ is the mean difference between measures.
database, and a path for integrating CT-based opportunistic skeletal health assessment into the clinical workflow.
Declaration of Competing Interest Author S.K.B has co-ownership of Numerics88 Solutions, which is used for the FE analysis in this study (but there was no mone-
Please cite this article as: A.S. Michalski, B.A. Besler and G.J. Michalak et al., CT-based internal density calibration for opportunistic skeletal assessment using abdominal CT scans, Medical Engineering and Physics, https://doi.org/10.1016/j.medengphy.2020.01.009
JID: JJBE
ARTICLE IN PRESS
[m5G;February 11, 2020;19:54]
A.S. Michalski, B.A. Besler and G.J. Michalak et al. / Medical Engineering and Physics xxx (xxxx) xxx
tary benefit). There are no other conflicts of interest for any of the other authors. Acknowledgments This research was supported by the Natural Sciences and Engineering Research Council (NSERC), RGPIN 261693-2013. The University of Calgary Conjoint Health Research Ethics Board approved all scanning procedures (REB 15-1301 and REB 16-0014). References [1] Schuit SC, van der Klift M, Weel AE, de Laet CE, Burger H, Seeman E, et al. Fracture incidence and association with bone mineral density in elderly men and women: the Rotterdam study. Bone 2004;34:195–202. [2] Crawford RP, Cann CE, Keaveny TM. Finite element models predict in vitro vertebral body compressive strength better than quantitative computed tomography. Bone 2003;33:744–50. [3] Imai K, Ohnishi I, Yamamoto S, Nakamura K. In vivo assessment of lumbar vertebral strength in elderly women using computed tomography-based nonlinear finite element model. Spine 2008;33:27–32 Phila Pa 1976. doi:10.1097/BRS. 0b013e31815e3993. [4] Keaveny TM. Biomechanical computed tomography-noninvasive bone strength analysis using clinical computed tomography scans. Ann N Acad Sci 2010;1192:57–65. doi:10.1111/j.1749-6632.2009.05348.x. [5] Nishiyama KK, Gilchrist S, Guy P, Cripton P, Boyd SK. Proximal femur bone strength estimated by a computationally fast finite element analysis in a sideways fall configuration. J Biomech 2013;46:1231–6. doi:10.1016/j.jbiomech. 2013.02.025. [6] Faulkner KG, Cann CE, Hasegawa BH. Effect of bone distribution on vertebral strength: assessment with patient-specific nonlinear finite element analysis. Radiology 1991;179:669–74. doi:10.1148/radiology.179.3.2027972. [7] Melton LJ, Riggs BL, Keaveny TM, Achenbach SJ, Hoffmann PF, Camp JJ, et al. Structural determinants of vertebral fracture risk. J Bone Min Res 2007;22:1885–92. doi:10.1359/jbmr.070728. [8] Nishiyama KK, Ito M, Harada A, Boyd SK. Classification of women with and without hip fracture based on quantitative computed tomography and finite element analysis. Osteoporos Int 2014;25:619–26. doi:10.1007/ s00198-013-2459-6. [9] American College of Radiology. ACR–SPR–SSR practice parameter for the performance of musculoskeletal quantitative computed tomography (QCT). Reston, Virginia, United States: American College of Radiology; 2018. [10] Kopperdahl DL, Aspelund T, Hoffmann PF, Sigurdsson S, Siggeirsdottir K, Harris TB, et al. Assessment of incident spine and hip fractures in women and men using finite element analysis of CT scans. J Bone Min Res 2014;29:570– 80. doi:10.1002/jbmr.2069. [11] Bauer JS, Henning TD, Müeller D, Lu Y, Majumdar S, Link TM. Volumetric quantitative CT of the spine and hip derived from contrast-enhanced MDCT: conversion factors. Am J Roentgenol 2007;188:1294–301. doi:10.2214/AJR.06.1006. [12] Pickhardt PJ, Bodeen G, Brett A, Brown JK, Binkley N. Comparison of femoral neck BMD evaluation obtained using lunar DXA and QCT with asynchronous calibration from CT colonography. J Clin Densitom 2015;18:5–12. doi:10.1016/j. jocd.2014.03.002. [13] Engelke K, Lang T, Khosla S, Qin L, Zysset P, Leslie WD, et al. Clinical use of quantitative computed tomography-based advanced techniques in the management of osteoporosis in adults: the 2015 ISCD official positions-part III. J Clin Densitom 2015;18:393–407. doi:10.1016/j.jocd.2015.06.010. [14] Weaver AA, Beavers KM, Hightower RC, Lynch SK, Miller AN, Stitzel JD. Lumbar bone mineral density phantomless computed tomography measurements and correlation with age and fracture incidence. Traffic Inj Prev 2015;16(Suppl 2):S153–60. doi:10.1080/15389588.2015.1054029. [15] Mueller DK, Kutscherenko A, Bartel H, Vlassenbroek A, Ourednicek P, Erckenbrecht J. Phantom-less QCT BMD system as screening tool for osteoporosis without additional radiation. Eur J Radiol 2011;79:375–81. doi:10.1016/j.ejrad. 2010.02.008. [16] Therkildsen J, Thygesen J, Winther S, Svensson M, Hauge E-M, Bottcher M, et al. Vertebral bone mineral density measured by quantitative computed tomography with and without a calibration phantom: a comparison between 2 different software solutions. J Clin Densitom Off J Int Soc Clin Densitom 2018;21:367–74. doi:10.1016/j.jocd.2017.12.003.
9
[17] Lee DC, Hoffmann PF, Kopperdahl DL, Keaveny TM. Phantomless calibration of CT scans for measurement of BMD and bone strength-Inter-operator reanalysis precision. Bone 2017;103:325–33. doi:10.1016/j.bone.2017.07.029. [18] Brown JK, Timm W, Bodeen G, Chason A, Perry M, Vernacchia F, et al. Asynchronously calibrated quantitative bone densitometry. J Clin Densitom Off J Int Soc Clin Densitom 2017;20:216–25. doi:10.1016/j.jocd.2015.11.001. [19] White DR, Wilson IJ, Griffith RV. Report 46. J Int Comm Radiat Units Meas 2016 os24:NP-NP. doi:10.1093/jicru/os24.1.Report46. [20] Millner MR, Payne WH, Waggener RG, McDavid WD, Dennis MJ, Sank VJ. Determination of effective energies in CT calibration. Med Phys 1978;5:543–5. doi:10.1118/1.594488. [21] Genant HK, Boyd D. Quantitative bone mineral analysis using dual energy computed tomography. Invest Radiol 1977;12:545–51. [22] Yushkevich PA, Piven J, Hazlett HC, Smith RG, Ho S, Gee JC, et al. User-guided 3D active contour segmentation of anatomical structures: significantly improved efficiency and reliability. Neuroimage 2006;31:1116–28. doi:10.1016/j. neuroimage.2006.01.015. [23] Michalski AS, Edwards WB, Boyd SK. The influence of reconstruction kernel on bone mineral and strength estimates using quantitative computed tomography and finite element analysis. J Clin Densitom Off J Int Soc Clin Densitom 2017;22:219–28. doi:10.1016/j.jocd.2017.09.001. [24] Keller TS. Predicting the compressive mechanical behavior of bone. J Biomech 1994;27:1159–68. [25] Pistoia W, van Rietbergen B, Lochmuller EM, Lill CA, Eckstein F, Ruegsegger P. Estimation of distal radius failure load with micro-finite element analysis models based on three-dimensional peripheral quantitative computed tomography images. Bone 2002;30:842–8. [26] Gluer CC, Blake G, Lu Y, Blunt BA, Jergas M, Genant HK. Accurate assessment of precision errors: how to measure the reproducibility of bone densitometry techniques. Osteoporos Int 1995;5:262–70. [27] Giavarina D. Understanding Bland Altman analysis. Biochem Med Zagreb 2015;25:141–51. doi:10.11613/BM.2015.015. [28] Fidler JL, Murthy NS, Khosla S, Clarke BL, Bruining DH, Kopperdahl DL, et al. Comprehensive assessment of osteoporosis and bone fragility with CT colonography. Radiology 2016;278:172–80. doi:10.1148/radiol.2015141984. [29] Pickhardt PJ, Lee LJ, del Rio AM, Lauder T, Bruce RJ, Summers RM, et al. Simultaneous screening for osteoporosis at CT colonography: bone mineral density assessment using mdct attenuation techniques compared with the DXA reference standard. J Bone Miner Res Off J Am Soc Bone Miner Res 2011;26:2194– 203. doi:10.1002/jbmr.428. [30] Shepherd JA, Lu Y. A generalized least significant change for individuals measured on different DXA systems. J Clin Densitom Off J Int Soc Clin Densitom 2007;10:249–58. doi:10.1016/j.jocd.2007.05.002. [31] Helgason B, Gilchrist S, Ariza O, Vogt P, Enns-Bray W, Widmer RP, et al. The influence of the modulus-density relationship and the material mapping method on the simulated mechanical response of the proximal femur in sideways fall loading configuration. Med Eng Phys 2016;38:679–89. doi:10.1016/j. medengphy.2016.03.006. [32] Bligh M, Bidaut L, White RA, Murphy WA Jr, Stevens DM, Cody DD. Helical multidetector row quantitative computed tomography (QCT) precision. Acad Radiol 2009;16:150–9. doi:10.1016/j.acra.20 08.08.0 07. [33] Imamura K, Fujii M. Empirical beam hardening correction in the measurement of vertebral bone mineral content by computed tomography. Radiology 1981;138:223–6. doi:10.1148/radiology.138.1.7470214. [34] Merritt RB, Chenery SG. Quantitative CT measurements: the effect of scatter acceptance and filter characteristics on the EMI 7070. Phys Med Biol 1986;31:55–63. doi:10.1088/0031-9155/31/1/005. [35] Boden SD, Goodenough DJ, Stockham CD, Jacobs E, Dina T, Allman RM. Precise measurement of vertebral bone density using computed tomography without the use of an external reference phantom. J Digit Imaging 1989;2:31–8. [36] Tavichakorntrakool R, Prasongwattana V, Sriboonlue P, Puapairoj A, Pongskul J, Khuntikeo N, et al. Serial analyses of postmortem changes in human skeletal muscle: a case study of alterations in proteome profile, histology, electrolyte contents, water composition, and enzyme activity. Proteomics Clin Appl 2008;2:1255–64. doi:10.1002/prca.200800051. [37] Adams AL, Fischer H, Kopperdahl DL, Lee DC, Black DM, Bouxsein ML, et al. Osteoporosis and hip fracture risk from routine computed tomography scans: the fracture, osteoporosis, and CT utilization study (FOCUS). J Bone Miner Res Off J Am Soc Bone Miner Res 2018;33:1291–301. doi:10.1002/jbmr.3423.
Please cite this article as: A.S. Michalski, B.A. Besler and G.J. Michalak et al., CT-based internal density calibration for opportunistic skeletal assessment using abdominal CT scans, Medical Engineering and Physics, https://doi.org/10.1016/j.medengphy.2020.01.009