Original Investigation
Deep Learning Reconstruction at CT: Phantom Study of the Image Characteristics Toru Higaki, PhD, Yuko Nakamura, MD, PhD, Jian Zhou, PhD, Zhou Yu, PhD, Takuya Nemoto, MS, Fuminari Tatsugami, MD, PhD, Kazuo Awai, MD, PhD
Abbreviations CT Computed tomography FBP Filtered back projection Hybrid-IR and h-IR Hybrid iterative reconstruction MBIR Model-based iterative reconstruction DLR Deep-learning reconstruction HU Hounsfield unit SD Standard deviation NPS Noise power spectrum MTF Modulation-transfer function MO Machine observer
Objectives: Noise, commonly encountered on computed tomography (CT) images, can impact diagnostic accuracy. To reduce the image noise, we developed a deep-learning reconstruction (DLR) method that integrates deep convolutional neural networks into image reconstruction. In this phantom study, we compared the image noise characteristics, spatial resolution, and task-based detectability on DLR images and images reconstructed with other state-of-the art techniques. Methods: We scanned a phantom harboring cylindrical modules with different contrast on a 320-row detector CT scanner. Phantom images were reconstructed with filtered back projection, hybrid iterative reconstruction, model-based iterative reconstruction, and DLR. The standard deviation of the CT number and the noise power spectrum were calculated for noise characterization. The 10% modulationtransfer function (MTF) level was used to evaluate spatial resolution; task-based detectability was assessed using the model observer method. Results: On images reconstructed with DLR, the noise was lower than on images subjected to other reconstructions, especially at low radiation dose settings. Noise power spectrum measurements also showed that the noise amplitude was lower, especially for low-frequency components, on DLR images. Based on the MTF, spatial resolution was higher on model-based iterative reconstruction image than DLR image, however, for lower-contrast objects, the MTF on DLR images was comparable to images reconstructed with other methods. The machine observer study showed that at reduced radiation-dose settings, DLR yielded the best detectability. Conclusion: On DLR images, the image noise was lower, and high-contrast spatial resolution and taskbased detectability were better than on images reconstructed with other state-of-the art techniques. DLR also outperformed other methods with respect to task-based detectability. Key Words: Phantoms; imaging; neural networks; X-ray computed tomography; machine learning; artificial intelligence. © 2019 The Association of University Radiologists. Published by Elsevier Inc. All rights reserved.
INTRODUCTION
C
omputed tomography (CT) image reconstruction has evolved from filtered back projection (FBP) to hybrid iterative- and model-based iterative reconstruction (h-IR, MBIR). As the image noise is lower and artifacts are fewer on h-IR- and MBIR images than on FBP
Acad Radiol 2020; 27:82–87 From the Department of Diagnostic Radiology, Hiroshima University, 1-2-3 Kasumi, Minami-ku, Hiroshima 734-8551, Japan (T.H., Y.N., F.T., K.A.); Canon Medical Research USA, Vernon Hills, IL, United States (J.Z., Z.Y.); Canon Medical Systems, Otawara, Tochigi, Japan (T.N.). Received May 6, 2019; revised June 21, 2019; accepted September 8, 2019. Implications for patient care: N/A (Phantom study). Address correspondence to: T.H. e-mail:
[email protected] © 2019 The Association of University Radiologists. Published by Elsevier Inc. All rights reserved. https://doi.org/10.1016/j.acra.2019.09.008
82
images, h-IR and MBIR are widely used to reduce radiation exposure and improve diagnostic ability (1 4). Although they improve the image quality significantly, especially on low-dose CT studies, the image texture, spatial resolution, and object detectability remain unsatisfactory (5 8). Therefore, we developed an image reconstruction method we call “deep learning reconstruction (DLR)” (9,10). Deep learning is one of the major strategies for computeraided diagnosis. Its value for organ recognition (segmentation), lesion detection, and lesion characterization has been explored (11) and its ability to detect and reduce the image noise has been demonstrated (12 16). Unlike the conventional noise reduction methods, which involve a trade-off between spatial resolution and noise reduction (6,17), deep learning lowers the image noise and increases spatial resolution simultaneously (9). In this phantom study we examined the noise characteristics, spatial resolution, and task-based detectability of DLR images.
Academic Radiology, Vol 27, No 1, January 2020
DEEP LEARNING RECONSTRUCTION AT CT
MATERIALS AND METHODS
Quantitative Image Quality Evaluation
Principles of Deep Learning Reconstruction
The standard deviation of the CT attenuation number on homogeneous regions in our phantom was used as the overall image noise indicator. The noise power spectrum (NPS) was the index for the image noise characteristics. The NPS, determined with standard Fourier techniques, is presented in radially-rebinned one-dimensional graphs (18,19). The task-based modulation transfer function (MTF), determined with a circular edge method (20), was the index for spatial resolution. The task contrast for the MTF was 50, 100, and 200 HU. When measuring the MTF, to eliminate the effect of noise on the edge spread function, we generated an ensemble average of test-object images. For each test object, the edge of the cylindrical sensitometry objects on this composite image was used to generate the edge spread function which was then differentiated into a line spread function and transformed into an MTF using standard Fourier methods. To quantify the MTF, the value of 10% MTF (MTF10%), i.e., the spatial frequency at the point where the MTF chart is 0.1, was measured. To obtain the detectability index, we recorded the task-based detectability index dʹ (21) at a contrast of 50, 100, and 200 HU. The targeted object size for detectability calculation was 3.0 5.0 mm. A summary of the image evaluation is shown in Table 2.
DLR is a CT image reconstruction method applied with a deep convolutional neural network to improve the image quality (9,10). The teaching data used for DLR training are high-quality CT images reconstructed with MBIR whose parameters are adjusted to obtain the best image quality. For higher throughput in the clinical setting, the MBIR processing time must be as short as possible. However, for reconstructing the teaching dataset, the processing time is not an issue. The DLR algorithm has been commercialized as the Advanced intelligent Clear-IQ Engine by Canon Medical Systems Corp. (CMSC). CT Acquisition and Image Reconstruction
To evaluate the image quality, a cylindrical, 200-mm diameter phantom harboring 8 cylindrical modules filled with different concentrations of diluted iodine contrast medium was scanned on a 320-row detector CT instrument (Aquilion ONE GENESIS edition, CMSC, Otawara, Japan). The base of the phantom was created with a 3D printer and made of acrylic material. The contrast between the acrylic phantom base and the CM modules was 400, 200, 100, 50, 20, 10, 5, and 60 Hounsfield units (HU). The CM concentration in the CM modules was 17.25, 9.66, 5.87, 3.98, 2.84, 2.46, 2.27, and 0.00 mgI/ml. The diameter of the cylindrical modules was 10 mm (Fig. 1). The X-ray tube voltage was 120 kV; the X-ray tube current was 300, 250, 200, 130, 100, 80, 60, 40, and 20 mA. The CTDIvol for each tube current was 18.7, 15.6, 12.5, 8.1, 6.2, 5.0, 3.7, 2.5, and 1.2 mGy. We defined doses of 10 mGy or more as high-, and of less than 5 mGy as reduced dose radiation. Doses of 5.0, 6.2, and 8.1 mGy represented the medium dose. The detector configuration was 0.5 mm £ 80-rows, and the rotation time was 1.0 sec/rotation. The images were subjected to FBP (FC13), h-IR (AIDR 3D Standard FC13, CMSC), MBIR (FIRST BODY Standard, CMSC), and DLR (Advanced intelligent Clear-IQ Engine BODY Standard, CMSC). The scan protocol is shown in Table 1.
RESULTS In all graphs (Figs. 2 5), actual measurement data are identified by dots; the spline interpolated curve is a solid line. Image Noise
As shown in Figure 2, the overall image noise was highest on FBP images at all X-ray tube currents. The h-IR and DLR curves were similar, but the DLR- was always lower than the h-IR curve. At high radiation doses, the noise was lowest on MBIR images, however, at low doses it was higher than on h-IR and DLR images. At a high radiation dose (15.6 mGy), the noise component was smallest on MBIR images (Fig. 3a) at all noise frequency bands except for the low-frequency domain. At 6.2 mGy
Figure 1. Phantom for evaluating image characteristics.
83
HIGAKI ET AL
Academic Radiology, Vol 27, No 1, January 2020
TABLE 1. Summary of CT Scan Protocol X-ray Tube Voltage [kV]
120
X-ray tube current [mA]
High dose
Radiation exposure CTDIvol [mGy] Rotation time [sec/rotation]
300 18.7
250 15.6
Medium dose
200 12.5
150 8.1
Noise characteristics Spatial resolution Detectability
60 3.7
40 2.5
20 1.2
0.5 mm £ 80-row FBP FC13
Hybrid IR AIDR 3D FC13
TABLE 2. Summary of Image Quality Evaluation Method Overall noise
80 5.0 1.0
Detector configuration Image reconstruction
100 6.2
Reduced dose
Standard deviation (SD) of the CT attenuation number on a homogeneous region Noise power spectrum (NPS) curve 10% modulation transfer function (MTF10༅) Task-based detectability index dʹ
MBIR FIRST Body standard
DLR AiCE Body standard
Spatial Resolution
As shown in Figure 4, the MTF10% on FBP images was constant at all radiation doses and task contrast settings. At high radiation doses (18.7, 15.6, 12.5 mGy), MTF10% of lowcontrast object (50 HU) was the same on h-IR, MBIR-, and DLR images. These MTF10% values decreased parallelly in the low radiation dose range, especially below 5 mGy. As the contrast setting increased, so did MTF10% on all but FBP images; it was highest on MBIR images. Task-based Detectability
As the task contrast increased, so did the detectability index dʹ of MBIR and DLR images (Fig. 5). At low radiation doses (3.7, 2.5, 1.2 mGy) dʹ was highest on DLR images at all task contrast settings. At doses above 8.1 mGy, dʹ was highest on MBIR images. Representative Phantom Image
Figure 2. Overall image noise. (Color version of figure is available online.)
(medium dose), the NPS on DLR images tended to be lower than on the other images even in the low-frequency domain (Fig. 3b). The NPS was similar at 2.5 mGy (low dose) and 6.2 mGy; it tended to be lowest on DLR images (Fig. 3c).
Figure 6 shows representative phantom images scanned at 2.5 mGy. The image noise is lowest on the DLR image, the texture is preserved, and the object boundary is sharper than on the other images. DISCUSSION For images to be highly diagnostic, the noise level, the noise characteristics, and the spatial resolution must be improved.
Figure 3. Noise power spectrum of images scanned with at different radiation doses. (Color version of figure is available online.)
84
Academic Radiology, Vol 27, No 1, January 2020
DEEP LEARNING RECONSTRUCTION AT CT
Figure 4. Task-based spatial resolution (MTF10%). (Color version of figure is available online.)
Figure 5. Task-based detectability (dʹ). (Color version of figure is available online.)
Figure 6.
Representative images acquired at 2.5 mGy, 40 mAs.
Our comparison of FBP, h-IR, MIBIR-, and DLR showed that DLR featured better noise properties, especially at reduced radiation dose, than the other reconstruction methods. On the other hand, at high radiation exposure, the image quality of MBIR images, including their noise properties and spatial resolution, was superior. The overall image noise was lowest on DLR images acquired at reduced radiation doses (3.7, 2.5, 1.2 mGy). Low-contrast detectability, especially important in abdominal CT study, is particularly affected by the image noise (22). We found that DLR images are less noisy at our phantom study, consequently, DLR may outperform the other reconstruction
methods for abdominal CT studies. As high-dose radiation exposure should be avoided, good noise characteristics at reduced radiation dose are of clinical importance. Like the overall noise, the NPS curves were lowest on DLR images acquired at doses lower than 8.1 mGy. In the low-frequency domain, the NPS curve of MBIR images was relatively high at all radiation doses. When noise reduction methods are applied, reducing low-frequency noise tends to be difficult and the proportion of low-frequency components on NPS curves tends to increase. An increase in the lowfrequency noise components results in coarsening of the image texture (23); the “oil-painting-” or “plastic-like” 85
HIGAKI ET AL
appearance (24) compromises the detection of small lesions. However, as demonstrated by the NPS curve of DLR images, even low-frequency components were effectively suppressed. We think that the NPS curves of DLR images, especially with respect to the low-frequency components, were superior because the teaching images were scanned at relatively high radiation doses and reconstructed with MBIR whose parameters were adjusted to obtain the best image quality. The MTF10% value of FBP images was almost constant at all X-ray tube currents and task contrasts. It means the spatial resolution of FBP images does not depend on the radiation dose or the contrast of the target object (25,26). The MTF10% value of h-IR images, on the other hand, depends on the radiation dose because h-IR applies a noise reduction filter to FBP. If the radiation dose is high, the noise filter does not need to work hard, so the MTF10% of FBP and h-IR will be almost the same. If the radiation dose is low, a strong noise filter will blur the image and decrease the MTF10%. The MTF10% of MBIR reflects both the radiation exposure and the task contrast; it increased as the radiation dose and the task contrast increased. At a task contrast of 50 HU, the MTF10% value of MBIR images was slightly better than of h-IR- and DLR images. When the task contrast was as high as 100 or 200 HU, MTF10% of MBIR was significantly improved and superior to that of the other reconstruction methods because MBIR is a reconstruction method that obtains an accurate reconstructed image in the iterative projection process (24,25). This allows the visualization of fine objects and particularly useful for chest imaging, CT angiography, and the evaluation of implanted stent-grafts (26 30). The assessment of task-based detectability is based on the noise level, noise characteristics, and spatial resolution. Detectability on FBP- and h-IR images was not strongly dependent on the task contrast because their MTF was not related to the task contrast. Detectability on h-IR- was better than on FBP images because the image noise was lower. Detectability on MBIR- and DLR images was task-contrast dependent because their spatial resolution was affected by the task contrast. For scans acquired at high radiation doses, the image noise was lowest on MBIR images, consequently, detectability was highest on MBIR scans. In addition, MBIR yielded high spatial resolution. At medium- and reduced radiation dose settings, DLR provided the highest detectability because the lowfrequency noise component was effectively reduced. The ability to detect objects whose contrast ranges from 50 to 200 HU on medium- or reduced-dose CT studies is useful for soft tissue examinations such as abdominal dynamic CT studies. Our findings are limited because we used only phantoms. To confirm the superior lesion detectability on DLR images, clinical studies that include the analysis of receiver operating characteristic curves are needed. In conclusion, our phantom study showed that DLR improved the image quality of CT scans, especially those performed at medium- or reduced radiation doses. While MBIR exhibited the highest performance when images were acquired at high radiation doses, high-dose settings are generally undesirable. Since DLR can reduce the image noise on 86
Academic Radiology, Vol 27, No 1, January 2020
reduced-dose CT scans, especially with respect to lowfrequency noise components, it may improve the quality of CT images targeted at soft tissues. On the other hand, MBIR yielded high spatial resolution, a property useful for chest imaging, CT angiography, and the evaluation of stents.
REFERENCES 1. Shuman WP, Chan KT, Busey JM, et al. Standard and reduced radiation dose liver CT images: adaptive statistical iterative reconstruction versus model-based iterative reconstruction-comparison of findings and image quality. Radiology 2014; 273(3):793–800. 2. Kuo Y, Lin YY, Lee RC, et al. Comparison of image quality from filtered back projection, statistical iterative reconstruction, and model-based iterative reconstruction algorithms in abdominal computed tomography. Medicine 2016; 95(31):e4456. 3. Patino M, Fuentes JM, Singh S, et al. Iterative reconstruction techniques in abdominopelvic CT: technical concepts and clinical implementation. AJR Am J Roentgenol 2015; 205(1):W19–W31. 4. Jensen K, Martinsen AC, Tingberg A, et al. Comparing five different iterative reconstruction algorithms for computed tomography in an ROC study. Eur Radiol 2014; 24(12):2989–3002. 5. Minamishima K, Sugisawa K, Yamada Y, et al. Quantitative and qualitative evaluation of hybrid iterative reconstruction, with and without noise power spectrum models: A phantom study. J Appl Clin Med Phys 2018; 19(3):318–325. 6. Millon D, Vlassenbroek A, Van Maanen AG, et al. Low contrast detectability and spatial resolution with model-based Iterative reconstructions of MDCT images: a phantom and cadaveric study. Eur Radiol 2017; 27 (3):927–937. 7. Euler A, Stieltjes B, Szucs-Farkas Z, et al. Impact of model-based iterative reconstruction on low-contrast lesion detection and image quality in abdominal CT: a 12-reader-based comparative phantom study with filtered back projection at different tube voltages. Eur Radiol 2017; 27 (12):5252–5259. 8. Nishizawa M, Tanaka H, Watanabe Y, et al. Model-based iterative reconstruction for detection of subtle hypoattenuation in early cerebral infarction: a phantom study. Jpn J Radiol 2015; 33(1):26–32. 9. Tatsugami F, Higaki T, Nakamura Y, et al. Deep learning-based image restoration algorithm for coronary CT angiography. Eur Radiol 2019; 29 (10):5322–5329. doi:10.1007/s00330-019-06183-y. 10. Akagi M, Nakamura Y, Higaki T, et al. Deep learning reconstruction improves image quality of abdominal ultra-high-resolution CT. Eur Radiol 2019. doi:10.1007/s00330-019-06170-3. 11. Chartrand G, Cheng PM, Vorontsov E, et al. Deep learning: a primer for radiologists. Radiographics 2017; 37(7):2113–2131. 12. Kang E, Min J, Ye JC. A deep convolutional neural network using directional wavelets for low-dose X-ray CT reconstruction. Med Phys 2017; 44(10):e360–e375. 13. Wolterink JM, Leiner T, Viergever MA, et al. Generative adversarial networks for noise reduction in low-dose CT. IEEE Trans Med Imaging 2017; 36(12):2536–2545. 14. Wu D, Kim K, El Fakhri G, Li Q. Iterative low-dose CT reconstruction with priors trained by artificial neural network. IEEE Trans Med Imaging 2017; 36(12):2479–2486. 15. Higaki T, Nishimaru E, Nakamura Y, et al. Radiation dose reduction in CT using deep learning based reconstruction (DLR): a phantom study. Electron Present Online Syst Eur Soc Radiol 2018: C–1656. doi:10.1594/ ecr2018/C-1656. 16. Higaki T, Nakamura Y, Tatsugami F, et al. Improvement of image quality at CT and MRI using deep learning. Jpn J Radiol 2019; 37(1): 73–80. 17. Jensen CT, Telesmanich ME, Wagner-Bartak NA, et al. Evaluation of abdominal computed tomography image quality using a new version of vendor-specific model-based iterative reconstruction. J Comput Assist Tomogr 2017; 41(1):67–74. 18. Kijewski MF, Judy PF. The noise power spectrum of CT images. Phys Med Biol 1987; 32(5):565–575. 19. International Commission on Radiation U. Measurements. ICRU Report No. 87: radiation dose and image-quality assessment in computed tomography. J ICRU 2012; 12(1):1–149.
Academic Radiology, Vol 27, No 1, January 2020
20. Richard S, Husarik DB, Yadava G, et al. Towards task-based assessment of CT performance: system and object MTF across different reconstruction algorithms. Med Phys 2012; 39(7):4115–4122. 21. Samei E, Richard S. Assessment of the dose reduction potential of a model-based iterative reconstruction algorithm using a task-based performance metrology. Med Phys 2015; 42(1):314–323. 22. Suess C, Kalender WA, Coman JM. New low-contrast resolution phantoms for computed tomography. Med Phys 1999; 26(2):296–302. 23. Ehman EC, Yu L, Manduca A, et al. Methods for clinical evaluation of noise reduction techniques in abdominopelvic CT. Radiograph 2014; 34 (4):849–862. 24. Geyer LL, Schoepf UJ, Meinel FG, et al. State of the art: iterative CT reconstruction techniques. Radiology 2015; 276(2):339–357. 25. Stiller W. Basics of iterative reconstruction methods in computed tomography: a vendor-independent overview. Eur J Radiol 2018; 109:147–154.
DEEP LEARNING RECONSTRUCTION AT CT
26. Fujita M, Higaki T, Awaya Y, et al. Lung cancer screening with ultra-low dose CT using full iterative reconstruction. Jpn J Radiol 2017; 35(4):179–189. 27. Higaki T, Tatsugami F, Fujioka C, et al. Visualization of simulated small vessels on computed tomography using a model-based iterative reconstruction technique. Data Brief 2017; 13:437–443. 28. Hirata K, Utsunomiya D, Kidoh M, et al. Tradeoff between noise reduction and inartificial visualization in a model-based iterative reconstruction algorithm on coronary computed tomography angiography. Medicine 2018; 97(20):e10810. 29. Tatsugami F, Higaki T, Sakane H, et al. Diagnostic accuracy of in-stent restenosis using model-based iterative reconstruction at coronary CT angiography: initial experience. Br J Radiol 2018; 91(1082):20170598. 30. Yokomachi K, Tatsugami F, Higaki T, et al. Neointimal formation after carotid artery stenting: phantom and clinical evaluation of model-based iterative reconstruction (MBIR). Eur Radiol 2019; 29(1):161–167. doi:10.1007/s00330-018-5598-5.
87