Effects of MRI image normalization techniques in prostate cancer radiomics

Effects of MRI image normalization techniques in prostate cancer radiomics

Physica Medica 71 (2020) 7–13 Contents lists available at ScienceDirect Physica Medica journal homepage: www.elsevier.com/locate/ejmp Original pape...

1MB Sizes 0 Downloads 35 Views

Physica Medica 71 (2020) 7–13

Contents lists available at ScienceDirect

Physica Medica journal homepage: www.elsevier.com/locate/ejmp

Original paper

Effects of MRI image normalization techniques in prostate cancer radiomics a,⁎

b

c

a

T

a

Lars J. Isaksson , Sara Raimondi , Francesca Botta , Matteo Pepa , Simone G. Gugliandolo , Simone P. De Angelisb, Giulia Marvasoa, Giuseppe Petraliad,e, Ottavio De Cobellid,f, Sara Gandinib, Marta Cremonesig, Federica Cattanic, Paul Summerse, Barbara A. Jereczek-Fossaa,d a

Division of Radiotherapy, European Institute of Oncology IRCCS, via Ripamonti 435, Milan, Italy Department of Experimental Oncology, European Institute of Oncology IRCCS, via Ripamonti 435, Milan, Italy c Medical Physics Unit, European Institute of Oncology IRCCS, via Ripamonti 435, Milan, Italy d Department of Oncology and Hemato-oncology, University of Milan, via Ripamonti 435, Milan, Italy e Division of Radiology, European Institute of Oncology IRCCS, via Ripamonti 435, Milan, Italy f Department of Urology, IEO European Institute of Oncology IRCCS, via Ripamonti 435, Milan, Italy g Radiation Research Unit, IEO European Institute of Oncology IRCCS, Via Ripamonti 435, 20141, Milan, Italy b

A R T I C LE I N FO

A B S T R A C T

Keywords: Image normalization Prostate cancer MRI Radiomics

The variance in intensities of MRI scans is a fundamental impediment for quantitative MRI analysis. Intensity values are not only highly dependent on acquisition parameters, but also on the subject and body region being scanned. This warrants the need for image normalization techniques to ensure that intensity values are consistent within tissues across different subjects and visits. Many intensity normalization methods have been developed and proven successful for the analysis of brain pathologies, but evaluation of these methods for images of the prostate region is lagging. In this paper, we compare four different normalization methods on 49 T2-w scans of prostate cancer patients: 1) the well-established histogram normalization, 2) the generalized scale normalization, 3) an extension of generalized scale normalization called generalized ball-scale normalization, and 4) a custom normalization based on healthy prostate tissue intensities. The methods are compared qualitatively and quantitatively in terms of behaviors of intensity distributions as well as impact on radiomic features. Our findings suggest that normalization based on prior knowledge of the healthy prostate tissue intensities may be the most effective way of acquiring the desired properties of normalized images. In addition, the histogram normalization method outperform the generalized scale and generalized ball-scale methods which have proven superior for other body regions.

1. Introduction

[9]. This issue is elevated in the context of MRI images because of the lack of physical interpretability and arbitrary nature of MRI intensity values [10,11]. Indeed, the arbitrary scale of MRI intensities is a well known problem for all quantitative analysis of MRI images, and not only in radiomic studies. An intuitive solution to the problem of intensity standardization might be to adjust the contrast and brightness of the images or simply adjust the intensity range, but in practice this doesn’t actually help to provide tissue-specific association of the intensities [10,12]. To combat this, a method called histogram normalization was developed by Nyu’l and Udupa in 1999 [10] and further refined in [13]. This method is based upon matching quantile landmarks of the image intensity histograms via nonlinear transformation. First, a mean landmark point is

Radiomics is the science of mining useful quantitative data from medical images, typically CT, PET, or MRI, that is not directly apparent to clinical practitioners. The general idea is that important “hidden” information can be retrieved by calculating mathematical quantities, known as radiomic features, from the images. These features can then be used to predict clinical outcomes such as survival rates or response to treatments and can thus be used by clinicians as an important decision making, guidance, or diagnostic tool [1–7]. Radiomic features are, however, sensitive to acquisition/reconstruction parameters and post processing methods (see [8] for a comprehensive review of this topic), and can even vary between consecutive acquisitions of the same patient

Substudy of AIRC IG-13218 phase II trial. ⁎ Corresponding author. E-mail address: [email protected] (L.J. Isaksson). https://doi.org/10.1016/j.ejmp.2020.02.007 Received 16 August 2019; Received in revised form 15 January 2020; Accepted 7 February 2020 1120-1797/ © 2020 Associazione Italiana di Fisica Medica. Published by Elsevier Ltd. All rights reserved.

Physica Medica 71 (2020) 7–13

L.J. Isaksson, et al.

acquired using a 1.5 T scanner (AvantoFit, Siemens Healthineers, Erlangen, Germany) with slice thickness 3 mm, pixel spacing 0.59×0.59 mm, echo time 118 ms, and repetition time 3780 ms. The prostate, urethra, and dominant intraprostatic lesion (DIL) were identified and manually contoured on each image slice on a radiotherapy treatment planning system (Eclipse, Varian Medical Systems, Inc.) by three radiologists with experience in prostate cancer MRI ranging from 4 to 8 years. The patients were participants in a phase II, prospective, single-arm, single-center clinical trial. Patients with low- to intermediate-risk PCa undergo ultra-hypofractionated radiotherapy with simultaneous integrated boost (IMRT-SIB). The treatment schedule is every other day, with 36.25 Gy given in 5 fractions for the whole prostate gland and 37.5 Gy in 5 fractions for the DIL. Details of the study can be found in [23]. The study was approved by the Institutional Ethics Committee and conducted according to the Declaration of Helsinki/Tokyo and Good Clinical Practice guidelines. Each patient gave written informed consent for participation in the original study, for the treatment and use of their anonymous data for research and educational purpose.

learned from the intensity histograms of a training set of images. Secondly, the intensities of the original images are transformed such that the landmark point(s) of all images coincide with each other at the learned mean landmark point. The result is new images in which the variation of intensities of the different tissue types are reduced across patients [14]. Although relatively old, the method is still used today, and the idea of matching quantities of the image histograms forms the basis of more recently developed alternatives. Two such alternative methods are the generalized scale and generalized ball-scale normalizations [15,12]. Both use novel methods of attaining image landmarks but the same nonlinear transformation proposed by Nyu’l & Udupa. Additional methods have been developed exclusively for brain images, such as White-Stripe normalization [16] and RAVEL [11]. While several normalization techniques have proven useful for analysis of brain pathologies, the standardization of image normalization in the context of quantitative prostate cancer analysis remains virtually inexistent (cf. for instance [17–21]). This invalidates direct comparison of different studies since calculated quantities are intensity dependent. Furthermore, certain tools such as some segmentation algorithms rely on assumptions that may be violated by poor or absent normalization procedures [22,16]. The aim of this study was therefore to analyze how different image intensity normalization techniques compare to each other in terms of desired properties of intensity distributions as well as the behavior of the resulting radiomic features in the context of prostate MRI.

2.2. Evaluation criteria 2.2.1. Intensity-based evaluation The four different normalization methods mentioned above were independently applied to the full set of 49 3D images of the patient cohort, creating five different data sets (including raw images). The full set of images was included in the normalization training for all methods. The goals of the normalization procedure are: 1) make intensities have similar distributions for same tissue types within and across patients and 2) make intensities have a common interpretation across locations within the same tissue type. These two crucial goals have been highlighted as part of the seven principles of image normalization (a.k.a SPIN) [16], proposed as an impetus to standardize research in quantitative imaging. The other five principles are secondary to our scenario as they relate to reproducibility (which is outside the scope of this article) or properties which are intrinsically fulfilled by the normalization algorithms at hand (such as preserving the rank of intensities and not allowing any loss of information). In order to evaluate how the distribution of intensities of different tissues behaves, we isolated the intensities of healthy prostate tissue and cancerous tissue. To quantify the similarity within tissue types across patients we used two measures: the standard deviation of the normalized mean intensity (NMI), a quantity which has been used in image normalization comparisons previously [10], and the coefficient of variation (cv) among all healthy and cancerous voxel intensities. The NMI for a patient p and tissue type t (here either healthy or cancerous prostate tissue) is defined on the MRI volume as

2. Materials & methods In this work, we compare and evaluate four different image intensity normalization methods: histogram normalization [10,13] (Method I), generalized scale normalization [15] (g-scale, Method II), generalized ball-scale normalization [15] ( gb -scale, Method III), and a custom normalization method for the prostate (Method IV). The custom prostate normalization method works in much the same way as histogram normalization, but uses reference intensities from within the healthy prostate instead of the whole image when extracting the landmark points. More detailed descriptions of the normalization methods are presented in the supplementary material. 2.1. Data set This study was conducted on axial T2-w pre-radiotherapy MRI scans of 49 patients with organ-confined prostate cancer. A clinical summary of the patient characteristics is presented in Table 1. All images were Table 1 Summary of prostate cancer characteristics for the study cohort. Characteristic PSA T-stage

Number of patients

cT1 cT2

6.47 (3.07)* 7 (14%) 42 (86%)

Gleason Score

3+3 3+4 4+3

26 (53%) 17 (35%) 6 (12%)

ECE score

1 2 3 4

3 (6%) 12 (24%) 15 (31%) 19 (39%)

PIRADS score

2 3 4 5

1 (2%) 4 (8%) 27 (55%) 17 (35%)

Risk class

High Low

8 (16%) 41 (84%)

NMIp, t =

μ p, t s2, p − s1, p

(1)

where μ is the mean intensity, and s1 and s2 are the 2nd and 98th percentile intensity values on the standardized image scale. The coefficient of variation is defined over the collection of all intensity values for a specific tissue type t among all patients by cvt = σt / μt where σ and μ represent the standard deviation and the mean, respectively. We also compared the difference between distributions of healthy and cancerous tissue. It is likely that algorithms for segmenting prostate regions of interests (ROIs) would benefit from a greater difference between intensity distributions of different tissue types. Additionally, it is possible that an increased distribution difference would impact radiomic features in such a way that the predictive power of the radiomic model is increased. This was quantified in three ways. Firstly by measuring the expected

*Mean (standard deviation)

8

Physica Medica 71 (2020) 7–13

L.J. Isaksson, et al.

3. Results

value of the normalized difference of means between healthy and cancerous intensities (a quantity we will call NDM). In other words, if I hi is the multiset of all healthy intensity values and I ci is the multiset of all cancerous intensity values for a patient i, then

〈NDM〉 =

1 N

N

∑ i

|〈I hi 〉 − 〈I ci 〉| 〈I hi ⊎ I ci 〉

3.1. Qualitative assessment Image intensity histograms of the different normalization methods are shown in Fig. 1. Each row represent a different normalization technique, with un-normalized images at the top, and columns corresponding to histograms for the whole image, healthy prostate tissue, and cancer tissue, respectively. Normalization methods I and III are almost identical due to very similar landmark values. Overall, normalization methods I and III appear to co-align histogram peaks very well, whereas normalization methods II and IV instead tend to have a larger variability at higher intensity values (as seen in the histograms of the full images). Normalization method II seems to worsen the alignment all together. We note that the abnormal looking histogram peaks right after the landmark point in normalization method IV comes from highly inhomogeneous images. Apart from normalization method II, the healthy tissue looks very similarly distributed within the methods, with slightly more co-aligned peaks after normalization. Cancerous tissue intensity distributions are also very similar across different methods (excluding normalization method II), but with much higher variation within methods as compared to healthy tissue. Apart from normalization method II, it is hard to notice any difference at all for cancerous tissue across methods. In order to more readily visualize the difference in intensity distributions between healthy and cancerous prostate tissue, Fig. 2 shows the mean histograms (with confidence intervals) of the collections of histograms in the second and third columns of Fig. 1. In this ensemble view there is only a minor skew between the healthy and cancerous intensities, and the difference is somewhat larger prior to normalization.

(2)

is the average NDM for N patients. This quantity may be interpreted as a distance metric between centres of the distributions. Secondly, we tested whether the difference between NDM prior to and after normalization is statistically significant with the Wilcoxon signed rank test. Lastly, the Jeffreys divergence (or simply J-divergence) between the healthy tissue and cancerous tissue distributions were computed over all MRI volumes taken together in a fashion similar to [22]. The Jdivergence is defined for two probability distributions p and q by the sum of their relative entropy (also termed Kullback-Lieber divergence) to each other:

J(p , q) =

∫−∞ p (x )log qp ((xx)) dx + ∫−∞ q (x )log qp ((xx)) dx. ∞



(3)

In contrast to the relative entropy, this measure is symmetric, which makes it suitable as a distance metric. 2.2.2. Radiomic-Based Evaluation The extraction of radiomic features was performed on the whole prostate region (excluding the urethra) using the IBEX [24] software with parameters optimized for the data set1. All available features were calculated for every image volume, resulting in 1702 features for each patient. Any redundant features, that is features which had identical values for all patients or features that were duplicates, were removed from this set and 1058 features were left for analysis. To investigate how feature behaviour depends on normalization, the concordance correlation coefficient (CCC) was analyzed for each feature before and after normalization. The CCC between two sets x and y is roughly a measure of reproducibility and is defined as

CCC =

3.2. Intensity-based evaluation The results for both the standard deviation of the NMI and the coeffcient of variation from inter-tissue analysis (Table 2) follow the same pattern; prostate-specific normalization (method IV) has the most consistent (the least varying) intensities for both healthy and cancerous tissue, followed by normalization methods I and III which gave identical results. Method II performed worse than no normalization at all. In the comparison of distance between healthy and cancerous tissue intensity distributions (Table 3), prostate-specific normalization (method IV) produced the largest separation between healthy and cancerous tissue intensity populations, both in terms of mean NDM and J-divergence. Normalization methods I and III, on the other hand, more consistently produced an increase in difference post normalization, as evident by the lower p-value of their mean NDMs (this stems from the fact that 36 out of the 49 normalized images had an increased difference after normalization in method I and III, as opposed to 31 images in method IV). Generalized scale normalization (method II) actually increased the similarity of tissues in terms of mean NDM, which in turn means that its low p-value is largely irrelevant since the study focused on the decrease in similarity. After normalization method IV, method II produced the greatest J-divergence, followed by normalization method I and III.

2sxy σx2 + σy2 + (μx − μ y )2

(4)

for standard deviations σ , means μ and covariance sxy . The features were assumed to be unchanged if their CCC was above an absolute threshold of 0.8, indicating a very strong correlation as dictated e.g. in [25]. The normalization methods were then evaluated in terms of the number of robust features. Feature group membership was also taken into consideration in the radiomic analysis. The IBEX radiomic features are inherently separated into ten different groups based on their definitions: gradient orient histogram (GOH), gray level cooccurence matrix (GLCM, in 2.5 and 3 dimensions), grey level run length matrix (GLRLM, in 2.5 dimensions), intensity direct (ID), intensity histogram (IH), intensity histogram Gauss fit (IHGF), neighbour intensity difference (NID, in 2.5 and 3 dimensions), and Shape features. A dimension of 2.5 means that the matrix (e.g. the run length matrix) is calculated by summing up the matrices calculated separately on each two-dimensional slice. Features within a single group generally share similar interpretations with each other, which can be an important aspect when conducting radiomic analysis from a clinical point of view.

3.3. Radiomic-based evaluation Fig. 3 illustrates the change in feature values caused by the different normalization methods. In general, features are less impacted by the histogram and gb -scale normalizations. The numbers of unchanged features in the different methods are presented in Table 4. The full list of all the unchanged features are presented in the supplementary material. Eighty-nine (89) features had a CCC very close to zero, which occurs if either variable has very small variance or if the correlation between them is close to zero. These features were primarily in

1

The following parameters were modified: Neighbour Intensity Difference 2.5: RangeMax = 8000, Nbins = 100. Intensity Histogram Gauss Fit: RangeMin = 1, RangeMax = 60000, RangeFix = 1. Intensity Histogram: RangeFix = 0, Nbins = 100. Intensity Direct: ThresholdHigh = 8000. Gray Level Run Length Matrix 2.5: GrayLimits=[], NumLevels = 100. Gray Level Cooccurence Matrix 2.5: GrayLimits=[], NumLevels = 100. Gray Level Cooccurrence Matrix 3: GrayLimits=[], NumLevels = 100. Empty brackets (“[]”) indicate the input to be left empty. 9

Physica Medica 71 (2020) 7–13

L.J. Isaksson, et al.

Fig. 1. Intensity histograms of the full images, healthy prostate tissue, and cancerous tissue from different normalization methods. The top row shows unnormalized histograms, and the following rows show normalization methods I to IV, respectively. Vertical blue lines in the left column indicate the histogram landmark point ( μs ) on the standard scale. Intensities have been grouped into 32 bins in order to smoothen the histograms. The vertical axes in the last two columns have been scaled to favor visual comparison between columns.

was followed by a tie between histogram normalization (method I) and gb -scale normalization (method III). While method IV also provided the greatest overall separation between distributions of healthy and unhealthy prostate tissues, methods I and III resulted in more individual subjects showing a statistically significant increase in the difference between healthy and unhealthy prostate tissue intensity distributions. A significant reduction in the separation of the distributions was found for method II, which would make the distinction between healthy and cancerous prostate tissue harder. Methods I and III had arguably the “best” performance in terms of number of features not impacted by normalization. It should be noted, however, that this method of comparing radiomic performance is not intrinsically decisive; we merely based this assumption on the fact that a purely random image transformation would have very few features that are concordant with the unnormalized features. In an ideal scenario, the optimal choice of normalization method would be the one which produces the best results in whichever radiomic model is being used.

categories IntensityDirect (44 out of 49) and IntensityHistogram (17 out of 17). 4. Discussion Our results support the hypothesis that image normalization, wellestablished in brain MRI, is also valid for MRI of the prostate, as demonstrated by the significant changes in intensity distributions as well as the vast majority of radiomic features. In contrast to most other normalization-oriented studies, we have focused on a single homogeneous data set acquired within a single institution. This is akin to the working situation in clinical practice. Our results demonstrate that image normalization is no less important in these cases. The method that most often resulted in smaller coefficients of variation and standard deviation of values within tissues (taken to be desirable in terms of indicating higher intra-tissue similarity) was that of prostate normalization (method IV), which normalizes images according to pixel intensities within healthy prostate tissue. This method 10

Physica Medica 71 (2020) 7–13

L.J. Isaksson, et al.

Fig. 2. Mean histograms of patients’ healthy and cancerous prostate tissue constructed from the collection of histograms shown in the second and third column of Fig. 1. Transparent bands indicate the 95% confidence interval of the mean. (a) Unnormalized images (b) method I: Histogram normalized images (c) method II: gscale normalized images (d) method III: gb -scale normalization (e) method IV: prostate normalization. Table 2 Intensity variation within healthy and cancerous tissue, measured by coefficient of variation (c v ) and standard deviation of the normalized mean intensity (NMI). A low variation indicates good agreement of intensity values for the specific tissue. Parentheses represent rank (lower is better).

Table 3 Metrics of separation between healthy and cancerous tissue intensity distributions, measured by the mean normalized difference of means (NDM), and the Jeffrey’s divergence. Large values indicate good separation of cancerous and healthy tissue intensities. The p-value of the Wilcoxon rank-sum test indicates the significance of the NDM change.

SD(NMI)

cv

Mean NDM

Unnormalized Method I Method II Method III Method IV

Healthy

Cancer

Healthy

0.528 0.518 0.557 0.518 0.510

0.522 0.489 0.493 0.489 0.464

0.068 0.057 0.094 0.057 0.032

(4) (2) (5) (2) (1)

(4) (2) (5) (2) (1)

(4) (2) (5) (2) (1)

Cancer 0.092 0.086 0.101 0.086 0.078

(4) (2) (5) (2) (1)

Unnormalized Method I Method II Method III Method IV

The large changes of the radiomic feature values due to normalization make it hard to readily compare or reproduce radiomic research. For instance, in order to make comparisons independent of normalization, one would have to normalize on the same data set with

0.1998 0.2004 0.1835 0.2004 0.2061

Wilcoxon rank-sum test (p-value)

0.0027 0.0001 0.0027 0.0518

J-Divergence (× 1010 ) 705 1125 1165 1125 2206

the exact same parameters (or explicitly use the landmark points and parameters). The problem of comparison and reproducibility is already well known in MRI-related research due to the inherent variability of

11

Physica Medica 71 (2020) 7–13

L.J. Isaksson, et al.

Fig. 3. Concordance correlation coefficients between radiomic features from unnormalized images and images normalized by methods I, II, III, and IV. A single marker represents the concordance correlation of a single feature (each feature has 49 different values – one for each patient). Marker symbol and color represents different groups of features according to the legend, with the following abbreviations (in order): gradient orient histogram, gray level cooccurence matrix (2.5 and 3 dimensions), grey level run length matrix (2.5 dimensions), intensity direct, intensity histogram, intensity histogram Gauss fit, neighbour intensity difference (2.5 and 3 dimensions). This figure visualizes the huge changes in feature values caused by normalization as well as the similarity between the methods I and III.

have not included this approach in this paper. Another issue with all histogram-based methods that was not thoroughly discussed in the original publications is the foreground identification procedure. Identification of the foreground was originally a means of finding the intensity of the first non-background mode of the histogram, which by assumption correspond to pixels of interest. The procedure was subsequently kept in the updated methods that used the median/quartile/ decile quantiles as landmark points instead, and was thus also used in this study. In practice, foreground identification has the effect of increasing the intensity values of the landmark points because it effectively removes low intensity background pixels. This procedure performs well on brain MRIs where relevant tissues are dominated by hyperintense structures, but since prostate tissue is dominated by intensity values fairly close to the mean value of the full image (see last row of Fig. 1), relevant intensities may be excluded when extracting the foreground. Therefore, we suggest not using foreground identification at all when normalizing small field of view pelvis MRI images and other images where tissues of interest are in the low intensity region. One issue that was not evaluated in this study is the effect of image noise. This was addressed in the original publication of the g-scale and gb -scale algorithms [15], but not for histogram normalization. Spatially uniform Gaussian white noise is, however, not likely to significantly impact the value of the quantiles in the histogram normalization algorithm except at very high noise levels. We believe that the images analysed here are representative for typical noise levels that would be encountered in clinical settings. A parallel argument can be made for different acquisition parameters. As for the effect of noise on the radiomic features themselves, we refer to studies adressing this issue separately, such as [26] or [27]. While the size of the patient cohort in this study is not exhaustive, we believe that it is adequate to provide a good understanding of how the normalization methods behave for typical pelvis region clinical images. Indeed, cohorts of other prostate cancer radiomic studies typically include similar numbers (33, 54, 64 patients in [28,19,29] respectively), and have even gone as low as 23 [30]. On the other hand, increasing the normative cohort size of prostate cancer radiomic studies should be a legitimate concern for future research in the field.

Table 4 Numbers of unchanged features (CCC⩾0.8) for the different normalization methods. Left out categories (GLRLM25, ID, IH, IGGF, NID25, and NID3) had no unchanged features. GOH Method Method Method Method

I II III IV

39 23 39 38

(89%) (52%) (89%) (86%)

17 17 17 17

Shape

GLCM25

(94%) (94%) (94%) (94%)

24 21 24 21

(8%) (7%) (8%) (7%)

GLCM3 57 (10%) 44 (8%) 56 (10%) 53 (9%)

Total 137 105 136 129

(13%) (10%) (13%) (12%)

MRI acquisition – a problem which the normalization itself attempts to counteract. The results presented here suggest that just applying normalization is not enough to alleviate this issue in radiomic-related research. A couple of issues can be pointed out regarding the nature of prostate cancer. Firstly, prostate cancer is often multifocal, meaning that there can be cancerous tissue even outside of the DIL. Moreover, there is often non cancerous desease or inflammation which deviate from healthy tissue. Therefore, what we consider healthy prostate tissue might also include some abnormalities. As a consequence, the similarity seen here between healthy and cancerous tissue could be deceptively small. However, there is no solution to this issue in the current clinical workflow, as it would require comparisons of MRIs and prostatectomy specimens, which is not typically performed. Secondly, at least one patient in our data set exhibited large benign lesions in the prostate. This undoubtedly impacts both the histogram of the MRI volume as well as the radiomic features of the patient, since benign lesions appear in T2-w MRIs as hyperintense areas. Normalization method II (g-scale normalization) had the worst performance of the evaluated methods, and was even worse than raw unnormalized images both in terms of tissue consistency and tissue difference. We attribute this to the low intensity of the landmark point within the largest g-scale region (in general consisting of locally homogeneous regions of dark muscle-mass). It should be noted that this method is highly parameter-dependent and that other parameters may be able to produce results in line with the other methods. Our results for this method make it clear that caution should be exercised when performing normalization, and that blindly applying normalization algorithms without concern for the specific application may severely alter, and even invalidate one’s results or conclusions. A possible improvement to using the median landmark in the histogram normalization is using the decile or quartile histogram values instead, as in [13]. However, the authors concluded that the performance is not significantly improved with these landmarks, and thus we

5. Conclusion This study has compared four different normalization methods (histogram normalization, generalized scale normalization, generalized ball-scale normalization, and a prostate specific normalization based on healthy prostate tissue intensities) on pelvis-region T2-w MRI images of prostate cancer patients. We looked at how the different methods 12

Physica Medica 71 (2020) 7–13

L.J. Isaksson, et al.

performed in terms of desired behaviors of intensity values. Additionally, we looked at how radiomic features are impacted by different normalization methods, which to the best of our knowledge has not yet been discussed in the literature. Our results suggest that normalizing with the average of healthy prostate tissue intensities as a landmark has the most desired behavior. If information about healthy tissue is not available we suggest using one of the quantile (median/quartile/decile) histogram normalization methods without foreground identification. Furthermore, we have demonstrated that image normalization is important even for internally homogeneous data sets, which many other normalization studies have disregarded. Our radiomic evaluation revealed that normalization has a large impact on the vast majority of radiomic features, which could have tremendous impact on the results of radiomic machine learning models. In addition, this behavior makes it particularly hard to readily compare radiomic research unless normalization was done with the same methods and parameters. For a more conclusive result in terms of radiomics, it would be useful to investigate how the radiomic features relate to particular clinical endpoints of interest.

2017;7(1):10117. [5] Aerts HJ, Velazquez ER, Leijenaar RT, Parmar C, Grossmann P, Carvalho S, et al. Decoding tumour phenotype by noninvasive imaging using a quantitative radiomics approach. Nature Commun 2014;5:4006. [6] Grossmann P, Stringfield O, El-Hachem N, Bui MM, Velazquez ER, Parmar C, et al. Defining the biological basis of radiomic phenotypes in lung cancer. Elife 2017;6:e23421. [7] Panth KM, Leijenaar RT, Carvalho S, Lieuwes NG, Yaromina A, Dubois L, et al. Is there a causal relationship between genetic changes and radiomics-based image features? an in vivo preclinical experiment with doxycycline inducible GADD34 tumor cells. Radiotherapy Oncol 2015;116(3):462–6. [8] Traverso A, Wee L, Dekker A, Gillies R. Repeatability and reproducibility of radiomic features: a systematic review. Int J Rad Oncol Biol Phys 2018;102(4):1143–58. [9] Shiri I, Abdollahi H, Shaysteh S, Mahdavi SR. Test-retest reproducibility and robustness analysis of recurrent glioblastoma MRI radiomics texture features. Iranian J Radiol 2017; Special issue(5):e48035. [10] Nyúl LG, Udupa JK. On standardizing the MR image intensity scale. Magn Resonance Med 1999;42(6):1072–81. [11] Fortin JP, Sweeney EM, Muschelli J, Crainiceanu CM, Shinohara RT, Initiative ADN, et al. Removing inter-subject technical variability in magnetic resonance imaging studies. Neuroimage 2016;132:198–212. [12] Madabhushi A, Udupa JK. New methods of MR image intensity standardization via generalized scale. Med Phys 2006;33(9):3426–34. [13] Nyúl LG, Udupa JK, Zhang X. New variants of a method of MRI scale standardization. IEEE Trans Med Imaging 2000;19(2):143–50. [14] Ge Y, Udupa JK, Nyul LG, Wei L, Grossman RI. Numerical tissue characterization in MS via standardization of the MR image intensity scale. J Magn Resonance Imaging 2000;12(5):715–21. [15] Madabhushi A, Udupa JK, Souza A. Generalized scale: theory, algorithms, and application to image inhomogeneity correction. Computer Vision Image Understanding 2006;101(2):100–21. [16] Shinohara RT, Sweeney EM, Goldsmith J, Shiee N, Mateen FJ, Calabresi PA, et al. Statistical normalization techniques for magnetic resonance imaging. NeuroImage: Clinical 2014;6:9–19. [17] Chaddad A, Kucharczyk M, Niazi T. Multimodal radiomic features for the predicting gleason score of prostate cancer. Cancers 2018;10(8):249. [18] Khalvati F, Zhang J, Chung AG, Shafiee MJ, Wong A, Haider MA. MPCaD: a multiscale radiomics-driven framework for automated prostate cancer localization and detection. BMC Med Imaging 2018;18(1):16. [19] Wang J, Wu CJ, Bao ML, Zhang J, Wang XN, Zhang YD. Machine learning-based analysis of MR radiomics can help to improve the diagnostic performance of PIRADS v2 in clinically relevant prostate cancer. Eur Radiol 2017;27(10):4082–90. [20] Ginsburg SB, Algohary A, Pahwa S, Gulani V, Ponsky L, Aronen HJ, et al. Radiomic features for prostate cancer detection on MRI differ between the transition and peripheral zones: preliminary findings from a multi-institutional study. J Magn Resonance Imaging 2017;46(1):184–93. [21] Shiradkar R, Podder TK, Algohary A, Viswanath S, Ellis RJ, Madabhushi A. Radiomics based targeted radiotherapy planning (Rad-TRaP): a computational framework for prostate cancer treatment planning with MRI. Radiation Oncol 2016;11(1):148. [22] Shah M, Xiao Y, Subbanna N, Francis S, Arnold DL, Collins DL, et al. Evaluating intensity normalization on MRIs of human brain with multiple sclerosis. Med Image Anal 2011;15(2):267–82. [23] Timon G, Ciardo D, Bazani A, Garioni M, Maestri D, De Lorenzo D, et al. Rationale and protocol of AIRC IG-13218, short-term radiotherapy for early prostate cancer with concomitant boost to the dominant lesion. Tumori J 2016;102(5):536–40. [24] Zhang L, Fried DV, Fave XJ, Hunter LA, Yang J, Court LE. IBEX: an open infrastructure software platform to facilitate collaborative work in radiomics. Med Phys 2015;42(3):1341–53. [25] Akoglu H. User’s guide to correlation coefficients. Turkish J Emergency Med 2018;18(3):91–3. [26] Lafata K, Cai J, Wang C, Hong J, Kelsey C, Yin F. Sensitivity of radiomic features to image noise and respiratory motion: SU-F-605-08. Med Phys 2017;44(6). [27] Pfaehler E, Beukinga RJ, de Jong JR, Slart RH, Slump CH, Dierckx RA, et al. Repeatability of 18F-FDG PET radiomic features: a phantom study to explore sensitivity to image reconstruction settings, noise, and delineation method. Med Phys 2019;46(2):665–78. [28] Abdollahi H, Mofid B, Shiri I, Razzaghdoust A, Saadipoor A, Mahdavi A, et al. Machine learning-based radiomic models to predict intensity-modulated radiation therapy response, Gleason score and stage in prostate cancer. La Radiologia Medica 2019;124(6):555–67. [29] Hectors SJ, Cherny M, Yadav KK, Beksaç AT, Thulasidass H, Lewis S, et al. Radiomics features measured with multiparametric magnetic resonance imaging predict prostate cancer aggressiveness. J Urol 2019:10–1097. [30] Nketiah G, Elschot M, Kim E, Teruel JR, Scheenen TW, Bathen TF, et al. T2weighted MRI-derived textural features reflect prostate cancer aggressiveness: preliminary results. Eur Radiol 2017;27(7):3050–9.

Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Acknowledgements This work was partially funded by the research grants from the Associazione Italiana per la Ricerca sul Cancro (AIRC): IG-13218 “Short-term high precision RT for early prostate cancer with concomitant boost to the dominant lesion”, registered at ClinicalTrials.gov NCT01913717, approved by IEO S768/113 and IG-14300 “Carbon ions boost followed by pelvic photon radiotherapy for high risk prostate cancer” and by a research grant from Accuray Inc. entitled “Data collection and analysis of Tomotherapy and CyberKnife breast clinical studies, breast physics studies and prostate study”. The Sponsors did not play any role in the study design, collection, analysis and interpretation of data, nor in the writing of the manuscript, nor in the decision to submit the manuscript for publication. L.J. Isaksson is a PhD student at the European School of Molecular Medicine (SEMM), Milan, Italy. Appendix A. Supplementary data Supplementary data associated with this article can be found, in the online version, athttps://doi.org/10.1016/j.ejmp.2020.02.007. References [1] Chong Y, Kim JH, Lee HY, Ahn YC, Lee KS, Ahn MJ, et al. Quantitative CT variables enabling response prediction in neoadjuvant therapy with EGFR-TKIs: are they different from those in neoadjuvant concurrent chemoradiotherapy? PloS one 2014;9(2):e88598. [2] Fried DV, Tucker SL, Zhou S, Liao Z, Mawlawi O, Ibbott G, et al. Prognostic value and reproducibility of pretreatment CT texture features in stage III non-small cell lung cancer. Int J Rad Oncol Biol Phys 2014;90(4):834–42. [3] Coroller TP, Agrawal V, Narayan V, Hou Y, Grossmann P, Lee SW, et al. Radiomic phenotype features predict pathological response in non-small cell lung cancer. Radiotherapy Oncol 2016;119(3):480–6. [4] Vallières M, Kay-Rivest E, Perrin LJ, Liem X, Furstoss C, Aerts HJ, et al. Radiomics strategies for risk assessment of tumour failure in head-and-neck cancer. Sci Rep

13