Medical Image Analysis 16 (2012) 966–975
Contents lists available at SciVerse ScienceDirect
Medical Image Analysis journal homepage: www.elsevier.com/locate/media
MRI to X-ray mammography registration using a volume-preserving affine transformation Thomy Mertzanidou a,⇑, John Hipwell a, M. Jorge Cardoso a, Xiying Zhang a, Christine Tanner b, Sebastien Ourselin a, Ulrich Bick d, Henkjan Huisman c, Nico Karssemeijer c, David Hawkes a a
Centre for Medical Image Computing, University College London, Gower Street, London WC1E 6BT, UK Computer Vision Laboratory, ETH Zurich, Sternwartstrasse 7, 8092 Zurich, Switzerland Diagnostic Image Analysis Group, Radboud University Nijmegen Medical Centre, P.O. Box 9102, 6500 HC Nijmegen, The Netherlands d Department of Radiology, Charite Universitätsmedizin Berlin, 10117 Berlin, Germany b c
a r t i c l e
i n f o
Article history: Received 2 August 2011 Received in revised form 6 March 2012 Accepted 15 March 2012 Available online 28 March 2012 Keywords: Multimodal registration 2D/3D registration Breast tissue classification Mammography
a b s t r a c t X-ray mammography is routinely used in national screening programmes and as a clinical diagnostic tool. Magnetic Resonance Imaging (MRI) is commonly used as a complementary modality, providing functional information about the breast and a 3D image that can overcome ambiguities caused by the superimposition of fibro-glandular structures associated with X-ray imaging. Relating findings between these modalities is a challenging task however, due to the different imaging processes involved and the large deformation that the breast undergoes. In this work we present a registration method to determine spatial correspondence between pairs of MR and X-ray images of the breast, that is targeted for clinical use. We propose a generic registration framework which incorporates a volume-preserving affine transformation model and validate its performance using routinely acquired clinical data. Experiments on simulated mammograms from 8 volunteers produced a mean registration error of 3.8 ± 1.6 mm for a mean of 12 manually identified landmarks per volume. When validated using 57 lesions identified on routine clinical CC and MLO mammograms (n = 113 registration tasks) from 49 subjects the median registration error was 13.1 mm. When applied to the registration of an MR image to CC and MLO mammograms of a patient with a localisation clip, the mean error was 8.9 mm. The results indicate that an intensity based registration algorithm, using a relatively simple transformation model, can provide radiologists with a clinically useful tool for breast cancer diagnosis. Ó 2012 Elsevier B.V. All rights reserved.
1. Introduction X-ray mammography is the breast imaging modality that is most commonly used for screening, diagnosis and biopsy guidance. During mammography, the breast is extended, compressed and immobilised between two plates, to increase contrast between different tissue types and reduce motion and scatter artifacts. Two views are routinely acquired: Cranio-Caudal (CC) and Medio-Lateral Oblique (MLO). Although the resolution of a typical mammogram is fine [(0.05–0.1) (0.05–0.1)] mm2, superimposition of breast structures and poor contrast between healthy fibro-glandular and tumorous tissues make interpretation difficult. Dynamic Contrast Enhanced MRI is generally indicated for preoperative staging, assessment of chemotherapy and for diagnosis when conventional imaging modalities, i.e. X-ray and ultrasound, are inadequate. In a recent study Benndorf et al. (2010) concluded ⇑ Corresponding author. Address: University College London, Centre for Medical Image Computing, Engineering Front Building Malet Place, London WC1E 6BT, United Kingdom. Tel.: +44 2076790177. E-mail address:
[email protected] (T. Mertzanidou). 1361-8415/$ - see front matter Ó 2012 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.media.2012.03.001
that MRI is ‘‘a worthwhile adjunct to mammography in a non-screening setting’’. In young women with dense breasts and a family history of breast cancer, MRI has been shown to increase sensitivity (Leach et al., 2005) and is currently recommended for annual screening of high risk women, aged between 30 and 49, by the National Institute for Health and Clinical Excellence (NICE October 2006). During an MR acquisition, women lie prone with their breasts pendulous. MRI has the advantage of providing functional and threedimensional information of the breast, producing images with a typical resolution of around [1 1 2] mm3. To fully exploit the complementary information provided by MRI and X-ray, radiologists must be able to localise corresponding findings in each of the images. This is a challenging task for several reasons. Firstly, the two modalities measure different physical properties of the breast and hence the contrast between the different tissue types is disparate. Secondly, as described above, the deformation of the breast is large and complex between MR and X-ray, and finally the images have different dimensionality and resolution. Therefore, a tool that would automatically provide correspondence between the two modalities could aid radiologists in breast cancer diagnosis and management, and provide the
T. Mertzanidou et al. / Medical Image Analysis 16 (2012) 966–975
enabling technology for routine multimodal Computer-Aided Diagnosis (Yuan et al., 2010) in a clinical context. For these applications an accuracy of 10–15 mm can be valuable to clinicians. An additional potential application is X-ray guided biopsy for lesions that are not visible in ultrasound, although this would require higher accuracy. As findings are poorly visible in X-ray mammograms of women with dense breasts, a registration method, with sufficient accuracy, could provide an indication of the biopsy location for a lesion that is visible in MRI. Development of 2D/3D registration algorithms specifically for breast imaging is limited and has largely focused on feature-based approaches. Marti et al. (2004) and Behrenbruch et al. (2003) matched regions of interest on each mammographic view to the 3D MR volume. Their method generated simulated mammograms, via parallel projections of the MRI, to which the CC and MLO mammograms were warped, by extracting distinctive features in both images. Marti et al. (2004) calculated the corresponding location in the MRI by directly comparing the features extracted from mammograms, with those extracted from MR slices (as opposed to X-rays simulated from the MR volume). Behrenbruch et al. (2003) used a reconstruction technique that takes into consideration the mammographic compression, in order to reconstruct the MR volume using the CC and MLO views and find the corresponding 3D location of a region of interest in the two views. Feature-based approaches are in general computationally efficient, but feature selection from mammograms and calculation of their correspondence are non-trivial tasks and can limit robustness or necessitate manual interaction. Moreover, the selected features are often associated with areas around lesions that are identifiable both in the MRI and on the X-ray mammogram. This renders these methods less suitable for clinical use, as accurate matching is more valuable for cases where lesions are not obvious in the X-ray mammogram (for example dense breasts). Ruiter et al. (2006) followed a patient-specific biomechanical modelling approach to simulate mammographic compression. A Finite Element (FE) model of the patient’s breast was generated from MRI. This was subsequently deformed by applying displacements on the surface nodes, both in the direction of the projection, to account for the compression, and also in the perpendicular direction, to compensate for the anisotropic behaviour of the breast tissue under compression. The magnitude of displacements applied on the breast model nodes was defined by the matching of the contours between the real mammogram and the parallel projection of the deformed MRI. This approach has the advantage of using a physically realistic 3D transformation model to account for the breast deformation between MRI and mammogram. However the manual steps involved in creating a new FE model, on a patient-by-patient basis, make this approach challenging to apply clinically. This includes the problem of selecting appropriate material parameters and boundary conditions to model the distribution of breasts in the general population. To overcome this problem Tanner et al. (2009) proposed a population-based Statistical Deformation Model for multimodal registration using an intensity-based algorithm. Regarding validation, Marti et al. (2004) do not report quantitative results on lesion localisation and Tanner et al. (2009) did not test their method using real data. Behrenbruch et al. (2003) tested their algorithm on 14 cases and obtained errors in the range 4– 10 mm, depending on the MRI voxel resolution. Ruiter et al. (2006) achieved a mean error of 4.3 mm on 6 cases, and in a more recent semi-automated implementation of their approach, reported values of 11.8 ± 6.5 mm and a mean overlap of 69 ± 76% for 11 subjects (CC view only) (Hopp et al., 2011). The registration accuracy cannot be directly compared between the two methods, as the datasets used for validation were different and the variability of breasts and pathologies between patients is great. A common component of these studies is the small number of data sets used.
967
Previously we presented an intensity-based registration framework, that uses a volume-preserving affine transformation model to approximate the breast deformation between the two modalities (Mertzanidou et al., 2010a). As 3D/2D registration is a poorly constrained, ill-posed problem, the optimisation process is prone to terminate in local, rather than global minima. Using an affine transformation model has the advantage of incorporating a low number of degrees of freedom and therefore should provide robust and reproducible results. Moreover an additional benefit of our technique is the potential to provide a computationally tractable and generic solution that can be easily integrated into clinical use and will not depend on human interaction. In our current work we evaluate further our framework on clinical data and show that the achieved accuracy is comparable to patient-specific FE models (Hopp et al., 2011) and could be useful in clinical practice. Our intensity-based registration is driven by the similarity of the real mammogram to the simulated, Digitally Reconstructed Radiograph (DRR) created from the MRI. It is essential, therefore, to have an accurate and automated method to simulate X-ray mammograms via segmentation of the MRI into X-ray attenuation classes (fat and non-fat tissue). For this task we propose the use of an Expectation–Maximisation (EM) method with a Markov Random Field (MRF) regularisation that gives improved segmentation in comparison to pure intensity-based techniques used previously; i.e. manual thresholding (Wei et al., 2004; Hipwell et al., 2007) and fuzzy c-means (Nie et al., 2008). Our method is based on the brain tissue classification method proposed by van Leemput et al. (1999). In the following sections we describe the breast tissue classification method (Section 2) and the registration framework (Section 3). Experiments on simulated and real data are described in Sections 4 and 5 respectively. Finally, Sections 6 and 7 contain the discussion, conclusions and future work. To our knowledge, this is the first MRI to X-ray mammography registration method that is based on the alignment of structure visible in both images and is validated using a large number of clinical cases (113 registrations from 49 patients).
2. Breast tissue classification of MRI and mapping to X-ray attenuation For the breast tissue classification we are using the pre-contrast T1-weighted MR images, with the goal of assigning each voxel to one of the two tissue types: fibro-glandular and fat. The enhanced tumour could be included in the model using a third tissue class (substituting as input, an appropriate volume from the MR contrast sequence). Nevertheless, as the number of classes needs to be pre-defined and the framework should be able to cope equally well with both healthy and malignant cases, we do not use a third tissue type in our model. Therefore, tumour and fibro-glandular structures are assigned the same X-ray attenuation value. According to the experimental work of Johns and Yaffe (1987) the linear attenuation coefficients of tumorous and fibro-glandular tissue are very similar. As in van Leemput et al. (1999), our method integrates an intensity model, a spatial regularisation scheme and bias field inhomogeneity correction in the same framework. The incorporation of spatial information has been shown to improve classification results in the past as it provides robustness to noise and allows anatomical information to be included (Cardoso et al., 2011). Specifically for the breast tissue classification, the MRF regularisation is considered an appropriate choice due to the anatomy of the fibro-glandular tissue. Since breast ducts are connected in a tree-like structure inside the breast, our hypothesis is that the voxels containing glandular tissue are more likely to appear connected to other glandular voxels, rather than isolated inside the fat (and similarly for fat voxels). The intensity model assumes three classes (for glandular, fat tissue and background) and the bias field is modelled using a third
968
T. Mertzanidou et al. / Medical Image Analysis 16 (2012) 966–975
order polynomial basis function. Instead of considering Gaussian distributed intensities corrupted by a multiplicative bias field, log-transformed intensities are used to make the bias field additive.The model parameters are optimised using an EM algorithm under a Maximum Likelihood formulation. Due to the large variation of glandular structures in the breast across the population, there are no anatomical spatial priors available. Initially, therefore, before classification, any voxel in the breast has equal probability of being glandular or fatty tissue. The intensity model alone can only give accurate results when the different tissue class distributions are well separated. This is not the case for the glandular and fat tissue due to many voxels containing both tissue types (the partial volume effect). The use of an MRF regularisation scheme improves the overall robustness of the model parameter estimation and provides spatial consistency. Voxels are thus classified based also on the current classification of the neighbouring voxels. Further detailed explanations and the model equations can be found in (van Leemput et al., 1999). The MRF regularisation makes the classification more robust to noise and to isolated misclassified voxels (e.g. isolated voxels classified as fat and surrounded by glandular tissue). Instead of estimating the MRF parameters from the image as in (van Leemput et al., 1999), however, we use a two-level MRF with its parameters derived from the anatomical properties of the breast. In the first level, the interclass MRF energy is the same for all classes, thus the MRF only adds global spatial consistency and robustness in the parameter estimation. In the second stage, after the EM converges, the MRF energy matrices are altered in order to include more anatomical knowledge (e.g. the cost of having glandular tissue next to the background is higher than having fat next to the background). Following this the classification is restarted and iterated until convergence. This modification allows an unbiased and robust parameter estimation in the first step, followed by a second step that enforces more anatomical knowledge and topological constraints. The values of the MRF energy matrix are chosen empirically, in order to produce realistic X-ray mammogram simulations. After classifying each voxel in the MR volume into fibro-glandular and fatty tissue, we then calculate an ‘‘effective’’ mono-energetic
X-ray attenuation volume, H. H captures the relative non-linear attenuations of a poly-energetic X-ray spectrum, by fat and fibro-glandular tissue, in a single volume. In this way we can repeatedly simulate DRRs during the iterative registration, using a simple ray-casting and summation of H, without having to recompute the attenuated spectrum for each ray cast. Robinson and Scrimger (1991) demonstrate that the use of a theoretical, effective attenuation and a mono-energetic beam provides good agreement with laboratory measurements. Each voxel H(i) in this volume is given by:
R max N0 ðÞfPF ðiÞlF ðÞ þ P G ðiÞlG ðÞgd HðiÞ ¼ ¼0 R max ¼0 N 0 ðÞd
ð1Þ
where N0() is the X-ray spectrum with respect to photon energy , PFjG(i) is the probability of tissue classes Fat (F) or Glandular (G) for voxel i, given by the EM-MRF classification and lFjG() is the linear attenuation of tissue classes F or G at photon energy . Details of the relevant X-ray parameters, namely the anode type and anode angle, are obtained from the mammogram’s DICOM header and the manufacturer’s X-ray set specifications, respectively. From these the Xray spectrum, N0(), can be estimated using published data (Cranley et al., 1997). Similarly the linear attenuation coefficients, lFjG(), can be obtained from publicly available data published by the National Institute of Standards and Technology (NIST) (Hubbell and Seltzer, 2004). The purpose of this volume is to create an image which can be repeatedly and efficiently projected to simulate a Full-Field Digital Mammogram. Beer’s law (Beer, 1852) describes the absorption of X-ray photons by distance h of a given material with linear, mono-energetic, attenuation coefficient l. For a monochromatic Xray beam this can be expressed as follows:
I ¼ I0 ehl
ð2Þ
where I0 is the incident photon energy and I is the attenuated energy. When viewing an image of X-ray attenuation, such as a mammogram, it is common to ‘‘log invert’’ the raw data, I, so that intensities in the mammogram, IM, reflect the total attenuation of X-rays reaching the detector:
Fig. 1. Comparison of results for calculation of the X-ray attenuation volume: (a) original MRI, (b) X-ray attenuation volume using manual thresholding and subsequent interactive editing, (c) X-ray attenuation volume using the EM-MRF algorithm, (d) simulated X-ray mammogram from the undeformed volume using the manual method, (e) simulated X-ray mammogram using EM-MRF. The two rows correspond to two patients. The red cross indicates the position of a corresponding coordinate in each image. Data provided by the MARIBS UK screening study (Leach et al., 2005).
T. Mertzanidou et al. / Medical Image Analysis 16 (2012) 966–975
969
Fig. 2. Overview of the 2D/3D registration framework. The processes are illustrated in blue and the data in red. The algorithm is implemented via the Insight Toolkit (ITK) which provides a generic, iterative optimisation loop into which into different transformation, similarity measure and optimisation components can be inserted. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
IM ¼ lnðI0 ehl Þ
ð3Þ hl
¼ lnðI0 Þ lnðe ¼ lnðI0 Þ þ hl
Þ
ð4Þ ð5Þ
Substituting the effective attenuation volume H in Eq. (5) and ignoring the constant ln(I0) term gives:
IM ¼
Z
hmax
HðhÞdh
ð6Þ
and validated for the classification of different tissue types visible in MR images (albeit brain tissue classification) by van Leemput et al. (1999) We believe the use of physical X-ray reference data in the simulation process creates images with intensity characteristics close to that of real X-ray mammograms. However given the lack of an appropriate validation strategy, this assumption must be evaluated, alongside our choice of the other registration components, with reference to the final registration error obtained below.
h¼0
which is simply a ray-casting of H. Fig. 1 shows the results of the EM-MRF algorithm, and for comparison, a histogram-based classification that uses manual thresholding and subsequent interactive editing to remove falsely classified voxels (Mertzanidou et al., 2010b; Hipwell et al., 2007). We can see that the proposed method contains more detail of the glandular tissue and these are then also visible in the projection images. Other advantages over manual thresholding are the fact that this gives reproducible results, it takes less than 1 min to process each breast volume and it requires minimal pre-processing interaction. The only requirement is that the pectoral muscle is pre-segmented from the MRI. This is needed because automated intensity-based segmentation methods are prone to error for this task. This boundary is not well-defined in the majority of the cases, especially when the glandular tissue is very close to the chest wall, or when organs with intensities similar to fat (such as the liver) are adjacent to the rib cage. We use a semi-automated pre-processing method, where the user defines landmarks on the boundary between the pectoral muscle and the breast through which a parametric B-spline surface is subsequently fitted. Quantitative validation of this algorithm is problematic due to the lack of a ground truth classification of the breast tissue. The fine nature of the ductal network, and relatively coarse resolution of the MRI in comparison, mean that the partial volume effect is substantial. For this reason manual delineation of the ductal network to produce such a ground truth data set is impractical. The choice of the EM-MRF approach for this application therefore, was made based on its advantages over manual thresholding discussed above, and also because it has been successfully applied
3. Registration framework An overview of the registration framework is given in Fig. 2. The inputs to the registration pipeline are the real mammogram (fixed/target image) and the X-ray attenuation volume (moving/ source image) with the intensities computed by Eq. (1). During registration, a DRR is generated from the attenuation volume, at the current position, by casting rays to each pixel in the target mammogram, using the current value of the affine transformation. The corresponding pixel intensity in the DRR is computed by bilinearly interpolating each plane of voxels intersecting the ray’s path and summing these intensities. A separate volume transformation and resampling step is therefore avoided by applying the affine transformation directly to the position and direction of each ray cast through the volume. The similarity between the DRR and the target mammogram is then calculated. Based on the value of this metric, the optimiser iteratively updates the 3D affine transformation parameters and the process is repeated until convergence. The 3D affine transformation has twelve parameters; these include translations (tx, ty, tz), rotations (rx, ry, rz), scaling (sx, sy, sz) and shear (shxy, shyz, shxz). We create an individual matrix for each one of these parameters and concatenate them into one matrix T:
T ¼ T translation ðT rotation ðT scaling T shearing ÞÞ
ð7Þ
Instead of optimising the matrix coefficients of T, we optimise the affine parameters defined above. This allows us to easily include the volume preservation constraint as explained below.
970
T. Mertzanidou et al. / Medical Image Analysis 16 (2012) 966–975
Normalised Cross Correlation is used as the similarity measure, as this was proven to have the best performance in our initial experiments (Mertzanidou et al., 2010b). For the two images, the real mammogram M and the simulated DRR S, the similarity is given by the formula:
PN i¼1 ðM i Si Þ ffi: NCCðM; SÞ ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi PN 2 2 PN M S i¼1 i i¼1 i
ð9Þ
where N is the total number of pixels in the region where the similarity is calculated. This region is a bounding box around the mammogram for the CC view and is specified using a binary mask which excludes the pectoral muscle in the MLO view. For the optimisation process, we use a gradient descent scheme, where each parameter is updated according to the gradient of the similarity measure with respect to the parameters. The value of each parameter p at iteration i is given by:
pi ¼ pi1 þ d Fig. 3. Projection geometry used in the registration, showing the X-ray source, the breast volume V and the projection plane. Before registration the volume V is positioned according to the value of the focal length, f, obtained from the DICOM header of the mammogram (e.g. f = 660 mm for GE Senographe 2000D X-ray sets) and the normal from the detector to the X-ray source is set to coincide with the centre of gravity of the mammogram. The centre of gravity of the masked MR volume is also positioned in the (x, y) plane to lie on this normal.
Our initial experiments (Mertzanidou et al., 2010b) indicated that using an affine transformation, without any constraints, can cause the breast volume to increase during registration. This is due to the 2D/3D registration problem being ill-posed; a single view is insufficient to constrain movement during optimisation along the direction of projection, leading to an expansion of the breast. To avoid such non-physical expansion we have included a volume-preservation constraint, by ensuring that the product of all scaling factors across the three dimensions is unity (sxsysz = 1). This is achieved by constraining the scaling along the direction of the projection, arbitrarily set to the z axis, to be
1 sz ¼ sx sy
ð8Þ
This constraint removes one degree of freedom from the optimisation process, reducing the size of the search space and potentially enhancing the robustness of the registration. A good initial position of the volume before registration is important to reduce the likelihood of the optimisation getting trapped in local minima. The projection geometry used is shown in Fig. 3. The distance between the X-ray source and the detector is extracted from the DICOM header of the Full-Field Digital Mammograms (FFDMs) and is f = 660 mm in all cases. The initial translation of the volume in the direction of the projection (z axis) ensures that the volume is positioned on top of the detector. The volume is also translated in the perpendicular plane (xy plane) such that the centre of mass of the volume is projected onto the centre of mass of the real mammogram.
Gðpi1 Þ step ffi rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi wðpÞ PP Gðki1 Þ2 k¼1
ð10Þ
wðkÞ
where d = ±1 defines the direction, G(p) is the magnitude of the gradient of the similarity measure with respect to parameter p and w(p) is a scalar weight factor that controls the relative magnitude of the step size step for each parameter. P is the set of parameters p. The step size decreases during optimisation if the direction d changes, indicating that the similarity measure is close to the minimum value. The optimisation terminates when the magnitude of the gradient, for all parameters, is smaller than a pre-defined tolerance value. More details can be found in (ITK, xxxx). For each registration, the algorithm requires around 20 min, on a single core, 64bit machine, with a 2.8 GHz processor. We believe that this can be further improved with a GPU-based ray-casting algorithm, as this part is the most computationally expensive. 4. Experiments on simulated mammograms The goal of this first set of experiments was to evaluate how well the volume-preserving affine transformation can approximate a real mammographic compression in a 2D/3D registration task. More specifically, the images registered were an undeformed MR volume and a simulated X-ray that was created using an MR image of a real compression of the same breast. The data used were a series of real MR compressions of the breast from 8 volunteers in the lateral to medial direction (Tanner et al., 2011). Fig. 4 shows some examples of these compression images (volume sizes: [1 1 2.5] mm3). The ground truth correspondences in this case were estimated by manually picking 3D landmarks between the undeformed and the compressed MRI and calculating a target registration error (TRE). A mean of 12 landmarks were picked for each volunteer distributed across the breast volume, by an imaging scientist (Tanner et al., 2011). The mean TRE for these experiments was reduced to 3.8 mm (with a standard deviation of 1.5 mm) after registration, from an initial 11.5 mm (std 6.6 mm) misalignment. The individual errors
Fig. 4. Coronal slices of the data used for evaluation for 3 volunteers (a–c). From left to right in each image: MRI slice before and after compression. Target, simulated mammograms were generated from the compressed MR volumes and the undeformed MRIs were registered to these images. The registrations were evaluated using corresponding landmarks manually identified in each subjects pair of MR volumes.
971
T. Mertzanidou et al. / Medical Image Analysis 16 (2012) 966–975 Table 1 The mean registration error (Euclidean distance in mm) before and after registration for the eight volunteers used in the validation tests on simulated X-ray data. Each individual volume includes also the standard deviation and the maximum error for the landmark positions. v2
v3
v4
v5
v6
v7
v8
Mean
Std
5.8
20.1
17.6
19.8
4.0
8.6
5.9
10.5
11.5
6.6
2.6
6.2
5.9
3.8
1.5
4.0
2.9
3.4
3.8
1.5
1.9
2.4
3.2
1.1
0.6
1.9
1.4
1.8
–
–
6.6
9.9
12.1
5.2
2.6
7.6
5.3
7.2
–
–
9 8
Number of cases
Before reg. (mean) After reg. (mean) After reg. (std) After reg. (max)
v1
Registration Error histogram 10
7 6 5 4 3 2
Number of cases
1
Registration Error histograms
6
0 CC view median
0
5
10
15
20
25
30
35
40
Registration error in mm
4 2 0
0
5
10
15
20
25
30
35
40
Fig. 5. Histogram of the registration errors calculated from 113 registration tasks using corresponding lesions, manually annotated by experienced observers in both the MRI and X-ray images. In red is shown the median value (13.1 mm) and in green the outliers. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Number of cases
Registration error in mm 6 MLO view median
4 2 0
0
5
10
15
20
25
30
35
40
Registration error in mm Fig. 6. Histograms of the registration errors of Fig. 5, displayed individually for the CC and MLO views. Whilst the median error for MLO mammograms is marginally higher than for CC view mammograms (13.5 mm versus 12.9 mm as indicated in red) the distribution of these errors is broadly similar. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
for each volunteer are shown in Table 1. The results indicate that the affine transformation can give a clinically useful accuracy, when registering 3D volumes to DRRs that have been generated using real breast compressions. These are promising results for the suitability of the affine transformation model. Nevertheless the error is expected to be larger when using real X-ray mammograms, as the intensity range and the similarity between the simulated and the target mammogram will change significantly.
The patients had a range of different pathologies. One of the patients had an MR and X-ray compatible clip inserted after breast biopsy. This was used as ground truth correspondence. The rest of the patients had clearly visible findings in both modalities. These were annotated and the annotations were used as ground truth correspondences for validation. For the annotated data, the MR findings were marked using one or multiple spheres, while the X-ray images using either a disc, or more frequently a free-form shape defining the outline of the finding. As it is generally harder to annotate the 3D images accurately, the spheres did not always represent the finding’s actual volume, but were rather centred around it. As a result we consider as the most appropriate error metric the distance between the centres of the annotated regions. In all the results shown below, the registration error is the 2D Euclidean distance between the centres of mass of the X-ray annotation and the projection of the MR annotation, after being deformed with an affine transformation and projected.
5. Experiments on real data In these experiments we used MRIs and FFDMs, both CC and MLO views, that were acquired approximately at the same time point.1 The MRI data were T1-weighted images. The voxel resolution of the MRIs varied, as the images were acquired from three different scanners. The majority had a resolution of either [0.9 0.9 1.0] mm3 or [0.6 0.6 1.3] mm3; whilst one had a resolution of [0.7 0.7 2] mm3. The original resolution of the X-ray mammograms was [0.1 0.1] mm2 for all of the cases apart from one that was [0.085 0.085] mm2; they were subsampled by a factor of 10 for registration to match the MRI resolution and reduce the computational cost associated with the ray-casting. 1
In most of the cases these were acquired the same day, overall within a month.
Fig. 7. Registration result for CC (on the left) and MLO (on the right) mammograms of patient 1, diagnosed with an invasive ductal carcinoma. The X-ray mammogram annotation is shown in red and the projection of the MR annotation in green. The registration error is 8.1 mm for the CC and 14.4 mm for the MLO view. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
972
T. Mertzanidou et al. / Medical Image Analysis 16 (2012) 966–975
Fig. 8. Registration result for CC (on the left) and MLO (on the right) mammograms of patient 2, diagnosed with ductal carcinoma in situ. The X-ray mammogram annotation is shown in red and the projection of the MR annotation in green. The registration error is 6.8 mm for the CC and 2.6 mm for the MLO view. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Fig. 9. Registration result for CC (on the left) and MLO (on the right) mammograms of patient 3. The top row illustrates the raw mammograms and the bottom row the registered DDRs. A magnification view is given for both views on the bottom right corner of the raw mammograms. The evaluation in this case was done using the clip location. The clip is 2 mm long as displayed in the magnification view; the location in the X-ray mammogram is illustrated by the high intensity region (red arrow) and the projection of the clip location in the MR is shown in green (green arrow). The registration error is 9.9 mm for the CC and 7.7 mm for the MLO view. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Fig. 10. Registration result for two cases for which the registration error was high; (left) CC view of patient 4 (error 28.4 mm) and (right) MLO view of patient 5 (error 20.2 mm), both diagnosed with invasive ductal carcinoma. The X-ray mammogram annotation is shown in red and the projection of the MR annotation in green. In both cases the annotated regions are large and hence poorly localised. The corresponding MR volumes and annotations are shown in Fig. 11. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
For validation we have performed in total 113 registration tasks, including both CC (n = 55) and MLO (n = 58) view mammograms. These came from 49 patients, some of which had multiple studies of MRI and X-ray mammogram pairs acquired at different times. The total histogram of the registration errors is given in Fig. 5. The median registration error including all cases is 13.1 mm. As can also be seen in the histogram, for a limited number of cases (n = 3) the registration was not successful for reasons that are further discussed below. We consider these cases outliers. The percentage of the registrations that had an error larger than 30 mm is 2%. Using the median instead of the mean value is more informative of the registration error’s central tendency, as it represents better the value around which the majority of the registration errors are clustered. The outliers affect the computation of the mean error value, while the median is more robust to their presence. Fig. 6 shows the distributions of the registration errors for the CC and MLO view mammograms separately. The histograms and the median values illustrate that overall the algorithm is slightly less accurate for the MLO view, although the difference between the two median values is negligible (12.9 mm for the CC and 13.5 mm for the MLO view). Figs. 7–9 illustrate the results for 3 patients (6 registrations), for which the registration was considered to perform well. In the first two cases it is clear that the appearance of the lesion shape in the two modalities varies significantly. We can also see that even in cases where the error in mm seems rather high (Fig. 7: 14.4 mm and Fig. 9: 9.9 mm), there is still an overlap between the two annotations and the registration result can give a good indication about the location of the lesion on the X-ray. The results of two cases for which the registration did not perform well are shown in Fig. 10. Fig. 11 shows the two corresponding MR volumes to illustrate the difficulty in registering these lesions. As we can see in the figures illustrating the results, the annotation of the projected MR lesion is generally larger than the corresponding annotation in the X-ray mammogram. Overall, the mean radius of the MR lesions is 11.7 mm, while that of the X-ray annotations is 9.9 mm. This small difference in size can be caused by the difficulty in annotating the 3D volumes as discussed above, and the different contrast mechanisms in MRI and X-ray. When the MR lesions are projected into 2D after transformation,
T. Mertzanidou et al. / Medical Image Analysis 16 (2012) 966–975
Fig. 11. Corresponding MRIs and annotations for the patients illustrated in Fig. 10 for which the registration error was high. Patient 4 (top) and patient 5 (bottom).
973
fatty breasts, with an average of 76% by volume of fatty tissue; the range for all the breast volumes in the dataset is 58–78% (mean: 71%). For the case of very fatty breasts, there is not enough information (i.e. glandular tissue) to drive the registration and subsequently the optimisation is more likely to terminate in local minima. The registration of these cases is of less clinical importance, as for radiologists establishing correspondences between images is more challenging in cases of dense rather than fatty breasts. We also expect the affine transformation assumption to be less accurate for the large-size breasts, as in these cases the breast undergoes large anisotropic deformations that cannot be approximated accurately by our affine transformation model. Moreover, we consider that the cases with large irregularshaped lesions are not suitable for validation, since the contrast mechanisms are different between the two modalities and hence the lesion appearance differs significantly between them; as a result, their centres of mass would not necessarily correspond. One of the main issues that arises from the validation of this alignment task is how we determine ground truth correspondences between the two images. Although in clinics the registration would only be useful to the radiologists in cases where lesions are not easily identified in both modalities, clearly visible lesion cases are used in this study to provide quantitative results. Nevertheless, due to the different nature of the images involved, the appearance of a lesion in a projection X-ray mammogram differs significantly from the enhanced 3D area in the MRI. Moreover it is not always straightforward to identify the lesion boundaries. Consequently, the 2D Euclidean distance could give results that are not representative of the actual correspondence. We believe that the cases that would be best suited for validation are the ones with small lesions or clip data, such as that displayed in Fig. 9. The total number of clinical cases that we originally had access to was 76 patients. From these, we excluded from validation those that had artifacts. For example, we excluded those that had annotation problems (8 cases), either because of lesions that were clearly mis-matched between the MRI and the X-ray mammogram, or because of very large lesion annotations such as those shown in Fig. 12, as the Euclidean distance is not meaningful for these patients. There were also 4 cases excluded for which our breast segmentation carried out before registration failed to produce reasonable results around the chest wall, due to non-clearly defined boundaries between the pectoral muscle and the breast tissue. The rest of the patients that were excluded from validation are patients with breast folding artifacts in the MRI. An example of
the mean radius of these areas is 15.9 mm. This increase in size is expected from the global affine transformation model, since the lesions are not modelled separately as rigid objects. Therefore, although their volume is preserved, an expansion of the breast in the medial–lateral direction (for a CC view) would result in an expansion of the mean radius of the lesion when this is projected into 2D. 6. Discussion Overall, the main characteristic of the cases where the registration did not perform well are patients with large fatty breasts and cases with large, irregular-shaped lesions. In our experiments, the results did not show any correlation between the breast density or size and the registration error. Nevertheless, the analysis of the cases that had errors larger than 25 mm showed that seven out of nine cases were large breasts. Their mean volume size was 46% larger than the mean breast size of all cases (1.55 106 mm3 against 1.06 106 mm3). Moreover, five of these cases were also
Fig. 12. Two MLO view mammograms that were excluded from validation, because the lesion annotation was too large.
974
T. Mertzanidou et al. / Medical Image Analysis 16 (2012) 966–975
introducing folding would be volume registration between the pre- and the post-contrast image (Rueckert et al., 1999). As our algorithm does not require significant manual interaction, we also ran the registrations for all the above cases. The histogram of the registration errors is shown in Fig. 14. We also display in the same figure the histogram of the cases used for validation and the total histogram for comparison. We can see from the plots that the errors for the cases that contained artifacts are spread equally throughout the total error range and account for all the errors that are larger than 45 mm. Our method can be easily integrated into clinical practice. Results can be presented to radiologists via a linked cursor which enables the user to navigate through the MRI and view the projection of the MR position on the X-ray mammogram. The median registration error extracted from our large dataset could be visualised as a circle of an ellipse of uncertainty around the linked cursor in the X-ray view.
7. Conclusion
Fig. 13. The MR image of one patient with large folding artifacts. This and similar cases were excluded from the validation.
these cases is shown in Fig. 13. We found that for a large number of cases the breast was significantly deformed in the MRI. One reason why folding was so frequent in our dataset is that, according to the protocol of the clinic, women were advised to wear t-shirts during MRI acquisition to limit subtraction artifacts coming from motion of the breast before and after the injection of the contrast agent. As a result, this can cause folding artifacts in the MRI, particularly for large breasts. Our algorithm cannot compensate for folding of the breast and we believe these cases will also be problematic for previously published methods which attempt to deform the breast volume from the prone position to the compression between two plates. Subsequently these cases were excluded from validation. An alternative option for reducing breast motion artifacts without
Registration Error histogram 14 all cases cases with no artifacts cases with artifacts
Number of cases
12 10 8 6 4 2
This study presents a new MRI to X-ray mammography registration framework that uses an improved breast tissue classification technique and an affine transformation model to approximate the breast deformation when compressed between two plates. This is the first method proposed that uses the structures within the breast for alignment, rather than surrogates based on breast outline or nipple position, and has been tested on a meaningfully large number of datasets. Its main advantages are the ability to be easily integrated in clinics and also to provide reproducible results with minimal pre-processing interaction. The only interactive step is selecting landmarks on the pectoral muscle boundary. With an alternative automated method incorporated for this task, such as that recently proposed by Gubern-Merida et al. (2011), the pipeline would be fully automated. The motivation for this work was to assist in the reading of multimodal images and potentially improve breast cancer diagnosis. The experiments on simulated data indicate that a simple affine transformation model can approximate a real breast compression in a 2D/3D registration task with an accuracy of 3.8 mm. These results are promising for the suitability of the affine transformation for this task, however these cases contain limited translation, rotation, shear and more complex non-rigid deformations which we expect from a real mammographic compression. The results of 113 registration tasks on clinical cases show that our algorithm can be applied in clinical practice giving useful accuracy (median 13.1 mm). The results show a comparable accuracy to patient-specific biomechanical modelling (Hopp et al., 2011, mean 11.8 mm). Nevertheless we cannot directly compare the two statistics as the two techniques were tested on different data sets. In future work it would be beneficial to compare our method with the patient-specific biomechanical modelling on the same dataset for a more rigorous comparison, as the accuracy can vary with breast shape, size and the pathology or otherwise used for validation.
Acknowledgements
0 0
10
20
30
40
50
60
70
Registration error in mm Fig. 14. Total histogram of the registration errors computed from all manual annotations, 179 registrations. In red are shown the 66 cases that were excluded from validation due to artifacts. The histogram in green corresponds to the 113 cases used for validation and is the same data displayed in Fig. 5. The total histogram of all 179 cases is shown in blue. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
This work was funded by the European 7th Framework Program, HAMAM, ICT-2007.5.3 and EPSRC Grant EP/E031579/1. M. Jorge Cardoso is supported by the FCT Scholarship (SFRH/BD/ 43894/2008), EPSRC (EP/H046410/1). Sebastien Ourselin received funding from the CBRC Strategic Investment Award (Ref. 168). The authors would like to thank the University College London Hospitals for the MR breast compression data, the Radboud University Nijmegen Medical Centre for the MRI and X-ray
T. Mertzanidou et al. / Medical Image Analysis 16 (2012) 966–975
mammography data with annotations and the Charite Universitätsmedizin Berlin for the clip case used in this study. References Beer, A., 1852. Bestimmung der Absorption des rothen Lichts in farbigen Flüssigkeiten. Annalen der Physik 162, 78–88. Behrenbruch, C., Marias, K., Armitage, P., Moore, N., English, R., Clarke, J., Brady, M., 2003. Fusion of contrast-enhanced breast MR and mammographic imaging data. Medical Image Analysis 7, 311–340. Benndorf, M., Baltzer, P.A.T., Vag, T., Gajda, M., Runnebaum, I.B., Kaiser, W.A., 2010. Breast MRI as an adjunct to mammography: does it really suffer from low specificity? A retrospective analysis stratified by mammographic BI-RADS classes. Acta Radiologica 51, 715–721. Cardoso, M., Clarkson, M., Ridgway, G., Modat, M., Fox, N., Ourselin, S., 2011. The Alzheimer’s disease neuroimaging initiative, load: a locally adaptive cortical segmentation algorithm. Neuroimage 56, 1386–1397. Cranley, K., Gilmore, B., Fogarty, G., Desponds, L., 1997. Catalogue of Diagnostic Xray Spectra and Other Data. Technical Report. The Institute of Physics and Engineering in Medicine. Gubern-Merida, A., Kallenberg, M., Marti, R., Karssemeijer, N., 2011. Fully automatic fibroglandular tissue segmentation in breast MRI: atlas-based approach, in: MICCAI Workshop on Breast Image Analysis, pp. 73–80. Hipwell, J., Tanner, C., Crum, W., Schnabel, J., Hawkes, D., 2007. A new validation method for X-ray mammogram registration algorithms using a projection model of breast X-ray compression. IEEE Transactions on Medical Imaging 26, 1190–1200. Hopp, T., Baltzer, P., Dietzel, M., Kaiser, W., Ruiter, N., 2011. 2D/3D image fusion of X-ray mammograms with breast MRI: visualizing dynamic contrast enhancement in mammograms. International Journal of Computer Assisted Radiology and Surgery, in press. Hubbell, J., Seltzer, S., 2004. Tables of X-Ray Mass Attenuation Coefficients and Mass Energy-Absorption Coefficients from 1 keV to 20 MeV for Elements Z = 1 to 92 and 48 Additional Substances of Dosimetric Interest. Technical Report. Ionizing Radiation Division, Physics Laboratory, NIST. 100 Bureau Drive, M/S 8460 Gaithersburg, MD. ITK, Insight Segmentation and Registration Toolkit.
. Johns, P., Yaffe, M., 1987. X-ray characterisation of normal and neoplastic breast tissues. Physics in Medicine and Biology 32, 675–695. Leach, M., Boggis, C., Dixon, A., Easton, D., Eeles, R., Evans, D., Gilbert, F., Griebsch, I., Hoff, R., Kessar, P., Lakhani, S., Moss, S., Nerurkar, A., Padhani, A., Pointon, L.,
975
Thompson, Warren, R., 2005. Screening with magnetic resonance imaging and mammography of a UK population at high familial risk of breast cancer: a prospective multicentre cohort study (MARIBS). The Lancet 365, 1769–1778. van Leemput, K., Maes, F., Vandermeulen, D., Suetens, P., 1999. Automated modelbased tissue classification of MR images of the brain. IEEE Transactions on Medical Imaging 18, 897–908. Marti, R., Zwiggelaar, R., Rubin, C., Denton, E., 2004. 2D-3D correspondence in mammography. Cybernetics and Systems 35, 85–105. Mertzanidou, T., Hipwell, J., Cardoso, M., Tanner, C., Ourselin, S., Hawkes, D., 2010a. X-ray mammography – MRI registration using a volume-preserving affine transformation and an EM-MRF for breast tissue classification, in: International Workshop on Digital Mammography, pp. 23–30. Mertzanidou, T., Hipwell, J., Tanner, C., Hawkes, D., 2010b. An intensity-based approach to X-ray mammography – MRI registration, in: Proceedings of SPIE Medical Imaging: Image Processing, pp. 7623–106. NICE, 2006. National Institute for Health and Clinical Excellence. Familial Breast Cancer: Quick Reference Guide. Nie, K., Chen, J., Chan, S., Chau, M., Yu, H., Bahri, S., Tseng, T., Nalcioglu, O., Su, M., 2008. Development of a quantitative method for analysis of breast density based on three-dimensional breast MRI. Medical Physics 35, 5253–5262. Robinson, D., Scrimger, J., 1991. Monoenergetic approximation of a polyenergetic beam – a theoretical approach. British Journal of Radiology 64, 452–454. Rueckert, D., Sonoda, L., Hayes, C., Hill, D., Leach, M., Hawkes, D., 1999. Nonrigid registration using free-form deformations: application to breast MR images. IEEE Transactions on Medical Imaging 18, 712–721. Ruiter, N., Stotzka, R., Muller, T., Gemmeke, H., Reichenbach, J., Kaiser, W., 2006. Model-based registration of X-ray mammograms and mr images of the female breast. IEEE Transactions on Nuclear Science 53, 204–211. Tanner, C., Hipwell, J., Hawkes, D., 2009. Using Statistical Deformation Models for the Registration of Multimodal Breast Images, in: Proceedings of SPIE Medical Imaging, pp. 72590P–72590P-9. Tanner, C., White, M., Guarino, S., Hall-Craggs, M., Douek, M., Hawkes, D., 2011. Large breast compressions – observations and evaluation of simulations. Medical Physics 38, 682–690. Wei, J., Chan, H., Helvie, M., Roubidoux, M., Sahiner, B., Hadjiiski, L., Zhou, C., Chenevert, S.P.T., Goodsitt, M., 2004. Correlation between mammographic density and volumetric fibroglandular tissue estimated on breast MR images. Medical Physics 31, 933–942. Yuan, Y., Giger, M., Li, H., Bhooshan, N., Sennett, C., 2010. Multimodality computeraided breast cancer diagnosis with FFDM and DCE-MRI. Academic Radiology 17, 1158–1167.