Non-rigid registration of multi-modal images using both mutual information and cross-correlation

Non-rigid registration of multi-modal images using both mutual information and cross-correlation

Available online at www.sciencedirect.com Medical Image Analysis 12 (2008) 3–15 www.elsevier.com/locate/media Non-rigid registration of multi-modal ...

1MB Sizes 0 Downloads 31 Views

Available online at www.sciencedirect.com

Medical Image Analysis 12 (2008) 3–15 www.elsevier.com/locate/media

Non-rigid registration of multi-modal images using both mutual information and cross-correlation A. Andronache *, M. von Siebenthal, G. Sze´kely, Ph. Cattin ETH Zurich – Computer Vision Laboratory, Sternwartstrasse 7, CH – 8092 Zurich, Switzerland Received 1 November 2006; received in revised form 27 May 2007; accepted 6 June 2007 Available online 28 June 2007

Abstract The hierarchical subdivision strategy which decomposes a non-rigid matching problem into numerous local rigid transformations is a very common approach in image registration. While mutual information (MI) has proven to be a very robust and reliable similarity measure for intensity-based matching of multi-modal images, numerous problems have to be faced if it is applied to small-sized images, compromising its usefulness for such subdivision schemes. We examine and explain the loss of MI’s statistical consistency along the hierarchical subdivision. Information theoretical measures are proposed to identify the problematic regions in order to overcome the MI drawbacks. This does not only improve the accuracy and robustness of the registration, but also can be used as a very efficient stopping criterion for the further subdivision of nodes in the hierarchy, which drastically reduces the computational cost of the entire registration procedure. Moreover, we present a new intensity mapping technique allowing to replace MI by more reliable measures for small patches. Integrated into the hierarchical framework, this mapping can locally transform the multi-modal images into an intermediate pseudo-modality. This intensity mapping uses the local joint intensity histograms of the coarsely registered sub-images and allows the use of the more robust and computationally more efficient cross-correlation coefficient (CC) for the matching at lower levels of the hierarchy.  2007 Elsevier B.V. All rights reserved. Keywords: Non-rigid registration; Mutual information; Cross correlation; Hierarchical registration; Intensity mapping

1. Introduction Medical imaging technologies have become indispensable components in most clinical procedures during the last years. The wide availability of different imaging modalities tremendously increased the need for fast and accurate multi-modal image registration methods. Several surveys and textbooks (e.g. Maintz and Viergever, 1998; Hajnal et al., 2001; Hill et al., 2001; Maes et al., 2003; Pluim et al., 2003 and references therein) have already been published providing a broad and general overview of image registration techniques. *

Corresponding author. Tel.: +41 446323202. E-mail addresses: [email protected] (A. Andronache), [email protected] (M. von Siebenthal), [email protected]. ethz.ch (G. Sze´kely), [email protected] (Ph. Cattin). 1361-8415/$ - see front matter  2007 Elsevier B.V. All rights reserved. doi:10.1016/j.media.2007.06.005

Among other, two key elements are to be mentioned in the context of image registration: the measure used to quantify the similarity between the images to be matched, and the spatial transformation that aligns the images. As for the first, the introduction of MI as a similarity measure (Maes et al., 1996; Viola and Wells, 1995) has especially influenced the development of intensity-based image registration due to its inherent ability to deal with multi-modal matching problems (Maes et al., 2003). Regarding the transformations, classical rigid, affine or projective registration techniques can only be used in limited special cases. Non-rigid registration procedures, capable to deal with more localized spatial changes, are therefore in the focus of current research and development. The most accurate methods for non-rigid registration are based on physical models (Hajnal et al., 2001) but they proved to be computationally very expensive. Therefore, various

4

A. Andronache et al. / Medical Image Analysis 12 (2008) 3–15

simplifications have been investigated based on different heuristics to approximate the underlying physical reality by alternative mathematical models. One possible way to approach the related problems has been proposed by Maintz et al. (1998) implementing elastic registration by windowing the original images and finding local rigid transformations that are elastically interpolated into a continuous deformation field. Later on a hierarchical image subdivision strategy was proposed by Likar and Pernusˇ (2001). It decomposes the non-rigid matching problem into an elastic interpolation of numerous local rigid registrations of sub-images of decreasing size, as shown in Fig. 1. As the local registrations are achieved by maximizing MI, the algorithm can be generally applied both for mono- and multi-modal cases. Unfortunately, the usage of MI for image matching has several drawbacks in connection with either interpolation artifacts or its statistical consistency (see Pluim et al., 2003; Likar and Pernusˇ, 2001; Maes et al., 1997; Pluim et al., 1999). Our investigation of MI’s behavior revealed additional problems when using it as a similarity measure for registering noisy patches showing no clear image structure. In the context of the hierarchical strategy, these drawbacks become increasingly serious during the image subdivision process due to the higher probability to produce more and more structureless image patches and the decreasing number of samples used to compute the two-dimensional (2D) joint intensity histogram. This decrease of MI’s statistical consistency weakens the performance of the entire non-rigid registration and limits the number of levels that can be generated during the hierarchical subdivision. It would be therefore desirable to replace MI with a more robust similarity measure. However, the usage of CC favored by most researchers is restricted to the monomodal case. In the past few years, several methods have been proposed either for estimating a functional relationship between the intensities of images from different modalities or for the direct estimation of similarity measures which integrate this functionality in their definition. For example, the variance of intensity ratios (VIR) criterion presented by Woods et al. (1993) proved to be efficient for matching PET with MR images. In Nikou et al. (1998) an approach

Reference Image

Floating Image

was presented that removed the need for manual segmentation of the original VIR method and extended its applicability to other modality combinations. Another extension of Woods’ VIR criterion called correlation ratio is described in Roche et al. (1998). Later on, in Guimond et al. (2001), an adaptive intensity correction was proposed that combines the correlation ratio with the demons algorithm (Thirion, 1996). A completely different approach for CT-MR cross-registration is described in van den Elsen et al. (1994) which bases on a simple intensity mapping of the original CT image such that bone and air have identical appearance as in an MR image. All the proposed methods, however, lead to the appearance of fake structures within the mapped image, which strongly limits their usability. These ghost features caused by imaging details which are not visible in both modalities lead to ambiguities that result in misregistrations. Starting from the original 2D version of the hierarchical strategy, we improved the algorithm presented by Likar and Pernusˇ (2001) and extended it for volumetric data (Andronache et al., 2005). In this paper we first present an analysis of difficulties emerging when using this hierarchical registration scheme. Subsequently, we propose solutions to overcome these limitations, leading to an increase in robustness, accuracy, and speed of the entire non-rigid registration procedure. Firstly, we propose a new method to detect structureless regions within an image. Based on this method, we developed a reliable stopping criterion for the hierarchical subdivision procedure. Secondly, we propose to switch from MI to CC as a similarity measure at higher levels of the hierarchy, when MI becomes unstable and statistically unreliable. In contrast to the already existing approaches that estimate the functional relationship from one image modality to the other, we propose to build a common intermediate pseudo-modality (Andronache et al., 2006). The intensities in both images are mapped simultaneously into a common contrast space, which is not necessarily one of the two source intensities, but rather a combination of them. Although the transformed images may locally resemble one of the modalities, on an overall scale this is not true. Finally, we also demonstrate that incorporating these solutions in the hierarchical strategy for image registration can drastically reduce its computational complexity. 2. On the behavior of mutual information

1st Level Global rigid registration 2nd Level 4 local rigid registrations 3rd Level 16 local rigid registrations 4th Level

Fig. 1. The hierarchical subdivision scheme.

Even though MI has proven to be a very robust and reliable similarity measure for intensity-based registration of multi-modal images, numerous problems have to be faced if applied to small-size images, compromising its usefulness for subdivision schemes. These problems have been identified in connection with either interpolation artifacts or inherent limitations of the MI definition. These difficulties are strongly coupled with the calculation of parameters for the discrete intensity probability distribution, estimated by histograms. Therefore, they are increasingly disturbing

A. Andronache et al. / Medical Image Analysis 12 (2008) 3–15

when the size of the sub-images becomes smaller with the successive hierarchical subdivision. This effect is, however, less pronounced for 3D data. The interpolation artifacts have been broadly analyzed in the literature and different solutions have been proposed (Pluim et al., 1999; Ji et al., 2003; Tsao, 2003). In order to minimize their influence, Likar and Pernusˇ artificially increased the number of image samples used for histogram generation by incorporating the global joint distribution as a prior. As a consequence, the statistical reliability of MI is increased. For small patch sizes, however, the use of prior information estimated from the entire image may lead to false maxima in the MI goal function. As another consequence of the successive image splitting, patches of low structural content may appear that often lead to morphologically inconsistent local registrations with a low MI response. Likar and Pernusˇ suggested to identify such patches by thresholding the MI value and to exclude them from the local adjustment process. However, the thresholding of the MI value can lead to two disturbing problems. Firstly, structured sub-images having a low MI value, which still can be registered in a consistent way, will be prevented from becoming properly adjusted. Secondly, according to our observations, structureless patches may be retained as the MI significantly increases when these patches start to overlap a structure in the reference image. It is well known from information theory (Shannon, 1948), that if two signals are statistically independent then their MI is reaching its minimum possible value, namely zero. Therefore, one would expect that by shifting a structureless noisy patch around its initial position, the similarity measure has small response. Surprisingly, preliminary experiments clearly demonstrated that even though MI is small, it starts to increase as soon as the structureless patch

5

overlaps a region of higher structural content, leading to wrong local registrations, that a thresholding technique or a regularization procedure may fail to detect or correct. The problem is even more pronounced in the context of multi-modal image registration when not all tissue details can be seen in all modalities. The example shown in Fig. 2 illustrates how such structureless sub-images can induce significant local misregistrations at a later level of the hierarchy. The experiment was performed on one selected region of interest of a CT-MR matching example shown in Fig. 2a,b. Fig. 2c,d shows the region of interest around the sphenoid sinus in the left temporal bone on the reference CT and the floating MR images, respectively. Fig. 2e demonstrates the problem when all sub-images are undergoing the local registration. Fig. 2f shows the result when only those sub-images having a clear structure are locally registered, while the structureless sub-images remain in their initial position. This behavior can be well demonstrated and explained by experiments performed on 1D signals. Let us consider two signals A and B as depicted in Fig. 3. We generated the reference signal A by adding white noise to a step function. The floating signal B consists of white noise, and is statistically independent of A. Using the basic definition, we can calculate the MI between the two signals as a function of the displacement when the floating signal is translated along the reference signal. The none-zero baseline of MI, clearly identifiable in Fig. 3b, can be explained by a combination of two different effects. One is rooted in the difficulty to achieve full independence between signals represented by a finite number of discrete samples. In the ideal case of statistically independent signals A and B, the entropies would cancel out resulting in zero MI. This is, however, very unlikely for a

Fig. 2. Registration details of the sphenoid sinus in the left temporal bone at the 6th level of the hierarchy where the original floating image is divided in 32 · 32 sub-images of 16 · 16 pixels. (a) The reference CT and (b) the floating MR image. (c) The examined image region in the CT and (d) in the MR, consisting of 3 · 3 sub-images. (e) The final position of each sub-image after the local rigid registration. (f) The result after applying the local rigid registration only to those MR sub-images which have a clear structure, keeping the position of the structureless sub-images unchanged.

A. Andronache et al. / Medical Image Analysis 12 (2008) 3–15

200 Floating signal B ↓ 100 →

0.3 ↑ Reference signal A

Mutual Inf

Value

6

0.2 0.1 0

0

1.5

2 2.5 3 Sample number

3.5

4

1

2 3 Displacement

4

x 10

4 4

x 10

Fig. 3. One-dimensional experiment showing the behavior of MI in the presence of noise (a) Original test signals: the reference signal A and the floating signal B. (b) The response of MI when the floating signal B is shifted over the reference signal A.

real case scenario. On the other hand it is well known from information theory (Cover and Thomas, 1991) that at the transition from a continuous differential entropy to a discrete entropy there is a systematic bias by an error term log2(D) depending on the size of the quantization bins D used for histogram generation: limðH ðxD Þ þ log2 ðDÞÞ ¼ hðxÞ

ð1Þ

D!0

where H(xD) is the discrete entropy, and h(x) the differential entropy. This theorem only applies to the marginal entropies H(A) and H(B). We are not aware of any results on deriving a similar relation for the joint entropy H(A, B). Clearly, for strictly independent signals A and B the quantization error of the discrete entropy would cancel out. This is, however, not the case if the independency condition is perturbed. To further investigate the problem, numerical experiments have been performed showing the dependency of the entropies on the sample size. The results are shown in Fig. 4. As expected, the CC graph (Fig. 4c) clearly shows that the statistical independence between the two finite, discrete random signals improves with their sample length. On the other hand, Fig. 4a shows a very interesting property of the entropies. While the sum of the marginal entropies H(A) + H(B) is only slightly influenced by the sample length and very quickly reaches the theoretically predicted

10

3. Moran’s I spatial autocorrelation coefficient As discussed previously, a simple thresholding of MI during the matching process does not offer a satisfactory solution to the identified problems. Therefore, a new method is necessary, purely relying on the structural content of the single sub-images, allowing to identify structure-

1

HA+HB ↓

MI

10.5

value for discrete entropy, the joint entropy H(A, B) requires substantially more samples to show a similarly stable behavior. In other words, MI is much more sensitive to deviations from independency between the signals to be matched. Relating this observation to the test signals from Fig. 3 it is obvious, that once the floating signal B starts to overlap the step in A, we get a bi-modal distribution for both H(A) and H(A, B), while H(B) remains constant. The number of available samples then needs to be distributed among two separate intensity clusters for the marginal entropy H(A) and the joint entropy H(A, B). As can be clearly seen from the above graph, the joint entropy H(A, B) decreases much faster than the marginal entropy H(A), thus leading to the observed strong increase of MI. The same behavior has been also noticed when using the normalized MI as defined in Studholme et al. (1999).

↑ HAB

0 3 10

9.5

Entropy

0.5

4

10 Number of samples

5

10

9 0.1

CC

8.5 8 7.5 3 10

4

10 Number of samples

10

5

0

3

10

4

10 Number of samples

5

10

Fig. 4. (a) The dependency of the entropies H(A) + H(B) and H(A, B) on the number of samples in the signal. The corresponding dependency of (b) MI and (c) CC on the number of samples in the signal.

A. Andronache et al. / Medical Image Analysis 12 (2008) 3–15

7

Fig. 5. Classification of image regions into consistent and inconsistent structures when using the Moran’s I consistency test. (a) The original MR slice, split into 32 · 32 sub-images of 16 · 16 pixels. (b) Sub-images classified as showing consistent anatomical structures. (c) Sub-images classified as having inconsistent structures (noisy or structureless). (d–f) Same as (a–c) for a CT slice. (g–i) Same as (a–c) for a PET image, split into 10 · 14 sub-images of 10 · 10 pixels.

less nodes in the subdivision scheme. These have to be excluded both from the registration chain and from the further hierarchical splitting process. Inspired by point-pattern analysis, where the main goal is to characterize spatial patterns or the variation in the data within a region of interest, we propose to use the spatial autocorrelation coefficient to test for the structural content of the sub-images. Among different statistical tests proposed in the literature to determine whether the data are randomly or regularly distributed or clustered, the Moran I coefficient is favored by most of the analysts because it has highly desirable statistical properties (Lee and Wong, 2001). According to Cliff and Ord (1973), for a dataset X = {xi, i = 1, . . ., N} of mean value EðX Þ ¼ x, Moran’s I is defined as PN xÞ  ðxj  xÞ N i;j¼1 wij  ðxi   I ¼ PN  ð2Þ PN x Þ2 i;j¼1 wij i¼1 ðxi  

where W = {wij} is called the contiguity matrix, representing the connectivity weights, depending on the amount of interaction between the locations i and j, and N stands for the number of observations in the analyzed data. Note, that I is similar to the classical form of an autocorrelation coefficient: the numerator term is a measure of covariance among the {xi} and the denominator term describes the signal variance. It is also obvious that Moran’s I is based upon the cross-products of deviations of the xi from the mean value x. It varies in the interval [1, 1], the values close to the margins of this interval indicating the presence of structure in the spatial distribution while random patterns are characterized by values close to zero. In order to use Moran’s I as an indicator of the structural content in 2D or 3D sub-images, we chose to use a weighting scheme inversely proportional to the Euclidean distance d(Æ, Æ) between the currently inspected pixel at the image coordinates ~i ¼ fi1 ; . . . ; in g 2 Nn¼2;3 and its

8

A. Andronache et al. / Medical Image Analysis 12 (2008) 3–15

~ 2 Nn has to neighbors. A maximum interaction distance D be selected according to the minimal size of the structures to be detected in the image. Changing the linearized index notation in ((2)) to image coordinates and denoting ~ around ~i with V~D~ ¼ f~ the vicinity of size D j 2 Nn 8 i j ik  jk j 6 Dk and k ¼ 1; . . . ; ng, then the contiguity matrix can be expressed as ( ~ ~ w~i~j ¼ dð~1i;~jÞ 8~ j 2 V~D i n fig W ¼ ð3Þ 0 otherwise Denoting the intensity value of the image voxel located at the spatial position ~i within an image patch A of size ~ 2 Nn by a~i and the mean value by a, the Moran I N becomes P P aÞ  ða~j   aÞ ~ w j  ða~ ~ ~i2N i  ~ j2V~D ~i~ 1 i I ¼P  ð4Þ P ~w ~ aÞ 2 ~ 0j ~ ða~i   ~i2N j2VD ~ 0

It has been shown in Cliff and Ord (1973) that the standard Z value associated to the Moran’s I coefficient is asymptotically normally distributed as the number of data elements N increases. Therefore, the Student’s t-test on the Z value with the significance level of 95%, corresponding to the interval ±1.96, can be used to identify random, i.e. structureless patterns. Fig. 5 shows a few examples of classifying image regions into consistent and inconsistent categories using Moran’s I coefficient to test the structural content of the image. Fig. 5a depicts an MR image which is partitioned into 32 · 32 sub-images of 16 · 16 pixels each. By using a threshold of 1.96 on the standard Z values associated to Moran’s I, the classification is shown in Fig. 5b,c. The benefits of incorporating the consistency test into the hierarchical splitting scheme can be seen in the example shown in Fig. 2, where the proposed test clearly prevented the two middle patches from being pulled towards structures in the reference CT. It is obvious that the Moran test will become less usable with Ultrasound images, e.g, showing speckles that might mimic structure. However, if the modality of the other registration image pair is better behaved, at least the compu-

tational complexity can be reduced. In the worst case that Moran’s I cannot detect the structureless patches for both involved imaging modalities, the registration method will not fail, but perform at least as well as the original approach proposed by Likar and Pernusˇ (2001). 4. Intensity mapping To further increase the robustness and computational efficiency of the registration algorithm, we propose a local intensity mapping that allows the registration algorithm to switch the similarity measure from MI to CC at higher levels in the hierarchy. The mapping strategy bases on the observation that the performance of a registration algorithm will not increase if one of the images contains more structural detail than the other. On the contrary, details visible in only one of the images can lead to ambiguities by inducing misleading optima in the similarity measure. The performance of the registration procedure thus only depends on those image features which exist simultaneously in both modalities. Therefore, we propose to map the images from their initial modalities onto an intermediate pseudo-modality that will show only the common image features and drop additional details prominent in only one of the modalities. As an example, Fig. 6 depicts two corresponding transversal slices from rigidly registered 3D CT and MR volumes of 512 · 512 · 50 voxels of size 0.39 · 0.39 · 0.6 mm3. Obvious differences can be noticed not only in the intensities of most of the structures but also in visible tissue details. The soft tissue details visible in the MR cannot increase the registration performance when no corresponding structure in the CT is present and should therefore be removed by the mapping. The mapping procedure achieves this goal by decreasing the variance of the intensities of the mapped images. Therefore, the pseudo-modality contrast space is built by assigning the intensities from the image with the least conditional variance at each point in the histogram, as shown in the schematic joint histogram in Fig. 7a. These conditional variances can be easily determined using the joint intensity histogram of the already coarsely registered images. As

Fig. 6. Transversal slices of rigidly registered (a) CT and (b) MR acquisitions of the head.

A. Andronache et al. / Medical Image Analysis 12 (2008) 3–15

9

Fig. 7. Schematic joint histogram showing (a) the mutually exclusive rules of the initialization stage and (b) an example of an ambiguous mapping.

such, the intensities in both images are mapped simultaneously onto this common contrast space, which is not necessarily one of the two source intensities, but rather a combination of them. Although the transformed images may locally resemble one of the modalities, on an overall scale this is not true. The proposed mapping algorithm is demonstrated on a CT/MR image registration but is generally applicable for any combination of modalities. The mapping algorithm is based on the conditional probability distributions of the intensity values in the two modalities (A and B). These distributions are characterized by their first and second order moments estimated from the joint histogram HAB . Accordingly, for each single normalized intensity value a 2 {0,255} in modality A P b2B HAB ða; bÞ  b ð5Þ lA ðaÞ ¼ P b2B HAB ða; bÞ P HAB ða; bÞ  ðb  lA ðaÞÞ2 ð6Þ r2A ðaÞ ¼ b2B P b2B HAB ða; bÞ and likewise for the modality B intensities b 2 {0,255} P a2A HAB ða; bÞ  a ð7Þ lB ðbÞ ¼ P a2A HAB ða; bÞ P HAB ða; bÞ  ða  lB ðbÞÞ2 : ð8Þ r2B ðbÞ ¼ a2A P a2A HAB ða; bÞ The mapping will be basically governed by the mean conditional variance of the competing intensity values. Accordingly, for each modality the intensities will be classified into two multisets (bags), as shown in Fig. 8. The multisets AMap ¼ fða; mAM ðaÞÞg and BMap ¼ fðb; mBM ðbÞÞg will contain those intensities a of A resp. b of B with their corresponding multiplicities mAM ðaÞ or mBM ðbÞ, which will be potentially overwritten during the mapping procedure. APut ¼ fða; mAP ðaÞÞg and BPut ¼ fðb; mBP ðbÞÞg collect intensity values which are the candidates for the intensities forming the resulting pseudo-modality. These multisets are determined by an iterative refinement procedure following an initialization stage. During initialization first the intensity values of modality A are processed. For each ai 2 {0,255}, bk ¼ lai will be investigated as possible target, as schematically shown in Fig. 7. The actual allocation is determined by the following mutually exclusive rules:

Fig. 8. Classification multisets of the mapping procedure showing an ambiguous mapping example.

if rai < rbk ! place ai in AMap

and

bk in BPut

ð9Þ

and

bk in BMap

ð10Þ

and if rai P rbk ! place ai in APut

Similarly for modality B for all bl 2 {0,255}, aj ¼ lbl if rbl < raj ! place bl in BMap

and aj in APut

ð11Þ

and

ð12Þ

and if rbl P raj ! place bl in BPut

aj in AMap

As a result, the multiset AMap contains all the gray values that should be mapped onto the B modality. At the same time, APut contains a mixture of initial gray values which should be retained (conform Eq. (10)), and all the target intensities of the BMap elements (conform to Eq. (11)). It should be noted that the same intensity values can be placed several times in the multisets AMap and APut . The same applies for the BMap and BPut . Intensities that are either only in AMap or APut (respectively, in BMap or BPut ) will be mapped onto the other modality or they will be retained. The corresponding assignment of the gray values in the pseudo-modality is therefore straightforward. All the values in the intersection AMap \ APut (BMap \ BPut ) are, however, ambiguous cases that have to be resolved before final assignment. This situation is shown in Figs. 7b and 8. On this illustrative example the intensity value an should be mapped onto the target bm, while at the same time being target for at least one (but possibly more) different intensities bp1;2;...;k in modality B (note that k ¼ mAP ðan Þ). To resolve this ambiguity we look at the multiplicity mAP ðan Þ of an in the multiset APut and the multiplicity mBP ðbm Þ of bm in BPut . If the multiplicity mAP ðan Þ is bigger than mBP ðbm Þ, then ðan ; mAP ðan ÞÞ is removed from AMap . Otherwise an is mapped onto bm and all the intensities from modality B that had an as a target are diverted to bm instead. The intersection BMap \ BPut is

10

A. Andronache et al. / Medical Image Analysis 12 (2008) 3–15

processed in an identical fashion. Applying these rules iteratively, all ambiguities are resolved until AMap \ APut ¼ / and BMap \ BPut ¼ /. As at every step the cardinality of one intersection is reduced, the convergence of the procedure is guaranteed. It should be noted, however, that the final result may depend on the processing sequence, which has been chosen arbitrarily. 5. The entire non-rigid registration algorithm We incorporated the enhancements discussed above into the 3D extended version of the hierarchical subdivision scheme described in Likar and Pernusˇ (2001). We will refer to our enhanced registration algorithm as HERA (hierarchical enhanced registration algorithm). In contrast to the original version, our hierarchical strategy partitions only the floating image and does not perform any image reconstruction before completing the registration. The local similarity measure for a partitioned sub-image is calculated from its volume of overlap with the entire reference image. Therefore, only the sub-images on the border of the volume are effected by an eventual change in the overlapping volume. According to our experience, in seldom cases this leads to misregistrations that are subsequently corrected in the regularization step. The Moran information consistency test is applied to every sub-image in order to pass only the structured image patches to the succeeding rigid registration stage. For all the sub-images failing this consistency test, the hierarchical splitting is stopped. In order to avoid discontinuities between neighboring leafs, the transformation parameters used for initialization are calculated by linearly interpolating the transformations of the parent leaf and its surrounding neighbors. After locally registering all the structured image patches, the consistency of the identified individual transformations between neighboring sub-images is ensured by a regularization procedure. In contrast to the regularization procedure as originally proposed by Likar and Pernusˇ (2001), we replaced the optimum distinctiveness test with a geometrical test that ensures a physically consistent positioning of each sub-image relative to its neighbors. More precisely, this test verifies whether the center of each sub-image lies inside a region of confidence. This region of confidence is defined as an ellipsoid around the center of mass of all its adjacent sub-images and with its axes equal to half of the sub-image size in each direction. The identified outliers are corrected by assigning them the average transformation of all their surrounding sub-images (see Andronache, 2006 for more details). The stopping criterion checks whether any sub-image successfully passed the Moran test and therefore can be further subdivided. Otherwise, thin-plate splines (TPS) (Bookstein, 1989) are used to calculate the final deformation field by densely interpolating the regularized local transformations of the sub-images over the whole image domain. The set of control points of the TPS is formed by the centers of all the image patches at the last hierarchical level. The registration

algorithm uses MI at the first levels of the hierarchy, where the partitioned sub-images are relatively large and MI is statistically consistent. As the images are coarsely registered, the proposed intensity mapping can be used to robustly transform the multi-modal registration into a mono-modal registration scenario and thus allowing to use CC to perform the local matches for the subsequent hierarchical levels. Based on our experience we switch from MI to CC when the in-plane size of the divided sub-images reaches 64 · 64 pixels (Fig. 14). 6. Results For all the experiments, the registration algorithm was using the following general setup: (a) the 2D histogram was generated using 256 bins (256 · 256 normalized gray values); (b) the weighting factor used to incorporate the prior into the local joint intensity distribution is equal to the ratio between the volume of the current sub-image undergoing the local registration relative to the whole floating image; (c) the Moran test is using a contiguity matrix with a maximum interaction distance of 3 pixels, resulting in a cube of 7 · 7 · 7 for the three-dimensional (3D) case, taking also the voxel anisotropy into account; (d) the threshold for the magnitude of the standard Z value of Moran’s I coefficient used in the information consistency test was set to 1.96. First, we illustrate the ability of our method to correctly perform the non-rigid registration in difficult situations where both small image distortions and big elastic deformations are present in close proximity within an image. The reference image is a 512 · 512 · 46 CT scan of the head with 0.39 · 0.39 · 0.6 mm3 voxel dimension and the floating image is a 512 · 512 · 28 MR scan with voxels of the size 0.5 · 0.5 · 1.0 mm3. Fig. 9 visualizes details in a region where both the MR susceptibility artifacts (e.g. within the left sphenoid sinus) and the tissue deformations (e.g. the left ear) need to be corrected between the two acquisitions. In order to better visualize the performance of our registration algorithm, the outline of the head and of the left sphenoid sinus is extracted from the floating MR volume and overlaid on the reference CT image after applying a simple rigid registration (Fig. 9c,d) and after performing the full non-rigid registration (Fig. 9e,f). The remaining deviations of the two contours are caused by both the spatial constraints imposed by the regularization of the deformation field and the size of the smallest sub-image 16 · 16 · 8 passing Moran’s test at the last hierarchical level. The advantage of integrating the intensity mapping strategy into the hierarchical registration is exemplified on two representative regions of interest which are marked with white squares on Fig. 6. All patches are of size 64 · 64 · 17 voxels, equivalent to the 4th level of the subdivision. The next experiment was performed with an image pair (upper white squares in Fig. 6) containing rich structural details. Fig. 10 shows sections through the original 3D

A. Andronache et al. / Medical Image Analysis 12 (2008) 3–15

11

Cross Corr

Mutual Inf

Fig. 9. Result of a CT-MR cross-registration. (a) Transversal and (b) coronal sections of the region of interest in the initial floating MR volume. (c–f) Corresponding sections in the reference CT volume, overlaid with the contours of the head and of the sphenoid sinus after a global rigid (c,d) and the full hierarchical (e,f) registration. Note: The dashed lines mark the position of the cutting planes.

1 0.5 0

0 5 10 Horizontal displacement

1 0 0 5 10 Horizontal displacement

50

0

50 100 150 CT intensities

100

Mutual Inf

100

0

Final mapping MR MR intensities

MR intensities

Final mapping CT

erally fails to find the correct registration position. Fig. 11c,f shows the comparison between MI and CC responses to horizontal translations. While CC remains robust for this 3D region, too, MI shows highly unreliable behavior. Fig. 11 illustrates the ability of the intensity mapping algorithm to remove the additional details which are present in only one image modality. Fig. 11a,b shows the regions marked by the lower square in Fig. 6. In the lower row, the result of the intensity mapping for the CT (Fig. 11d) and the MR (Fig. 11e) patches are shown, clearly

50 0

0

0.2 0.1 0

50 100 150 CT intensities

0.1 0 0 5 10 Horizontal displacement

Final mapping CT 80 60 40 20 0

Final mapping MR MR intensities

MR intensities

patches and their intensity mapped versions. The behavior of MI (on the original) and CC (on the intensity mapped images) is calculated for horizontal displacements up to ±10 pixels. It can be seen that for regions having sufficient structural information, both similarity measures are sufficiently stable for finding the correct registration position. However, this is not the case for the region marked by the lower squares in Fig. 6. While the corresponding CT patch is almost uniform containing only a little part of the skull (not contained by the slice shown in the image), the MR image shows significant contrast within the covered brain tissue. This is a classical case in which MI gen-

Cross Corr

Fig. 10. (a,b) Initial patches showing rich structural details and (d,e) their intensity mapped versions. (c) The response of MI on the original and (f) CC on the intensity mapped images to horizontal translations. (g,h) The 2D histogram overlaid with the mapping function of the CT and MR intensities, respectively.

0 5 10 Horizontal displacement

0

50 100 150 CT intensities

80 60 40 20 0

0

50 100 150 CT intensities

Fig. 11. (a,b) Patches with major differences of tissue contrast in CT and MR. (d,e) The intensity mapped versions of the images. (c) The response of MI (in the multi-modal space) and (f) the response of CC (in the pseudo-modality space) to horizontal translations. (g,h) The 2D histogram overlaid with the mapping function of the CT and MR intensities, respectively.

12

A. Andronache et al. / Medical Image Analysis 12 (2008) 3–15

demonstrating the desired loss of detail in the MR modality. In order to quantitatively analyze the effects of integrating the Moran test as well as the proposed intensity mapping strategy into the hierarchical non-rigid registration procedure, a multi-modal registration scenario with known deformation fields was used. The selected task consists of recovering the deformation of the liver due to respiration. For this, a pair of T1- and T2-weighted images of the liver were acquired simultaneously during breath hold at inhalation. The two acquisitions can therefore be assumed to be perfectly aligned and having uncorrelated noise. At the same time they represent a good approximation for a multi-modal registration scenario (like CT/MR, e.g.), as the T2 image shows significantly more anatomical details than the T1 (see Fig. 13). These inhalation images were then deformed according to realistic respiratory patterns observed in two individuals using time resolved volumetric MRI (Siebenthal et al., 2007). To give a quantitative example of the applied respiratory motion, we consider a typical deformation between

inhalation

inhalation Registration

Known deformation

deformed

Known deformation

deformed

Fig. 12. The scheme of the validation setup. The initial synchronously acquired T1- and T2-weighted inhalation MR images are deformed using a realistic respiratory deformation D. The original images are then crossregistered and the two deformation fields compared with the ground truth.

inhalation and exhalation. The minimum, average, and maximum voxel displacements in the considered volume are 0.8 mm, 11.1 mm, and 21.6 mm. This motion has a rigid component, which is mainly a translation in superior–inferior direction. After an affine registration the remaining deformation would still have an average and maximum voxel displacement of 4.7 mm and 11.6 mm. From each individual, 5 breathing cycles with different depths and three phases per cycle were selected (mid-exhalation, full exhalation, mid-inhalation), yielding 30 different deformations. Each respiratory deformation was applied to both, the T1 and T2 inhalation images. This resulted in a total number of 60 registration experiments for evaluating the performance of our multi-modal registration algorithm. Fig. 12 shows the cross-modality registration scenario of recovering the known deformation fields D between the original T1 inhalation image and the deformed T2 image or between the T2 inhalation image and the deformed T1 image. Fig. 13 shows transversal and sagittal sections through the original and through an exemplary deformed T2 image, having 256 · 256 · 16 voxels of size 1.4648 · 1.4648 · 8 mm3. The ideal level for switching the goal function from MI to CC depends on two factors. On the one hand, the subdivision levels prior to the switching have to yield a reasonably good registration of the images. Otherwise the joint histogram is not suitable to determine the mapping function. This requires to postpone the switching as far as possible. On the other hand, however, the reliability of MI-based registration decreases as the sample size used to create the histogram is becoming small. The overall registration accuracy can be used to find an optimal compromise between these contradictory requirements. Therefore, we performed registration experiments with the aforementioned liver data, switching the similarity measure at various hierarchical levels. For each experiment, the registration error was estimated as the mean Euclidean

Fig. 13. Transversal and sagittal sections through the MR acquisitions of the liver used in the validation scheme presented in Fig.12. (a,b,d,e) Sections through the original T1- and T2-weighted MR acquisitions of the liver. (c,f) Sections through one of the artificially deformed images (T2-weighted).

A. Andronache et al. / Medical Image Analysis 12 (2008) 3–15

distance between the known and the recovered positions of all the voxels within the floating volume. The registration error as a function of the switching level is plotted in Fig. 14 for one experiment. This graph clearly demonstrates that the best performance of the algorithm was

13

reached when the intensity mapping was performed at the 3rd hierarchical level, corresponding to a partitioned subimage of size 64 · 64 · 8 voxels. However, although an optimum could be found, the registration error proved to be not very sensitive to the switching level. Accordingly,

Fig. 14. The registration error as a function of the level where the intensity mapping was enabled and MI was switched to CC.

In plane error

Execution time 2000

12 10

time [s]

Error [mm]

1500

1000

8 6 4 2

500

0 (1)

(2)

(3)

(4)

(1)

(2)

Method

(3)

(4)

Method

Out of plane error

Overall error 16

10

14 12 Error [mm]

Error [mm]

8

6

4

10 8 6 4

2 2 0

0 (1)

(2) Method

(3)

(4)

(1)

(2)

(3)

(4)

Method

Fig. 15. Boxplots of the execution time (on a standard PC with a Pentium 4 at 3 GHz) and the average registration error over the 60 experiments of all four methods.

14

A. Andronache et al. / Medical Image Analysis 12 (2008) 3–15

Table 1 Average registration error and execution time over all 60 experiments of the different registration methods on a standard PC equipped with a Pentium 4 at 3 GHz Registration method

Estimation direction

Registration error mean ± std

Execution time

[mm]

[voxels]

(1) Likar and Pernusˇ

In plane (xy) Out of plane (z) Overall (xyz)

4.08 ± 4.03 3.77 ± 3.63 4.93 ± 4.53

2.79 ± 2.75 0.47 ± 0.45

33 0 0200

(2) MI and no Moran

In plane (xy) Out of plane (z) Overall (xyz)

4.59 ± 4.65 3.53 ± 3.48 6.17 ± 5.41

3.13 ± 3.17 0.44 ± 0.43

23 0 4400

(3) MI and Moran

In plane (xy) Out of plane (z) Overall (xyz)

1.56 ± 1.15 1.19 ± 0.96 2.15 ± 1.22

1.07 ± 0.78 0.14 ± 0.12

20 0 5700

(4) MI ! CC at 3rd level

In plane (xy) Out of plane (z) Overall (xyz)

1.46 ± 1.11 1.13 ± 0.92 2.03 ± 1.18

1.00 ± 0.75 0.14 ± 0.11

9 0 5000

the resulting deformation field is quite robust against the precise selection of the switching level, which justifies using generic heuristic rules in most of the cases. To quantitatively evaluate the potential benefits of the application of Moran test and the intensity mapping, the aforementioned set of 60 experiments was performed using the following registration methods: (1) A 3D version of the method proposed by Likar and Pernusˇ using MI (4 hierarchical levels). (2) HERA without the Moran test using MI (4 hierarchical levels). (3) HERA with the Moran test using MI (6 hierarchical levels). (4) HERA and switching MI to CC at the 3rd level (6 hierarchical levels). For the first two methods (1) + (2), the maximum hierarchical level was set to 4, corresponding to a minimum sub-image size of 32 · 32 · 4voxels (as proposed by Likar and Pernusˇ). For the registration experiments (3) + (4) the Moran test was used as the stopping criterion. For some sub-images the 6th level could be reached. In experiment (4) the goal function was additionally switched from MI to CC at a sub-image size of 64 · 64 · 8 voxels (corresponding to the 3rd level). The recovered deformation fields were then compared to the known ground truth. Fig. 15 shows for all four registration methods the boxplots of the execution times and of the registration errors. Table 1 summarizes the mean and the standard deviation of the registration error of all the experiments. The higher accuracy in the out of plane direction can not be attributed to the registration method, but originates from the relatively small initial out of plane motion from the realistic motion field. The table clearly shows the benefits of incorporating the Moran test for structural content and the intensity mapping algorithm. By comparing the performance of methods (1) and (2) a drastic reduction

of the computational complexity can be noticed due to the absence of the image reconstruction after every level of the hierarchy. Also, it can be noticed that the absence of the optimum distinctiveness test does not influence the registration accuracy. Whereas, the introduction of the Moran test in (3) and (4) brought about a significant improvement in the registration accuracy. However, the computational time did not decrease a lot in (3) due to the overhead needed to estimate Moran’s autocorrelation coefficient. The integration of the intensity mapping in (4) significantly reduced the computational cost and further improved the registration accuracy. The additional speed up of the algorithm can be attributed to the computationally less expensive CC and the faster convergence of the optimization algorithm due to reduced interpolation artifacts. 7. Conclusions and outlook In this paper we extended and improved the hierarchical non-rigid registration algorithm relying on the maximization of MI, proposed by Likar and Pernusˇ. An information consistency test based on Moran’s spatial autocorrelation coefficient has been developed and used to detect and eliminate the structureless sub-images from the registration chain, thus avoiding registration errors caused by structurally meaningless extrema of the MI. Moran’s test results in a very effective stopping criterion for the hierarchical subdivision process not only allowing to eliminate errors introduced by problematic structureless sub-images, but also resulting in a considerable speed up of the entire algorithm (up to a factor of 5, depending on the image content). Moreover, the intensity mapping strategy presented in this paper allows the registration algorithm to switch the image similarity measure from MI to CC, which proved to be more robust. It enables the combination of both similarity measures for multi-modal registration procedures relying on hierarchical subdivision. At the first levels of

A. Andronache et al. / Medical Image Analysis 12 (2008) 3–15

the hierarchy, where the partitions are still relatively large, MI can be used to coarsely register the corresponding patches. After this stage, the images can be transformed to a pseudo-modality using the presented mapping technique and the similarity measure can be switched to the more robust CC. With the proposed hybrid approach, two important properties of these similarity measures can be seamlessly combined in a unique manner, namely the multi-modal capabilities of MI with the robustness of CC. In addition, the computational complexity of the underlying algorithm decreases further by a factor of 2 compared to a registration performed using only MI. We see several possible directions for future research. Currently, the hierarchical binary splitting is ignoring the image content. An adaptive positioning of the splitting boundaries, respecting the anatomical structures present in the region, would not only further reduce the computational load but also improve image registration quality. This adjustment process could also be supported by using Moran’s spatial autocorrelation coefficient. Other issues that should be further investigated are related to the estimation of the deformation field which could integrate prior knowledge about the physical properties of the tissue (e.g. stiffness). As an example, bony structures could impose rigid transformation constraints to the deformation field. Nevertheless, it has to be noted that such an extension would require tissue segmentation or at least the identification of regions with similar elastic properties. This would compromise some advantages of purely intensity-based registrations. Acknowledgement This work has been supported by the CO-ME/NCCR research network of the Swiss National Science Foundation (http://co-me.ch). References Andronache, A., 2006. Multi-modal non-rigid registration of volumetric medical images, Diss. ETH No. 16601, ETH Zurich, Department of Information Technology and Electrical Engineering, Computer Vision Laboratory, April. Andronache, A., Cattin, P., Sze´kely, G., 2005. Adaptive subdivision for hierarchical non-rigid registration of multi-modal images using mutual information. In: Duncan, J.S., Gerig, G. (Eds.), MICCAI (2), vol. 3750. Springer-Verlag GmbH, pp. 976–983. Andronache, A., Cattin, P., Sze´kely, G., 2006. Local intensity mapping for hierarchical non-rigid registration of multi-modal images using the cross-correlation coefficient. In: Pluim, J.P.W., Likar, B., Gerritsen, F.A. (Eds.), Biomedical image registration, Lecture Notes in Computer Science, vol 4057. Springer, pp. 26–33. Bookstein, F.L., 1989. Principal warps: thin-plate splines and the decomposition of deformations. IEEE Trans. Pattern Anal. Mach. Intell. 11, 567–585. Cliff, A., Ord, J., 1973. In: Chorley, R.J., Harvey, D.W. (Eds.), Spatial Autocorrelation. Pion Limited, London, ISBN 0-85086-036-9. Cover, T., Thomas, J., 1991. Elements of Information Theory. John Wiley & Sons. Guimond, A., Roche, A., Ayache, N., Meunier, J., 2001. Threedimensional multimodal brain warping using the demons algorithm

15

and adaptive intensity corrections. IEEE Trans. Med. Imaging 20 (1), 58–69. Hajnal, J.V., Hill, D.L., Hawkes, D.J., 2001. Medical Image Registration. CRC Press. Hill, D.L.G., Batchelor, P.G., Holden, M., Hawkes, D.J., 2001. Medical image registration. Phys. Med. Biol., 46. Ji, J.X., Pan, H., Liang, Z.-P., 2003. Further analysis of interpolation effects in mutual information-based image registration. IEEE Trans. Med. Imaging 22 (9), 1131–1140. Lee, J., Wong, D.W., 2001. Statistical Analysiswith ArcView GIS. John Wiley & Sons, Inc., ISBN 0-471-34874-0. Likar, B., Pernusˇ, F., 2001. A hierarchical approach to elastic registration based on mutual information. Image Vision Comput. 19, 33–44. Maes, F., Collignon, A., Vandermeulen, D., Marchal, G., Suetens, P. (1996). Multi-modality image registration by maximization of mutual information. In: Proceedings of IEEE Workshop Mathematical Methods in Biomedical Image Analysis, pp. 14–22. Maes, F., Collignon, A., Vandermeulen, D., Marchal, G., Suetens, P., 1997. Multimodality image registration by maximization of mutual information. IEEE Trans. Med. Imaging 16 (2), 187–198. Maes, F., Vandermeulen, D., Suetens, P., 2003. Medical image registration using mutual information. Proc IEEE 91 (10), 1699–1722. Maintz, J., Viergever, M., 1998. A survey of medical image registration. Med. Image Anal. 2 (1), 1–36. Maintz, J., Meijering, E., Viergever, M., 1998. General multimodal elastic registration based on mutual information. In: Kenneth M. Hanson (Ed.), Medical Imaging 1998: Image Processing. Proc. SPIE, vol. 3338, 1998, pp. 144–154. Nikou, C., F. Heitz, Armspach, J., Namer, I. 1998. Single and multimodal subvoxel registration of dissimilar medical images using robust similarity measures. In: SPIE Conference on Medical Imaging. Pluim, J.P.W., Maintz, J.A., Viergever, M.A., 1999. Mutual information matching and interpolation artefacts. Proc SPIE Med. Imaging 3661, 56–65. Pluim, J.P.W., Maintz, J.A., Viergever, M.A., 2003. Mutual-informationbased registration of medical images: a survey. IEEE Trans. Med. Imaging 22, 986–1004. Roche, A., Malandain, G., Pennec, X., Ayache, N., 1998. The correlation ratio as a new similarity measure for multimodal image registration. In: Wells, W.M., Colchester, A., Delp, S., (Eds.), Proceedings of First International Conference on Medical Image Computing and Computer-Assisted Intervention, MICCAI’98, LNCS, vol. 1496. Springer Verlag, Cambridge, USA, pp. 1115–1124. Shannon, C.E., 1948. A Mathematical Theory of Communication, Technical Report, Bell Laboratories. Siebenthal, M.v., Szekely, G., Gamper, U., Boesiger, P., Lomax, A., Cattin, P., 2007. 4D MR imaging of respiratory organ motion and its variability. Phys. Med. Biol. 52, 1547–1564. Studholme, C., Hill, D., Hawkes, D., 1999. An overlap invariant entropy measure of 3d medical image alignment. Pattern Recognit. 32, 71–86. Thirion, J., 1996. Non-rigid matching using demons. In: International Conference on Computer Vision and Pattern Recognition, pp. 245– 251. Tsao, J., 2003. Interpolation artifacts in multimodality image registration based on maximization of mutual information. IEEE Trans. Med. Imaging 22 (7), 854–864. van den Elsen, P.A., Pol, E.-J.D., Sumanaweera, T.S., Hemler, P.F., Napel, S., Adler, J.R., 1994. Grey value correlation techniques for automatic matching of CT and MR brain and spine images. Visual. Biomed. Comp. 2359, 227–237. Viola, P., Wells, I.W.M., 1995. Alignment by maximization of mutual information. In: ICCV ’95: Proceedings of the Fifth International Conference on Computer Vision. IEEE Computer Society, p. 16. ISBN 0-8186-7042-8. Woods, R.P., Mazziotta, J.C., Cherry, S.R., 1993. MRI-PET registration with automated algorithm. Comput. Assist. Tomogr. 17 (4), 536–546.