Medical Image Analysis 10 (2006) 41–58 www.elsevier.com/locate/media
Localization of anatomical point landmarks in 3D medical images by fitting 3D parametric intensity models Stefan Wo¨rz *, Karl Rohr University of Heidelberg and DKFZ Heidelberg, Department of Intelligent Bioinformatics Systems, Biomedical Computer Vision Group, Im Neuenheimer Feld 580, 69120 Heidelberg, Germany Received 6 May 2004; received in revised form 24 November 2004; accepted 28 February 2005 Available online 13 May 2005
Abstract We introduce a new approach for the localization of 3D anatomical point landmarks. This approach is based on 3D parametric intensity models which are directly fitted to 3D images. To efficiently model tip-like, saddle-like, and sphere-like anatomical structures we introduce analytic intensity models based on the Gaussian error function in conjunction with 3D rigid transformations as well as deformations. To select a suitable size of the region-of-interest (ROI) where model fitting is performed, we also propose a new scheme for automatic selection of an optimal 3D ROI size based on the dominant gradient direction. In addition, to achieve a higher level of automation we present an algorithm for automatic initialization of the model parameters. Our approach has been successfully applied to accurately localize anatomical landmarks in 3D synthetic data as well as 3D MR and 3D CT image data. We have also compared the experimental results with the results of a previously proposed 3D differential approach. It turns out that the new approach significantly improves the localization accuracy. 2005 Elsevier B.V. All rights reserved. Keywords: 3D anatomical point landmarks; 3D parametric intensity models; Model fitting; Subvoxel localization; Automatic 3D ROI size selection; Dominant gradient direction; Automatic model parameter initialization
1. Introduction The localization of 3D anatomical point landmarks is an important task in medical image analysis. Landmarks are useful image features in a variety of applications, for example, for the registration of 3D brain images of different modalities or the registration of images with digital atlases. Examples of 3D anatomical landmarks of the human head are shown in Fig. 1. The current standard procedure is to localize 3D anatomical point landmarks manually. However, this procedure is difficult, time consuming, and error-prone (e.g., Evans et al., 1991; Hill et al., 1991). To improve the current situation it is therefore important to develop automated methods. *
Corresponding author. E-mail addresses:
[email protected] (S. Wo¨rz),
[email protected] (K. Rohr). 1361-8415/$ - see front matter 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.media.2005.02.003
The extraction of point landmarks generally comprises two steps. First, the landmark has to be detected, i.e. it has to be determined whether there is a landmark or not. Second, the landmark has to be localized, i.e. the position has to be estimated. Actually, these two steps are coupled since detecting a landmark also implies that a (probably coarse) position is determined. In previous work on the detection of 3D anatomical point landmarks, 3D differential operators have been proposed (e.g., Thirion, 1996; Rohr, 1997). Note that by applying differential operators a position of the landmark is also determined. In addition, approaches have been proposed (e.g., Frantz et al., 1998) for refining the estimated position from 3D differential operators, and thus allow improved localization of the landmark. In (Frantz et al., 1998), a three-step differential approach has been used for refining the landmark position yielding subvoxel positions. For related approaches on the refined localization
42
S. Wo¨rz, K. Rohr / Medical Image Analysis 10 (2006) 41–58
Fig. 1. Ventricular system of the human brain (left, from (Sobotta, 1988)) and human skull (right, from (Bertolini and Leutert, 1982)). Examples of 3D point landmarks are indicated by black dots.
of edge detectors we refer to Verbeek and Vliet (1994). Recently, an evaluation study focusing on the detection performance of nine different 3D differential operators has been performed by Hartkens et al. (2002). In a validation study, Frantz et al. (1999b) found that the elapsed time spent for 3D landmark extraction can significantly be reduced compared to a manual procedure by using a semi-automatic procedure based on 3D differential operators (Rohr, 1997). 2D differential approaches for extracting point landmarks in 2D medical images have been described, for example, in (Briquer et al., 1993) and (Scott et al., 2003). For other approaches used to extract point landmarks in 2D images, see (Walker et al., 1998) as well as (Likar and Pernusˇ, 1999). While being computationally efficient, differential operators incorporate only small local neighborhoods of an image and are therefore relatively sensitive to noise, which leads to false detections and also affects the localization accuracy. Recently, an approach based on deformable models was introduced for 3D landmark localization (Frantz et al., 2000; Alker et al., 2001). In this approach tip-like and saddle-like anatomical structures are modeled by surface models using quadric surfaces, i.e. ellipsoids with global deformations and hyperboloids of one sheet, respectively. The surface models are fitted to the image data using an edge-based fitting measure. However, the approach requires the detection of 3D image edges as well as the formulation of a relatively complicated fitting measure, which involves the image gradient as well as first-order partial derivatives of the surface model. In this contribution, we describe a new approach for the localization of 3D anatomical point landmarks. In contrast to previous approaches the central idea is to use 3D parametric intensity models of anatomical structures. In comparison to differential approaches, larger image regions and thus semi-global image information is taken into account. In comparison to approaches based on surface models, we directly exploit the intensity information of anatomical structures. Therefore, more a priori knowledge and much more image information is taken into account in our approach to improve the
robustness against noise and to increase the localization accuracy. In addition, a much simpler fitting measure can be used that does not include the image gradient or derivatives of the model. As important classes of 3D anatomical point landmarks, we consider here tiplike, saddle-like, and sphere-like structures. For these different classes of structures, we have developed different models representing the intensity variations in 3D tomographic images. 3D parametric intensity models can be classified as parametric deformable models based on analytic models as well as volumetric deformable models where model fitting is formulated as an optimization problem (e.g., Jain et al., 1996; McInerney and Terzopoulos, 1996). In comparison to general deformable models, for example, energy-minimizing deformable models (snakes, e.g., McInerney and Terzopoulos, 1996), a main characteristic is that our models exhibit a prominent point which uniquely defines the position of the landmark. In addition, in comparison to snakes, active shape models (ASM, e.g., Cootes et al., 1995), and active appearance models (AAM, e.g., Cootes et al., 2001), our models do not require any discrete points for representing the contour (shape) of the model. Related fitting approaches have so far only been proposed for localizing 2D intensity features such as 2D edges, corners, and circular structures (e.g., Nalwa and Binford, 1986; Rohr, 1992; Drewniok and Rohr, 1997). Model-based approaches in general and particularly those for landmark localization require that a suitable size of the region-of-interest (ROI) is chosen. Obviously, the ROI should be large enough to capture enough image information to guarantee a successful localization of the point landmark. On the other hand, if the ROI chosen is too large, it might contain neighboring structures that negatively influence the localization accuracy. Moreover, for the fitting process we also need starting values for the model parameters including a coarse estimate of the landmark position, i.e. our approach addresses the second step in landmark extraction as mentioned above (note that the surface model approach mentioned above also requires a similar parameter
S. Wo¨rz, K. Rohr / Medical Image Analysis 10 (2006) 41–58
initialization). An improper initialization can lead to inaccurate localizations or false fitting results. Often, the ROI size as well as the model parameters are initialized manually. In order to achieve a higher level of automation, it is desirable to automatically initialize the ROI size as well as all model parameters. Work on automatic 3D ROI size selection for landmark localization is rare. Frantz et al. (1999a) presented an approach based on the statistical uncertainty of a differential edge intersection approach. This approach is based on a model of an ideally sharp tip-like landmark. However, the underlying geometric assumption is a tetrahedron, which is an improper approximation of tip-like structures that are typically rounded. This approach has been applied to determine the ROI size in 3D tomographical images, both for relatively sharp and rounded structures. However, this 3D approach has only been applied to select the ROI size for application of a 3D differential operator, not in conjunction with a deformable model. 2D approaches for ROI size selection have been described by Kanade and Okutomi (1994) as well as Lindeberg (1994). We have developed a new approach for the selection of an optimal 3D ROI size for effective fitting of a deformable model. This approach exploits the dominant direction of the image gradient in the neighborhood of the landmark position. In comparison to the statistical uncertainty approach (Frantz et al., 1999a), the new approach (1) can cope with rounded tip-like structures of ellipsoidal shape, (2) can detect more effectively and accurately the optimal ROI size in the presence of typical neighboring structures (e.g., sulci close to the tips of the ventricular system), (3) is more robust against image noise as well as variations of the initial position, and (4) is much simpler and computationally less expensive. In addition, we developed an algorithm to automatically initialize the model parameters. We use differential properties (e.g., gradient and curvature) as well as other properties of the landmark structure in the image. This paper is organized as follows. First, we introduce our 3D parametric intensity models (Section 2) and then we describe the model fitting process (Section 3). In Section 4 we describe a scheme for automatic estimation of an optimal ROI size as well as algorithms for automatic initialization of the model parameters. Section 5 presents experimental results concerning automatic ROI size selection. Finally, experimental results of applying our new model fitting approach to 3D synthetic data and 3D tomographic images of the human head are reported in Section 6.
2. Parametric intensity models for anatomical landmarks
data. These models describe the image intensities of anatomical landmarks in a semi-global region as an analytic function of a certain number of parameters. A main characteristic in comparison to general deformable models is that they exhibit a prominent point which uniquely defines the position of the landmark. By fitting the parametric intensity models to the image intensities we obtain a subvoxel estimate of the landmark position as well as estimates of the other parameters, e.g., the image contrast. As important classes of 3D anatomical point landmarks we consider tip-like, saddle-like, and sphere-like structures. 2.1. Tip-like structures Tip-like structures can be found, for example, within the human head at the ventricular system (e.g., the tips of the frontal, occipital, or temporal horns, see Fig. 1) and at the skull (e.g., the tip of the external occipital protuberance). The shape of these anatomical structures is approximately ellipsoidal. Therefore, to model them we use a (half-) ellipsoid defined by three semi-axes (rx, ry, rz) and the intensity levels a0 (outside) and a1 (inside). We also introduce Gaussian smoothing specified by a parameter r to incorporate image blurring effects. The exact model of a Gaussian smoothed ellipsoid cannot be expressed in analytic form and is thus computationally expensive. To efficiently represent the resulting 3D intensity structure we developed an analytic model as an approximation. This model is based on the GaussRx 2 ian error function UðxÞ ¼ 1 ð2pÞ1=2 en =2 dn and can be written as sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi!! p ffiffiffiffiffiffiffiffiffiffiffi 3 r r r x2 y 2 ðz þ rz Þ2 x y z gEll. ðxÞ ¼ a0 þ ða1 a0 ÞU þ þ 1 ; r2x r2y r2z r ð1Þ
where x = (x, y, z)T. We define the tip of the ellipsoid w.r.t. the positive semi-axis rz as the position of the landmark, which is also the center of the local coordinate system (see Fig. 2 for a sketch). In addition, we include a 3D rigid transform R with rotation parameters (a, b, c) and translation parameters (x0, y0, z0). The translation parameters define the position of the landmark in the 3D image. Moreover, we extend our model to a more general class of tip-like structures by applying a tapering deformation T with the parameters qx and qy, and a bending deformation B with the parameters d (strength) and m (direction), which are defined by (e.g., Solina and Bajcsy, 1990) 0 1 xð1 þ zqx =rz Þ B C TðxÞ ¼ @ yð1 þ zqy =rz Þ A ð2Þ z
Our approach uses 3D parametric intensity models that are fitted directly to the intensities of the image
43
and
44
S. Wo¨rz, K. Rohr / Medical Image Analysis 10 (2006) 41–58
a ventricular horn in a 3D MR image are shown in comparison to images generated by the ellipsoidal model. It can be seen that our model represents the depicted intensity structure fairly well. 2.2. Saddle-like structures
Fig. 2. Sketch of a (half-) ellipsoid with marked landmark position at the tip, local coordinate system, and semi-axes (left) as well as surface normal, principal directions, and principal curvatures at the tip (right).
0
x z2 d cos m
1
B C BðxÞ ¼ @ y z2 d sin m A.
ð3Þ
z This results in our parametric intensity model with a total of 16 parameters: gM;Ellipsoid ðx; pEll. Þ ¼ gEll. ðTðBðRðxÞÞÞÞ;
ð4Þ
pEll. ¼ ðrx ; ry ; rz ; a0 ; a1 ; r; qx ; qy ; d; m; a; b; c; x0 ; y 0 ; z0 Þ. ð5Þ Fig. 3 shows 2D sections of synthetic data generated by the ellipsoidal model with different settings of the deformation parameters. In Fig. 4, orthogonal 2D sections of
Saddle-like structures can be found, for example, within the human head at the zygomatic and nasal bone (see Fig. 1). These structures can be modeled by a bended ellipsoid where the bending is symmetric w.r.t. the center of the ellipsoid. Therefore, we modify (1) such that the center of the local coordinate system is localized at the tip of the ellipsoid w.r.t. the positive semi-axis rx. By restricting the direction of the bending deformation B towards the x-axis, i.e. setting m = 0 in (3), we achieve a saddle-like structure where the curvature of the bending is maximal at the center of the local coordinate system. This defines the position of the landmark. Besides the bending deformation we also apply a 3D rigid transform R. Here, we do not use a tapering deformation. Applying the transformations, we obtain the parametric intensity model (see Fig. 5 for 2D sections as well as a 3D contour plot) with a total of 13 parameters: pSaddle ¼ ðrx ; ry ; rz ; a0 ; a1 ; r; d; a; b; c; x0 ; y 0 ; z0 Þ.
ð6Þ
Fig. 3. 2D sections (32 · 32 voxels) of the ellipsoidal model without deformations, with bending (d = 0.02), with tapering (qy = 0.2), and with both types of deformations (from left to right). The position of the landmark is highlighted.
Fig. 4. Three orthogonal sections (21 · 21 voxels) of the left frontal horn in a 3D MR image (top) and of the ellipsoidal intensity model (bottom). The circles indicate the spherical ROI and the crosses mark the position of the landmark.
S. Wo¨rz, K. Rohr / Medical Image Analysis 10 (2006) 41–58
45
Fig. 5. Three orthogonal 2D sections (19 · 19 voxels) generated by the saddle model and a 3D contour plot of the model (from left to right). The position of the landmark is highlighted.
2.3. Sphere-like structures Sphere-like structures are, for example, human eyes. These structures can be modeled by a sphere with radius R. Fortunately, the exact model of a Gaussian smoothed sphere can be expressed in analytic form (see Kessler et al., 1984) and is given by gSphere ðxÞ ¼ Ur ðR rÞ Ur ðR rÞ r2 ðGr ðR rÞ Gr ðR þ rÞÞ; ð7Þ r pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi where ¼ x2 þ y 2 þ z2 ; Ur ðxÞ ¼ Uðx=rÞ; and Gr ðxÞ ¼ pffiffiffiffiffiffi r1 2 2 ð 2prÞ ex =2r . We define the center of the sphere as the position of the landmark. In addition, we include the intensity levels a0 and a1 as well as a 3D translation t = (x0, y0, z0)T. This results in the parametric intensity model with a total of seven parameters: gM;Sphere ðx; pSphere Þ ¼ a0 þ ða1 a0 ÞgSphere ðx tÞ;
ð8Þ
pSphere ¼ ðR; a0 ; a1 ; r; x0 ; y 0 ; z0 Þ.
ð9Þ
3. Model fitting approach 3.1. Fitting scheme Estimates of the model parameters in (5), (6), and (9) are found by a least-squares fit of the model gM(x, p) to the image intensities g(x) within semi-global ROIs, thus minimizing the objective function X ðgM ðx; pÞ gðxÞÞ2 . ð10Þ
models, the partial derivatives can be found in Appendix A. Note that we do not need to compute the image gradient as is the case with surface models. We need first-order derivatives of the intensity model only for the minimization process, whereas the surface model approach requires second-order derivatives for the minimization. The termination criterion we used for the iterative fitting is the relative difference between the successive values of the objective function utilizing a threshold value of 0.0001. 3.2. Means to improve stability During model fitting, which is an iterative process, it may happen that the minimizer yields an invalid value for a certain parameter, e.g., a negative value for the smoothing parameter r or a semi-axis. We developed two strategies to cope with this problem. The first strategy continues the minimization with the last valid parameter vector where the problematic parameter is not allowed to vary for a few iterations. Normally, after a few iterations, it is safe to activate this parameter again as the overall parameter vector has changed. Rarely, e.g. when using synthetic data, does the first strategy not solve the problem. In this case, the second strategy is applied. With this strategy the last valid parameter vector is modified by slightly changing the value of the problematic parameter towards the invalid value. In our implementation, we use the two strategies alternatingly, always starting with the preferable first strategy (which does not change the parameter values). 3.3. Scale space
x2ROI
The fitting measure does not include any derivatives. This is in contrast to previous fitting measures for surface models used for landmark localization that incorporate the image gradient as well as first-order partial derivatives of the model (e.g., Frantz et al., 2000). For the minimization we apply the Levenberg–Marquardt method (e.g., Press et al., 1993), incorporating first-order partial derivatives of the intensity model w.r.t. the model parameters. The partial derivatives can be derived analytically using the generalized chain rule (e.g., Bronstein and Semendjajew, 1981). For our
To improve the robustness as well as the accuracy of model fitting, we can apply model fitting to different scales j of the 3D image. The scales are generated by Gaussian smoothing of the image using standard deviations sj (see Witkin, 1983). The largest standard deviation of the Gaussian filter, which results in the coarsest scale, depends on the size of the anatomical structure at hand, e.g., for very small or very thin structures only a standard deviation of 0.5 voxels might be advisable whereas for larger structures values of 1 or 2 voxels can be used. Model fitting is first applied to the
46
S. Wo¨rz, K. Rohr / Medical Image Analysis 10 (2006) 41–58
coarsest scale, then to intermediate scales, and finally to the finest scale, i.e. the original unsmoothed image. For each scale j, the model parameters are initialized with the estimated parameters of the previous scale j 1, except for the smoothing parameter r of the model. The initial value rj,init for the smoothing parameter in scale ^j1 of the previous scale j 1, j is the estimated value r adapted by the standard deviations sj1 of the previous scale and sj of the current scale: qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ^2j1 s2j1 þ s2j . ð11Þ rj;init ¼ r
adequate rotation and interchange the semi-axes to reverse the changes.
Using smoothed images at the beginning of the model fitting process, when the initial parameters are typically poor, helps to improve robustness, as the influence of the noise is reduced. In addition, it is less likely that the fitting process gets stuck in a local minimum.
4.1. ROI size selection
3.4. Fitting phases Particularly the ellipsoidal model has a relatively large number of parameters, in this case 16 parameters. To further improve the robustness of model fitting, we separated the model fitting process into three different phases. In the first phase, only a subset of the model parameters are allowed to vary in the minimization process (parameters for semi-axes, rotation, and translation). In the second phase, the parameters for the intensity levels and the smoothing are allowed to vary additionally. Finally, the bending and tapering parameters are included in the third phase. Note, that for the saddle model and the spherical model we have fewer parameters than for the ellipsoidal model (13 and 7, respectively). We found that for these two models different phases are not necessary. 3.5. Correction of interchanged axes In the case of the ellipsoidal model, the model occasionally interchanges the z-axis with the x- or y-axis in the first phase, mainly when the model is poorly initialized. This is effectively a rotation of 90 around the center of the ellipsoid with a simultaneous interchange of the sizes of the semi-axes. In very few cases the model changes the orientation of its z-axis, which is a rotation of 180 around the center of the ellipsoid. In principle, these changes do not affect the result of model fitting as an ellipsoid is invariant under a renaming of its axes. However, the resulting estimate of the landmark position is completely wrong as the landmark position is defined as the positive tip w.r.t. the semi-axis rz. Fortunately, it is easy to detect and reverse these changes automatically: if the positive tip w.r.t. the semi-axis rz is not the closest tip to the initial landmark position and, in case of a rotation of 90, the semi-axis rz is smaller than the respective semi-axis, then we apply an
4. Automatic initialization of the model parameters In this section, we describe a new scheme for automatic selection of an optimal ROI size as well as algorithms for automatic initialization of the model parameters of the ellipsoidal model.
From initial experiments it turned out that the used ROI size for model fitting has a major influence on the success and the accuracy of landmark localization. If the ROI is too small then we do not incorporate enough image information into the model fitting process to guarantee a successful fitting. On the other hand, if the ROI is too large we might include neighboring structures that negatively influence the estimated parameters. In addition, with an increasing ROI it becomes more likely that our intensity model does not well describe the anatomical structure at hand since a larger part of the structure has to be covered. As a consequence, the size of the ROI should be carefully chosen for each landmark in order to improve the results. In our previous work (Wo¨rz and Rohr, 2003) we used a brute-force approach to determine an optimal ROI size by applying model fitting to a range of different ROI sizes (e.g., radii from 5 to 20 voxels in steps of 1 voxel). For each ROI size we fit the model a larger number of times (e.g. 20 times) by using different initial values for the model parameters (randomly generated within certain ranges). As criteria for choosing the optimal ROI size we used the robustness of the estimated landmark position w.r.t. the different initial parameter sets measured by the product of the variances of the components of the 3D position. In most cases this approach led to a good choice of the ROI size. However, the approach is computationally expensive as model fitting has to be applied several hundred times, for example, 320 times for a range of 16 different ROI sizes. Here, we present a different and much more efficient approach. It also turns out that this approach is more accurate. To select an optimal ROI size we determine here the dominant gradient dr of a spherical 3D ROI of radius r. The dominant gradient is obtained by computing the sum of the image gradients $g over all voxels within the ROI, i.e. X dr ¼ rgðxÞ. ð12Þ x2ROI
Here, we compute the image gradient using 3D Gaussian derivative filters. The value of the standard deviation rf of the filters is chosen based on the size of the anatom-
S. Wo¨rz, K. Rohr / Medical Image Analysis 10 (2006) 41–58
47
Fig. 6. Sketch of an (half-) ellipse with image gradients at several positions and the resulting dominant gradient direction highlighted (left) as well as inclusion of a neighboring structure (hatched ellipse) and indicated two different ROIs (right).
ical structure (typically, we used values of rf = 1.0 and rf = 1.5 voxels, see Section 5.1 and 6.2). The main contribution to dr typically results from the boundary of the anatomical structure itself and potential neighboring structures whereas homogenous regions have only a small influence. The reason is that the magnitude of the image gradient in homogenous regions is relatively small. In addition, image noise also has only a relatively small influence. Since we compute the sum, even a small ROI with a radius of, for example, 5 voxels already contains more than 500 voxels in 3D. Therefore, assuming the image noise is stochastically independent, the contribution of the noise to the dominant gradient is cancelled out in different parts of the ROI. For an isolated rounded tip-like structure similar to an (half-) ellipsoid, summing up the image gradients results in a dominant gradient pointing along the center line of the ellipsoid (see Fig. 6 for a 2D sketch). The reason is that the components of the image gradients pointing perpendicular to the center line compensate each other over the boundary of the ellipsoid, while only the components pointing along the center line contribute to the dominant gradient. When a neighboring structure is additionally captured within the ROI, then the direction of the dominant gradient is generally changed. In our approach we exploit this observation. Note that this approach works not only for ellipsoids but also for other structures, for example, for tetrahedra. In order to select an optimal ROI size, we compare for increasing radii r (e.g., in a range from rmin = 5 to rmax = 20 voxels) the direction of the dominant gradient dr1 of a spherical ROI with the direction of the dominant gradient of an adjacent spherical shell (thickness of 1 voxel). The dominant gradient of the spherical shell is given by dr dr1. As a measure for the change of the direction, we compute the angle dr between both dominant gradients dr1 and dr dr1, which is given by dr1 dr dr1 ; dr ¼ arccos ; ð13Þ kdr1 k kdr dr1 k where ÆÆ, Ææ denotes the inner product. The optimal ROI size is determined by the largest ROI before a significant
increase of dr occurs. A significant increase is detected when dr exceeds a threshold Td (in our case we chose Td = p/8 = 22.5) or when dr has a maximal increase before a local maximum of Td/2 6 dlmax < Td. The threshold Td is adapted to the image noise as well as minor distortions in the image close to the landmark position by adding the value of drmin =2 to T d (where drmin is the value for the minimal radius rmin). Fig. 7 shows three typical plots of dr for synthetic images.
4.2. Parameter initialization of the ellipsoidal model For automatic initialization of the 16 parameters pEll. of the ellipsoidal intensity model (see Section 2.1) we propose the following strategy. Given a coarse position of the landmark, we first determine the optimal ROI size automatically (see Section 4.1). For landmark detection within this spherical ROI, the 3D differential operator Op4 = det Cg is applied (Rohr, 1997), where Cg is the structure tensor (averaged dyadic product of the image gradient). To refine this position we apply the three-step procedure in (Frantz et al., 1998), which results in a subvoxel position. Note that this refined operator Op4 comprises a detection and a localization step and is thus optimized for landmark localization. The position closest to the coarse position with a positive Gaussian curvature K (for an isointensity surface of ellipsoidal shape) is then used as the initial position (x0, y0, z0) of the model. In the rare case where no such position is found, we apply the same approach using a different 3D differential operator Op3 = det Cg/trace Cg. In the evaluation study of Hartkens et al. (2002) these two differential operators Op4 and Op3 yielded the best detection performance. To determine the initial orientation (a, b, c) as well as the size of the three semi-axes (rx, ry, rz), the image gradient $g and the principal curvatures j1 and j2 of the local isointensity surface at this point (see Fig. 2) are exploited. Here, we face the problem that we have three semi-axes but only two relations between the principal curvatures and the semi-axes:
48
S. Wo¨rz, K. Rohr / Medical Image Analysis 10 (2006) 41–58 60
30
30
50
25
25
40
20
20
30
15
15
20
10
10
10
5
5
r 5
10
15
20
r 5
10
15
20
r 5
10
15
20
Fig. 7. Three typical plots of the angle dr (in degrees) for a range of radii r (in voxels) for synthetic 3D images containing an ellipsoid with a neighboring sphere at a distance of 10 voxels and additional Gaussian noise.
j1 ¼
rz r2x
and
j2 ¼
rz : r2y
ð14Þ
In our case, we initialize rz with the radius of the ROI, which is a reasonable assumption (note that the tip of the ellipsoid w.r.t. the positive semi-axis rz is the landmark position). In cases where both the ROI size and one of the principal curvatures are relatively small, it can happen that rz is not much larger (or even smaller) than one of the other semi-axes, i.e. the ellipsoid has no pronounced tip in z-direction, see Fig. 2. In these cases rz is enlarged such that it is 50% larger than the respective other semi-axis, i.e. setting rz = 1.52/ min(j1, j2). The correctness of this equation can be verified by inserting rz in (14) and solving, for example, for rx in case j1 < j2. The initial values for the intensity levels of the surrounding tissue a0 and the anatomical structure a1 are determined using the minimal and maximal intensities within the ROI (after median filtering). To decide whether the landmark is a bright or dark tip we check the orientation of the image gradient w.r.t. the surface normal of the local isointensity surface. The initial value r for the image smoothing parameter is always set to 1 voxel. The bending parameter d is set to 105 voxels1, which has a negligible bending effect but is necessary to avoid a singular matrix in the minimization (for d = 0 the derivative ogM/om is zero, see (3)). The remaining model parameters qx, qy, and m for the global deformations are always set to zero, thus the model is always initialized as an (undeformed) ellipsoid.
5. Experimental results: automatic ROI size selection We have validated our approach for selecting an optimal ROI size (Section 4.1) given 3D synthetic data as well as 3D MR images of the human head. 5.1. 3D synthetic data We generated in total 5600 different 3D synthetic images. Each image contains a tip-like structure of varying 3D orientation and size (generated by our ellipsoidal model) and a neighboring structure at a varying distance and varying orientation (either a smoothed thin ellipsoid or sphere, see Fig. 8 for a 2D sketch). We also added Gaussian noise to these images. Neighboring structures have been simulated at 10 different distances (5–14 voxels) and, in the case of a thin ellipsoid, at different angles a of 0, 45, and 90. The image contrast for both the ellipsoid and the neighboring structure was set to 100 grey levels and for the noise we chose a standard deviation of rn = 3 grey levels. For example, Fig. 8 (right) shows a 2D section of a 3D synthetic image used in the experiments. Since the chosen distances are between 5 and 14 voxels, the optimal ROI radii are also between 5 and 14 voxels. We used a minimal radius of the ROI of 5 voxels to guarantee that the landmark is sufficiently captured by the ROI. For computing the image gradients we used 3D Gaussian derivative filters with a standard deviation of rf = 1 voxel in these experiments.
Fig. 8. Sketch of differently oriented and sized (half-) ellipses with a hatched neighboring circle (left) and ellipse (center) at a distance d and angle a to the origin. The optimal ROI size is highlighted. The right image shows a 2D section of 3D synthetic data of an ellipsoid with a neighboring ellipsoid (d = 11 voxels and a = 45). The position of the landmark as well as the automatically chosen ROI are highlighted.
S. Wo¨rz, K. Rohr / Medical Image Analysis 10 (2006) 41–58
49
Table 1 Results of estimating an optimal ROI size in synthetic images for the statistical uncertainty approach and the new approach for different neighboring structures Distances: 5–14 voxels
Accuracy (voxel)
Fraction of correct estimates (%) Sphere
Ellipsoid 0
Mean values 45
90
Statistical uncertainty Approach
±1 ±2
25.5 48.6
15.8 16.0
1.6 5.3
0.1 17.9
10.8 22.0
Dominant gradient Approach
±1 ±2
84.7 96.3
83.1 91.7
84.2 95.1
86.3 97.8
84.6 95.2
The fractions of correct estimates within 1 and 2 voxels accuracy are given.
From the experiments we found that the new approach is quite robust against noise and successfully estimates the optimal ROI size. For 84.6% of the experiments, the correct ROI size was detected within 1 voxel accuracy. For a comparison we also applied the statistical uncertainty approach (Frantz et al., 1999a), which yielded only 10.8% (see Table 1). Using a tolerance range of two voxels we obtain 22.0% for the statistical uncertainty approach and 95.2% for the new approach. As an example, in the synthetic image shown in Fig. 8 (right) the distance of the neighboring ellipsoid to the landmark position is 11 voxels. In this case, the new approach selected a radius of the ROI of 12 voxels. For larger distances of the neighboring structure, we found that the dominant gradient approach has the tendency to somewhat underestimate the optimal ROI size. Still, for distances of 15–18 voxels, the correct ROI size could be found within 2 voxels accuracy for 63.1% of the images. In comparison, the statistical uncertainty approach yielded 35.5%. Note that for the statistical uncertainty approach we set the threshold tv = 0. In (Frantz et al., 1999a) the threshold was set to tv = 0.5, which gave even worse results in our experiments. Note also that in earlier work the statistical uncertainty approach had not been evaluated using synthetic images. Based on these experiments we can conclude that the statistical uncertainty approach is, in general, not suited for rounded tip-like structures. In particular, when there exists a neighboring ellipsoidal structure the approach performs relatively poorly. For a neighboring sphere the approach is much better, detecting the neighboring structure in about 50% of the cases with a tolerance range of 2 voxels. However, this is still unsatisfactory and much worse compared to the new approach. 5.2. 3D medical images In addition, we applied our approach to real 3D MR images of the human head. Also here it turned out that we obtain quite good results for the optimal size of the ROI. For example, Fig. 9 shows two slices of the right occipital horn in a 3D MR image (C06). In these slices, the selected spherical ROI is highlighted by circles and,
Fig. 9. Two axial 2D slices showing a region of 31 · 31 voxels of the right occipital horn in a 3D MR image (C06). In the two slices, the size of the spherical ROI is highlighted by the circles. The dashed circles indicate a sphere with a radius increased by one voxel and the arrows mark two different neighboring structures.
as a comparison, a slightly enlarged sphere (radius increased by one voxel) is indicated by dashed circles. It can be seen that the ROI is well chosen as neighboring structures (marked by arrows) are close to the chosen ROI (i.e. partly within the dashed circles) but not inside the chosen ROI (see the circles). Table 2 summarizes the results of estimating an optimal ROI size for the statistical uncertainty approach and the new approach for six different tip-like landmarks in a 3D MR image (Woho) to investigate the stability of our approach. In this experiment, we varied the center position of the ROI in a 3 · 3 · 3 neighborhood of the coarse landmark position and applied our approach 27 times for each landmark. Shown are the estimated radius r of the ROI for the coarse landmark position as well as the average radius r and the standard deviation rr for all 27 different center positions. By varying the center position of the ROI by, for example, one voxel we expect that the estimated ROI size is affected at most by one voxel. In addition, under the assumption of an ellipsoidal neighboring structure, the theoretical standard deviation of the distance of the neighboring structure to the center of the ROI for all 27 different center positions computes to rr 0.83 voxels. From this experiment it turned out that the new approach is quite stable w.r.t. the initial coarse position of the landmark, i.e. a variation of the position by one voxel typically affects
50
S. Wo¨rz, K. Rohr / Medical Image Analysis 10 (2006) 41–58
Table 2 Results of estimating an optimal ROI size for the statistical uncertainty approach and the new approach for six different tip-like landmarks in a 3D MR image (Woho) Woho (voxels)
Left frontal horn Right frontal horn Left occipital horn Right occipital horn Left temporal horn Right temporal horn
Statistical uncertainty
Dominant gradient
r
r
rr
r
r
rr
6 6 6 13 6 6
6.0 6.6 10.4 11.7 6.2 6.2
0.19 0.69 4.32 2.97 0.40 0.36
8 9 8 6 6 6
7.3 8.5 7.8 6.7 6.7 6.0
0.87 1.40 0.79 1.20 1.04 0.00
Shown are the estimated radius r of the ROI as well as the average radius r and the standard deviation rr of 27 different estimates of the ROI size by varying the center position of the ROI in a 3 · 3 · 3 neighborhood.
the estimated ROI size by at most one voxel. In addition, it turned out that the estimated standard deviation rr is relatively close to the theoretical value of 0.83 voxels (see Table 2 right column), except for the right temporal horn where several neighboring structures close to the landmark exist. In contrast, the statistical uncertainty approach (Frantz et al., 1999a) is for certain landmarks relatively unstable, i.e. a variation of the initial position by one voxel often changes the estimated ROI size by 5 or more voxels, which results in relatively large standard deviations (see Table 2). More experiments are presented in Section 6.2 in conjunction with the experiments on the automatic parameter initialization of the ellipsoidal model.
6. Experimental results: model fitting In this section we present experimental results of applying our model fitting approach for the localization of anatomical point landmarks to 3D synthetic data as well as to 3D tomographic images of the human head. 6.1. 3D synthetic data In the first part of the synthetic experiments we applied our approach to 3D image data generated by the three intensity models itself (ellipsoidal, saddle, and spherical model, see Section 2) with added Gaussian noise. In these experiments, we initialized the model parameters by randomly varying the true values in certain ranges, e.g. ±2 voxels for the translation parameters, ±0.25 voxels for r, or ±10 grey levels for the intensities. For the ellipsoidal model, we carried out about 3000 experiments with different parameter settings and added Gaussian noise with standard deviations of rn = 0, 1, and 3 grey levels. We achieved a very high localization accuracy with an average error in the estimated landmark position of 0.02 voxels and a maximum error of 0.14 voxels (except in one case out of 3000 we obtained an error of 0.35 voxels). We also found that the approach is robust w.r.t. the choice of initial parameters. Addition-
ally, for about 2000 experiments with similar settings but very intense Gaussian noise (rn = 5 and 10 grey levels) down to a signal-to-noise ratio of ca. 1, the average localization error turned out to be 0.08 voxels (maximal error of 0.52 voxels). Details are shown in Table 3. For the saddle model, we carried out about 5000 similar experiments. The resulting average localization error is 0.03 voxels, which is comparable to the ellipsoidal model, and the maximal error is 1.80 voxels, which is worse due to a few outliers (see Table 3). For the spherical model, the results of 5000 similar experiments are much better than the results of the other two models, i.e. the average localization error is 0.01 voxels with a maximal error of 0.16 voxels even with very intense Gaussian noise down to a signal-to-noise ratio of ca. 0.1 (last row in Table 3). In the second part of the experiments we applied model fitting of the ellipsoidal model with automatic parameter initialization (including the selection of the ROI size, see Section 4.1 and 4.2) given 3D image data generated by the ellipsoidal model itself with added Gaussian noise. For 1000 experiments with different parameter settings but without global deformations (i.e. a pure ellipsoid was used), automatic initialization with subsequent model fitting succeeded in 99.5% of the cases with an average localization error of 0.12 voxels. In comparison, the 3D differential operator Op4 in conjunction with the three-step refinement (which we used to initialize the landmark position) yielded an average error of 1.65 voxels. Note that the refined operator Op4 is optimized for landmark localization yielding subvoxel positions. The reasons for the five (out of 1000) unsuccessful model fittings are as follows. In three cases, the differential operators did not detect a landmark position within the ROI, in one case model fitting did not terminate, and in one case the intensity levels a0 and a1 were not initialized properly because of a relatively poor localization of the differential operator. For 1000 additional similar experiments including global deformations, automatic initialization with subsequent model fitting succeeded in 99.2% of the cases with an average localization error of 0.33 voxels. In comparison, the
S. Wo¨rz, K. Rohr / Medical Image Analysis 10 (2006) 41–58
51
Table 3 Average and maximal localization errors of the estimated landmark position for the ellipsoidal, saddle, and spherical model using different levels of noise (for each noise level we have used 1000 different synthetic images) Model
Error (voxels)
Gaussian noise rn (grey levels)
Result for all rn
0
1
3
5
10
Ellipsoidal
Average Maximal
0.00 0.00
0.01 0.06
0.03 0.35
0.06 0.24
0.11 0.52
0.04 0.52
Saddle
Average Maximal
0.00 0.26
0.01 0.26
0.02 0.32
0.04 1.20
0.10 1.80
0.03 1.80
Spherical
Average Maximal
0.00 0.00
0.00 0.01
0.01 0.04
0.01 0.05
0.02 0.16
0.01 0.16
Table 4 Same as Table 3 but for fitting the ellipsoidal model to synthetic images containing a paraboloid Model
Error (voxel)
Gaussian noise rn (grey levels) 0
1
3
5
10
Paraboloid
Average Maximal
0.31 0.42
0.31 0.42
0.32 0.43
0.33 0.48
0.36 0.66
refined 3D differential operator Op4 yielded an average error of 1.64 voxels. Third, we applied model fitting of the ellipsoidal model to 3D image data containing structures that were generated using different models. We used both a 3D paraboloid to model a tip-like structure and 3D image data containing an ellipsoid with two different intensity plateaus in the background. For generating the synthetic images containing a smoothed paraboloid we applied a discrete convolution of a 3D Gaussian filter to ideal unblurred paraboloids with different parameter settings (we used four times oversampling to minimize discretization artefacts). Here, a paraboloid is defined by two scaling parameters for the 2D quadratic function as well as additional parameters for a 3D rigid transform and two parameters for the inside and outside intensities. The position of the landmark is given by the tip of the paraboloid, which is the point of largest curvature. We carried out 5000 experiments similar to the first experiments (see Table 4). The resulting average localization error for all noise levels is 0.33 voxels. It turns out that we have a systematic localization error of about 0.3 voxels compared to synthetic data generated by the ellipsoidal model itself (see Table 3). However, the localization errors are still quite good with maximal errors below 0.5 voxels for all but the largest level of noise. For generating the synthetic images containing an ellipsoid with two different intensity levels in the background we use the superposition of the ellipsoidal model (with intensities a0 (outside) and a1 (inside)) and a smoothed ideal 3D step function (with intensities 0 and (a2 a0)), see Fig. 10 (left) for a 2D sketch. Here, the step function is oriented along the centerline of the ellipsoid and the step is located in the tip of the ellipsoid using the same 3D rigid transform as for the ellipsoid. The smoothed step function is the Gaussian error func-
Result for all rn 0.33 0.66
tion U(z/r) using the same standard deviation r as for the ellipsoid. Thus, the resulting superposition has the intensity levels a0 and a2 (outside) as well as a1 (inside), see Fig. 10 (right) for a 2D section of a 3D image. In this experiment, we varied the contrast of the background intensities a0 and a2 by using nine different contrasts abackground = a2 a0 = 0, 5, . . . , 40 grey levels. Here, the contrast between the inside intensity and the mean of the background intensities is fixed to 100 grey levels. We also added Gaussian noise with a standard deviation of rn = 3 grey levels. Otherwise, we used the same parameter settings as in the first experiment. The results of this experiment are summarized in Table 5. The results show that the approach achieves a very high localization accuracy for low contrasts. For larger contrasts the localization accuracy is also very good. For example, for a relatively large contrast of 40 grey levels the average localization error is 0.57 voxels, which is in the subvoxel range even though the synthetic image shows a large deviation from the ellipsoidal model used for fitting (see Fig. 10, right).
Fig. 10. 2D sketch (left) and 2D section of 3D synthetic data (right) of an ellipsoid with two different background intensities a0 and a2. The 2D section (right) shows a bended ellipsoid with intensities of a0 = 30 and a2 = 70 grey levels (outside) as well as a1 = 150 grey levels (inside). The position of the landmark is highlighted by a black cross.
52
S. Wo¨rz, K. Rohr / Medical Image Analysis 10 (2006) 41–58
Table 5 Average and maximal localization errors of the estimated landmark position for fitting the ellipsoidal model to synthetic image data containing an ellipsoid with two different background intensities using different levels of contrast for the two background intensities (for each contrast we have used 100 different synthetic images) Contrast (grey levels)
0
5
10
15
20
25
30
35
40
Average error (voxels) Maximal error (voxels)
0.03 0.11
0.07 0.26
0.14 0.46
0.21 0.69
0.29 0.93
0.36 1.19
0.42 1.52
0.51 2.36
0.57 2.31
6.2. 3D medical images We also applied the new approach to real 3D tomographic MR and CT images of the human head. The sizes and resolutions of the images are listed in Table 6. To achieve isotropic image data in the case of one image pair (C06), we applied an interpolation based on thirdorder polynomials (Meijering et al., 1999) prior to model fitting. We considered seven tip-like landmarks, i.e. the frontal, occipital, and temporal horns (left and right) as well as the external occipital protuberance, and two saddlelike landmarks, i.e. the saddle points at the zygomatic bone (left and right). For these landmarks in all three images we used positions as ground truth that were manually determined by different persons. For each landmark the different persons agreed on a certain voxel, i.e. only one position was available as ground truth. For each landmark, up to four persons agreed on the position with a minimum of two persons and an average of about 3 persons. For the CT image, we did not consider the temporal horns since either the ground truth position was missing due to low signal-to-noise ratio (left horn) or it was not possible to successfully fit the intensity model Table 6 Size and resolution of the 3D images used in the experiments Image
Slice orientation
Size (voxels)
Resolution (mm3)
Woho (MR) C06 (MR) C06 (CT)
Sagittal Axial Axial
256 · 256 · 256 256 · 256 · 120 320 · 320 · 87
1.0 · 1.0 · 1.0 0.859 · 0.859 · 1.2 0.625 · 0.625 · 1.0
(right horn). As an example, Fig. 11 shows the image intensities in a ROI around the right temporal horn. Particularly with this landmark the image quality was relatively bad. In general, the quality of the CT image at the ventricular system was worse in comparison to the MR images. In addition, we considered two sphere-like landmarks in the MR images, i.e. the left and right eye, as well as one saddle-like landmark in the CT image, i.e. the nasal bone. In total, we considered 12 different types of landmarks and applied model fitting to 30 landmarks in the 3D data sets. 6.2.1. Parameter settings The fitting procedure described in Section 3.1 requires the determination of suitable initial parameter values as well as the ROI size. For the ellipsoidal model we used a semi-automatic approach as well as an automatic approach and compared the different schemes with each other. Regarding the semi-automatic approach for fitting the ellipsoidal model (based on our previous work in (Wo¨rz and Rohr, 2003)), we used a brute-force approach to determine the ROI size (see Section 4.1 above for a brief description as well as (Wo¨rz and Rohr, 2003) for more details). In 13 out of 19 tip-like landmarks, this approach led to a good choice of the ROI size. For the remaining landmarks, we initialized the ROI size manually. Values for the most important parameters, namely, the translation parameters (x0, y0, z0) defining the position of the landmark, were obtained by a 3D differential operator. Here we used the operator Op3 = det Cg/
Fig. 11. Five axial 2D slices showing a ROI of 21 · 21 · 5 voxels of the right temporal horn in the C06 image pair (top MR, bottom CT). The ground truth position of the landmark is marked by the square in the center image. The slices on the left and right are directly adjacent to the center slice in the 3D image.
S. Wo¨rz, K. Rohr / Medical Image Analysis 10 (2006) 41–58
trace Cg, where Cg is the averaged dyadic product of the image gradient (Rohr, 1997). For computing the image gradient we used Beaudet derivative filters. For the left and right occipital horns in the Woho image, automatic initialization was not possible since the resulting positions of the 3D differential operator Op3 are relatively far away from the ground truth positions. Also, since the anatomical structure of the occipital horns in this image is rather untypical (and actually consists of two tiny horns), very good initial parameters are required for successful model fitting. Therefore, we initialized the translation parameters in these two cases manually. The remaining parameters were either initialized with fixed values (r, qx, qy, d, and m) or initialized manually (rx, ry, rz, a0, a1, a, b, and c). Concerning the automatic approach for fitting the ellipsoidal model we used the automatic schemes for the selection of an ROI size and for parameter initialization, as described in Section 4. In this approach we compute the image derivatives using 3D Gaussian derivative filters, which has the advantage that scalability is simpler in comparison to Beaudet derivative filters used previously. Depending on the size of the considered anatomical structure, we used Gaussian derivative filters with standard deviations of rf = 1.0 and rf = 1.5 voxels, respectively. In order to improve the reliability of model fitting we applied the scale space approach as described in Section 3.3. In case of rf = 1.0 voxel we used two scales with standard deviations of s = 0 and 0.5 voxels, and in case of rf = 1.5 voxels we used three scales with s = 0, 0.5, and 1.0 voxels (note, s = 0 refers to the original unsmoothed image). We applied model fitting of the ellipsoidal model in four variants: without deformations, with bending only, with tapering only, and with both types of deformation. For each landmark, we chose the variant with the smallest mean fitting error as the fitting result.
6.2.2. Results Tables 7–9 show the fitting results for the considered landmarks using semi-automatic (ellipsoidal model) and manual (saddle and spherical model) parameter initialization. For all landmarks, the radius of the ROI, the estimated intensity levels, as well as the localization errors are listed. Additional fitting results (not shown here) are the other model parameters (e.g., the landmark position as well as the size of the semi-axes in case of the ellipsoidal model or the radius of the spherical model) as well as the mean fitting error. In the case of the ellipsoidal model, model fitting needed 75 iterations on average. We have visualized the fitting results of the left and right frontal horn within an MR image in Fig. 12 and of the external occipital protuberance for the C06 image pair in Fig. 13 using 3D Slicer (Gering et al., 1999). The fitted intensity model is visualized as a 3D contour plot, using
53
Table 7 Fitting results for the ventricular horns and the external occipital protuberance (ellipsoidal model), for the zygomatic bone (saddle model), and for the eyes (spherical model) for the C06 image (MR) C06 (MR)
r
Left frontal horn Right frontal horn Left occipital horn Right occipital horn Left temporal horn Right temporal horn Ext. occipital protub.
14 5 7 8 9 6 17
^a0 91.6 93.9 84.9 86.6 82.4 80.0 61.6
^a1
e (mm)
eOp3 (mm)
eSurf (mm)
22.3 18.8 15.2 20.0 12.8 18.8 8.7
1.27 0.58 0.15 0.70 1.20 0.97 0.06
1.92 1.72 3.32 1.72 1.71 2.10 1.21
0.90 1.28 1.23 1.61
0.70
1.96
1.26
1.42 0.99
1.21 1.48
0.78 1.52
Mean Left zygomatic bone Right zygomatic bone Left eye Right eye
6 6 20 20
121.7 128.2 90.4 97.5
20.8 14.9 24.7 25.4
The radius r of the ROI, the estimated intensity levels, and the distance e to the ground truth position are given. For comparison, the distances eOp3 of the differential operator Op3 and eSurf of the surface model approach to the ground truth position are listed.
Table 8 Same as Table 7 but for the Woho image (MR) Woho
r
^a0
^a1
e (mm)
eOp3 (mm)
Left frontal horn Right frontal horn Left occipital horn Right occipital horn Left temporal horn Right temporal horn Ext. occipital protub.
7 5 5 5 18 9 14
124.0 117.3 107.3 112.7 95.1 109.6 84.2
23.8 20.1 23.3 15.9 44.3 35.8 26.8
2.22 1.44 2.31 0.68 1.80 1.46 1.48
3.16 2.24 4.12 3.61 2.83 4.58 1.41
1.63
3.14
6 6 20 20
167.2 197.8 114.8 122.2
47.1 12.1 29.9 30.4
2.22 3.12
3.00 2.45
Mean Left zygomatic bone Right zygomatic bone Left eye Right eye
Table 9 Same as Table 7 but for the C06 image (CT) C06 (CT) Left frontal horn Right frontal horn Left occipital horn Right occipital horn Ext. occipital protub.
r 5 7 6 7 8
^a0
^a1
e (mm)
eOp3 (mm)
1043.5 1036.7 1038.5 1045.0 1007.9
996.8 1001.8 989.7 994.0 2679.0
1.33 1.26 0.66 0.94 1.10
0.63 2.10 0.00 1.33 1.72
1.06
1.16
1.49 1.81
0.63 1.00
Mean Left zygomatic bone Right zygomatic bone Nasal bone
9 8 12
976.1 977.1 1039.0
2829.0 2918.0 2315.1
the models intensity value at the estimated landmark position as contour value. The results for the ellipsoidal model are quite good. For the C06 image (MR), the average distance between the estimated landmark positions and ground truth
54
S. Wo¨rz, K. Rohr / Medical Image Analysis 10 (2006) 41–58
Fig. 12. 3D contour plots of the fitted intensity models for the left and right frontal horn within an MR image (Woho). The result is shown for four different slices of the original data. The marked axes indicate the estimated landmark positions.
Fig. 13. 3D contour plots of the fitted intensity model for the external occipital protuberance within the original image pair C06 (left MR and right CT). Note, the size of the ROI and the used deformations are different.
positions for all seven tip-like landmarks turns out to be e ¼ 0.70 mm (see Table 7). In comparison, using the 3D differential operator Op3 with Beaudet filters, we obtain an average distance of eOp3 ¼ 1.96 mm. Thus, the localization accuracy with our new approach is much better. For all seven tip-like landmarks of the Woho image (MR), the ellipsoidal model yields an average distance of e ¼ 1.63 mm (see Table 8), whereas for the 3D differential operator we obtain an average distance of eOp3 ¼ 3.14 mm. Therefore, also for this image the localization accuracy with our new approach is much better. For the C06 image (CT), both approaches yield comparable results w.r.t. the estimated landmark position (see Table 9). Here, the ellipsoidal model yields for five tip-like landmarks an average distance of e ¼ 1.06 mm whereas the 3D differential operator yields an average distance of eOp3 ¼ 1.16 mm. Summarizing the results for all 19 tip-like landmarks, it turns out that the average distance between the estimated landmark positions and ground truth positions using the ellipsoidal model is e ¼ 1.14 mm. In comparison, using the 3D differential operator Op3, we obtain an average distance of eOp3 ¼ 2.18 mm. Comparing the results using a paired T-test shows that the ellipsoidal model yields significantly
better results than the 3D differential operator Op3 (p < 0.001). For the surface model approach (Frantz et al., 2000) only comparable data for four landmarks is available, namely, the left and right frontal as well as occipital horns of the C06 MR image. For these four landmarks the average distance of our approach is e ¼ 0.68 mm whereas the surface model approach yields eSurf ¼ 1.26 mm and the differential operator eOp3 ¼ 2.17 mm. Thus, the localization accuracy with our new approach is better than the surface model approach, while the differential operator Op3 yields the worst result. The results for the saddle model are worse in comparison to the ellipsoidal model. The average distance between the estimated landmark positions and the ground truth positions for all six saddle-like landmarks computes to e ¼ 1.84 mm. In comparison, using the 3D differential operator Op3, we obtain an average distance of eOp3 ¼ 1.63 mm. For the surface model approach (Frantz et al., 2000), only comparable data for two landmarks is available, namely, the left and right zygomatic bone of the C06 MR image. For these two landmarks the average distance of our approach is e ¼ 1.21 mm, whereas the surface model approach
S. Wo¨rz, K. Rohr / Medical Image Analysis 10 (2006) 41–58
yields eSurf ¼ 1.15 mm and the differential operator eOp3 ¼ 1.35 mm. Thus, in this case all three approaches have a comparable localization accuracy. In addition, our findings show that the saddle model depends more on the initial parameter values than the other models. In contrast, for the nasal bone in the CT image, fitting the saddle model is much more robust. Unfortunately, we do not have ground truth positions for this landmark. Therefore, we cannot determine the localization accuracy. Fig. 14 (left) shows the fitting result for the left zygomatic bone within an CT image. The results for the spherical model are very good. The fitted model describes the image intensities fairly well and also model fitting is very robust w.r.t. the initial parameters. This is in accordance with the results we obtained using 3D synthetic data (see Section 6.1). Fig. 14 (right) shows the fitting results for both eyes within an MR image. In Fig. 15 we visualized the fitting result for the left occipital horn in the C06 (MR) image. Besides the 3D contour plot of the fitted intensity model within three adjacent slices of the original data we also marked the estimated landmark positions for the new approach (white) and the differential operator Op3 (black). It can be seen that the model describes the depicted anatomical structures fairly well. Here, the distance between the estimated position of our approach to the ground truth position (not shown) is 0.15 mm whereas the distance of the
55
differential operator Op3 is 3.32 mm. The estimated position of the differential operator is clearly inside the structure and relatively far away from the tip of the horn. This is a typical result of the differential operator for the long and thin ventricular horns in our experiments. The reason for this systematic localization error results from smoothing the image data when computing the image gradient, which is necessary to calculate the response of the differential operator. In contrast, our model fitting approach directly exploits the image intensities (without smoothing) and is therefore not vulnerable to this effect. Finally, we present quantitative results for the automatic parameter initialization of the ellipsoidal model as described in Section 4. We considered as landmarks the tips of six ventricular horns (left and right frontal, occipital, and temporal horns) in both MR images. For 8 landmarks (out of 12) it turned out that automatic initialization is reliable and allows very good fitting results, see Table 10. The average distance between the estimated landmark positions and ground truth positions computes to e ¼ 1.63 mm. In comparison, using the refined 3D differential operator Op4, we obtain an average localization error of eOp4; refined ¼ 2.33 mm. Note that the refined operator Op4 is optimized for landmark localization yielding subvoxel positions, and is particularly designed to cope with the localization error due to image blurring. Model fitting was not successful for the right frontal and the left temporal horn in the C06 data set (these anatomical
Fig. 14. 3D contour plots of the fitted intensity model for the left zygomatic bone within an CT image (C06) and for the eyes within an MR image (Woho).
Fig. 15. 3D contour plots of the fitted intensity model for the left occipital horn within the C06 (MR) image. The result is shown with and without the model for three adjacent slices of the original data. The marked axes indicate the estimated landmark positions for the new approach (white) and for the differential operator Op3 (black).
56
S. Wo¨rz, K. Rohr / Medical Image Analysis 10 (2006) 41–58
Table 10 Fitting results for the ventricular horns in two 3D MR images (C06 and Woho) for the automatic approach for fitting the ellipsoidal model
eOp3 (mm)
eOp4, refined (mm)
esemi-auto (mm)
eauto (mm)
C06 Left frontal horn Left occipital horn Right occipital horn Right temporal horn
1.92 3.32 1.72 2.10
2.97 3.21 1.73 1.83
1.27 0.15 0.70 0.97
1.52 0.65 1.23 0.84
1.98 1.93 1.25 3.74
Woho Left frontal horn Right frontal horn Left temporal horn Right temporal horn
3.16 2.24 2.83 4.58
1.98 1.93 1.25 3.74
2.22 1.44 1.80 1.46
2.25 2.61 2.18 1.76
2.33
Mean
2.73
2.33
1.25
1.63
r
^a0
^a1
e (mm)
eOp4,
C06 Left frontal horn Left occipital horn Right occipital horn Right temporal horn
9 6 6 7
92.6 84.6 87.3 79.7
19.6 15.6 19.8 20.2
1.52 0.65 1.23 0.84
2.97 3.21 1.73 1.83
Woho Left frontal horn Right frontal horn Left temporal horn Right temporal horn
8 9 6 6
126.3 126.3 115.1 110.8
21.8 23.8 18.2 26.1
2.25 2.61 2.18 1.76 1.63
Mean
Table 11 Comparison of the localization errors for different approaches
refined
(mm)
The selected radius r of the ROI, the estimated intensity levels, and the distance e to the ground truth position are given. For comparison, the distance of the refined differential operator Op4 to the ground truth position is listed.
Shown are the distances to the ground truth position of the differential operator Op3 with Beaudet filters (eOp3) and of the refined differential operator Op4 with Gaussian filters (eOp4) as well as of the model fitting approach using semi-automatic initialization (esemi-auto) and automatic initialization (eauto).
structures are very tiny which leads to a relatively poor estimate of the initial orientation and position) and for both occipital horns in the Woho data set (the reason is a rather untypical anatomical structure which can be characterized as a double horn with two tiny tips). Also, model fitting was not successful for the landmarks in the CT image. The reason is the relatively poor image quality due to image noise. This leads to a wrong initialization of the model orientation as well as a relatively poor initialization of the other model parameters. Table 11 compares the accuracy of landmark localization for the 3D differential operator Op3 with Beaudet filters, the refined 3D differential operator Op4 with Gaussian filters as well as model fitting with semi-automatic and automatic initialization. It turns out that the average localization error of automatic fitting is 1.63 mm, which is somewhat worse in comparison to the semi-automatic fitting, which yields an error of 1.25 mm. However, the localization accuracy is much better than using both differential operators which yield 2.33 and 2.73 mm, respectively. The execution time of our algorithm mainly depends on the size of the ROI, the number of model parameters, and the quality of the initial parameters. As a typical example, the fitting time for the right temporal horn in the Woho image including tapering and bending deformations and a radius of the ROI of 9 voxels is ca. 1 s (on a AMD Athlon PC with 1.7 GHz, running Linux).
knowledge about the shape and intensity of 3D anatomical structures and yields subvoxel positions of landmarks. The suggested different types of intensity models (i.e. tip-like, saddle-like, and sphere-like) describe the anatomical structures fairly well, as can be seen from the experiments and the shown 3D contour plots. Also, the figures demonstrate that the spectrum of possible shapes of our intensity models is relatively large. Through experiments on the basis of synthetic and real tomographic images, it turned out that the new approach significantly improves the localization accuracy of tiplike landmarks compared to previously proposed 3D differential operators and a contour fitting scheme. In addition, we presented a new approach for the automatic selection of an optimal 3D ROI size for model fitting as well as the automatic initialization of the model parameters of our ellipsoidal model. The experiments testify the applicability of this approach. The results show that we obtain optimal ROI sizes such that neighboring structures are not captured. In combination with the initialization of the parameters of our model, this allows a fully automated localization of tip-like landmarks. Further work is necessary in the case of very tiny landmark structures. In our approach, we apply the Levenberg–Marquardt method for model fitting. Since Levenberg–Marquardt is a local optimization method, generally a local optimum instead of the global optimum is found. Especially when the initial parameters are relatively inaccurate, a local optimum can give a poor estimate of the model parameters. We have addressed this problem by proposing a scale space method for model fitting. Alternatively, global optimization methods such as genetic algorithms could be used. However, the disadvantage of global optimization methods are the relatively high computational costs. Alker et al. (2001) proposed a hybrid
7. Discussion In this paper we introduced a new approach for the localization of anatomical point landmarks in 3D tomographic images. In comparison to previously used 3D differential operators, our scheme exploits more a priori
S. Wo¨rz, K. Rohr / Medical Image Analysis 10 (2006) 41–58
algorithm for 3D landmark localization combining the advantages of a local optimization method (using a conjugate gradient method) with a global optimization method (using a genetic algorithm). An issue for future work is to study the accuracy and robustness of different optimization schemes in the context of 3D landmark localization.
Acknowledgments We thank the reviewers for their constructive and thoughtful comments. The original MR and CT images have kindly been provided by Philips Research Hamburg and W.P.Th.M. Mali, L. Ramos, and C.W.M. van Veelen (Utrecht University Hospital) via ICS-AD of Philips Medical Systems Best.
Appendix A. Partial derivatives of parametric models A.1. Ellipsoidal model
57
where T
ðh1 ðxÞ; h2 ðxÞ; h3 ðxÞÞ ¼ TðBðRðxÞÞÞ; # T # h1 ðxÞ; h# ¼ RðxÞ; 2 ðxÞ; h3 ðxÞ
ðA:2Þ
and p ffiffiffiffiffiffiffiffiffiffiffi 3 r r r x y z x ¼ x1 ð1 x2 Þ; x1 ¼ ; r sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 2 2 h1 h2 h3 þ r z x2 ¼ þ þ ; rx ry rz # 2 q d cos ðmÞ; xx2 ¼ 1 þ x h# ; xx1 ¼ h# 1 h3 rz 3 # 2 qy d sin ðmÞ; xy2 ¼ 1 þ h# ; xy1 ¼ h# 2 h3 rz 3 ogM x 1 h1 ¼ ða0 a1 ÞGðxÞ ; oh1 x2 r2x ogM x 1 h2 ¼ ða0 a1 ÞGðxÞ ; oh2 x2 r2y
ðA:3Þ
ogM x 1 h3 þ r z ¼ ða0 a1 ÞGðxÞ ; oh3 x2 r2z 0
The partial derivatives of the model function gM,Ellipsoid (see (4) in Section 2.1) w.r.t. the model parameters pEll. are given by ogM x x1 h21 ¼ ða1 a0 ÞGðxÞ þ ; 3rx x2 r3x orx ! ogM x x1 h22 ¼ ða1 a0 ÞGðxÞ þ ; 3ry x2 r3y ory ogM ¼ orz
ogM oa0 ogM oa1 ogM or ogM oqx ogM oqy ogM od ogM om ogM oa
qy h# xy1 2 3 rz
q h# xx1 x 2 3 rz
ogM ogM oh1 oh2 x x1 h23 þ rz h3 þ ða1 a0 ÞGðxÞ þ ; 3rz x2 r3z
¼ 1 UðxÞ; ¼ UðxÞ; ¼ ða0 a1 ÞGðxÞ ¼ xx1
h# 3 og M ; rz oh1
¼ xy1
h# 3 og M ; rz oh2
x ; r
ogM ogM 2 xy2 ðh# ; 3 Þ sinðmÞ oh1 oh2 ogM ogM 2 2 ¼ xx2 ðh# xy2 ðh# ; 3 Þ d sinðmÞ 3 Þ d cosðmÞ oh1 oh2 oRðxÞ analogous for b; c; x0 ; y 0 ; and z0 ; ¼h oa ðA:1Þ 2
¼ xx2 ðh# 3 Þ cosðmÞ
1 M d cosðmÞxx2 þ xx1 qrzx og þ 2h# 3 oh og og 1 B C h ¼ @xx2 M ; xy2 M ; A oh1 oh2 2h# d sinðmÞx þ x qy ogM þ ogM y2 y1 rz oh2 3 oh3
ðA:4Þ
as well as a 3D rigid transform RðxÞ ¼ Rðx tÞ with rotation matrix R and translation t = (x0, y0, z0)T. A.2. Spherical model The partial derivatives of the model function gM,Sphere (see (8) in Section 2.3) w.r.t. the model parameters pSphere are given by ogM R ¼ ða1 a0 Þ ðGr ðR rt Þ Gr ðR þ rt ÞÞ; rt oR ogM ¼ 1 gSphere ðx t;RÞ; oa0 ogM ¼ gSphere ðx t;RÞ; oa1 2 ogM R þ xp R2 þ xm ¼ ða1 a0 Þ Gr ðR þ rt Þ Gr ðR rt Þ ; or rrt rrt ogM x x0 ¼ ða1 a0 Þ 3 ðxp Gr ðR þ rt Þ xm Gr ðR rt ÞÞ; ox0 rt ogM y y0 ¼ ða1 a0 Þ 3 ðxp Gr ðR þ rt Þ xm Gr ðR rt ÞÞ; oy 0 rt ogM z z0 ¼ ða1 a0 Þ 3 ðxp Gr ðR þ rt Þ xm Gr ðR rt ÞÞ; oz0 rt ðA:5Þ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 2 2 where rt ¼ ðx x0 Þ þ ðy y 0 Þ þ ðz z0 Þ ; xp ¼ r2 þ 2 Rrt ; and xm ¼ r Rrt .
58
S. Wo¨rz, K. Rohr / Medical Image Analysis 10 (2006) 41–58
References Alker, M., Frantz, S., Rohr, K., Stiehl, H.S., 2001. Improving the robustness in extracting 3D point landmarks from 3D medical images using parametric deformable models. In: Proc. MICCAI01, Utrecht, The Netherlands, Oct. 14–17, 2001, Lect. Notes Comp. Sci. 2208. Springer, Berlin, pp. 582–590. Bertolini, R., Leutert, G., 1982. Atlas Der Anatomie Des Menschen. Band 3: Kopf, Hals, Gehirn, Ru¨ckenmark Und Sinnesorgane. Springer, Berlin. Le Briquer, L., Lachmann, F., Barillot, C., 1993. Using local extremum curvatures to extract anatomical landmarks from medical images. In: Medical Imaging 1993: Image Processing, Newport Beach, CA/USA, Feb. 16–19, 1993, Proc. SPIE 1898, pp. 549–558. Bronstein, I.N., Semendjajew, K.A., 1981. Taschenbuch Der Mathematik, 19. Auflage, Verlag Harri Deutsch, Thun und Frankfurt/ Main. Cootes, T.F., Taylor, C.J., Cooper, D.H., Graham, J., 1995. Active shape models – their training and application. Comput. Vis. Image Und. 61 (1), 38–59. Cootes, T.F., Edwards, G.J., Taylor, C.J., 2001. Active appearance models. IEEE Trans. Pattern Anal. Mach. Intell. 23 (6), 681– 685. Drewniok, C., Rohr, K., 1997. Model-based detection and localization of circular landmarks in aerial images. Int. J. Comput. Vis. 24 (3), 187–217. Evans, A.C., Dai, W., Collins, L., Neelin, P., Marret, S., 1991. Warping of a computerized 3-D atlas to match brain image volumes for quantitative neuroanatomical and functional analysis. In: Medical Imaging V: Image Processing, San Jose, CA/USA, 1991, Proc. SPIE 1445, pp. 236–246. Frantz, S., Rohr, K., Stiehl, H.S., 1998. Multi-step differential approaches for the localization of 3D point landmarks in medical images. J. Comput. Inform. Technol. 6 (4), 435–447. Frantz, S., Rohr, K., Stiehl, H.S., 1999a. Improving the detection performance in semi-automatic landmark extraction. In: Proc. MICCAI99, Cambridge, UK, Sep. 19–22, 1999, Lect. Notes Comp. Sci. 1679. Springer, Berlin, pp. 253–262. Frantz, S., Rohr, K., Stiehl, H.S., Kim, S.-I., Weese, J., 1999b. Validating point-based MR/CT registration based on semi-automatic landmark extraction. In: Proc. CARS99, Paris, France, June 23–26, 1999. Elsevier, Amsterdam, pp. 233–237. Frantz, S., Rohr, K., Stiehl, H.S., 2000. Localization of 3D anatomical point landmarks in 3D tomographic images using deformable models. In: Proc. MICCAI00, Pittsburgh, Pennsylvania/USA, Oct. 11–14, 2000, Lect. Notes Comp. Sci. 1935. Springer, Berlin, pp. 492–501. Gering, D.T., Nabavi, A., Kikinis, R., Grimson, W.E.L., Hata, N., Everett, P., Jolesz, F., Wells, W.M., 1999. An integrated visualization system for surgical planning and guidance using image fusion and interventional imaging. In: Proc. MICCAI99, Cambridge, UK, Sep. 19–22, 1999, Lect. Notes Comp. Sci. 1679. Springer, Berlin, pp. 808–819. Hartkens, T., Rohr, K., Stiehl, H.S., 2002. Evaluation of 3D operators for the detection of anatomical point landmarks in MR and CT images. Comput. Vis. Image Und. 85, 1–19. Hill, D.L.G., Hawkes, D.J., Crossman, J.E., Gleeson, M.J., Cox, T.C.S., Bracey, E.E.C.M., Strong, A.J., Graves, P., 1991. Regis-
tration of MR and CT images for skull base surgery using pointlike anatomical features. Br. J. Radiol. 64 (767), 1030–1035. Jain, A.K., Zhong, Y., Lakshmanan, S., 1996. Object matching using deformable templates. IEEE Trans. Pattern Anal. Mach. Intell. 18 (3), 267–278. Kanade, T., Okutomi, M., 1994. A stereo matching algorithm with an adaptive window: theory and experiment. IEEE Trans. Pattern Anal. Mach. Intell. 16 (9), 920–932. Kessler, R.M., Ellis Jr., J.R., Eden, M., 1984. Analysis of emission tomographic scan data: limitations imposed by resolution and background. J. Comput. Assist. Tomogr. 8 (3), 514–522. Likar, B., Pernusˇ, F., 1999. Automatic extraction of corresponding points for the registration of medical images. Med. Phys. 26, 1678– 1686. Lindeberg, T., 1994. Junction detection with automatic selection of detection scales and localization scales. In: Proc. ICIP94, Austin, TX/USA, 1994. IEEE Computer Society Press, Los Alamitos, CA, USA, pp. 924–928. McInerney, T., Terzopoulos, D., 1996. Deformable models in medical image analysis: a survey. Med. Image Anal. 1 (2), 91–108. Meijering, E.H.W., Zuiderveld, K.J., Viergever, M.A., 1999. Image reconstruction by convolution with symmetrical piecewise nthorder polynomial kernels. IEEE Trans. Image Process. 8 (2), 192– 201. Nalwa, V.S., Binford, T.O., 1986. On detecting edges. IEEE Trans. Pattern Anal. Mach. Intell. 8 (6), 699–714. Press, W.H., Flannery, B.P., Teukolsky, S.A., Vetterling, W.T., 1993. Numerical Recipes in C. Cambridge University Press, Cambridge, UK. Rohr, K., 1992. Recognizing corners by fitting parametric models. Int. J. Comput. Vis. 9 (3), 213–230. Rohr, K., 1997. On 3D differential operators for detecting point landmarks. Image Vis. Comput. 15 (3), 219–233. Scott, I., Cootes, T.F., Taylor, C.J., 2003. Improving appearance model matching using local image structure. In: Proc. IPMI03, Ambleside, UK, July 21–25, 2003, Lect. Notes Comp. Sci. 2732. Springer, Berlin, pp. 258–269. Sobotta, J., 1988. Atlas Der Anatomie Des Menschen. Band 1: Kopf, Hals, Obere Extremita¨t, Haut, nineteenth ed. Urban & Schwarzenberg, Mu¨nchen. Solina, F., Bajcsy, R., 1990. Recovery of parametric models from range images: the case for superquadrics with global deformations. IEEE Trans. Pattern Anal. Mach. Intell. 12 (2), 131–147. Thirion, J.-P., 1996. New feature points based on geometric invariants for 3D image registration. Int. J. Comput. Vis. 18 (2), 121–137. Verbeek, P.W., van Vliet, L.J., 1994. On the location error of curved edges in low-pass filtered 2-D and 3-D images. IEEE Trans. Pattern Anal. Mach. Intell. 16 (7), 726–733. Walker, K.N., Cootes, T.F., Taylor, C.J., 1998. Locating salient object features. In: Proc. 9th British Machine Vision Conference (BMVC98), Southampton, UK, Sep., 1998. BMVA Press, pp. 557–566. Witkin, A.P., 1983. Scale-space filtering. In: Proc. 4th International Joint Conference on Artificial Intelligence, Karlsruhe, Germany, pp. 1019–1022. Wo¨rz, S., Rohr, K., 2003. Localization of anatomical point landmarks in 3D medical images by fitting 3D parametric intensity models. In: Proc. IPMI03, Ambleside, UK, July 21–25, 2003, Lect. Notes Comp. Sci. 2732. Springer, Berlin, pp. 76–88.