A shape prior constraint for implicit active contours

A shape prior constraint for implicit active contours

Pattern Recognition Letters 32 (2011) 1937–1947 Contents lists available at SciVerse ScienceDirect Pattern Recognition Letters journal homepage: www...

2MB Sizes 6 Downloads 118 Views

Pattern Recognition Letters 32 (2011) 1937–1947

Contents lists available at SciVerse ScienceDirect

Pattern Recognition Letters journal homepage: www.elsevier.com/locate/patrec

A shape prior constraint for implicit active contours Weiping Liu a,⇑, Yanfeng Shang b,c, Xin Yang a, Rudi Deklerck c,d, Jan Cornelis c a

Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai 200240, China Shanghai Advanced Research Institute, Chinese Academy of Sciences, Shanghai 201210, China c Department of ETRO, Vrije Universiteit Brussel, Pleinlaan 2, Brussel 1050, Belgium d Institute of Broad Band Technology, Gaston Crommenlaan 8, 9050 Gent, Belgium b

a r t i c l e

i n f o

Article history: Received 27 March 2010 Available online 16 September 2011 Communicated by S. Sarkar Keywords: Image segmentation Implicit active contour Shape prior constraint Registration Level set methods

a b s t r a c t We present a shape prior constraint to guide the evolution of implicit active contours. Our method includes three core techniques. Firstly, a rigid registration is introduced, using a line search method within a level set framework. The method automatically finds the time step for the iterative optimization processes. The order for finding the optimal translation, rotation and scale is derived experimentally. Secondly, a single reconstructed shape is created from a shape distribution of a previously acquired learning set. The reconstructed shape is applied to guide the active contour evolution. Thirdly, our method balances the impact of the shape prior versus the image guidance of the active contour. A mixed stopping condition is defined based on the stationarity of the evolving curve and the shape prior constraint. Our method is completely non-parametric and avoids taking linear combinations of non-linear signed distance functions, which would cause problems because distance functions are not closed under linear operations. Experimental results show that our method is able to extract the desired objects in several circumstances, namely when noise is present in the image, when the objects are in slightly different poses and when parts of the object are invisible in the image. Ó 2011 Elsevier B.V. All rights reserved.

1. Introduction The active contour model is a very powerful image segmentation technique, and has been widely used in a variety of applications in computer vision, pattern recognition and medical image analysis. Many variations of the active contour model have been developed since the ‘‘snake’’ was first proposed by Kass et al. (1988). Based on the type of extracted image features and involved knowledge, active contour based image segmentation can be categorized as edge-driven, region-driven (or intensity-based) and prior-knowledge driven. Edge-driven active contour models find an optimal boundary by minimizing an energy function over all closed curves (Caselles et al., 1997; Paragios et al., 2004; Melonakos et al., 2008; Xie and Mirmehdi, 2008). Region-driven active contour models perform an integration of statistical features associated to object regions, which is then used in the energy function (Zhu and Yuille, 1996; Chan and Vese, 2001; Vese and Chan, 2002). They are often too noise sensitive, they fail in the presence of object overlap and incompletely imaged objects. A solution is to introduce higher level prior knowledge, such as texture or prior

⇑ Corresponding author. Tel.: +86 21 34205973; fax: +86 21 34204022. E-mail addresses: [email protected] (W. Liu), [email protected] (Y. Shang), [email protected] (X. Yang), [email protected] (R. Deklerck), jpcornel@ etro.vub.ac.be (J. Cornelis). 0167-8655/$ - see front matter Ó 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.patrec.2011.09.012

shape information (Cremers et al., 2007) into the active contour model for specific object extraction. The prior-knowledge driven active contour model has gained success in cases with a complex background, e.g. medical image segmentation (Tsai et al., 2003; Yang and Duncan, 2004) of incompletely imaged objects that cannot be reconstructed without using prior knowledge information. Prior shapes of specific objects, which can be represented as shape descriptors or implicit surfaces are widely-used forms of higher-level knowledge. Using prior shapes, the models are able to segment objects corrupted by noise, clutter and occlusions (Cremers et al., 2003). However, many researchers still investigate better strategies for segmentation based on prior shapes. The main challenge of this approach is to derive the shape prior model, and embed it into the active contour model. The statistical shape prior model has been widely investigated recently. In this method, the density distribution of a specific object is estimated from the aligned prior shape set. In the pioneering work of Leventon et al. (2000), the shapes are represented by signed distance functions, and then Principal Component Analysis (PCA) is applied to extract shape features from aligned curves. The shape distribution is assumed to be Gaussian distributed. The work of Tsai et al. (2003, 2004) and Yang and Duncan (2004) is following a similar reasoning. The parametric shape model suggested by Tsai et al. (2003, 2004) derives a shape model by a weighted sum of the eigen-shapes created by PCA. Yang and Duncan (2004) combine an

1938

W. Liu et al. / Pattern Recognition Letters 32 (2011) 1937–1947

intensity prior with a shape prior by PCA. Cremers et al. (2002, 2003, 2006) also propose the use of a shape distribution to construct a statistical shape prior model, and they combine it with a region-based active contour model. In Cremers et al. (2006), kernel density estimation – instead of the Gaussian model assumption – is applied to estimate the shape distribution more accurately. In Cremers et al. (2003), the use of kernel principal component analysis (KPCA) – instead of PCA – is proposed to enhance the extraction of shape features. Effective methods can be found in the literature for matching the evolving active contour with a prior shape template represented by a signed distance function. In this way, a shape force is created during the registration that drives the evolving curve towards a boundary in the image. Mean shape (Chen et al., 2002; Rousson and Paragios, 2002), atlas (Chan and Zhu, 2005) and parametric shape model (Munim and Farag, 2007) have all been proposed as effective prior shape templates. However, for mean shape and for atlas approaches, it is difficult to provide sufficient knowledge to the model so that incomplete objects can be reconstructed in the segmentation procedure by rigid registration of the template to the evolving curve. The parametric shape model is constructed by a weighted sum of the signed distance functions deviating from the mean shape (Munim and Farag, 2007). The results, described in Munim and Farag (2007) show that their shape model can handle shape deformation through specific parametertuning. However, the shape model is not a signed distance function: all of the training shapes are represented by a signed distance function and the linear combination of signed distance functions is not a signed distance function anymore (Cremers et al., 2006). A registration algorithm is proposed by Paragios et al. (2002) for matching the evolving curve to the prior shape template, and a matching energy is minimized through optimizing the rotation, translation and scale transformation parameters in a variational framework. The matching energy can be treated as shape energy added to the energy function of the conventional active contour (Rousson and Paragios, 2002). As far as we know, all of the methods introduced above primarily focus on the shape model, but hardly discuss how to combine the shape model with the conventional active contour model. Usually they balance the influence of the shape prior with respect to the conventional active contour model by a weight factor that must be carefully tuned by hand. Chen and Radke (2009) present a method that avoids this balancing problem by estimating the distribution of image information and shape prior by kernel density estimation. However, how to choose the kernel width in kernel density estimation is still an open question. In this paper, we present a shape prior constraint for an implicit active contour model, which exploits a set of prior shapes to reconstruct one single shape that will be used to guide the active contour in the segmentation procedure. Our framework consists of two stages: the first is shape alignment, and the second is segmentation based on a shape prior. In the first stage, a matching energy is defined for alignment. Minimizing this energy by a gradient descent method, the training shapes will be matched with an evolving mean shape (Munim and Farag, 2007) by optimizing the parameters of rotation, translation and scale. The result of the shape alignment is a set of prior shapes that will be applied to conduct the segmentation procedure in the second stage. In the second stage, a shape prior constraint is derived. The reconstructed shape is created by the minimization process described in Section 2.2. Since the coordinate systems of the evolving curve and the aligned shape are different, at first we must transform the evolving curve back to the coordinate system of the aligned shape to satisfy a shape distribution estimated by kernel density estimation. Secondly, the shape distribution is optimized in the aligned shape coordinates using a level set method to

get the reconstructed shape. At last, the reconstructed shape is matched with the evolving curve and the result of the registration is applied to guide the evolution of the active contour. Our shape prior constraint can deal with the deformation of the shape by optimizing the shape distribution using kernel density estimation and it avoids the problem that arises by the linear combination of the signed distance functions. A stopping condition is defined. The shape force created by the shape prior constraint compromises with the image force created by the active contour by checking whether the stopping condition is satisfied. Our work differs from Cremers et al. (2006) in three major ways. Firstly, our framework can handle translation, scale and rotation and an effective registration technique is developed. Secondly, we compute the reconstructed shape in the aligned training shape coordinate system and transform the result forward to the evolving curve coordinates. We only need to optimize the transformation parameters between the reconstructed shape and the evolving curve. Thus, the computation is very fast. Thirdly, we develop a method to combine the influence of the shape prior and the image intensity on the evolving curve. The method automatically determines the weight factor between the shape prior and the conventional active contour model. The paper is organized as follows: in Section 2, shape alignment and the parametric shape prior constraint are explained in detail. Some experiments and results are included in Section 3. Finally, a conclusion is formulated in Section 4. 2. Active contour model with a shape prior The conventional active contour model is an evolving curve that is driven by an image force to minimize its energy function (Kass et al., 1988). For the active contour combined with a shape prior model, an additional shape energy term, ES, is introduced into the energy function, which causes the active contour to move towards or coincide with the shape prior of a specific object. The energy representation, E is given by the following equation:

E ¼ EA þ ks ES

ð1Þ

where EA is the energy of the conventional active contour model, and ES is the additional shape energy. ks > 0 is a weight factor expressing the amount of added shape energy, i.e. the contribution of the shape regularization to the energy E. 2.1. Shape representation and alignment In our work, we represent the shapes in an implicit way. Each shape is embedded in a signed distance function, wi(u, v) having a zero level equal to the image contour. We use a fast marching method to create the signed distance function because of its computational efficiency (Osher and Fedkiw, 2003). Then, all the training shapes are represented by a set {wi(u, v)}i=1,. . .,n. The training shapes are extracted from segmentation procedures or manual outlines, which lead to differences in position, direction and scale. Therefore, all of the training shapes must be aligned. To create a prior shape set, an enhanced version of Paragios’ model (2002) is used to align the training shapes. We assume that all the shapes of the object are similar, which means that they can be matched with each other after a rigid transformation. The rigid transformation between the current shape, wC(u, v) and the target shape, wT(x, y) is a mapping given by Fðu; v Þ ¼ sR½ u

v T þ T, which is the forward transformation. The

mapping involves the translation T ¼ ½ T x T y T , rotation R ¼   cos h sin h and scale parameter s. Clearly, the signed distance  sin h cos h function satisfies:

1939

W. Liu et al. / Pattern Recognition Letters 32 (2011) 1937–1947

swC ðBðx; yÞT Þ ¼ wT ðx; yÞ 1

T

1

where Bðx; yÞ ¼ s R ½ x  T x y  T y  is the backward transformation. Note, that when the scale of the shape is changed, we need to multiply the signed distance function wC with a normalization factor s. In practice, we only transform the zero level set, and create the signed distance function using the transformed zero level set, which explains why the scale factor does not appear in (2). The matching energy, Em is defined based on the shape distance, more precisely the integration of the squared error (Chan and Zhu, 2005) and is given by:

ZZ

Em ðwT ðx; yÞ; wC ðBðx; yÞT ÞÞ ¼

ð2Þ

where X is the image domain and H(x) is the Heaviside function. Assuming the signed distance function is positive inside the zero level set, the application of H(wT(x, y)) extracts the inner region enclosed by the shape. By minimizing Eq. (2), the optimal transformation parameters of scale, translation and angle can be obtained.  is computed from the training shape set The mean shape w {wi}i=1,. . .,n for the calculation of the transformation parameters. Tsai et al. (2003) define the mean shape as the average of signed distance function of the aligned shape:

wi ðBTi Þ

i¼1

where fwi ðBTi Þgi¼1;...;n is the aligned shape set. As explained in Section 1, this gives rise to an inconsistent framework for shape modeling. In order to avoid this problem, a variational strategy is adopted to create the mean shape. Meanwhile, the training shape set will be aligned. Each shape from the shape set will be matched with an evolving mean shape (Munim and Farag, 2007) using the following matching energy:

 w ;...;w Þ ¼ Eðw; 1 n

n ZZ X i¼1

X

  Hðw ðBT ÞÞ2 dx dy ½HðwÞ i i

ð3Þ

 w ; . . . ; w Þ is minimized by a gradient The matching energy, Eðw; 1 n descent method. Furthermore, the gradient descent direction of the transformation parameters is given as follows:

dsi ¼ 2s1 i dt 

ZZ

 ðHðwÞ X

(

Hðwi ðBTi ÞÞÞdðwi ðBTi ÞÞ



(

) @wi ðBTi Þ @wi ðBTi Þ dx dy v i þ ui @ui @v i

Hðwi ðBTi ÞÞÞdðwi ðBTi ÞÞ

dT xi ¼ 2s1 i dt

ZZ

ð5Þ

 ðHðwÞ X

(

 Hðwi ðBTi ÞÞÞdðwi ðBTi ÞÞ 

) @wi ðBTi Þ @w ðBT Þ cos hi  i i sin hi dx dy @ui @v i ð6Þ

X

n X

 ðHðwÞ X

½HðwT ðx; yÞÞ

 HðwC ðBðx; yÞT ÞÞ2 dx dy

 ¼1 w n

dhi ¼2 dt

ZZ

) @wi ðBTi Þ @wi ðBTi Þ dx dy ui  vi @ui @v i

ð4Þ

dT yi ¼ 2s1 i dt 

ZZ

 ðHðwÞ X

( ) @wi ðBTi Þ @wi ðBTi Þ sin hi  cos hi dx dy @ui @v i

Hðwi ðBTi ÞÞÞdðwi ðBTi ÞÞ

ð7Þ  @w ¼2 @t

n X

  Hðw ðBT ÞÞÞdðwÞ  ðHðwÞ i i

ð8Þ

i¼1

1 where ui ¼ s1 i ½ðx  T xi Þ cos hi  ðy  T yi Þ sin hi , v i ¼ si ½ðx  T xi Þ sin hi þ ðy  T yi Þ cos hi . The variable t makes the equations dynamic. d(x) is the Dirac function, d(x) = H0 (x). The algorithm is implemented in an iterative way, as follows:

 by using any training shape from (1) Initialize the mean shape w the training shape set. (2) All of the transformation parameters Txi, Tyi, hi and si are computed by minimizing the matching energy using Eqs. (4)–(7). (3) The mean shape is calculated from Eq. (8). Steps (2) and (3) are repeated several times until the transformation parameters and the mean shape reach a steady state. We  and the aligned shape set obtain both the final mean shape w fwi ðBTi Þgi¼1;...;n represented by signed distance functions which will be used to find a single reconstructed shape that will be used as a shape prior during guidance of the active contour model (see Section 2.3). To analyze the alignment procedure, a training set consisting of 9 starfishes, is used as shown in Fig. 1. The final alignment results are shown in Fig. 2. It can be observed that all the aligned starfishes roughly share the same center, have the same orientation, and have approximately equal size. The initial shape and the final mean shape are shown in Fig. 3. It can be noticed that the shape templates in Fig. 2 are slightly different from the contours in Fig. 1. Due to some breaks in the contour when the scale parameters

Fig. 1. Training set: contours of nine starfishes.

Fig. 2. Shape templates: aligned starfishes of the training set.

1940

W. Liu et al. / Pattern Recognition Letters 32 (2011) 1937–1947

shape which is the shape template denoted by uT. The shape template is deformed by the kernel density estimation and will be applied to guide the curve evolution in the next section. 2.3. Segmentation with a shape prior Fig. 3. Initialization and final result of the mean shape.

change, dilation and erosion operations (Gonzalez and Woods, 2003) are applied to maintain a complete closed contour of the evolving curve. 2.2. Shape reconstruction from the training set The shape model (Munim and Farag, 2007) could be seen as an adaptive changing template. By tuning the parameters, different templates will be achieved. But this model suffers from the problem due to the linear combination of signed distance functions (Cremers et al., 2006). We present a deformed shape template which circumvents this problem. We reconstruct the shape from the training set according to a given curve. In our case this curve is the evolving active contour during the segmentation process (see Section 2.3). The reconstructed shape is derived from the aligned training set at each iteration step and it will be treated as a deformable template to guide the evolving curve during the segmentation. The aligned prior shape set fwi ðBTi Þgi¼1;...;n obtained from Section 2.1 will be denoted by {ui}i=1,. . .,n. Kernel density estimation has been known as a powerful nonparametric technique to model the shape variation (Cremers et al., 2006). This allows the shape prior to approximate arbitrary distributions. We use kernel density estimation on the training set {/i}i=1,. . .,n to model the shape:

PðuÞ /

  n 1X 1 2 exp  2 d ðHðuÞ; Hðui ÞÞ n i¼1 2r

ð9Þ

where P(/) is the distribution of the shape / in the aligned shape coordinate. d2 is the distance measure and r is the kernel width. They are given as follows: 2

d ðHðuÞ; Hðui ÞÞ ¼

r2 ¼

ZZ

X0

ðHðuÞ  Hðui ÞÞ2 du dv

n wX 2 min d ðHðui Þ; Hðuj ÞÞ n i¼1 j–i

ð10Þ

ð11Þ

where u and v in (10) are the coordinates associated to the aligned shape. X0 is domain of the aligned shape. w is a parameter that is given a constant value (w < 1) to define the kernel width. We will analyse this parameter in the section on experiments. The energy of reconstruction in the aligned coordinates u and v is given by:

ER ¼  log PðuÞ

ð12Þ

The gradient descent equation can be written as:

@u ¼ @t

Pn

2 @ i¼1 wi @ u d ðHð Þ; Hð P 2 2 ni¼1 wi

u

ui ÞÞ

r

In this section, we derive the region based C–V model (Chan and Vese, 2001) together with the shape prior constraint to drive the evolving curve / close to the target object. The use of a regionbased active contour model is advantageous in the presence of weak edges, hence the C–V model is an appropriate choice. The contour of the evolving curve is denoted as C(x, y). L(C) is the length of the curve, and S(C) is the area inside the curve. The energy function, ECV, is defined as (Chan and Vese, 2001):

ECV ðC; cin ; cout Þ ¼ lLðCÞ þ mSðCÞ þ kin þ kout

jI  cin j2 dx dy insideðCÞ

ZZ

jI  cout j2 dx dy

ð15Þ

outsideðCÞ

where l; m P 0 kin ; kout > 0 are fixed parameters during the evolution. I is the image intensity. cin and cout are the mean intensity of the object (inside C) and background (outside C), respectively. cin and cout are given by the following equations:

RR

Ci n ¼

Iðx; yÞHð/Þdx dy XR R

ð16Þ

X Hð/Þdx dy

RR C out ¼

Iðx; yÞð1 XR R X ð1

 Hð/ÞÞdx dy  Hð/ÞÞdx dy

ð17Þ

We use a signed distance function, /(x, y), to implicitly represent the enclosing curve at its zero level set C = {(x, y)|/(x, y) = 0}. Assuming positive values in the region enclosed by C, the length L(C) of the evolving curve and the enclosed area S(C) can be computed as integrals over the whole image as follows:

LðCÞ ¼ ¼

ZZ

ZZ

jrHð/Þjdx dy ¼

ZZ

X

dð/Þjr/jdx dy and SðCÞ X

Hð/Þdx dy

ð18Þ

X

The shape energy is defined by the shape distance between the active curve and the reconstructed shape template uT, and it will be minimized by optimizing the transformation parameters with respect to the evolving curve. The energy of shape, ES is expressed as follows:

ES ¼

ZZ

X

n o2 Hð/Þ  HðuT ðBT ÞÞ dx dy

ð19Þ

where uT is the deformed template created in Section 2.2. BT represents transformations. The final energy function, which takes into consideration the C– V model and shape prior, is given by:

ZZ

jI  cin j2 Hð/Þdx dy ZZ jI  cout j2 ð1  Hð/ÞÞdx dy þ kout X ZZ ZZ dð/Þjr/jdx dy þ m Hð/Þdx dy þl X ZZ X fHð/Þ  HðuT ðBT ÞÞg2 dx dy þ ks

Eð/; uT ; cin ; cout Þ ¼ kin

ð13Þ

where wi is the weight of each shape force created by minimizing the shape distance;

  1 2 wi ¼ exp  2 d ðHðuÞ; Hðui ÞÞ 2r

ZZ

X

ð20Þ

X

ð14Þ

The coordinates associated to the aligned shape (u, v) and the evolving curve (x, y) are different. We need to use the forward transformed evolving curve /(F(u, v)T) to initialize the reconstructed shape u in the aligned shape coordinates (u, v). We execute (13) till convergence, and then will get the reconstructed

The energy can be minimized using a gradient descent approach:

ds ¼ 2s1 x dt

ZZ

ðHð/Þ X

) @ uT ðBT Þ @ uT ðBT Þ dx dy  HðuT ðB ÞÞÞdðuT ðB ÞÞ u v @u @v T

(

T

ð21Þ

1941

W. Liu et al. / Pattern Recognition Letters 32 (2011) 1937–1947

dh ¼ 2x dt

ZZ

ðHð/Þ X

(

) @ uT ðBT Þ @ uT ðBT Þ dx dy  HðuT ðB ÞÞÞdðuT ðB ÞÞ v þu @u @v T

T

ð22Þ ZZ dT x ¼ 2s1 x dt

ðHð/Þ X

(

 HðuT ðBT ÞÞÞdðuT ðBT ÞÞ 

) @ uT ðBT Þ @ uT ðBT Þ cos h  sin h dx dy @u @v ð23Þ

ZZ dT y ¼ 2s1 x dt

2.4. Computational complexity

ðHð/Þ X

( ) @ uT ðBT Þ @ uT ðBT Þ  HðuT ðBT ÞÞÞdðuT ðBT ÞÞ sin h  cos h dx dy @u @v ð24Þ

@/ ¼ dð/Þ @t



lr 

r/  v  kin ðI  cin Þ2 þ kout ðI  cout Þ2 jr/j

þ 2dð/Þks ðHð/Þ  HðuT ðBT ÞÞÞ

 ð25Þ

The algorithm terminates when both of the following stopping conditions are met: (a) The evolving curve is stationary. (b) The evolving curve and the templates satisfy:

ZZ X

Remark: The optimization of the angle, scale and translation parameters play a very important role in the algorithm. It is theoretically unclear in which order and how frequently one has to alternate between the gradient descent functions (Cremers et al., 2006). Steps (2)–(5) above are our proposed iteration strategy. We have tried many other orders, and find that our ordering yields the best results. The line search method is combined with a gradient descent method for registering the reconstructed shape template with the evolving curve. Most works are using a fixed step. We use the line search method to get the appropriate step for the registration at each iteration. Such line search methods have also been successfully applied to level set approaches in Chan and Tai (2003) and Lathen et al. (2009).

ðHð/Þ  HðuT ðBT ÞÞÞ2 dx dy 6

ZZ X

adðuT ðBT ÞÞdx dy

ð26Þ

where a is a constant value (a = 0.2 in our approach). The first condition is met when Eq. (25) converges. The shape constraint must be satisfied through Eq. (26), namely, this occurs when the steady state of the evolving curve is very close to the shape template. If condition (b) is not satisfied through the curve evolution procedure, it means that the shape energy is not strong enough to impact the evolving curve. Consequently, the process must be repeated by increasing ks or decreasing kin and kout . In summary, the outline of the complete segmentation algorithm with shape prior can be expressed in the following steps: (1) Initialize the zero level set and the angle, translation, and scale parameters of the template uT. (2) Use the line search method (Boyd and Vandenberghe, 2004) to compute the gradient descent function of scales (21). The gradient function of translation (23) and (24) must be iterated until a steady state condition is achieved when the scale is changed. (3) Check whether the change of scale is less than a predefined threshold. If not, go back to the step two. (4) Use the line search method to compute the gradient descent function of angles (22). The gradient function of translation (23) and (24) must be iterated until a steady state condition is achieved when the angle is changed. (5) Check whether the change of the angle is less than a predefined threshold. If not, go back to the step four. (6) Transform the evolving curve to the coordinates of the aligned shape. Reconstruct the shape by evolving (13) till convergence. The steady state shape is the template uT. (7) Iterate the evolving function in Eq. (25) several times, e.g. five times. (8) Check whether stopping conditions (a) and (b) are simultaneously satisfied or a specified maximum number of iterations is reached. If not, go back to the step two.

For the analysis of the computational complexity of our algorithm, three parts need to be considered: registration described by Eq. (21) till Eq. (24) (steps 2–5 in the outline of the complete segmentation algorithm given in Section 2.3), shape reconstruction described by Eq. (13) (step 6 in the outline of the complete segmentation algorithm) and curve evolution described by Eq. (25) (corresponding to steps 7 and 8 in the outline of the complete segmentation algorithm). We apply the narrow band (Adalsteinsson and Sethian, 1995) method for the curve evolution and hence the computations are limited to a narrow band around the zero level set. Thus, the algorithm is of the order O(kN) per iteration, where N is the number of grid points along the curve and k is the number of grid points corresponding to the width of the narrow band. The number of iterations is set to 5. In the shape reconstruction, the computation of the distances between the evolving curve and the aligned training shapes is of the order O(n), where n is the number of aligned training shapes. The curve evolution linked to the shape reconstruction in Eq. (13) is itself of the order O(kN) per iteration. The maximum number of iterations is set to 50. In general, the steady state of Eq. (13) is reached much faster. For registration, backtracking line search is used to minimize the energy function of Eq. (19) which contains all the parameters of Eq. (20) that are relevant for the registration process. We have chosen to let the minimization proceed along the gradient descent direction (Eqs. (21)–(24)) up to a predefined stopping condition – see Algorithms 9.1 and 9.2 in Boyd and Vandenberghe (2004). A large step size is chosen at the start, which is then reduced as illustrated in the example of the rotation, given below. The range of the rotation angle is set to [p/6, p/6] and its accuracy is set to p/360 rad. Two constants have to be chosen in the backtracking line search method (Boyd and Vandenberghe, 2004): a is typically chosen between 0.01 and 0.3 meaning that a decrease in the energy ES of approximately 1–30% is accepted; b is often chosen between 0.1 (which corresponds to a very crude search) and 0.8 (which corresponds to a finer search). We have chosen a = 0.01 and b = 0.8. The stopping condition is (see Algorithm 9.2 in Boyd and Vandenberghe (2004)):

   2 dh dh 6 ES ðhÞ þ as ES h þ s  dt dt

with

dh @ES ¼ dt @h

ð27Þ

where s  ðdh Þ is the step size (we first reduce the full range by half dt through computing the sign of Eq. (22), and then set the initial step size to p/6 or p/6 accordingly). The worst circumstance for the number of iterations p of the rotation angle becomes: p/6  1=60 bp = p/360. Thus, p ¼ log  18:3, which means that we can reach log 0:8 the required accuracy after a number of iterations that is less than or equal to 19. Hence, the computational complexity for the

1942

W. Liu et al. / Pattern Recognition Letters 32 (2011) 1937–1947

backtracking line search method is of logarithmic order with respect to the range of the rotation angle, at fixed accuracy. For every rotation step, the computational complexity using backtracking line search is primarily caused by the computation of ES given in Eq. (19) and the derivatives in Eq. (22). Both are dependent on the transformed level set function, which involves transforming the zero level set and deriving its new SDF. The computational cost of the transformation is of order O(N) and the computational complexity of calculating a new SDF in a narrow band is of order O(kkN) (Adalsteinsson and Sethian, 1995). Once the transformed level set function is obtained, there is still an extra computational cost associated to the calculation of Eqs. (19) and (22): for Eq. (22) this is of order O(N) since a calculation on the zero level set curve is required and for Eq. (19) this is approximately of order O(N2) since a calculation on an area containing the zero level set curve is required. For each change of the rotation angle step size, we apply also the backtracking line search method to get the optimal translational parameters Tx, Ty. In the worst case, the two independent calculations of the translation parameters are then of the order Oðlogb1 translation widthÞ and Oðlogb1 translation heightÞ respectively, when the accuracy is one pixel, the translation_width

(translation_height) is defined as the maximum difference between the widths (heights) of the bounding boxes of the reconstructed shape and the actual curve. In practice, the computation times for finding the optimal translational parameters are small. For each translation step the calculation is of order O(N) for Eqs. (23) and (24) and O(N2) for Eq. (19). The big gain compared to the rotation is that the zero level set does not have to be transformed when only integer translational shifts are considered. Note that, in general the number of steps in the backtracking line search linked to the translation parameters will be low, since only small adjustments have to be made to these parameters for each rotation step. For the scale s, the initial interval is set to [0.68, 1.5]. The accuracy is set to one pixel. The optimization through scale in step 2 of the outline of the complete segmentation can be analyzed in a similar way as for the rotation. 3. Experiments The goal of this section is to assess the performance of our method in complex or complicated backgrounds with noise,

Fig. 4. Segmentation of simulated images occluded by a bar or an ellipse: (a, e, i) are original synthetic images occluded by a bar or an ellipse; (b, f, j) are the initialization images, the red triangles in the image correspond to the reconstructed shape template, the green contour corresponds to the zero level set; (c, g, k) are the segmentation results of our model and (d, h, l) are the segmentation results of the C–V model. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Fig. 5. Segmentation results of a starfish on a noisy background: (a) original image; (b) initial zero level set; (c) result of the C–V model and (d) result of our model.

W. Liu et al. / Pattern Recognition Letters 32 (2011) 1937–1947

1943

Fig. 6. The process of segmentation (from right to left and top to bottom). Note that the continuously decreasing makes the shape guidance more dominant than the image guidance, towards the end of the process.

Fig. 7. Training set: 10 left ventricle (LV) contours.

Fig. 8. Shape templates: alignment results of the left ventricle.

occlusion and clutter. We study the procedure of how to balance the shape force and the image force using our stopping condition presented in Section 2.3. We illustrate that inclusion of our shape prior constraint and the addition of the corresponding shape force in the C–V model improves the results obtained with the original C–V model. Experiment 1 illustrates the capability to recover shapes even when they are partially occluded. Experiment 2 adds some complexity to the segmentation exercise because of the presence of a noisy background and partial occlusion by the background. Experiment 3 shows that contour leakage due to incomplete image information can be eliminated (an often occurring problem in medical imaging due to partial volume effects). Experiment 4 shows the performance differences when the kernel width is changed in the distribution of the shape P(u), given by (11). Experiment 5 examines the computational demands and the segmentation results compared to a method built upon a PCA-based statistical shape model. Our algorithm is coded in C++. All experiments in this section are run in a release version of our code on an Intel Core 2 Quad Q8200 2.33 GHz CPU, 3.5 GB RAM computer. In the first experiment, we apply our method on some synthetic images as shown in Fig. 4. We use only one template, namely a

template of a triangle to guide the extraction of triangles from occluded images. The obtained results are shown in Fig. 4(c), (g), (k). From the results, it can be observed that our method is able to extract the triangles in synthetic images that are occluded by a bar or an ellipse. In the second experiment, we examine the accuracy of our model in segmenting an object, namely a starfish in a slightly more complex background. As shown in Fig. 5(a), the starfish has two arms submerged in sand and its shadow is very similar in intensity to its body or structure. The background is also noisy. All these factors make this segmentation task a challenging one. In the experiments, we used the aligned templates of Fig. 2 as the standard shape prior templates. The green1 contour in Fig. 5(b) is the initialization of the zero level set; the red contours are the initialized prior shape templates. The segmentation results of the C–V model and our model are displayed in Fig. 5(c) and (d), respectively. We assigned relevant parameter values as: kin ¼ kout ¼ 0:1, l = 10, v = 0 and ks ¼ 50. The maximum number of iterations is 130. It can

1 For interpretation of colour in Fig. 5, the reader is referred to the web version of this article.

1944

W. Liu et al. / Pattern Recognition Letters 32 (2011) 1937–1947

Fig. 9. Segmention of the left ventricle in ultrasound images of the heart: (a) and (e) are the original images; (b) and (f) are the initialization images, the red contours correspond to the reconstructed shape templates, the green contour corresponds to the zero level set (the green and red contours are often overlapping and difficult to distinguish in the figure); The green curves in (c) and (g) are the segmentation results of our model; (d) and (h) are the segmentation results obtained by the original C–V model. We used the following parameter values in the experiments: kin ¼ kout ¼ 0:05; l ¼ 10; ks ¼ 80. The iteration numbers of (c), (g) were 45, 30, respectively.

be observed that our model is able to extract the starfish’s arms in the sand, and at the same time exclude its shadow in the final segmentation result. Furthermore, the segmentation result differs from any of the shape templates in the prior shape set and the mean shape, indicating that our model could reconstruct a new contour by combining prior knowledge from the shape templates with the original image. The segmentation procedure is displayed sequentially in Fig. 6. It is difficult to assign values to the constants kin ; kout and ks in Eq. (22) at the onset of the segmentation. Hence, a constant, cn is multiplied to kin ; kout in Eq. (22) where n is the current iteration number and c is less than one (0.95). Promising results are obtained by assigning suitable values to kin ; kout ; ks ; and a suitable maximum iteration number. It was found that Eq. (26) was already satisfied for a lower number of iterations than the maximum iteration number set to 130. In the segmentation process depicted in Fig. 6, it can be observed that the curve first evolves very similarly to the C–V segmentation model. Then, the shape prior constraint gradually affects the evolving curve. As the segmentation process progresses, the set of prior shape templates always approaches to the evolving curve. When the stopping condition (b) in (26) is satisfied, we stop changing cn which will be treated as a fixed parameter in the following iterations. At last, a meaningful result will be achieved when the stopping conditions (a) and (b) are satisfied.

In the third experiment, we analyze the performance of our method on some medical images. Echocardiography is frequently used in the diagnosis of heart diseases. The volume of the left ventricle of the heart (LV) is important to evaluate the heart function. It is useful for the physicians to get an automated segmentation of the LV. Due to the inherent noise and weak or incomplete image information, characteristic for ultrasound images, it is still a challenge to segment the LV accurately. Our proposed model might be a solution to this problem. We obtain a training set of manually-outlined contours from different people as shown in Fig. 7, and submit them to our registration algorithm described in Section 2.1. Ten standard prior templates are obtained as a result, and are displayed in Fig. 8.

Fig. 10. Training data. (a)–(c) Three of the shapes in training set. (d) Result of alignment.

W. Liu et al. / Pattern Recognition Letters 32 (2011) 1937–1947

The final segmentation results are shown in Fig. 9(c) and (g). The results show that our method can segment the LV well whereas the conventional active contour model leaks into the background and produces rough edges in the results shown in Fig. 9(d) and (h). The parameters of kin , kout and ks in this experiment are different from the previous one.

1945

In the fourth experiment, we analyze how the performance of our method is influenced by the parameter w in the kernel width of the distribution of the shape P(u), given by (11). We segment a 3D object with changing viewpoint. Our training set is composed of 46 2D views of a 3D terracotta warrior. The shape of the terracotta warrior has very different modes. Three of them are shown

Fig. 11. Segmentation using different kernel width. (a) and (e) Initialization; (b) and (f) w = 1; (c), (g) w = 0.5; (d) and (h) w = 0.2. The red contours in (b)–(d) and (f)–(h) are the results of the reconstructed shape, and the green contours are the segmentation results. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Fig. 12. Segmentation with different occlusions and scale. The sizes of these images are all 150  200 pixels.

1946

W. Liu et al. / Pattern Recognition Letters 32 (2011) 1937–1947

Fig. 13. Six hands from the shape set.

Fig. 14. Comparison of our method with the PCA-based statistical shape model. The first column is the image and initialization. The second column contains the results of our method. The third column contains results of the PCA-based statistical shape model.

in Fig. 10(a)–(c). We align the training set with respect to translation, rotation, and scaling. The result of the alignment is shown in Fig. 10(d). As mentioned before, there is still an open issue on how to appropriately choose the kernel width. Fig. 11 shows different kernel widths applied in segmentation. Fig. 11(a) is an occluded and noisy image of a terracotta warrior. The green curve is the initialization. Fig. 11(b)–(d) are results of different kernel widths using w = 1, w = 0.5 and w = 0.2, respectively. The space between two legs in Fig. 11(b) is not segmented correctly. But when the kernel width decreases, more accurate results are achieved. That is because a large kernel width often induces oversmoothed results. Fig. 11(e) is the initialization. The result in Fig. 11(f) is far from being correct, since choosing a large kernel allows many unwanted

shapes in the aligned space to act on the deformed shape template. The results in Fig. 11(g) and (h) have more accurate results. So, w = 0.2 is highly recommended in our system. Fig. 12 shows the segmentation results of the terracotta warrior with different scales and different occlusions. Fig. 12(a), (c), (e) and (g) are the initializations and the results are shown in Fig. 12(b), (d), (f) and (h) respectively. To obtain these results, the computational demands are 7s, 9s, 9s and 10s respectively.

Table 1 Computational demand for training.

Shape model learning

Our method

PCA-based shape model

2s

3s

W. Liu et al. / Pattern Recognition Letters 32 (2011) 1937–1947

the Shanghai Basic Research Program (No. 09jc1410502), and by the Belgian Science Policy – BELSPO - IUAP P6/38: NIMI.

Table 2 Computational demand for segmentation.

Experiment Experiment Experiment Experiment Experiment Experiment

in in in in in in

Fig. Fig. Fig. Fig. Fig. Fig.

14(a) 14(d) 14(g) 14(j) 14(m) 14(p)

1947

Our method

PCA-based Shape Model

9s 6s 6s 8s 7s 7s

4s 3s 3s 3s 5s 4s

In the last experiment, we compare our method with the method of the PCA-based statistical shape model presented by Tsai et al. (2004). Segmentation results and computational demands are compared in this experiment. We align 56 shapes of hands to train the shape model. Six of these aligned shapes are shown in Fig. 13. The size of the images in Figs. 13 and 14 are all 180  135 pixels. The computational demands in the training stage are listed in Table. 1. All of the images in the training set were aligned before. Our segmentation results and the PCA-based results are shown in the second column and third column of Fig. 14, respectively. Clearly, the results of our method are more accurate than the PCA-based method. The latter suffers from the incoherent linear combination of signed distance functions that is used in PCA (in the calculation of the mean shape; in the calculation of the principal components and in the linear combination of the mean shape with some weighted principal components during the segmentation). For complex shapes with many concave parts, the mean shape, as calculated in Tsai et al. (2003), wipes out the protrusions and produces a strongly smoothed mean shape. The computational demands are listed in Table. 2. Our method requires almost twice the computation time needed for the PCA-based statistical shape model, but, it yields more accurate segmentation results. 4. Conclusions A method composed of an implicit active contour model and a shape prior constraint has been presented in this paper. The originality of the method resides in the construction of the shape prior constraint, obtained by reconstructing a deformable template from a set of aligned shape templates. In the proposed approach, every aligned template from the set of prior shapes has a weighted contribution to the deformable shape template which will be exerted on the active contour. It is not easy to balance the impact of the image guidance on the active contour and the shape prior. Therefore, in this paper, a diminishing parameter, cn (c < 1) operates on the traditional active contour model, to allow the shape prior constraint to progressively affect more and more the evolving curve. The diminishing parameter is set to a fixed value when the stopping condition is satisfied. The active contour model and the shape prior constraint converge to a steady state, producing final segmentation results that are significantly better than those obtained by the traditional active contour model. The results obtained show that our algorithm can extract objects in the presence of noise, clutter and occlusions. Acknowledgements We thank Maxine Tan of Vrije Universiteit Brussel (ETRO) for the helpful revision of the manuscript. This paper was supported by the National Basic Research Program of China (No. 2010CB732506) and

References Adalsteinsson, D., Sethian, J.A., 1995. A fast level set method for propagating interfaces. J. Comput. Phys. 118 (2), 269–277. Boyd, S., Vandenberghe, L., 2004. Convex Optimization. Cambridge University Press, New York, pp. 463–484. Caselles, V., Kimmel, R., Sapiro, G., 1997. Geodesic Active Contours. Int. J. Comput. Vision 22 (1), 61–79. Chan, T.F., Vese, L.A., 2001. Active contours without edges. IEEE Trans. Image Process. 10 (2), 266–277. Chan, T.F., Tai, X.C., 2003. Level set and total variation regularization for elliptic inverse problems with discontinuous coefficients. J. Comput. Phys. 193 (1), 40– 66. Chan, T.F., Zhu, W., 2005. Level set based shape prior segmentation. In: Proc. IEEE Internat. Conf. on Computer Vision and Pattern Recognition, San Diego, Canada, pp. 1164–1170. Chen, S.Q., Radke, R.J., 2009. Level set segmentation with both shape and intensity priors. In: Proc. IEEE Internat. Conf. on Computer Vision, Kyoto, Japan, pp. 763– 770. Chen, Y.M., Tagare, H.D., Thiruvenkadam, S., Huang, F., Wilson, D., Gopinath, K.S., Briggs, R.W., Geiser, E.A., 2002. Using prior shapes in geometric active contours in a variational framework. Int. J. Comput. Vision 50 (3), 315–328. Cremers, D., Rousson, M., Deriche, R., 2007. A review of statistical approaches to level set segmentation: integrating color, texture, motion and shape. Int. J. Comput. Vision 72 (2), 195–215. Cremers, D., Tischhauser, F., Weickert, J., Schnorr, C., 2002. Diffusion snakes: Introducing statistical shape knowledge into the Mumford–Shah functional. Int. J. Comput. Vision 50 (3), 295–313. Cremers, D., Kohlberger, T., Schnorr, C., 2003. Shape statistics in kernel space for variational image segmentation. Pattern Recognition 36 (9), 1929–1943. Cremers, D., Osher, S.J., Schnorr, C., 2006. Kernel density estimation and intrinsic alignment for shape priors in level set segmentation. Int. J. Comput. Vision 69 (3), 335–351. Gonzalez, R.C., Woods, R.E., 2003. Digital Image Processing, second ed. Pearson Education Asia Limited, Beijing, pp. 523–525, 650–653. Yang, J., Duncan, J.S., 2004. 3D image segmentation of deformable objects with joint shape-intensity prior models using level sets. Med. Image Anal. 8 (3), 285–294. Kass, M., Witkin, A., Terzopoulos, D., 1988. Snakes: Active contour models. Int. J. Comput. Vision 1 (4), 321–331. Lathen, G., Anderson, T., Lenz, R., Borga, M., 2009. Momentum based optimization methods for level set segmentation. In: SSVM, Lecture Notes in Computer Science, pp. 124–136. Leventon, M.E., Grimson, W.E.L., Faugeras, O., 2000. Statistical shape influence in geodesic active contours. In: Proc. IEEE Internat. Conf. on Computer Vision and Pattern Recognition, Hilton Head Island, South Carolina, pp. 316–323. Melonakos, J., Pichon, E., Angenent, S., Tannenbaum, A., 2008. Finsler active contours. IEEE Trans. Pattern Anal. Machine Intell. 30 (3), 412–423. Munim, H.E., Farag, A.A., 2007. Curve/surface representation and evolution using vector level set with application to the shape-based segmentation problem. IEEE Trans. Pattern Anal. Machine Intell. 29 (6), 945–958. Osher, S., Fedkiw, R., 2003. Level Set Methods and Dynamic Implicit Surfaces. Springer, New York, pp. 69–74. Paragios, N., Mellina-Gottardo, O., Ramesh, V., 2004. Gradient vector flow fast geometric active contours. IEEE Trans. Pattern Anal. Machine Intell. 26 (3), 402– 407. Paragios, N., Rousson, M., Ramesh, V., 2002. Matching distance functions a shape-toarea variational approach for global-to-local registration. In: Proc. 7th Eur. Conf. Computer Vision, Copenhagen, Denmark, pp. 775–789. Rousson, M., Paragios, N., 2002. Shape priors for level set representations. In: Proc. 7th Eur. Conf. Computer Vision, Copenhagen, Denmark, pp. 78–92. Tsai, A., Wells, W., Tempany, C., Grimson, E., Willsky, A., 2004. Mutual information in coupled multi-shape model for medical image segmentation. Med. Image Anal. 8 (4), 429–445. Tsai, A., Yezzi, A., Wells, W., Temany, C., Tucker, D., Fan, A., Grimson, W.E., Willsky, A., 2003. A shape-based approach to the segmentation of medical imagery using level sets. IEEE Trans. Med. Imaging 22 (2), 137–154. Vese, L.A., Chan, T.F., 2002. A multiphase level set framework for image segmentation using the Mumford and Shah model. Int. J. Comput. Vision 50 (3), 271–293. Xie, X.X., Mirmehdi, M., 2008. MAC: Magnetostatic active contour model. IEEE Trans. Pattern Anal. Machine Intell. 30 (4), 632–646. Zhu, S.C., Yuille, A., 1996. Region competition: Unifying snakes, region growing, and Bayes/MDL for multiband image segmentation. IEEE Trans. Pattern Anal. Machine Intell. 18 (9), 884–900.