A version of the Swendsen–Wang algorithm for restoration of images degraded by Poisson noise

A version of the Swendsen–Wang algorithm for restoration of images degraded by Poisson noise

Pattern Recognition 36 (2003) 931 – 941 www.elsevier.com/locate/patcog A version of the Swendsen–Wang algorithm for restoration of images degraded b...

595KB Sizes 0 Downloads 72 Views

Pattern Recognition 36 (2003) 931 – 941

www.elsevier.com/locate/patcog

A version of the Swendsen–Wang algorithm for restoration of images degraded by Poisson noise  S lawomir Lasotaa , Wojciech Niemirob; ∗ b Institute

a Institute of Informatics, University of Warsaw, Banacha 2, 02-097 Warszawa, Poland of Applied Mathematics and Mechanics, University of Warsaw, Banacha 2, 02-097 Warszawa, Poland

Received 26 August 1999; accepted 19 November 2001

Abstract An algorithm for restoration of images degraded by Poisson noise is proposed. The algorithm belongs to the family of Markov chain Monte Carlo methods with auxiliary variables. We explicitly use the fact that medical images consist of 2nitely many, often relatively few, grey-levels. The continuous scale of grey-levels is discretized in an adaptive way, so that a straightforward application of the Swendsen–Wang (Phys. Rev. Lett. 58 (1987) 86) algorithm becomes possible. Partial decoupling method due to Higdon (J. Am. Statist. Assoc. 93 (1998) 442, 585) is also incorporated into the algorithm. Simulation results suggest that the algorithm is reliable and e:cient. ? 2002 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. Keywords: Bayesian image restoration; Gibbs distributions; Gibbs sampler; Swendsen–Wang algorithm; Markov chain Monte Carlo; Intensity estimation

1. Introduction Let us 2rst describe the general structure of a family of Bayesian models used for restoration of grey-level images. We focus on models in which the grey-levels correspond to intensities of Poisson distributed counts. Consider a 2nite set S of sites or pixels. We interpret S as a suitably discretized region on a plane. A collection  = (s )s∈S of non-negative real numbers represents an image “painted” with a continuous, ordered palette. One can consider s to be the intensity or grey-level assigned to pixel s ∈ S. Suppose con2guration  is not directly observed; instead we observe a collection x = (xs )s∈S of independent random variables, where xs has Poisson distribution with mean s . The likelihood is thus given by  − sxs p(x|) = e s (1.1) ; (xs = 0; 1; : : :): xs ! s  This work was supported by the State Committee for Scienti2c Research (KBN) under grant 8 T11C 019 11. ∗ Corresponding author. Tel.: +48-22-55-44-44-1. E-mail address: [email protected] (W. Niemiro).

The aim is to restore the original image , given the “noisy image” x. This problem arises naturally in emission tomography and other applications, too. The standard approach to image restoration is to adopt Bayesian methodology. A variety of Bayesian models have been successfully used in imaging, at least since Geman and Geman published their inHuential paper [1]. Let us give a brief overview of the basic ideas and state of the art in the area. For a more comprehensive review, see the monograph of Winkler [2]. According to the Bayesian paradigm, a prior probability distribution on the space of original images is introduced. In the case of grey-level images, which are our primary concern, the space is RS + . As usual in Bayesian statistics, the choice of the prior is subjective to some extent. This distribution should reHect our beliefs about the expected “degree of smoothness” of image . At the same time, it should lead to a computationally tractable model. To construct such a prior, we equip S with some structure of neighborhoods which reHects spatial relations between sites. Hence we consider a graph (S; K), where K is viewed as the set of edges that connect adjacent sites. For simplicity write s; t instead of {s; t} ∈ K. We can use a prior which

0031-3203/02/$30.00 ? 2002 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. PII: S 0 0 3 1 - 3 2 0 3 ( 0 2 ) 0 0 0 8 0 - 8

932

S. Lasota, W. Niemiro / Pattern Recognition 36 (2003) 931 – 941

is a Gibbs distribution of the following form:   

(s ; t ) ; () ˙ exp −

(1.2)

s; t

where symbol “˙” indicates that both sides are proportional as functions of . A suitably chosen function (prior potential) quanti2es dissimilarity between levels s and t . For example, Geman and McClure [3] put (s ; t ) = −[1 + c(s − t )2 ]−1 ; see also Geman [4]. Coe:cient ¿ 0 plays the role of a “smoothing parameter”. The posterior density (|x) = x () is given by x () ˙ p(x|)():

(1.3)

Note that the likelihood forces x () to concentrate on con2gurations  that “resemble the degraded image” x, while the prior acts “in favor of smoother con2gurations”. For the Poisson noise model, we get      x () ˙ exp − s + xs log s −

(s ; t ) : s

s

s; t

(1.4) Formula (1.4) follows immediately from (1.1), (1.2) and (1.3). The basic advantage of the prior of the form (1.2), apart from its intuitive appeal, is that the corresponding posterior (1.4) is also a Gibbs distribution. Put diNerently, Eq. (1.4) de2nes a probability distribution of a Markov random 2eld (MRF) with respect to graph K, see Geman [4]. Consequently, Markov chain Monte Carlo (MCMC) techniques can be used to sample approximately from x (), leading to Bayesian or maximum a posteriori estimates of . Standard tools are the Gibbs sampler (GS) or the Metropolis algorithm (MA). These algorithms update iteratively con2guration , site by site. Unfortunately, they may converge too slowly to the target distribution. There is an MCMC algorithm which is believed to perform better than either GS or MA, namely the Swendsen– Wang algorithm (S–W), designed originally for the Ising and Potts models [5]. To make further developments more clear, let us 2rst introduce the Potts model and the S–W algorithm in their original form. Suppose A is a 2nite set. Its elements can be interpreted as distinct colors. Consider a collection z = (zs )s∈S of colors assigned to sites. Now, the space of con2gurations is AS . By equipping this space with the Gibbs distribution    1(zs = zt ) ; (1.5) (z) ˙ exp − s; t

we obtain the Potts model. A precise and self-contained derivation of the S–W algorithm can be found in Edwards and Sokal [6]. However, we will repeat the derivation in a more general setting in Section 3. As a by-product, proofs of the facts to be mentioned

now will be provided. The idea of S–W is to introduce a collection  = (st ){s; t}∈K of auxiliary 0 –1 random variables, assigned to unordered pairs of adjacent sites. A joint probability distribution (z; ) is de2ned on the augmented space of con2gurations AS × {0; 1}K in such a way that it yields the marginal distribution (z) on AS given by Eq. (1.5). Actually,  − (z; ) ˙ [e 1(st = 0) s; t

+ (1 − e− )1(st = 1; zs = zt )]:

(1.6)

We interpret st as bond variables: say, an edge s; t is bound if st = 1 and free if st = 0. Given , con2guration z must satisfy zs = zt for all edges with st = 1. Therefore, the conditional distribution is concentrated on z-con2gurations which are constant on connected components of the graph of bound edges. Moreover, colors assigned to these components are (conditionally on ) independent and have uniform distribution on A. Put concisely, the colors of components are chosen at random. The conditional distribution of  given z is also simple. All st are conditionally independent. If zs = zt , then st = 0 with probability 1. If zs = zt , then st = 0 with probability e− or st = 1 with probability 1 − e− . Summing up, both conditional distributions (z|) and (|z) are easy to sample from. The S–W algorithm is the following. Start with an arbitrary z. Then repeat the following two steps: sample  from (|z); keeping z 2xed; sample z from (z|); keeping  2xed: Updating alternately  and z in this way, we obtain a sequence of (z; )-con2gurations which is a Markov chain converging to distribution (1.6). In particular, the sequence of z-components converges to the Potts distribution given by Eq. (1.5). It is clear from the above description that the original S–W procedure cannot cope directly with continuous grey-level images. The Potts distribution (1.5) is suitable for discrete con2gurations z but not for grey-level con2gurations . Higdon [7] describes a version of S–W which works in the continuous space RS + , provided that function which appears in Eq. (1.2) is shift-invariant. This means that

(s ; t ) depends only on |s − t |. We are going to propose another way of overcoming the di:culties posed by grey-level models. Our idea is to discretize the grey-level scale adaptively. Then we can use the Potts prior, thus making a straightforward application of S–W possible. In the next section we describe a suitable modi2cation of the initial Bayesian model given by Eqs. (1.1) and (1.2). 2. A model with adaptively discretized grey-level scale Assume that image  contains only a relatively small number of distinct grey-levels. Suppose we know a priori

S. Lasota, W. Niemiro / Pattern Recognition 36 (2003) 931 – 941

how many levels are present, but we do not know what these levels are (recall that grey-level s is an arbitrary non-negative real). We can model such a situation as follows. Let A be a 2nite set; its elements are now interpreted as labels of distinct grey-levels. A collection  = ((a))a∈A of non-negative reals represents the “actual palette”, that is, the grey-levels themselves. Assume s = (zs );

(2.1)

where zs ∈ A, so that z =(zs )s∈S is a con2guration of labels. The Poisson likelihood (1.1) now becomes  −(z ) (zs )xs e s (2.2) p(x|z; ) = ; (xs = 0; 1; : : :): xs ! s In order to restore  we have to identify z and the “actual palette” . Both z and  will be treated as random con2gurations with independent prior distributions, de2ned on spaces AS and RA + , respectively. Let the prior for z be the Potts model distribution. For , we choose an improper prior uniform on RA . Therefore, we assume that a priori  +   (z; ) ˙ exp − 1(zs = zt ) : (2.3) s; t

The choice of prior is always a subjective decision. Let us comment on Eq. (2.3) and provide some justi2cations. Numerous applications show that the Potts distribution is an appropriate prior for discrete images. On the other hand, the improper uniform distribution of  is a typical non-informative prior, suitable for describing our lack of knowledge about the magnitude of Poisson intensities. Choosing this prior, we just allow the levels  to be a posteriori determined by the magnitudes of observed counts x. The posterior distribution is Gibbs,    x (z; ) ˙ exp − (zs ) + xs log (zs ) s





s



1(zs = zt ) :

(2.4)

s; t

Of course, we are now speaking of con2gurations (z; ) ∈ AS × RA +: Note that Eq. (2.4) de2nes the probability distribution of a MRF with respect to the following graph: the set of vertices is S ∪ A; the set of edges is K ∪ (A × S). Put diNerently, we augment the basic graph (S; K) by adding the set A of labels to the set S of sites and joining every new vertex with all sites in S. Although the structure of neighborhoods is non-local, it turns out to be quite easy to handle.

933

Wang construction. We follow the derivation of S–W given by Edwards and Sokal [6], see also Ref. [8]. For convenience, let us trace the key steps of the construction and adjust them to our setting. We have more variables and more complicated potential than in the standard Potts model. In spite of this, the reader will notice that everything works equally well. Consider the posterior distribution given by Eq. (2.4). From now on, x can be considered 2xed. We can rewrite Eq. (2.4) in the following form:  x (z; ) = qx (z; ) b(zs ; zt ); (3.1) s; t

where qx (z; ) ˙ p(x|z; ) and b(zs ; zt ) = exp [ − 1(zs = zt )] =



e− if zs = zt ; 1

if zs = zt :

(3.2)

In order to introduce bond variables st , let us 2rst de2ne some auxiliary continuous variables ust and then discretize them. Let u = (ust ){s; t}∈K be a collection of random variables which are conditionally independent, with ust uniformly distributed on [0; b(zs ; zt )], given (z; ). They do not depend on  and x. Thus, x (u|z; ) = (u|z), where  (u|z) = b(zs ; zt )−1 1(ust 6 b(zs ; zt )): (3.3) s; t

Therefore, the joint distribution of (z; u; ) is  1(ust 6 b(zs ; zt )) x (z; u; ) = qx (z; ) s; t



= qx (z; )

[1(ust 6 e− ; zs = zt )

s; t

+ 1(zs = zt )]

(3.4)

(we implicitly assume 0 6 ust 6 1 throughout). Consequently, x (z|u; ) ˙ qx (z; )1(z satis2es ust 6 b(zs ; zt ) for every s; t):

(3.5)

Since our “interactions” (3.2) assume only values 1 or e− , the indicator function in the above formula depends on u only through variables st =1(ust ¿ e− ). Now, we are in a position to discard the auxiliary variables ust and keep only the con2guration of indicator variables  = (st ). Rewrite Eq. (3.5) as x (z|; ) ˙ qx (z; )1(z ∈ R());

(3.6)

where R() = {z : zs = zt for every s; t such that st = 1}: (3.7)

3. The Swendsen–Wang bond variables The chief advantage of the model described in the previous section is that we can directly carry out the Swendsen–

The “interactions” disappear from the conditional distribution. Given , dependencies between the z variables are reduced to the condition z ∈ R(), since qx (z; ) is proportional to a product distribution on the space of z-con2gurations.

934

S. Lasota, W. Niemiro / Pattern Recognition 36 (2003) 931 – 941

It remains to notice that Eq. (3.3) implies that x (|z; ) depends neither on  nor on x and is equal to  (st |zs ; zt ); (3.8) (|z) = s; t

where (st = 0|zs ; zt ) = 1;

(st = 1|zs ; zt ) = 0 for zs = zt ;

(st = 0|zs ; zt ) = e− ; =1−e



(st = 1|zs ; zt )

for zs = zt :

In summary, on the augmented space of con2gurations (z; ; ) ∈ AS × {0; 1}K × RA +; we have constructed a joint probability distribution such that marginal distribution of (z; ) is Eq: (2:4); conditional distribution of z; given (; ) is Eq: (3:6); conditional distribution of ; given (z; ) is Eq: (3:8): An explicit formula for the joint probability distribution of (z; ; ) can be obtained from Eq. (3.4) by taking integrals 1  e− · · · dust for st = 0 and e− · · · dust for st = 1, respec0 tively. We get  − [e 1(st = 0) x (z; ; ) ˙ qx (z; ) s; t

+(1 − e− )1(st = 1; zs = zt )]:

(3.9)

However, we will not use Eq. (3.9) in the sequel; only formulae (3.6) and (3.8) for conditional distributions are really needed. They will be used in the next section to construct an algorithm for sampling approximately from Eq. (2.4). Let us note in passing that the classical S–W construction for the simple Potts model can be extracted from the above considerations. To see this, just replace the posterior x (z; ) in Eq. (3.1) by the Potts prior (z) given by Eq. (1.5). Then, factor qx (z; ) disappears from Eqs. (3.6) and (3.9). Therefore, Eq. (3.9) reduces to Eq. (1.6) and Eq. (3.8) remains the same. 4. Gibbs sampler on the augmented space We are interested in the con2guration of Poisson intensities . The aim is to sample from the posterior distribution of , given the observed con2guration x. In the model described in Section 2, we have (s ) = ((zs )). To sample (z; ) approximately from x (z; ) given by Eq. (2.4), we will use a Gibbs sampler on the augumented space of con2gurations (z; ; ), 2nally dismissing the auxiliary . Note that x is 2xed. The algorithm is the following. Start with arbitrarily chosen (z; ). Then iterate the following loop: sample  from x (|z; ); given (z; ); sample z from x (z|; ); given (; ); sample  from x (|z; ); given (z; ):

The well-known general properties of GS imply that the algorithm generates a Markov chain convergent to x (z; ; ). After a su:ciently large number of steps, the algorithm produces con2gurations (z; ) sampled approximately from x (z; ). The corresponding con2gurations  are our estimates of the grey-level image. We proceed to a more detailed description of the three steps. • -Step: x (|z; ) depends only on z and it is the product of univariate distributions (st |zs ; zt ) given by Eq. (3.8). For each edge s; t, we independently sample st : if zs = zt , then we put st = 0; otherwise we choose st = 0 with probability e− or st = 1 with probability 1 − e− . This step is exactly the same as in the classical S–W algorithm for the Potts model. • z-Step: (z|; ) is given by Eq. (3.6). It is proportional to the likelihood on the set R() and vanishes outside R(). Now, we can exploit the simple form of qx (z; ) ˙ p(x|z; ) given by Eq. (2.2)

  qx (z; ) ˙ exp − (zs ) + xs log (zs ) : (4.1) s

s

From the above formula and Eq. (3.6) we get

  (zs ) + xs log (zs ) 1(z ∈ R()): x (z|; ) ˙ exp − s

s

Let K = {{s; t} ∈ K : st = 1}. Consider all connected components of the graph (S; K  ). We get a disjoint partition of the set of sites, say S = c Sc , where superscript c indexes the collection of components (possibly singletons). If z ∈ R(), then z is constant on each component. Accordingly, let zs = z c for s ∈ Sc . On R(),

 c c  c c x (z|; ) ˙ exp − N (z ) + X log (z ) ; c

c

of Sc and X c = s∈Sc xs . where N stands for cardinality

Therefore, x (z|; ) = c x (z c |Sc ; ), where x (·|Sc ; ) is a probability distribution on A de2ned by c

x (a|Sc ; ) ˙ exp [ − N c (a) + X c log (a)]

(a ∈ A):

Thus, we 2rst identify the connected components Sc and then, for each component independently, sample z c from x (·|Sc ; ) given above: • -Step: x (|z; ) depends only on z and it is proportional to qx (z; ) ˙ p(x|z; ). For a ∈ A, let Sa = {s ∈ S : zs = a}: Let Na be the cardinality of Sa and Xa = s∈Sa xs . Notice the diNerence between subscript a ∈ A and superscript c, which indicates the connected components. Now, from

S. Lasota, W. Niemiro / Pattern Recognition 36 (2003) 931 – 941

where summands indexed by superscripts c and d correc spond to distinct components (not single sites). Here, N is c c the number of sites in S and X = s∈Sc xs , as in Section 3. Symbol c; d means that two components Sc and Sd have adjacent sites. The number of adjacencies is denoted by

Eq. (4.1) we obtain

  Na (a) + Xa log (a) : qx (z; ) ˙ exp − a

Therefore, x (|z; ) =

a a x ((a)|Sa ), where

x (l|Sa ) ˙ exp [ − Na l + Xa log l];

(l ¿ 0):

Notice that x (·|Sa ) is the density of the Gamma distribution with shape parameter Xa + 1 and scale parameter Na . Hence, for each a ∈ A independently, we sample (a) from Gamma(Xa + 1; Na ): 5. Partial decoupling Although the algorithm described in the previous section is satisfactory, its considerable improvement can be achieved by application of the partial decoupling method due to Higdon [7]. Let us 2x an arbitrary number  such that 0 6  6 1. Suppose that instead of Eq. (3.3) we put  (u|z; ) = b(zs ; zt )− 1(ust 6 b(zs ; zt ) ): (5.1) s; t

Consequently, x (z; u; ) = qx (z; )



Bcd = #{s; t : s ∈ Sc ; t ∈ Sd }: Note that every con2guration z ∈ R() can be regarded as a family (z c ) of random variables indexed by the components Sc . Formula (5.3) de2nes, conditionally on , the probability distribution of a Markov random 2eld on con2gurations (z c ). The trick borrowed from Higdon allows us to replace the original grid of sites by a coarser grid, by lumping together sites inside connected components. The z-step of the algorithm is more di:cult. We can no longer update each z c independently. Instead, we resort to the Gibbs sampler, which works on a coarser grid. Speci2cally, given values z d for components Sd adjacent to Sc , we update z c as a with probability proportional to x (a|Sc ; {S d : c; d}; )  ˙ exp −N c (a) + log X c (a) − (1 − )

b(zs ; zt )1− 1(ust 6 b(zs ; zt ) ):

s; t

Suppose st = 1(ust ¿ e− ). Let R() be de2ned just as before, by Eq. (3.7). Now, instead of Eq. (3.6), we obtain  x (z|; ) ˙ qx (z; ) b(zs ; zt )1− 1(z ∈ R()): (5.2) s; t c

If we partition S into a sum of connected components S of the graph (S; K ), it is still true that x (z|; ) is concentrated on R(), the set of z-con2gurations constant on each component Sc . However, the components are no longer independent. Use Eqs. (4.1) and (3.2) to rewrite Eq. (5.2) as follows: x (z|; )



˙ exp −

 s



(zs ) +



xs log (zs ) − (1 − )

s



1(zs = zt ) 1(z ∈ R()):

s; t

If z ∈ R(), let z c = zs for s ∈ Sc . Now, we can group terms in the last formula to get, on R(), x (z|; )



˙ exp −



N c (z c ) +

c

 d:c;d



X c log (z c ) − (1 − )

c

 Bcd 1(z c = z d ) ;

935

(5.3)



 B 1(a = z ) : cd

d

d:c;d

We propose several, say k, cyclic sweeps of the Gibbs sampler in a single z-step. On the other hand, the -step requires only a minor adjustment. Notice that (st |zs ; zt ) is now given by (st = 0|zs ; zt ) = e− ;

(st = 1|zs ; zt ) = 1 − e−

for zs = zt (and st = 0 for zs = zt ). The -step remains exactly the same as in Section 3. The partial decoupling algorithm is a compromise between S–W and the single site GS. For  = 1, we obtain the pure S–W. For  = 0, the connected components reduce to singletons and we simply get the Gibbs sampler. Note that the choice of the control parameter  does not aNect the posterior distribution given by Eq. (2.4). The speed of convergence to this distribution seems to depend on , though. The other control parameter in our algorithm, , plays a different role. The choice of alters the inHuence of the prior on the posterior distribution. 6. Simulation results In our experiments, we 2rst generated a “phantom”, that is a synthetic ideal image con2guration  = (s ). Then independent Poisson variables xs were simulated, with intensities

936

S. Lasota, W. Niemiro / Pattern Recognition 36 (2003) 931 – 941

Fig. 1. Phantom and Poisson degradation, the di:cult example.

s taken from the phantom. Con2guration x = (xs ), treated as a “degraded image”, was used as input of our restoration ˆ was treated as algorithm. Finally the output, denoted by , restored image. In our synthetic examples, we were able to compare the result of restoration ˆ with the groundtruth — both visually and numerically. We experimented with versions of the widely used brain phantom of Vardi et. al. [9]. The phantom emulates objects of inference in single photon emission computed tomography (SPECT) or positron emission tomography (PET). A radioisotope is introduced into a patient and its location in, say, patient’s brain is to be reconstructed given some indirect measurements of photon emissions. One of our ultimate goals has been to use the algorithm described in this paper as a part of an algorithm for reconstruction of SPECT images. This explains the choice of “ideal images” we used in the experiments described below. However, to cope with indirectly observable emissions, we have to include an additional “reconstruction step” in the algorithm. The details go beyond the scope of this paper and will be reported in Lasota et al. [10]. The image area is disc shaped. The disc is discretized in almost the same way as in Silverman et al. [11], the diNerence between the two discretizations having no bearing on our results. We 2rst divide the disc into 64 rings of equal width by drawing circles of radius ir=64, i = 1; 2; : : : ; 64, where r is the radius of the detector ring. The innermost circle, which corresponds to i = 1 and determines the innermost ring, is divided into four arcs of the same length. In this way, four innermost pixels are obtained. The second circle (i.e., the circle corresponding to i = 2) is divided into eight arcs of the same length, forming eight pixels in the second ring. One boundary of each pixel in the second ring lies on the same radius as corresponding boundary of a pixel in the 2rst (innermost) ring (another boundary of the former pixel stems from the middle of the arc determining the latter pixel). Analogously, 16 pixels are formed in the third ring. In the fourth ring, again 16 pixels are formed, with boundaries on the same radii as the boundaries of pixels in the third ring. In the same fashion, 32 pixels are formed

in each of the next four rings, i.e., rings nos. 5 –8. Further, again in the same way as until now, 64 pixels are formed in rings nos. 9 –16, 128 pixels in rings nos. 17–32 and 256 pixels in rings nos. 33– 64 (ring no. 64 being the outermost one). The only diNerence with the discretization in Silverman et al. [11] is that here, if a radius becomes a boundary radius for a pixel in ring k, it retains this property for a pixel in ring k + 1; k + 2; : : : ; 64; the other authors impose some additional “seams” between the 2j th and 2j +1th rings, j = 1; : : : ; 5, as suitable for the method of smoothing they propose. The whole construction amounts to discretizing the disc into 10,924 pixels. The phantom used in our experiments is composed of four ellipses with intensities, respectively, 0:6w for the large ellipse, 0:7w for the small one and 1:5w and 2:0w for the medium ellipses. In the background, Poisson activity with intensity w is assumed. Factor w is a “measure of exposure”. In tomography, w can be interpreted as the length of time for which we observe emissions. All the Poisson intensities grow proportionally to w. For the degraded image x, the ratio of signal to noise increases with w. This is because the expectation √ of xs (signal) is s , while the standard deviation (noise) is s . In the experiments to be reported, we put w= 10 (a “di:cult example”) and w = 30 (an “easy example”). Our phantom  and the degraded image x are shown in Fig. 1. This is the di:cult example, with w = 10. The easy example, with w = 30, is shown in Fig. 2. We can see that both phantoms in Figs. 1 and 2 look exactly the same. This is because our visualization device takes into account only “relative” intensities, so the visible images are independent of the exposure factor w. Note that in medical diagnosis only these relative intensities really count and should be shown in the image. On the other hand, our algorithm is designed to restore the actual, absolute values of intensities. In the sequel we report the results obtained by our restoration procedure. The control parameters of the algorithm were the following. The number of distinct grey-levels, #A, was always set to 10 (note that our phantom has exactly 2ve levels). The

S. Lasota, W. Niemiro / Pattern Recognition 36 (2003) 931 – 941

Fig. 2. Phantom and Poisson degradation, the easy example.

Fig. 3. Restoration after 5, 10, 15, 20, 25 and 30 steps, the di:cult example.

937

938

S. Lasota, W. Niemiro / Pattern Recognition 36 (2003) 931 – 941

Fig. 4. Restoration after 5, 10, 15, 20, 25 and 30 steps, the easy example.

“smoothing parameter” was altered in the course of the procedure in the following way: Step

1–5

6 –15

16 –25

26 –30



1

2

3

4

The “decoupling parameter”  was equal to 0.25. The number of the coarse grid GS steps inside every z-step, denoted by k, was 5. At the beginning of the restoration procedure, we had to choose initial con2gurations  and z. Initial values of the ten ’s were (a) = xmax (a + 1=2)=10 for a = 0; 1; : : : ; 9, where xmax stands for the maximum xs . For each pixel s, we selected zs which minimizes |(zs ) − xs | as our initial guess.

The restored images ˆ obtained after 5, 10, 15, 20, 25 and 30 steps of our algorithm are shown in Fig. 3 (di:cult example) and in Fig. 4 (easy example). Apart from visual comparisons, the progress of the restoration procedure was recorded in numerical terms. In order to assess the discrepancy between restored images and the phantom, we computed the following distance (average diNerence of intensities):  ˆ ) = 1 d(; |ˆs − s |: #S s∈S Moreover, we computed the “posterior potential” de2ned as    s − xs log s + 1(s = t ) Hx () = s

s

s; t

S. Lasota, W. Niemiro / Pattern Recognition 36 (2003) 931 – 941 Table 1 Discrepancy between phantom and restored image for the “di:cult example” Step

0

5

10

15

20

25

30

ˆ d(; ) 2.51 1.72 0.53 0.32 0.21 0.25 0.22 ˆ − Hx () − Hx () 15 041 8335 3681 2318 2076 340

Table 2 Discrepancy between phantom and restored image for the “easy example” Step

0

ˆ d(; ) 4.00 ˆ − Hx () − Hx ()

5

10

15

20

25

30

1.94 7228

0.20 479

0.17 271

0.19 16

0.16 9

0.20 −29

for the phantom con2guration  and similarly for the reˆ In view of the fact that the constored con2gurations . 2gurations under consideration have 2nitely many diNerent grey-level values, −Hx () is exactly equal to the exponent in Eq. (2.4). Consequently, the posterior distribution x () in our Bayesian model is proportional to e−Hx () . Therefore, samples from the posterior should be concentrated near the minimum of H , which is presumably close (but not equal) to the phantom con2guration. Table 1 describes the experiment illustrated by Fig. 3. Table 2 corresponds to Fig. 4. We also provide a summary description of con2gurations ˆ we give the intensities and the number of pixels  and : to which particular intensities are assigned. Table 3 helps to compare the phantom and the reconstruction obtained after 25 steps for the di:cult example. Table 4 summarizes restoration after 30 steps for the easy one. Let us comment on the choice of the control parameters in our algorithm: , , k. Appropriate values of these parameters were selected after several trial and error experiments.

939

Theoretically, neither  (the “degree of decoupling”), nor k (the number of sweeps of GS inside each z-step) has any inHuence on 2nal results: recall that the algorithm converges to a posterior distribution which is independent of these two parameters. In our experiments, the inHuence of the choice of k was indeed negligible. However, a judicious choice of  seemed to accelerate convergence. The choice of is crucial. This is the parameter responsible for the “degree of smoothing”. The larger the , the more the noise eliminated from the image. On the other hand, large can cause the algorithm to misinterpret some 2ne features of the image as “noise” and eliminate them, too! Fig. 3 shows what happened, when was increased to four in the last 2ve steps: the small ellipse was completely obliterated. We argue that an appropriate value of must inevitably depend on some prior knowledge. We have to guess what is the “degree of smoothness” in the true image. In fact, in our experiments we decided to alter in the course of the procedure. In several initial steps, is small and then it gradually increases to some saturation value. Although we cannot justify this strategy theoretically, the results seem to be better than for kept constant. Let us oNer at least some intuitive explanation of this phenomenon. We guess that for large , the corresponding posterior x has many local maxima. The algorithm can be trapped in the vicinity of one of them, far from the global maximum, for a long time. To avoid this, we begin with small values of , for which the posterior is relatively “Hat”. Note that for the Potts model, is just the inverse temperature. The trick of increasing , that is of “cooling”, is borrowed from simulated annealing algorithms. Finally, let us mention that the number of steps su:cient for satisfactory results is very di:cult to determine theoretically. In our experiments, our decision was being made heuristically: the procedure was stopped when it seemed to have stabilized. It is a rather common practice in similar applications.

Table 3 Frequency of grey-levels in phantom and restored image for the “di:cult example” Phantom Intensity

Restored image No. of pixels

% of pixels

Description

6.00

1354

12.4

Large ellipse

7.00

159

1.5

Small ellipse

10.00

8351

76.4

Background

15.00 20.00

601 459

5.5 4.2

Medium ellipse Medium ellipse

Intensity

No. of pixels

% of pixels

6.26

1542

14.1

9.83 9.87 10.13 10.16 10.22 14.85 20.25 Other

1576 4669 353 192 1525 600 465 2

14.4 42.7 3.2 1.8 14.0 5.5 4.3 0.0

940

S. Lasota, W. Niemiro / Pattern Recognition 36 (2003) 931 – 941

Table 4 Frequency of grey-levels in phantom and restored image for the “easy example” Phantom

Restored image

Intensity

No. of pixels

% of pixels

Description

Intensity

No. of pixels

% of pixels

15.00 17.50 25.00 37.50 50.00

1354 159 8351 601 459

12.4 1.5 76.4 5.5 4.2

Large ellipse Small ellipse Background Medium ellipse Medium ellipse

14.87 16.93 25.08 37.96 49.59

1353 133 8372 605 461

12,4 1.2 76.6 5.5 4.2

Table 5 Discrepancy between phantom and restored image for the “di:cult example”, two versions of CGS algorithm Sweep

0

25

50

75

100

ˆ G–McC: d(; ) ˆ K–L: d(; )

2.51 2.51

0.99 1.56

0.89 1.35

0.86 1.22

0.86 1.13

Finally, let us compare our algorithm with that proposed by Geman and McClure [3]. The latter was mentioned in Introduction. It converges to the posterior given by Eq. (1.4), which is diNerent from ours (2.4), the diNerence being in model assumptions. We implemented Geman and McClure’s algorithm as pure GS with systematic sweeps; we will use abbreviation CGS (continuous GS) when referring to it. In our experiments with CGS we used the 2rst (di:cult) of the two examples, described earlier in this section (Fig. 1). Of course, for CGS, the crucial point is the choice of function in Eqs. (1.2) and (1.4). We experimented with the following two functions. The original Geman–McClure’s proposal (G–McC):

(s ; t ) = −

1 : 1 + c(s − t )2

We set c = 2 and = 5 in Eqs. (1.2) and (1.4). This choice was again made on the experimental basis. The Kullback–Leibler distance (K–L): s

(s ; t ) = (s − t ) log : t Note that this function is nothing but the symmetrized Kullback–Leibler distance between two Poisson distributions, with means s and t , respectively. Since this particular function seems to be consistent with the Poisson likelihood, we hoped it should bring good results. The results of restoration obtained by the two versions of CGS (after 100 full sweeps) are shown in Fig. 5. They can be compared with Fig. 3. The rate of descent of the mean distance in the course of the procedure is reported in Table 5 (the distance d(s ; t ) here is computed exactly as in Table 1). Our results support the recommendation given in Geman and McClure’s paper that a bounded yields better results. The algorithm based on their original proposal usually outperforms the one based on K–L distance, contrary to our expectations. Comparison between Figs. 3 and 5 provides some evidence that the algorithm presented in our paper better “recognizes” sharp boundaries. Of course, this is because our basic idea was to design an algorithm which produces large

Fig. 5. Restoration by two versions of CGS algorithm after 100 sweeps, the di:cult example. Left: original G–McC, right: K–L.

S. Lasota, W. Niemiro / Pattern Recognition 36 (2003) 931 – 941

“uniformly painted patches” with clearly visible boundaries. Inspection of Tables 1 and 5 suggests that the G–McC-type posterior is less concentrated in the neighborhood of the phantom con2guration (provided that “neighborhoods” are understood in terms of the d-distance). We have to concede, however, that this conclusion is valid for the kind of phantoms used in our experiments and need not be true for other types of images. Summing up, our algorithm can be considered as a promising alternative at least for some applications.

7. Summary We propose a new algorithm for restoration of noisy images. We assume that the observed image is a con2guration of Poisson counts and the goal is to estimate the underlying con2guration of intensities, interpreted as grey-levels. Problems of this kind arise naturally in many applications, including single photon or positron emission tomography. We use a Bayesian model of image restoration, based on a Gibbs prior probability distribution. Our algorithm belongs to the family of Markov chain Monte Carlo methods, designed for sampling from the posterior distribution. We exploit the fact that medical images consist of 2nitely many, often relatively few, grey-levels. In our model the continuum of grey-levels is discretized in an adaptive way. This allows us to introduce auxiliary bond variables, in much the same way as in the well-known Swendsen–Wang algorithm. Consequently, we update simultaneously whole clusters of pixels, instead of single pixels. This makes our algorithm more e:cient than the simple Gibbs sampler. The partial decoupling method due to Higdon is also incorporated into the algorithm. We present some simulation results. The algorithm was tested on synthetic examples known in the literature and compared with other methods. The results indicate that the algorithm is reliable and e:cient. Our algorithm is designed to be a noise-2ltering part of a more involved algorithm for restoration of tomographic images. The latter will be presented in a forthcoming paper.

941

Acknowledgements We wish to record our warmest thanks to Professor Jacek Koronacki for inspiration, encouragement and numerous discussions. This work would be impossible without his assistance. Valuable suggestions of an anonymous referee helped to improve the paper. References [1] D. Geman, S. Geman, Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images, IEEE Trans. PAMI 6 (1984) 721–741. [2] G. Winkler, Image Analysis, Random Fields and Dynamic Monte Carlo Methods, Springer, Berlin, 1995. [3] S. Geman, D.E. McClure, Statistical methods for tomographic image reconstruction, in: Proceedings of the 46th Session of the International Statistical Institute, Bulletin of ISI, Vol. 52, 1987. [4] D. Geman, Random 2elds and inverse problems in imaging, W W e de ProbabilitWes de Saint Flour XVIII–1988, Ecole d’EtW Lecture Notes in Mathematics, Vol. 1427, Springer, Berlin, 1990. [5] R.H. Swendsen, J.S. Wang, Nonuniversal critical dynamics in Monte Carlo simulations, Phys. Rev. Lett. 58 (1987) 86–88. [6] R.G. Edwards, A.D. Sokal, Generalization of the Fortuin– Kasteleyn–Swendsen–Wang representation and Monte Carlo algorithm, Phys. Rev. D 38 (1988) 2009–2012. [7] D.M. Higdon, Auxiliary variable methods for Markov chain Monte Carlo with applications, J. Amer. Statist. Assoc. 93 (1998) 442, 585–595. [8] J. Besag, P.J. Green, Spatial statistics and Bayesian computation, J. R. Statist. Soc. B 55 (1) (1993) 25–37. [9] Y. Vardi, L.A. Shepp, L. Kaufman, A statistical model for positron emission tomography, J. Amer. Statist. Assoc. 80 (1985) 8–37. [10] S. Lasota, W. Niemiro, J. Koronacki, Positron emission tomography by Markov chain Monte Carlo with auxiliary variables, 2002, in preparation. [11] B.W. Silverman, M.C. Jones, J.D. Wilson, D. Nychka, A smoothed EM approach to indirect estimation problems, J. Roy. Statist. Soc. Ser. B 52 (1990) 271–324.

About the Author—WOJCIECH NIEMIRO was born in 1953 in Poland. He graduated from Warsaw University, Department of Mathematics, in 1976. For several years his work was devoted to pattern recognition and medical applications. His theoretical research concerned limit theorems of mathematical statistics. He received Ph.D. (1987) and Dr. Sc. (1995) degrees in mathematics. Currently, Wojciech Niemiro is mainly interested in Bayesian statistics and Markov chain Monte Carlo methods. His recent work concerns Markov chains and simulated annealing algorithms. About the Author—S LAWOMIR LASOTA completed his studies in Computer Science in 1995, Ph.D. in Computer Science in 2000. His main 2elds of interest are formal methods for speci2cation and veri2cation of software systems. Recently, his work is focussed on algorithmic methods for veri2cation of behavioural equivalences of concurrent processes.