0031 3203/89 $3.00 + .00 Pergamon Press plc Pattern Recognition Society
Pattern Recognition, Vol. 22, No. 6, pp. 747-761, 1989 Printed in Great Britain
AN ITERATIVE GIBBSIAN TECHNIQUE FOR RECONSTRUCTION OF m-ARY IMAGES BERNARD CHALMOND CNRS, Statistique Appliqu6e, UA743, Universit6 Paris-Sud, Math6matique, Bat. 425, 91405 ORSAY C6dex, France (Received 15 June 1988; in revised form 14 November 1988; received for publication 28 November 1988) Abstract--The reconstruction of m-ary images corrupted by independent noise is treated. The original image is modeled by a Markov Random Field (MRF) whose parameters are unknown. Likewise, the probabilistic structure of the noise is unknown. This paper presents an iterative procedure which performs the parameter estimation and image reconstruction tasks at the same time. The procedure that we call Gibbsian EM algorithm, is a generalization to the MRF context of a general algorithm, known as the EM algorithm, used to approximate maximum-likelihood estimates for incomplete data problems. A number of experiments are presented in the case of Gaussian noise and binary noise, showing that the Gibbsian EM algorithm is useful and effective for image reconstruction and segmentation. Computer vision Image segmentation Bayesianinference
m-aryimages Incomplete data Gibbs sampler
Markov Random Fields Image reconstruction EM algorithm Pseudo-likelihoodestimation
I. INTRODUCTION To restore a picture in the presence of noise, one needs to achieve two steps. The first one is qualitative, and it consists of designing an image model which describes the spatial structure of the original image and then in selecting a noise model. Generally, these models depend on several unknown parameters. The second step is essentially technical, and it has to solve two statistical problems which are highly dependent: parameter estimation and image reconstruction. The performance of a reconstruction scheme depends mainly on the estimation accuracy and on the qualitative properties of the image model. Yet, the parameter estimation is crucial: a very well designed model can lead to an ill reconstructed image if the estimates are not sufficiently accurate. Often the authors attempt to overcome this difficulty by performing a supervised parameter estimation. Some authors estimate the image model parameters from a noise-free realisation of this model. Furthermore, the parameters of the noise model are assumed to be known or estimated from the degraded image but without considering the underlying image model. For the general problems in image restoration, the reader is referred to Rosenfeld and Kak, t21~ and Bates. t22) In this paper,* the original image is coded by a small number of labels: 2 or 3 ... or m. Such an image
* This paper was presented at the AMS-IMS-SIAM Joint Summer Conference on Spatial Statistics and Imagery, 18-24 June 1988.
is called respectively binary or ternary ... or m-ary. The observed image is a noisy version of the original image whose noise is signal-dependent. Depending on the nature of the noise, the degraded image is either a m-ary image of a grey-level image. The image model parameters and the noise model parameters are unknown: the estimation and the reconstruction tasks will be performed at the same time. The original image is modeled by a Markov Random Field (MRF) or equivalently by a Gibbs Distribution (GD) which is considered as a prior distribution. Then the GD and the noise distribution are combined to provide a posterior distribution from which a Bayesian image reconstruction is performed. This approach has been used by several authors (see Geman and Geman~l)), because it yields a very flexible tool to describe the local properties of the image. Unfortunately, to attain this flexibility, it is necessary to parameterize the posterior distribution. But without a "data-driven" method based on statistical principles, the parameters are chosen on an ad hoc basis,m Wolberg32) If this choice remains arduous but feasible for relatively simple models, it becomes intractable when the parameters are numerous. So, Derin~3) assumes to have a noise-free realization from the prior G D on which an estimation technique of the image parameters is performed prior to reconstruction and not as an integral part of the reconstruction algorithm (see also Possolo~4)). Furthermore, in the case of degradation by additive, independent, Gaussian noise, the noise parameters are estimated separately by assuming that the observations are independent which is not in agreement with the image model
747
748
BERNARD CHALMOND
assumptions, Derin. (3-5) By using a very simple model with only one image parameter and one noise parameter, Marroquin et al., (6) introduce a non-parametric method based on a merit function. For continuous-valued images corrupted by additive white noise, Chalmond (7) uses a maximum pseudo-likelihood principle to solve the estimation problem for a so-called "Gaussian-logistic-gradient" prior MRF, under the assumption that a signal-to-noise ratio is specified. During the writing of this paper, we became aware of a study on iterative segmentation of noisy Gibbsian fields, but with assumed known noise variance, (Laksmanan(27)). The main contribution of this paper is the development of an iterative programming approach to perform simultaneously the prior M R F estimation, the noise estimation, and the image reconstruction. Besides, the efficient solution of this problem will be relevant for image segmentation which can be reduced to the problem of reconstructing a m-ary image degraded by Gaussian noise. Our approach is related to the so-called EM algorithm, described by Dempster et al., ~s) for the case of independent observations: this is a broadly applicable algorithm for computing maximum-likelihood estimates from incomplete data at various levels of generality. A special derivation of the EM method for finite mixture identification was developed by Baum et al., (9) in the one dimensional case. Their prior model is unusual in that the original data are specified to follow a Markov chain. This situation arises in speech recognition where the socalled "hidden Markov chain" has been used by Levinson et a l / t ° ) More recently, Devijer and DekasseltH) have applied this special EM algorithm to the learning problem for the so-called "hidden Pickard M R F " (see also Ref. (19)). The Pickard M R F is a very restrictive causal sub-class of M R F models, and consequently, these authors emphasize that "the feasibility of this approach to the learning problem relied heavily on the crucial assumption of the causality for the underlying Markovian model". For continuousvalued images, the use of the EM algorithm was suggested in Ref. (18), in the case of a prior M R F model with only one parameter. In our paper, one of the purposes is to extend the EM algorithm to the M R F models in order to take advantage of the flexibility of these models for characterizing the class of images under consideration. As we shall see, this extension is not straightforward. The main obstacle for the practical application of this algorithm lies in the formidable computational cost associated with the exact computation of some posterior mathematical expectations. So, we shall compute these expectations by a Monte Carlo technique, known as "Gibbs sampler". (a) For this reason, our procedure will be called Gibbsian EM algorithm. This article is organized as follows. In Section 2, we present some background material on MRF's and the problem formulation for the parameter estimation objectives. The Gibbsian EM algorithm is presented in Section 3.
Section 4 then presents some applications of this technique. Finally, some concluding remarks are given in Section 5. 2. MRF AND ESTIMATION PROBLEM
A. B a c k g r o u n d on M R F ' s
Let co ={tot, t ~ } be the original image. ~ c 7J2 denotes the rectangular lattice of the sites, m is a mary image which is a partition of ~ into m regions; to can be considered as a label configuration on ~. For notation convenience, we define the following image coding: tot = i¢~t~region type i, with O <<, i <~ m - 1 .
We regard to as a realization from a m-ary M R F D = {f~,,te~}. In this subsection, we present the basic definitions of M R F and a particular class of M R F models that is used in the experiments of this paper. A detailed account of the M R F definition can be found in Griffeath/~2) Firstly, by an arbitrary choice, we define a neighborhood system V = {Vt, t~,V , c ~}, such that t C V t and s E V t t e V s. Then, fl is a M R F with regard to the chosen system V if its probability law, say P, verifies Vto: P(to) > 0 and P(tot [tou, u • ~ - t) = P(to, lto,, u e Vt).
Vf(to) will denote the label configuration restricted to Vt: Vt(to)= (tou, U~Vt}. Now, the question is how one can explicit these Markov fields? The set of cliques C is deduced from V: c e C if c c ~ is a singleton or if Vr, s e c , r and s are neighbors. Let V~(co,fl) be "interaction functionals" depending only on the restriction of to to the clique c and on unknown parameters/3 which adjust the strength of the interactions. Such a family { V~(to,/3),c e C} is called a potential and the quantity: U(to, fl) = ~ V~(to,fl)
(1)
CEC
is called the energy function. The answer to our question is given by a theorem of Hammersley: ~31 D is a M R F with respect to V if and only if P(to) has the form: 1
e(to) = Z - ~ exp - U(to,/3)
(2)
where Z(fl) = ~ exp - U(to,/3) is a normalizing consto~
ant. Such a distribution is called a Gibbs distribution. Finally, the joint probability (2) is uniquely determined by the collection of the conditional probabilities {P(tot/Vt(to))}/13) The conditional probabilities can be calculated as follows: P(to, I V,(to)) = ,,,-
exp - [U(to,/3) - U(to/, fl)] x
e x p - [U(to,/3)- U(to/,,/3)] ~t=O
oc exp - [U(to,/3) - U(to/,,/3)]
(3)
Reconstruction of m-ary images with og/t = ( .... ogt_ 1,0,cor+l .... ). Let us remark that these conditional probabilities are independent of the normalizing constant Z(fl). As an example, we describe a particular m-ary MRF. The definition of the energy U(co, r) must be independent of the image coding. Let us consider the nearest-neighborhood system with adjustments at the boundaries: a given site t has a neighborhood with cardinality 4 if t is in the interior of the lattice ~ , 2 if t lies at a corner and 3 if t lies at a boundary but not at a corner. This "free boundary" model is more natural for picture processing than torodial lattices. A clique is either a single site or a pair of nearest neighbor sites. The potentials of homogeneous, isotropic, nearest-neighborhood, m-ary M R F is defined as follows: if c is a single clique {t} then V~(co,/~) = fl~l) when cot = i; if c is a 2-pixel clique {t,u} then Vc(o),fl)= /'~!2) when co, = i, cou = J with /~!2.) .,,j = fl(2) r~,J j,i • F r o m this definition, setting fifo~) ~-- 0 and Po(2) O , j = O, the conditional probability (3) is of the form:
P(cot = ilVt(co)) ~ exp -- [Vlt)(co, fl) + ~ V~t,u)(co,fl)1 ueV t
UE~/t For a binary field, this last expression is:
P(co, lVt(co))oc_exp-co,[fft) + fl(2) ~ c o ~ ] UEVt
(5)
-I
which is a Bernouilli distribution with success parameter P(cot = l lVt(co)). Let P~j be the conditional probability to observe a white label at site t when its neighborhood containsj white labels and onlyj. Table 1 illustrates the local statistical characteristics of this prior model by computing P~i with flt~) = - 2 f f 2~ for some values of fl(~); Fig. l(a) and Fig. l(b) show two 32 × 32 configurations generated as described in Table 1. Conditional probabilities of a binary MRF
fl(1)
P1,0
PI,1
P1,2
P1,3
PI,4
0 2 5
0.5 0.12 0.01
0.5 0.27 0.08
0.5 0.50 0.50
0.5 0.73 0.92
0.5 0.88 0.99
a
749
the next paragraph, with respectively fl(1)= 2 and fl(1) = 5. We note a "clustering effect" which increases as if1) increases: Fig. l(a) belongs to the class of "weakly-clustered" images and Fig. l(b) to the class of "strongly-clustered" images. If the labels {0 . . . . . m - 1} are considered as greyvalues then the distribution (5) can be generalized by defining P(og, IVt(co)) as a Binomial distribution with parameters (m - 1) and A(T,) as follows: T,=ri°)+fl(2)
P(cot =
~ cou, A ( T , ) =
exp-Tt
il Vt(co)) = ( m - I)! - i!(m ~S 1 -~i)! A(T,)'(1 - A(T,)) m-' -'.
(6)
The Binomial model (6) is not so general as (4) but the experiments reported by Cross and Jain (141 show its practical interest (cf. also Section 4). M R F models even with the smallest neighborhood system are very flexible and powerful. Another binary nearestneighborhood model is given in Ref. (1). If the image coding is { - 1, 1} then the potentials are: ~t) = 0 and ~t.ul = flcolco2. This model has been generalized in Ref. (3). Finally, as we shall see below, our technique needs to simulate G D (2), and to compute expected value E[f(t~)] for a given known functional f(co) of the configuration co, and this expectation has no simple closed analytic form. All the G D realizations in this paper are generated using the so-called "Gibbs sampler" algorithm, presented in Ref. (1). This algorithm can briefly be described as follows. It creates iteratively a stochastic sequence ~ n ) of configurations starting with an arbitrary configuration ~0). The current configuration co(n) is computed from the previous one ~ n - 1) by choosing at every site t, the new value cot(n) from the conditional probability (3). In Ref. (1), it is shown that for n large, co(n) looks like a configuration drawn at random with probability P(co). Furthermore, we have the following property: lim l [ f ( ~ l ) ) ,~ o0n
+ . . . + f(co(n))] = E[f(D)].
This property will be used extensively.
b
c
Fig. 1. Binary images. (a)Binary MRF realization: small spatial interactions, (b)binary MRF realization: large spatial interactions, (c) character fl corrupted by a binary noise.
(7)
750
BERNARD CHALMOND
B. Estimation problem Let y = {Yt, t e ~} be the degraded image, y is a realization of a random field Y = { Yt, t • ~} such that:
3. GIBBSIAN EM A L G O R I T H M
P(Yt = k lo~t = i) : Pi(k), and we assume that, given any particular configuration ~o, the random variables { Yt} are conditionally independent. Then, the conditional probability of Y, given co, is: P(ylo~) = 1-[ P(Ytlt"t). te~
It should be pointed out that the noise probability P,(k) is region dependent. Now, the general problem would be to estimate 0 = (fl, {Pi(k)}) and to reconstruct oJ from a single noisy image y. Here, the noise is either discrete or continuous. In the case of continuous noise, the quantization of the image y must be sufficiently fine to consider YtID~= i as a continuous random variable. In some practical situations, it will be possible to model the noise distribution by a parameterized probability function with unknown parameters, say tp. In most of our examples, the noise is assumed Gaussian: Ytlf~t = i: LG(#I, tr), and in that case the unknown parameters are 0 = (fl,~p) with tp = ({#~},a). But, as we shall see below, the estimation of the parameter fl is not directly necessary and we can limit our investigation to the estimation of the conditional probabilities (3) of the prior MRF. So, the estimation problem is over-parameterized by O = ( { P i j } , {P,.(k)}) or ~)=({Pi./},tp) with Pii = P(c°t = i l Vt(~o) of type j). "Vt(~o) of type f ' means that a clustering of the neighborhood types has been devised. For example, in the special case of model (5), we saw before that the neighborhood configurations are naturally classified in 5 types according to the value j = ~ co~ which indicates the number of white labels surrounding the site t. Henceforth, all the probabilities will be indexed by their unknown parameters tp, fl, 0 (or ®), and very often we shall use 0 (or O) instead of ~p, 0 since cp ~ 0 and fl ~ 0. We now turn the posterior distribution Po(~ly) of the original image given the degraded image y. This distribution is a G D with the same neighborhood system than the original GD. More precisely, we have to find its energy. Let us assume that the noise distribution can be written in the form:
P,(Yt ltot) oc exp - g , ( y , co,)
posterior G D will be used extensively in the next section, both for image estimation and reconstruction.
(8)
Then, from the Bayes' rule, the posterior distribution is:
The purpose of this section is to extend to the mary MRF's, the EM algorithm described in Refs (8, 9).
A. Standard EM algorithm This algorithm considers the noisy observations y as the "incomplete data" and defines x = (~,y) as the "complete data". Thus, the estimation problem involves incomplete data by regarding each unlabeled observation in the image at hand as "missing", a label indicating its region of origin. If Po(x) denotes the likelihood function associated with the stochastic process X = (~, Y), then the maximum likelihood estimation would consist in searching Max 0 LogPo(x), given data x. The EM algorithm is based on the following heuristic idea. Since we do not know LogPo(x ) because x is not observed, we maximize instead its current expectation given data y and the current estimate 0~n~: Max0 E[LogPo(X) Iy, 0tn~].
(9)
Let Q(OIOtn~) be this expectation. The EM procedure is iterative and starts with a specific estimate 0~°~. At each iteration, the E-step computes Q(OIO~"~)and the M-step chooses 0~"+ 1~which maximizes Q(OIO~n~).The fundamental property of the algorithm is that the loglikelihood L of y, L(O) = logP0 (y), increases monotonically on {0t"~, n = 0, 1, 2 . . . . }, to a (possibly infinite) limit. This leaves unanswered the question of whether such a sequence converges at all and, if it does, whether it converges to a maximum-likelihood estimate, say 0. Redner and Walker give a local convergence theorem (see Ref. (2, theorem 4.3)) which states under suitable conditions that: lim 0 t ~ = 0 if 0t°~ is n~oo
sufficiently near 0. Thus, the choice of the starting values is crucial for the convergence of the standard EM algorithm.
B. Pseudo-likelihood function In our situation, the EM algorithm is not applicable. The difficulty is that the computation of the expectation Q is impractical because of the extraordinary computational complexity of the constant Z(fl) which appears in the expression of Po(x):
Po(to, Y) = Po(yla~)Po(o~) 1
= P~(ylo~)z--~exp - U(e~,fl).
(10.1)
Po(ogly) oc P,p(yl~)Pp(o~) oc e x p - E ~ g , ( y t , oh)+ U(co, fl)] = e x p - U ~ ( ~ l y ) , where Uff(~oly) denotes the posterior energy and P~(~o) is the prior distribution P(~o) defined in (2). Pa(~o) is also the likelihood function with respect to ft. The
The standard role of the EM algorithm is to estimate the model parameters 0 by using a specified criterion, that is, the likelihood of the model. So, to avoid the above-mentioned difficulty, we propose to use the Pseudo-Likelihood (PL) function as a new criterion. For the M R F fl, the PL function, denoted P0(o~), is
Reconstruction of m-ary images Appendix, the result is:
defined from the conditional probabilities (3), as:
Po(e))--4l-IP0(cotlV,(co)).
751
p(.+
(10.2)
tE~
i)
_ E[nljlY, ®l")]
iS
The PL function was introduced by Besag, "3) to estimate the parameters of a G D using a realisation from this distribution. For some particular GD, the statistical properties of the maximum pseudolikelihood estimation has been studied by Guyon "s) and Geman and Graffigne. It6) From the PL of fl, we define the PL of X as:
Po(co,Y) ~ Po(Ylco)Po(co)
(10.3)
or in a similar manner as:
(12.1)
~ E[nulY ' ®(,)] i
p!"+ X)(k) = E[miklY, ®~"1] k
(12.2)
EEmldy, ®(")]
with
E[nuly, O ] = ~nu~co)Po(coly )
(13.1)
to
E[m~kly, O] = Z mik(co)Pe(colY).
(13.2)
o
Po(co, Y) ~ Po(yl co)Po(co).
(10.4)
This definition is now independent of the normalizing constant Z(fl). Let us recall that we want to estimate ® and to reconstruct co simultaneously. Though different suitably defined likelihood functions can be used for accurate estimation,(6'7'13) image reconstruction requires to work directly with the original G D (2), in order to get a restored image which respects fully the spatial properties of the designed prior M R F model.
In the case of Gaussian noise with parameters ~0 = ({#i},a), formula (12.2) is replaced by:
y~")(i)y, p!.+ 1)
We now define the EM algorithm by considering the PL function (10.4). If ®(") denotes the current value of ® after n iterations of the algorithm, then at iteration n + 1, we have to search: Max O E[Log Po(X)l y, O (")3
(11)
where, considering the definitions (10.2) and (10.4):
Pc(x)
= ]-I
Po(y, lco3Po(co,IV,(co))
t
= 1~ P'(k)''(~) l-I P~j;'~) ik
(11.1)
F E E ~)[n)(i)(Yt -- ~In+ 1))2 "7 1/2
~("+" =
/
i,~
_ _ _ _
X Y ~I")(i)
/
(14.2)
J
where ?l")(i) = Pa~4cof = ily) is the marginal posterior distribution. Another difficulty then arises: contrary to the classical EM algorithm, (s-9) expectations (13) cannot be simply and directly evaluated because it requires an immense amount of computation. So, we propose to approximate them by using a Monte Carlo method based on the Gibbs sampler, associated with the posterior GD: Po(o~ly)~Po(ylco)Po(co) where Po(co) is the G D in (2), whose conditional probabilities {Pu} are contained in the current estimation of ®. This means that we can approximate the posterior expectations (13) as follows:
i,j 1
and
EEnuly, O ] ~
{co, = i and V,(co) of type j} mik(co) = number of occurrences in (co,y), such that {co, = i and y, = k}. It should be pointed out that the PL function Pc(X) is formally equivalent to the likelihood of the stochastic process (co,y) where co would be generated by a Markov chain, and for which the transition probabilities stand for the conditional probabilities of the MRF. So, the solution O ("+ 1) = ({p~7+ 1)}, {p!,+ l)(k)}) of the mathematical problem (11) can be related to that of the "reestimation" formulae, established in Ref. (9). The treatment of this problem is developed in the
sn
~ %~cofs)) Sn --
no, co) = number of occurrences in co, such that
PR 22:6-H
(14.1)
Z ~,,~')(i)
L C. EM algorithm and Gibbs sampler
'~
E[mikly, O] , ~
1
S. --
(15.1)
0 S=So sn
~ mlk(co(s))
(15.2)
0S=SO
where co(s)is the configuration generated by the Gibbs sampler at time s and So is the time required for the system to be in "thermal equilibrium". These approximates are simply the empirical mean values of the random variables n u and mlk, computed from the sample {co(So). . . . . co(s,)}. In the light of the theoretical result (7), these values converge respectively to the expectations (13.1) and (13.2). Similarly, the marginal posterior distributions which appear in the estimate expressions (14.1) and (14.2) are
752
BERNARD CHALMOND
approximated as follows: 1
Po(cot = ily) .~. - S n --
s.
~. ;G,,(s)=i
(16)
know exp - 6;j, but for every iteration of the Gibbsian EM, we have the current estimate of the prior conditional probabilities {Po} c ®, such that:
S 0 s=s o
Pij = Po(cnt = ilVt(co) of type j) where Z = 1 if cot(s) = i and • = 0 otherwise. At the same time that we are running the Gibbs sampler algorithm for structure estimation at iteration n + 1, we perform also the image reconstruction using the following decision rule: o91"+1) = / i f Po,,,(cot = ily) >i Po''(cot = JIY), Vj¢:i
= fll 1) + ~
t~!2)
Yl,to u
where the function of fl in the right-hand side of this equality depends on the neighborhood type j. Replacing {P~j} by their current estimates t~P~')~,s ,, we obtain a set of linear equations with respect to ft. Since there are more equations than the number of unknown parameters, a standard least squares solution of fl is appropriate. This approach was introduced in Ref. (3) and studied in Ref. (4), to estimate the parameters of a M R F using a non-noisy realization from its distribution. In their context, the method has a major drawback: the estimation of P~j using a single image can often give zero, and then the linear system is not defined. In our situation, this difficulty is removed since the estimation P!~) of P~s is calculated using the Gibbs sampler. D. Posterior Gibbs distribution To implement the Gibbsian EM algorithm, we need to express the conditional probabilities of the posterior G D Pe(colY) with respect to O = ({Pij}, q~}, in order to use the Gibbs sampler. Let P~((o) denote the posterior GD. In the same way as the calculation of the prior conditional probabilities in (3), the posterior conditional probabilities take the form: (18)
where AtU[(co[ y) = g,(Yt, cot) - g~,(Yt, O) + A, U(co, fl), A,U(co, fl) = U(co, fl) -- V(co/,, fl).
from which it follows exp - 8ij = PiflPo~.
u~V t
P~(cotlV(co)) oc exp - AtU~(coly)
exp - 6~j i=0
(20)
(17)
where the marginal posterior distribution is approximated as in (16). The statistical decision (17) is well known in signal processing (see for instance, Chalmond"7)). Maroquin et al/6) used it with the approximation "6) in the case of segmentation of mary images, and they called it "Maximizer of the Posterior Marginal" (MPM). We come back to the estimation of fl, at the nth iteration of the algorithm. Considering the model (4), for every type j of neighborhood, we get clearly: -Log(Plj/Poj)
exp - 6q ~---m--1
(19)
AtU(co, fl) depends on the value of cot and the type of Vt(co). Let us write A,U(co, fl) = 6is if cot = i and Vt(co) of type j. To find the expression of (18), it suffices to
4. EXPERIMENTAL RESULTS
In Section 4A we are concerned with the implementation of the Gibbsian EM algorithm and by its robustness, in the case of Gaussian noise, both for image reconstruction of m-ary images and segmentation of grey-level images. In Section 4B we examine briefly the case of degradation by noise distributed according to a Bernouilli law. In fact, from a rigorous standpoint, we should achieve a mathematical study of the local convergence of the algorithm. But, since the convergence problem is very difficult for the standard EM algorithm (cf. Ref. (20) and its 162 references), it becomes much more difficult for the Gibbsian EM algorithm. Consequently, we have tried to understand its behaviour by running a number of experiments on artificial images and real images, using the Binomial prior model (6). A. Gaussian noise The implementation* concerns equations (12.1) and (14) for structure estimation, and decision rule (17) for image reconstruction. A Gibbs sampler routine is used for carrying out the approximates (15) and (16). At each iteration of the Gibbsian EM algorithm, the reestimation of O is obtained using a raster scan pixel visiting mechanism for 50 sweeps of the Gibbs sampler, one sweep representing one full raster scan of the image lattice. As already mentioned for the standard EM algorithm, the choice of initial values for the parameter to be estimated is quite critical, and especially for the noise parameter. In our current implementation, the standard choice is specified as follows: p!o)= 1/m, Vi, j; pl °) is set equal to the ((i + 1)/(m + 1))th quantile of the cumulated greylevel histogram of the observed image; a(o) = (l~oo) _ #tmO~ - O/k(m -- 1) where k = 2 or 3. If Poj is very small, then its estimation obtained from equations (12.1) and (15.1) running the Gibbs sampler with 50 sweeps, may give o(,) - 0 j ~ 0. In that case, we set oc,0 - o j = e where ~ is a small value (e = 0.01 for instance), in order to be able to compute @~ = - Log(P,s/Poj) l.~ ~.) in (20). This algorithm has been run extensively on artificial data and real data, and
* Fortran code and data of experiments are available.
Reconstruction of m-ary images it has shown convergence in all cases. Of course, the quality of the reconstructed image depends on the signal-to-noise ratio and on the suitability of the prior M R F model to the original image. We now present some experiments. Figure 2: the original image (Fig. 2(a)) is a realization of a binary M R F of size 32 x 32, generated by the Gibbs sampler from the conditional probabilities (5) with flits= 3.25 and fit2)= -/~")/2. This image was degraded by white Gaussian noise with tr = 2.5, #o = 5, /at = 9 and is shown in Fig. 2(b). It means that at every site t, the label cot is replaced by a value
a
ll
Yt drawn at random with Gaussian law with mean p~,, and variance o 2. Figure 2(c) is the grey-level histogram of the degraded image, we note here that it is unimodal. Figure 2(d) through Fig. 2(f) show the labeling obtained at the end of the 1st, 5th and 10th iterations respectively. Further iteration did not improve the reconstruction any more. Table 2 illustrates the accuracy of the parameter estimates. The initial conditional probabilities are PI°~= 0.5, Vi, j. The initial values ¢(o~, calculated from the quantiles as described above, are pto°l= 5.5, #tt°)= 8.5, tr(°)= 1.4. In fact, we took /1~o°)=2, ptl°)= 10,
b
c
I
753
d
BIH mm
e
I
Fig. 2. Reconstruction of a binary MRF corrupted by Gaussian noise. (a)Binary MRF realization, (b) noisy binary MRF, (c)grey-level histogram, (d)1st iteration, (e)5th iteration, (f)10th iteration.
754
BERNARD CHALMOND
Table 2. Parameter estimates of Fig. 2(b) True O ®lo) ®~10)
/~o 5.0 2.0 4.8
/al o Pl.o 9.0 2.5 0.04 10.0 1.4 0.5 9.1 2.4 0.03
P1,1 0.16 0.5 0.19
P1.2 0.5 0.5 0.52
PI,3 0.83 0.5 0.81
PI,4 0.96 0.5 0.94
a(o) = 1.4 to illustrate the algorithm's robustness. Figure 3: the original image of size 32 x 32 (Fig. 3(a)) contains the character S displayed by our image processing system. The image was corrupted artificially by Gaussian noise with parameters: tr = 2,
#o = 5, /~1 = 9, (Fig. 3(b)). The algorithm has converged at the 30th iteration. Figure 4: the observed image (Fig.4(a)) of size 32 x 32 is digitized manuscript letter C. This image has not been artificially degraded. Figure 3 is the first illustration of the use of the Gibbsian EM algorithm for segmenting continuous grey-level images. The segmentation with two regions has not given sufficiently good results, because of the intrinsic nature of this image: gradient of illumination and blurred edges between the letter C and the background, due to the diffusion of the black ink through the paper.
a
b
C
d w m
i
II
e
[
Fig. 3. Reconstruction of a character corrupted by Gaussian noise. (a) Original character, (b) noisy character, (c)grey-level histogram, (d)1st iteration, (e)5th iteration, (f)30th iteration.
Reconstruction of m-ary images
755
t ltu. a
b
e
d
Fig. 4, Segmentation of a manuscript letter. (a) Input image, (b) grey-level histogram, (c) 1st iteration, (d) 6th iteration.
So, we have performed a ternary segmentation with labels 2, 1 and 0 for the background region, the intermediate region and the letter region respectively. Figure 4(d) is the segmentation resulting from the convergence of the algorithm at the 6th iteration. Figure 5: the input image, which is simply Fig. 4(a) corrupted by white Gaussian noise, is shown in Fig. 5(a). The noise variance was chosen so that: tr = 1.25trd where a d is the empirical standard deviation of the differences lYi - Yjl, (i,j) neighbours. Figure 5(c) shows the initial segmentation with ®to~ while Fig. 5(d) shows the segmented image after 10 iterations. Figure 6: Fig. 6(a) has been extracted from a cerebral digitized angiographic image, and shows blood-vessels of different intensities on a slowly varying background. Here, our goal is to recognize the main blood-vessel from the secondary ones. Thus, one could expect a ternary segmentation. But our experiments have shown that the resulting three region segmentation is not sufficiently accurate, especially along the edges. This is due to the capilary blood-vessels which makes the blurring of the edges and the intensity variation of the background. Here, some remarks have to be done. Because standard estimates {#!o7} increase monotonically with respect to the grey-level scale, (/~o~ is darker than #!°~1), and binomial G D treats the labels as grey-levels, the
segmented image can be seen as a grey-level image whose intensities correspond with those of the observed image. Furthermore, the binomial M R F model stresses region connectivity and encourages large cluster formation. More precisely, it can be deduced from (6) that the clusters can be large for the labels which correspond to the smallest and greatest means /~!0~ respectively (here, i = 0 and i = 3), the other ones can be reduced to narrow strips along the frontiers between the large clusters. This property can be easily verified by generating realizations of the binomial G D (6) with fix) = - 2 ( m - 1)ff 2). We take advantage of this property when the edges are blurred: strips will be located on the ramp of the edges. Let us come back to our experiment. We did not use the standard initial mean values as defined above, but we have designed a new one where the "black" label i = 1, is such that:/~t2°~ 1~°~ < #~3°1. Figure 6(d) shows the final segmentation obtained at the 5th iteration. On this figure, the "black" labels are organized along the edges of the main blood-vessel and above the secondary blood-vessels. What is surprising is that the model appears to perform an edge detection amazingly well. Figure 7: the initial picture (Fig. 7(a)) of size 64 x 64 was extracted from an aerial image and shows fields separated by roads. To illustrate the behaviour of the
756
BERNARD CHALMOND
a
b
.//I-///I
~./, //. z / / / f / / i
///////~//.~
y/. "/z z///I..e/~
;
"1,
"
g
'//,,.,~ ~ ,--.:~
",/////..,~
,
,~//~
~ "
,..,j,H/I/,¢,
~"//~/,'/////,,,'~Z~,"/~.'///I~,
c
d
Fig. 5. Segmentation of a noisy manuscript letter. (a)Noisy letter, (b)grey-level histogram, (c)1st iteration, (d) 10th iteration.
Gibbsian EM algorithm for image segmentation, we started with very wrong initial values O t ° ) #~o°) = 0, /~o) = #~o~___0.2, a ~°) = 0.4; instead of the initial values calculated from the q u a n t i l e s - #~o°) = - 0 . 8 , g(o) = -0.35,/1~2°) = 1, alo) = 0.4. It follows that the first reconstructed image o~~°) is very poor and we had to wait until the 30th iteration to recognize precisely all the roads. B. B e r n o u i l l i
noise
Let us examine the case of binary images degraded by an independent Bernouilli noise whose parameter is region-dependent, that is: P~(ytlogt = i) = 1 - ~oi
= ~oi
if y, = i,
if Yt ~ i; i = O, 1,
In Appendix (c) it is shown that formula (12.2) is replaced by:
(0!"+1)
'~
(21) E 71n)(i) te~
We now focus our attention on the restoration of strongly-clustered images (see Section 2A), and especially on character pictures. The prior model is
the Bernouilli model (5). The standard choice of the initial values is specified as follows: ~O~o o) = ~p(ol = 0.35 is the greatest value for which we think that the restoration problem is feasible; t a J are some great values which characterize stronglyclustered images such that P ~ = 0.5, p~O]= 1 - P(~, P ~ = 1 - P ~ , (P(~°o)= 0.02, P ~ = 0.12 for instance. The following example illustrates the behaviour of the Gibbsian EM algorithm, starting with these initial values. Figure 8: the original image (Fig.8(a)) contains the greek character /3. This image was corrupted artificially by Bernouilli noise with parameters ~0o = 0.2, ¢Pl = 0.1, (Fig. 8(b)). The algorithm has converged at the 6th iteration and Table 3 gives the estimated parameters. In this table, {PI°)} corresponds to the parameter value if1) = _ 2fit2) = 4 in the prior model (5), whereas {Pl~)} corresponds to the approximated value 5.5. Let us comment on the choice of the initial values. It is much more sensitive than in the Gaussian case, because now the prior model and the noise model are both defined by the same model, that is, the Bernouilli model (5), with i f 2 ) = 0 for the noise. It follows that it is impossible, without prior knowledge, to distinguish a non-noisy weakly-clustered image from a very noisy strongly-clustered image, as it is illustrated
Reconstruction of m-ary images
757
ii
a
b
e
d
Fig. 6. Segmentation of an X-ray image. (a)Input image, (b)grey-level histogram, (c)1st iteration, (d) 5th iteration.
respectively by Fig. l(a) and Fig. l(c). Figure l(c) is Fig. 8(a) corrupted by Bernouilli noise with ¢P0 = tpl = 0.3. If we try to restore this degraded image, starting with ~.[p!O) - - J j = 0.5), then the algorithm converges to small values for the noise parameters and to conditional probability values which characterize weakly-clustered image as Fig. l(a). CONCLUDING
REMARKS
This paper addresses the problem of obtaining an unsupervised parameter estimation based on a realization of a noisy m-ary image: both the probabilistic structure of the M R F which models the original image and the noise parameters, are simultaneously estimated. Furthermore, the reconstruction of the original image is performed at the same time. As of now, very few results of this general problem exist in the literature, see for example the recent report, Ref. (23). We refer also to Besag. ~24) In this paper, Besag has suggested a faster algorithm (the socalled Iterated Conditional Mode) than the algorithms based on the M P M criteria, which provides a configuration tb realizing a local maximizer in 09 for Po(091y). When 0 is unknown, he sketches an iterative procedure which is an approximation to a general algorithm related to the EM algorithm as remarked
by Kiiveri32s) Our scheme is related with the classical EM algorithm which has been extensively used during the last two decades. The success of the EM algorithm lies in its computationally simplicity and its effectiveness: for example, this algorithm achieves the best solution for the well-known problem of estimating the components of distribution mixture. This algorithm is mathematically grounded because it is designed in order to approximate the maximum likelihood estimates. Unfortunately, it does not converge if the initial estimates are chosen too far from the true parameters, as it was shown recently. Some of these remarks are still valid for the Gibbsian EM algorithm. Our approach has consisted in defining formally the algorithm and then in experimenting with artificial data and real data. In the case of Gaussian noise with constant variance, our algorithm appears to be robust and effective for image reconstruction and image segmentation even when the models do not represent the data well. On the contrary, under the hypothesis of unequal variances, there are important considerations to be aware of in using this technique326~ Further study of these issues is worthwhile. Contrary to the Gaussian noise, in the case of binary noise with region-dependent parameter, the choice of the initial values is crucial. In this paper we have not focussed our attention on the modeling of images by MRF's:
758
BERNARD CHALMOND
a
iiiiiiiilllllllilllllJllliJ ... b
d
c
e
Fig. 7. Segmentation of an aerial image, (a) Input image, (b)grey-level histogram, (c)1st iteration, (e) 30th iteration. a particular class of M R F has been considered to illustrate our approach. Yet, better results could be obtained in some of our examples if another class of M R F models has been designed. More generally, systematic methods have to be determined for matching M R F models to classes of image data. Finally, the Gibbsian EM algorithm is quite computationally
costly but the Gibbs sampler can be realized using parallel computer architecture. SUMMARY
This paper deals with the problem of the restoration of m-ary images corrupted by independent noise. The
Reconstruction of m-ary images
a
759
b
e
d
n
e
i
Fig. 8. Reconstruction of a character corrupted by binary noise. (a) Input image, (b) noisy character, (c) 1st iteration, (d) 2nd iteration, (e) 3rd iteration, (f) 6th iteration.
original image is coded by a small number of labels: 2 or 3 . . . o r m; such an image is called respectively binary or ternary.., or m-ary. The observed image is a noisy version of this image, whose noise is signaldependent. The original image is modeled by a Markov R a n d o m Field (MRF) or equivalently by a Gibbs Distribution. Both the M R F parameters and the probabilistic structure of the noise are unknown. This paper presents a stochastic iterative algorithm which performs in a Bayesian context, the parameter estimation and the image reconstruction at the same time. As of now, very few results of this general problem exist in the literature. Mostly, the authors
Table 3. Parameter estimates of Fig. 8(b) True O O ~°) ®~6~
qgo 0.2 0.35 0.21
(Pl Pl.o Pl,t Pt.2 P1,3 PI,4 0.1 * * * * * 0.35 0.018 0.119 0.50 0.881 0.982 0 . 1 2 0.005 0.05 0.51 0.944 0.994
overcome this problem by performing a supervised parameter estimation. Here, the procedure is unsupervised and it is a generalization to the M R F context of a general algorithm, known as the E M algorithm, used to approximate maximum-likelihood estimates from incomplete-data. Doing so, we take advantage
760
BERNARDCHALMOND
of the flexibility of the MRF models for characterizing the class of images under consideration. This generalization is not straightforward. The main obstacle for the practical application of this algorithm lies in the formidable computational cost associated with the exact computation of some posterior mathematical expectations. These expectations are calculated by a Monte Carlo technique based on the so-called Gibbs sampler routine. Besides, the efficient solution of this restoration problem is relevant for image segmentation which can be reduced to the problem of reconstructing a m-ary image degraded by noise. The algorithm that we develop in this paper is called Gibbsian EM algorithm, and it is tested with artificial data and real data. In the case of Gaussian noise with constant variance, this algorithm appears to be robust and effective for image reconstruction and segmentation even when the model does not fit the data well. This algorithm appears also useful for restoration of "strongly-clustered" binary images corrupted by binary noise. REFERENCES
1. D. Geman and S. Geman, Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images, IEEE Trans. Pattern Analysis Mach. lntell. PAMI-6, 721-741 (1984). 2. G. Wolgerg and T. Pavlidis, Restoration of binary images using stochastic relaxation with annealing, Pattern Recognition Lett. 3, 375-388 (1985). 3. H. Derin and H. Elliot, Modelling and segmentation of noisy and textured images using Gibbs random fields, IEEE Trans. Pattern Analysis Mach. Intell. PAM1-9, 3955 (1987). 4. A. Possolo, Estimation of binary random Markov field, University of Washington, Seattle, Technical report No. 77 (1986). 5. H. Derin, Estimating components of univariate Gaussian mixture using Prony's method, IEEE Trans. Pattern Analysis Mach. Intell. PAMI-9, 142-148 (1987). 6. J. L. Maroquin, S. Mitter and T. Pogio, Probabilistic solution of Ill-posed problems in computer vision, J. Am. statist. Ass. 82, 76-89 (1987). 7. B. Chalmond, Image restoration using an estimated Markov model, Signal Process. 15 (1988). 8. A. Dempster, N. Laird and D. Rubin, Maximum-likelihood from incomplete data via the EM algorithm, J. R. statist. Soc. B-39, 1-38 (1977). 9. L. Baum, T. Petrie, G. Soules and N. Weiss, A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains, Ann. Math. Statist. 41, 164-171 (1972). 10. S. Levinson, L. Rabiner and M. Shondi, An introduction to the application of the theory of probabilistic functions of a Markov process in automatic speech recognition, B.S.T.J. 62, 1035-1074 0983). 11. P. Devijer and M. Dekessel, Learning the parameters of a hidden Markov random field image model: a simple example, Pattern Recognition Theory and Applications, P. Devijer and J. Kittler, Eds. Springer, Heidelberg (1987). 12. D. Griffeath, Introduction to random field, Denumerable Markov Chain, J. Kemeny, L. Snell and A. Knapp, Eds, pp. 425-458. Springer, New York (1976). 13. J. Besag, Spatial interaction and the statistical analysis of lattice systems, J. R. Statist. Soc. B-36, 192-236 (1974). 14. G. R. Cross and A. K. Jain, Markov random field texture
15.
16.
17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27.
models, IEEE Trans. Pattern Analysis Much. Intell. PAMI-1, 25-39 (1983). X. Guyon, Estimation d'un champ par speudo-vraisemblance conditionnelle: Etude asymptotique et application au cas Markovien, Spatial Processes and Spatial Time Series Analysis, Proc. 6th Franco-Belgium Statist. Bruxelles, D. Droesbeke, Ed. (1987). S. Geman and C. Grafl~gne, Markov random field image model and their applications to computational vision, Proc. Int. Congr. Math. A. M. Gleason, Ed., Am. Math. Soc. (1987). B. Chalmond, A Bayesian detection-estimation for a shift of the mean in an autoregressive process, Proc. Comput. Statist., Part II, Wien, Physica (1982). S. Geman and D. McClure, Bayesian image analysis: an approach to single photon emission tomography, Proc. Am. statist. Ass., Statist. Comput. Sect., 12-18 (1985). J. Haslett, Maximum likelihood discriminant analysis on the plane using a Markovian model of spatial context, Pattern Recognition 18, 287-296 0985). R. Redner and H. Walker, Mixture densities, maximumlikelihood and the EM algorithm, S I A M Rev. 26, 195239 (1984). A. Rosenfeld and A. Kak, Digital Picture Processing, Academic Press, New York (1982). R. Bates and M. McDonnel, Image restoration and reconstruction, Clarendon Press (1986). L. Younes, Parametric inference for unperfectly observed Gibbsian fields, Universit6 Paris-Sud, Preprint (1988). J. Besag, On the statistical analysis of dirty pictures, J. R. statist. Soc. B-148 (1986). H. T. Kiiveri, Discussion on Besag's paper: On the statistical analysis of dirty pictures, J. R. Statist. Soc. (1986). B. Chalmond, An iterative Gibbsian technique for simultaneous structure estimation and reconstruction of mary images. Universit~ Paris-Sud, Preprint 88-28 (1988). S. Laksmanan and H. Derin, Simultaneous parameter estimation and segmentation of Gibbs random fields using simulated annealing, IEEE P A M I (to appear).
APPENDIX
(a) E-step: calculation of the expectation Q(®]®') Q(®I®') = E[Log •a(X) ly, O'] = Z [Log Pa(o~,y)] and from (11.1): to
= ~ E[mik ]y, ®'] LogPi(k) ik
+ ~ EEnijly, ®'] LogPij. 0
(A.I)
(b) M-step: maximization of Q(®I®') with respect to ~9 under the constraints ~kPi(k) = 1, Vi and ZiPij : 1, Vj We solve it by the classical method of Lagrange multipliers. Let ~_(®) be the Lagrangian of Q with respect to the constraints:
Reconstruction of m-ary images
where the {2i} and {vi} are the undetermined Lagrange multipliers. At a critical point of D_on the interior of the manifold defined by the constraints, we get Vi, Vj, Vk:
&(O)/OP,(k) = 0 t~ll_(®)/~eu = 0
761
if the noise has a Bernouilli distribution. Y~ LogPo(y, lo;t = i)Po,(~oly )
S~(~o) = ~ i
(A.2) (A.3)
t
to: t~t=i
= ~ ~ LogPo(y ,1~o, = i) 7t(i) i
and from where, it can be seen simply that U_is maximized when (12.1) and (12.2) hold.
t
where
(c) Noise parameter reestimation The parameters are O = ({Pu}, {Pi(k)})• {P,(k)} is now replaced by ep. In (A.1), we put: Q(O]O') = Sl(¢p) + S2({Po} ) with evident notations. Lagrangian equations (A.3) are unchanged but not equations (A.2). So, we have to express S1(~).
SI(q~)=~[~LogPe(Y,I~,)]P.'(~IY)
"gt(i) = ~ Po'(colY) = Po'(% = ily). to . t~ t = i
It follows that equations (A.2) reduces to: V dQ(O,®') tpq~q~, ~
~.[~LogPo(ytlcue=i@,t(i)=O" i z L~¢Pq
where: LogPo(y t I~o, = i) = - L o g x ~ t r - (Yt -/1~)2/2o'2,
(A.4)
if the noise is Gaussian and
LogPo(y,l% = i) = lY, - il Log~0~ + (1 - lY, - il)Log(1 - tp~),
The solution is straightforward and gives (14.1), (14.2) and (21), when we consider the special forms (A.4) and (A.5), respectively.
(A.5)
About the Author--BERNARD CHALMOND graduated in mathematics from the University of Paris, and he received the "Th6se de 3eme cycle" and the "Th6se de doctorat en sciences" in Applied Mathematics from the University of Paris Sud, Orsay in 1979 and 1988, respectively. After spending five years as Assistant Professor of Medical Informatics at the University Medical Center of Dijon, he joined the University of Paris Sud in 1984. Presently, he is "Maitre de Conference" of Stochastics Models and Statistics. His research interests include pattern recognition, image processing and applied stochastic processes.