Cluster variation method and image restoration problem

Cluster variation method and image restoration problem

- 17 July 1995 PHYSICS LETTERS A Physics Letters A 203 (1995) 122-128 Cluster variation method and image restoration problem Kazuyuki Tanaka a, T...

547KB Sizes 2 Downloads 123 Views

-

17 July 1995 PHYSICS

LETTERS

A

Physics Letters A 203 (1995) 122-128

Cluster variation method and image restoration problem Kazuyuki Tanaka a, Tohru Morita b a Department of Computer Science and Systems Engineering, Muroran Institute of Technology, 27-1 Mizumoto-cho, Muroran 050, Japan h Department ($ Computer and Mathematical Sciences, Graduate School of Information Sciences. Tohoku University, Sendai 980-77, Japan

Received 4 April 1995; accepted for publication 10 May 1995 Communicated by A.R. Bishop

Abstract The pair approximation model with a non-uniform

in the cluster variation method is applied to the image restoration external field.

Recently, some concepts and methods of statistical physics have been applied to information and computer science. Particularly, one of the fields, in which equilibrium statistical physics is directly applied to information science, is the Bayesian approach based on the Gibbs (Boltzmann) distribution at finite temperature. Geman and Geman suggested that the Ising model is applicable to image restoration based on the Bayesian formula [ 11. This problem corresponds to searching for the ground state of a finite-size Ising model under a non-uniform external field. Geman and Geman recovered a damaged image by using simulated annealing of a spin-S Ising model. After that, Gidas proposed a new method based on a combination of the renormalization group technique and the simulated annealing method [ 21, and Zhang introduced the mean-field annealing method for studying this problem [ 31. In statistical physics, one of the effective field approximations, into which the mean field approximation is extended, is the cluster variation method (CVM) [4,5]. In the Ising model on a square lattice, the CVM can provide us with such a sequence of approximations so that the series of approximate free energies converge to the exact one [6-81. The

problem based on the king

lowest-level approximation in the CVM is called pair approximation (PA), which is equivalent to the Bethe approximation [ 91. The purpose of this paper is to give a basic algorithm for the application of the PA in the CVM to the image restoration problem and to study its usefulness. We compare the results obtained from the PA annealing method with the ones obtained from the simulated annealing and mean-field annealing methods. We consider a square lattice consisting of L = Nx N lattice sites. A set of lattice sites is called a cluster. The whole cluster of all lattice sites is denoted by the label I. The image data is given by a black and white image consisting of faces on the lattice I, where every lattice site is black or white. The black and white configurations of a lattice site are represented by the values 1 and - 1, respectively. The original image data is damaged by a random noise and then we have a damaged image data different from the original one. We try to recover the original data from the given damaged data by using the Bayesian formula. Bayesian formula. The conditional probability distribution for event B under the condition that event A

0375-9601/95/$09.50 @ 1995 Elsevier Science B.V. All rights reserved SSDlO375-9601(95)00387-8

K Tanakn, T. Morita /Physics

occurred and the probability distribution for the occurrence of event A are given by P(BIA) and P(A), respectively. Then the conditional probability distribution for event A under the condition that event B occurred, P( A (B), can be calculated by the following formula,

P(BlA)P(A) P(A’B)

(1)

= C,P(B(A)P(A)’

Here, the summation events A.

In the original image data, the configuration on the ith lattice site is denoted by si and takes the values 1 and - 1. In the damaged image data, the configuration on the ith lattice site is denoted by hi and takes the values 1 and -1. When an original image data SL} is given, the conditional probSI = {S,,SZ,..., ability distribution for a damaged image data hl = hi} is denoted by P( his,). The proba{h,hZ,...> bility distribution for the original image data SI is denoted by P(s,). In this case, the Bayesian formula is written as P(hI~I)P(~I) P(sr’hr)

=

-&

P(h,(s,)P(s,)

*

(2)

The notations C,, and Ch, mean summations over all possible configurations on the whole lattice sites I,

czc 5,&I

s,

csc /II

c...c, s2=*1

S,*=iI

c

(hi - Sj)2P(

hIIS,)

I

4

cc

= CTj

(4)

h

The sum xi gi is the average of the total number of lattice sites i where hi is different from Si. When the total deviation L@ = xi Oi is given, we consider a canonical ensemble of systems with a given ~1, and the

is deter-

P(h,ls,),

(5)

condition

(hj - sJ2P(h,)s,)

= 32,)

(6)

hl

and the normalization cP(h,ls) 81

condition (7)

= 1.

Introducing the Lagrange F”),G’~, we obtain

P(hlsd = exp

F”’

_

multipliers

Ci(hi-si)’ I

l/T,

and

)1 /Tt t

(8)

where l/T,

and F”)/T,

are determined

by (6) and

(7). Next we consider the distribution function of the original image data SI. Here, we assume that the original image consists of faces, where S; for a lattice site i is usually the same as Sj for j around i as a priori bias. We denote the probability that si and Sj are different for the nearest neighbors i and j by ai,i in the set of ~1, so that (Si -

S,j)*P(SI)

= Oij,

(9)

s,

(3)

I%=*1

P(hrls,)ln 111

under the subsidiary

4 c

For an original image data ~1, we have a damaged image data h,, and denote the conditional probability distribution of ht for the original image data st by P( h,js,). We denote the probability that the damaged image data hr is deviated from the original SI at lattice site i by (+i,

4’

S”’ E -C

L

c...c. /I,=+1 h2=fI

conditional probability distribution P( h/s,) mined so as to maximizing the entropy,

i

CA is taken over all possible

123

Letters A 203 (1995) 122-128

for the nearest-neighbor pair of the lattice sites i and j. We use the notation Co, meaning summation over all nearest-neighbor pairs of the lattice sites i and j. The sum cCij, Uij is the total number of nearest-neighbor lattice sites i and j where si and Sj are different from each other. &,, Uij means the sum of the lengths of the boundaries between black faces and white faces in the original image data. As SI, we shall consider those in which cCij, Vij takes the value L252, where L2 is the total number of pairs of nearest-neighbors and is equal to 2( L - N) in the present lattice. With this restriction, we require P( SI) to fulfill

cc

$

(ii)

.m

(Si -

Sj)*P(SI)

= L2*7,7

(10)

124

K. Tanaka, T. Morita / Physics Letters A 203 (1995) 122-128

and

c

P(q)

properties. The average for the quantity Q in the total system I is defined by

= 1.

(11) (16)

Introducing

a canonical

ensemble,

and making the en-

tropy

Sc2’= - C

P(sr)ln

(12)

P(si)

s,

maximum with the subsidiary (ll), weobtain P(q)

F’2’ -

= exp

conditions

i(si-ss.i)2 c (ij)

(10)

1

/T, >

and

(13)

1

where l/T2 and F(2)/T2 are the Lagrange multipliers to be determined by ( 10) and ( 11) . TZis the parameter for the sum of the lengths of the boundaries in the original image data. By substituting (8) and ( 13) into (2), we obtain

P(s~lh) = exp

K )I FI -

+JC(Si -

Sj)2

Cij)

--

:

c i

(hi

--

sij2 IT

(14)

(

where J and T are defined by J = Tj /Tz and T G TI. T/J is a parameter for the sum of the lengths of the boundaries in the original image data and T is a parameter for the fraction of error due to noise. The ratio TI /T2 is approximately equal to arctanh( 1 2e-1) /arctanh( 1 - 282), and hence we can determine J from the given image datas {hi} and {si}, by estimating (~1 and i?z, where Crl is the fraction of error i/hi - s;j and 52 is the average of ilsi - sij for the nearest neighbors i and j in the image. These values @I and ii2 are recovered from the damaged data with a priori bias for the original image. It may be appropriate to try various values of J to see whether they give a reasonable result. When hl is given, we seek st which makes P( SI\ hI) maximum. This amounts to finding the ground state of the Hamiltonian given by Hi

3

fJC(Si-Sj)2+~C(hi-S~)2. (ij)

If we have a unique minimum

(15) i

configuration, we expect that the ground state determines the low-temperature

If(si)tE+l or-l,wecanuse{(si)i, (s2)1,. . . , (s~)I} instead of the ground state configuration st. We shall use the CVM to calculate { (si)r, (s&, . . . , (sL),} at low temperatures. In order to recover an original image data from a damaged image data hl, we search for the configuration st which gives the maximum value of the probability distribution P( srlhI) under a given hl. This probability distribution is equivalent to the distribution for an Ising model on a finite size square lattice under a non-uniform external field ht and we adopt it as the posterior probability distribution giving a posterior image when the degraded image is given. The image restoration problem is reduced to searching for the ground state or the low-temperature averages ((si)t } of the Ising model. In the Ising model under a uniform external field (h, = 1 for any i) on an infinite-size (L -+ -too) regular lattice, the ground state configuration can be exactly obtained when the system has translational and reflectional symmetries [lo]. Since the Ising model treated in the image restoration problem has no translational and reflectional symmetries, we must treat all possible 2L configurations. It is known that low-temperature properties can be achieved by the process of annealing. In this process, we now propose to use an approximation of the CVM. Hereafter, we write P (s,j hl) as pt (st) . We now note that the probability distribution ~t( si) given by ( 14) satisfies the following variational principle, FI =min,,

(Et -TSl),

(17)

Et = trt HI(~I)~I(~I),

(18)

where

and .SI = -trr pi(si)ln

PI(Q).

(19)

Here the notation tri stands for c,,. Since s’ = 1 and hf = 1, H,( ST) given by ( 15) can be replaced by HI

= -JCS~S,~ (ij)

- Chisi. I

(20)

K. Tanaka, T. Morita / Physics Letters A 203 (1995) 122-128

125

j. Clusters are labeled as 1y, which represents i or ij in the PA. If the whole cluster of all lattice sites is as shown in Fig. la, the sets of nearest-neighbor pair clusters ij and of single lattice site clusters i are given in Figs. lb and lc. The set of nearest-neighbor pair clusters ij is called the basic cluster set and is denoted by B. The single lattice site clusters i are all the nonempty clusters consisting of common sites of two or more clusters in B. For each cluster (Y, the set of random variables are denoted by s,. Explicitly, we write Sij = {~i,.sj}. The probability distributions for the clusters ij and i, are defined by the following rePij(si,sj) and dsi), ducibility equations,

(a>

ml2

l--bf-t8 2

1

3

4

(b)

]6?

15:

1112

1011

910

17:

1812

ITJ233yJ48 m-m cc> 0

0

9

5

l

0

10

6

0

0

11 l

7

l

12

Pij(Si9

Sj)

-

Pi(Si)

E trr\i

Q\ij

pI(sI)9

PI(Q),

(22)

respectively. Here, trr\ij and trt\i are summations over all possible configurations on the whole lattice sites except that the configurations for the nearest-neighbor pair of lattice sites ij and the single lattice site i, respectiveIy, are fixed. The averages of a quantity Q in the cluster (Y= i, j, ij, I are defined by

(Q)a = tr, Qpa(s,)

(a = i, j,ij,I>.

Using the reducibilities, as

the energy .!Zrcan be rewritten

EI = EPaii,{Pi3 Pij}

8

!E -

C

J(SiSj)ij

-

1

a

2

3 l

0

4

calculation is taken with respect to function pt(st) under the subsidiary

= 1.

(21)

Let cluster i be at a single lattice site i. Let cluster ij be at a nearest-neighbor pair of the lattice sites i and

entropy

Spair{Pi,Pij}

~P~r{Pi,Pij}-_CSi+C(Sij-S,-S,),

is

(25)

L

(ii)

where S, f -tr,

p,(s,)ln

The variational

p,(s,).

principle

fi = minpi,pi, (EP&i.

trr Pl(Sl)

(24)

hi(S

In the PA, the approximate given by

Fig. I. Example of clusters I, ij and i. (a) The whole cluster of all lattice sites I. (b) The nearest-neighbor pair clusters ij, These clusters are subclusters of I. The set of all clusters is called the basic cluster set and is denoted by B. (c) The single lattice site clusters i. These ate all clusters consisting of common sites of two or more clusters in the basic cluster set B.

The variational the distribition condition

C

i

(G) l

(23)

(26)

in the PA is Pij)

--TsPtir{Pi9

Pij)).

(27)

The variational calculation is taken with respect to the distribution functions pi and pij under the subsidiary conditions

126

K. Tanaka, T. Morita /Physics

(b)

Letters A 203 (1995) 122-128

(e>

r

8r Fig. 2. Image restoration by PA annealing for / = 0.70. (a) Original image data (82 = 0.07539). (b) Damaged image data (Cq = 0.21052). (c) T = 4.00 (R = 0.18213, RI = 0.02978, Rz = 0.28805). (d) T = 3.00 (R = 0.12327. RI = 0.08864, R2 = 0.19275). (e) T = 2.00 (R=0.08310, R, = 0.14404, R2 =0.12020). (f) T=0.40 (R= 0.05956, RI =0.18490, RZ =0.07752).

K. Tanaka, T. Morita / Physics Letters A 203 (I 995) 122-128

tr, p,(s,)

(Sj)j =

(a = i,j,ij),

= 1

(28)

127

(a)

(Sj)j = (Sj)ij,

(Si)ij,

for every pair of nearest-neighbor lattice sites i and j. We remark that the consistency conditions (29) are equivalent to the reducibility conditions pi(si) = trj pij(si,Sj) and Rj(Sj) = tri Rij(Si,Sj) [II]. By variational calculations, the one-body distribution functions pi( si) and Rj (Sj) and two-body distribution functions Rij (Sir sj) are expressed as

hi,isi)/Tl,

Pi(Si) =exp](E+

Pj(Sj>= exP[(Fj + hj,jsj)/Tlv

(30)

(b)

and pij (Si, Sj) = exp [ (Fij + Js~s,~+ hi,ijsi + hj,ijSj)

/T] . (31)

The unknown parameters Fi, Fj, Fij, hi,ij, h,ij, hi,i and hj,j are determined by (28) and (29) and the following relations,

( l - Zi) hi,i + (1 -

Zi)hj,j

C’hi,ij

+

hi,

=

C'hj,ij

=

hj.

(32)

i

Here, zi is the number of nearest-neighbor lattice sites of the site i and Cj’ is the summation over all the nearest-neighbor lattice sites j of the fixed i site. When the original image data given in Fig. 2a is changed to the damaged image data with a random noise as given in Fig. 2b, the process with which the damaged image data is recovered is shown in Figs. 2c-2f, by decreasing the parameter T using the PA. The set of original image data is denoted by {so”‘}. The measures of the restoration rates R, RI and R2 are defined by RE

c

&[S/‘)

- Sign((si)i)]2,

i

R1 E C

&[hi

- sign((si)i)]*,

i

R2 E x-I-[sign((si)i) (~) 1 4L2

- sign((sj)j)12,

(33)

Fig. 3. Restoration by mean-field and simulated annealings for J = 0.70 and T = 0.40: (a) Mean-field annealing (R = 0.06094, RI = 0.18006, RZ = 0.08037); (b) simulated annealing (R = 0.06163, RI = 0.18906, Rz = 0.07468).

respectively. When the original image data is perfectly recovered, we have R = 0, RI = 51 and R2 = 32. We start from the initial value, T = 5.0, and decrease to the final one, T = 0.40. This operation is done by an annealing procedure based on the PA, which may be called PA annealing procedure. We can see that the obtained distribution for sign( (Si)i) at T < 0.50 corresponds to the most probable configuration si. In the original image given in Fig. 2a, 82 = 212/2812-0.07539. In the damaged image given in Fig. 2b, 5-1~0.21053. If we try to estimate the values of J/T and T, we have J/Tzarctanh( 1-2a2) E 1.2533 and T = l/arctanh(l - 2Z?r)~l.5132 and hence J-1.8965. We need to know the properties below T less than 1.5132. If we reduce T one third of this

128

K. Tanaka, T. Morita /Physics

value, we require J/T-l .2533 in the result, and hence we choose about one third of the value 1.8965 for the value of J. In Fig. 3, we give the final results which were recovered by using a simulated annealing and a meanfield annealing based on the following simultaneous non-linear equations, (S;)i = tanh

hi + C’J(sj)j j

>

1

/T .

In the simulated annealing method, we have adopted the heat bath method and took 10000 steps as the Monte Carlo step. In this paper, we apply the PA in the CVM to image restoration problems and compare the results with those obtained by using the simulated and the meanfield annealings. The results obtained from the PA are closer to the ones from simulated annealing than to those from mean-field annealing. This fact means that the PA annealing is better than the mean-field annealing for searching for the most probable configuration in the Ising model given in (15) or (20). There are two important points: (i) The most probable configuration must be calculated for a posterior probability distribution as correctly as possible. (ii) The most suitable model must be adopted as a posterior probability distriution. In the CVM, the result is expected to improve successively as we adopt a larger basic cluster set. The generalized algorithm of the CVM for the Ising model with finite-range interactions under a non-uniform ex-

Letters A 203 (1995) 122-128

ternal field is given by Morita [ 121. In the restoration of more complicated images, a posterior probability distribution must include second-neighbor pair interaction terms and three- or four-body interaction terms. When treating these models, we need to adopt the CVM with a larger basic cluster set, for example, the square approximation. Moreover, if we adopt the Ising model which includes a line process [ I] as a posterior probability distribution, we will obtain more clearly recovered image datas. One of the authors (K.T.) is grateful to Professors Junji Maeda and Yukinori Suzuki for valuable discussions.

References [ I ] S. Geman and D. Geman, IEEE Trans. Pattern Anal. Mach. Intell. 6 ( 1984) 72 1. [ 21 B. Gidas, IEEE Trans. Pattern Anal. Mach. Intell. 11( 1989) 164. [ 31 J. Zhang, IEEE Trans. Signal Process. 40 (1992) 2570. [4] R. Kikuchi, Phys. Rev. 81 (1951) 988. [ 51 T. Morita, J. Phys. Sot. Japan 12 (1957) 1060; J. Stat. Phys. 59 ( 1990) 819; Prog. Theor. Phys. Suppl. 115 ( 1994) 27. [6] A.G. Schlijper, Phys. Rev. B 27 (1983) 6841; J. Stat. Phys. 40 (1985) I. [ 7 J R. Kikuchi, Prog. Theor. Phys. Suppl. 115 ( 1994) 1. [S] T. Morita, Prog. Theor. Phys. Suppl. 80 ( 1984) 103. [9] H.A. Bethe, Proc. R. Sot. A 150 (1935) 552. [lo] T. Morita, Physica A 133 (1985) 173. [ 111 T. Morita, J. Stat. Phys. 34 (1984) 319. ( 121 T. Morita, Prog. Theor. Phys. 85 (1991) 243; Phys. Lett. A 161 (1991) 140.