Multiplicative iterative algorithms for convex programming

Multiplicative iterative algorithms for convex programming

Multiplicative Iterative Algorithms for Convex Programming P. P. B. Eggermont Centre for Mathematics and Computer PO Box 4079 1009 AB Amsterdam, The N...

789KB Sizes 5 Downloads 124 Views

Multiplicative Iterative Algorithms for Convex Programming P. P. B. Eggermont Centre for Mathematics and Computer PO Box 4079 1009 AB Amsterdam, The Netherlands and Department

of Mathematical

Science

Sciences

University of Delaware Newark, Delaware 19716

Submitted by Tommy Elfving

ABSTRACT

We study multiplicative iterative algorithms for the minimization of a differentiable, convex function defined on the positive orthant of R!“. If the objective function has compact level sets and has a locally Lipschitz continuous gradient, then these algorithms converge to a solution of the minimization problem. Moreover, the convergence is nearly monotone in the sense of Kullback-Leibler divergence. Special cases of these algorithms have been applied in positron emission tomography and are formally related to the EM algorithm for positron emission tomography.

1.

INTRODUCTION In this paper we study multiplicative

iterative

algorithms

for the mini-

mization problem minimize

1(x)

subject to

x zo,

(1.1)

where 1 is a convex, continuously differentiable function on RN with compact level sets and locally Lipschitz continuous gradient. The interest in such algorithms is sparked by the emergence of the EM algorithm for LINEAR ALGEBRA AND ITS APPLICATIONS

130:25-42

(1990)

25

0 Elsevier Science Publishing Co., Inc., 1990 655 Avenue of the Americas, New York, NY 10010

0024-3795/90/$3.50

26

P. P. B. EGGERMONT

maximum

likelihood

estimation

in positron emission

tomography

(Shepp

and

Vardi [ll]) and by the more or less ad hoc variation proposed by DaubeWitherspoon and Muehllehner [3]. The EM algorithm belongs to a class of algorithms which arise as the method for successive substitution for the complementarity equations of the Kuhn-Tucker conditions for a solution of (l.l), viz., x is a solution of (1.1) if and only if

W(x) > 0,

x > 0, VZ(X)]

Xj[

see e.g. Mangasarian

Xr+l=

j

j=1,2,...,N;

0,

=

[8]. The resulting Xr{l-

(1.2)

algorithms

CfJ,[V2(X”)]

have the form

j=1,2

j},

>..., N,

(1.3)

where w, is a relaxation/steplength parameter. Unfortunately these algorithms are rather complicated to analyze, even in the special case of the EM algorithm, (Vardi et al. [14]). 0 ne notable feature of the convergence proof of the EM algorithm is the predominant role played by the Kullback-Leibler informational divergence between two vectors in IQ,“, which we feel is only partly explained by the fact that the negative log-likelihood function up to a constant can be written as a Kullback-Leibler divergence [14]. The alternative interpretation is to consider these multiplicative algorithms as certain approximate “proximal point methods” for (1.1) (Rockafellar [lo]), viz., x n+l is determined/interpreted as an approximate positive solution of the equation Xj

or equivalently,

+

WnXj[Vl(X)]

j=1,2

j=Xj">

via the Kuhn-Tucker

conditions,

>..., N, as an approximate

(1.4) solution

of the problem

where

d(xlly)

minimize

Z(X) + w, 'd(x"llr

subject

X20

to

is the Kullback-Leibler

divergence

)

(1.5)

(see [7] or [14])

N

d(xlly)=

c j=l

xjbrg;+yj-xj. 3

(1.6)

MULTIPLICATIVE ITERATIVE ALGORITHMS

27

Since d(xl]t~> 2 0 always, and d(x ]]t~) is convex in both x and y, there is at least a superficial similarity between the problem (1.5) and the standard proximal point algorithms [lo]. We will not exploit this connection, but it is obviously a very intriguing one. Besides the “exact” algorithm-i.e., x n+l is the unique positive solution of the equation x;+‘(1+

-we

w,[ VZ(x”f’)]

can now state the more practical

.;+I

approximate

as the implicit

,..., N,

(1.7)

method

j=1,2

=

We refer to these two algorithms respectively. Witherspoon

j=l,2

j} = x;,

I.‘.,

N.

(l-8)

and the explicit

After appropriate scaling, the ISRA and Muehllehner [3] is of the form (1.8).

method

algorithm, of

Daube-

The implicit algorithm (1.7) has a decided theoretical advantage, but is not an applicable algorithm as is, whereas (1.8) has the same cumputatio~lal complexity as the algorithm (1.3) and is still easy to analyze. The key ingredients in the convergence proof for the implicit algorithm kinds of montonicity. On the one hand we have that

l(x”) > z(x”+‘)

unless

Xn=Xn+l

(1.7) are two

(1.9)

as well as the unexpected

d(x*llx”)

> cl(x*llxn+l),

(1.10)

where x* is any accumulation point of (x”},,, thus showing that there is at most one accumulation point. The existence of such an accu.mulation point follows from the boundedness of (x”}, [from (1.9) and the compact level sets assumption on Z], and is easily shown to be a solution of our minimization problem (1.1). Th e analysis of the explicit algorithm (1.8) is only slightly more complicated, but we need to make a suitable choice for w,. Apart from establishing the crucial inequalities (1.9)--(IlO), these convergence proofs are virtually identical with the one given by Vardi et al. [14, Appendix] for the convergence of the EM algorithm.

P. P. B. EGGERMONT

28

Upon reflection, the difference between (1.3) and (1.7) is precisely the difference between the explicit and implicit Euler methods for the numerical solution of the system of ordinary differential equations

dxj_- at

xj[

vz(x)]

j>

j=1,2

,...,

N.

(1.11)

Indeed, the continuous analogues of the inequalities (1.9) and (1.10) are easily established. In the parlance of stability theory for ordinary differential equations these inequalities say that Z(x) and d(x*]]x) are Lyapunov functionals for the differential equation (1.11). For more details, and applications, see Hofbauer and Sigmund [4]. On the theoretical side this study sheds new light on the emergence

of

the Kullback-Leibler divergence in the theory of multiplicative iterative algorithms. There is a modest contribution on the practical side in that a special case of the algorithm (1.8) is the image space reconstruction algorithm of Daube-Witherspoon and Muehllehner [3] for emission tomography: the algorithm converges regardless of whether the solution is unique or not. In the case of uniqueness the convergence had been shown by de Pierro [5]. (In a later paper [6], de Pierro improves on this; see Section 6.2.) It should also be noted that an inequality

similar to (1.10)

for the original

multiplica-

tive iterative algorithm (1.3) d oes not seem to be available, except for the EM algorithm. See Vardi et al. [14. Appendix] for a complicated proof.

2.

ASSUMPTIONS In this section we state the precise

assumptions

and make a choice for the relaxation parameters let ]].()a (]I* ]lm>denote the Euclidean three assumptions about 1:

ASSUMPTION2.1.

The objective

tiable and convex on Ry,

(the maximum)

function

function

l(x)

We

norm on RN. We make

is continuously

differen-

so for all X, y E [w:, (VZ(x)-VZ(y),x

cf. Mangasarian

on the objective

in the explicit algorithm.

- y>>

0;

[B]. Here (. , . ) is the usual inner product on RN.

(2.1)

MULTIPLICATIVE

ITERATIVE

ASSUMPTION 2.2.

29

ALGORITHMS

The level sets of 1 are compact,

i.e., for every

y E Ry

the set

{xE rwy:2(x)
(2.2)

is compact.

ASSUMPTION 2.3.

The

objective

function

continuous gradient, i.e., for every compact constant L
Z(X) has a locally subset

Lipschitz

C of rWy there

exists a

(2.3) We now choose o n = w(x”), where

W(X) =

the relaxation

parameter

o,

in the algorithm

[max{M,2llVZ(x)ll_,ll2~ -Vz(x)lI,)]-la

in which M is an arbitrary, VZ(x) for the line segment

fixed constant,

(x-tD(x)VZ(x)

and

L

is the Lipschitz

:o~t~(2~~vz(x)ll,)-1).

(1.8) as

(2.4)

constant of

(2.5)

Here D(x) is the diagonal matrix with diagonal elements xi, xs, . . . , xN. For the algorithm (1.7) any w, will do, as long as inf w, > 0. The existence of x”+’ > 0 as “defined’ by (1.7) follows from its interpretation as the unique, positive solution of (1.5): note that d(x”(( x ) is strictly convex in x and tends to infinity as x does so. It is obvious that we need that 1 + w,,[ VZ(x)]j >‘O for

' is a reasonable choice. The modification (2.4) all j, so wCx>< (2llVZ(x>IIJ is motivated by the desire that Z(.r”)> Z(X~+‘); see Lemma 3.1. We comment briefly on the above assumptions. Assumptions 2.1 and 2.2 are pretty much essential. Assumption 2.3 justifies the particular choice of w(x) for which we can prove that {Z(x")}, is decreasing. Any alternative version of Assumption 2.3 and choice of o(x) for which this holds true should be fine, but some of the details will be different. An example in point is the treatment (in Section 6) of the image space reconstruction algorithm of Daube-Witherspoon and Muehllehner [3] with an implicit choice of o,

30

P. P. B. ECGERMONT

which is unlikely to satisfy the above assumptions. It seems a reasonable assumption that if o, is determined by line minimization, then a similar treatment will be possible.

3.

THE DOUBLE

MONOTONICITY

OF THE IMPLICIT

ALGORITHM

In this section we give the deceptively simple proofs of the monotonicity relations (1.9) and (1.10). We define the mapping H: Iwy + Iwy as follows. For x > 0 we let y be the positive solution of the equation Y+

4x1 NY)WY) = x

(3.1)

and define H(x) as H(x) = y LEMMA3.1. Proof.

For any x E rWT we have Z(x) > Z(H(x))

Let y = H(x).

Z(x)- Z(y) a

By convexity

(Vl(Y),X

unless x = H(x).

we have

- Y>'~(X)(VZ(Y),D(Y)VZ(Y)).

Since y > 0, this shows that Z(x)case x = y = H(x).

Z(y) > 0 unless

D(y)VZ(y)

= 0, in which W

We state the second monotonicity be any number with

result in slightly different

form. Let m

(3.2)

inf(Z(x):x>O}
&t-

dCx*(lx “+‘) for all n.

{x*E rWy: Z(x*)

<

m}.

(3.3)

of (1.1) is in a, so that CR is not empty. x*~Ck.

For

any

x”>O

we

have

d(x*llx”)>,

MULTIPLICATIVE

ITERATIVE

31

ALGORITHMS

Proof. Since x0 > 0, so are all x”, and thus d(x*]]x”) all 12. Now

d( x*11x”) -

d(x*llxn+l) =

cxy log ?-

is well defined

for

i-l+1

++$?+i

x;

.i

Since log(l + t)- ’ = - log(I + t) > - t, we thus get that the above expression is greater than or equal to

W,(VE(Xn+l),Xn+l-Lx*)> W,[Z(*n+l) - z(x*)] )

with the last inequality due to convexity. expression is nonnegative. The general pendix]. we omit

4.

By the choice of x* E Q the last n

convergence of the implicit algorithm is now easily shown along the outline of the convergence proof for the EM algorithm [14, ApSince the proof is essentially the same as for the explicit algorithm, it here.

MONOTONICITY

OF THE EXPLICIT

ALGORITHM

In this section we prove the monotonicity results for the explicit rithm. They are slightly messier than for the implicit case. Let G : rW7+ iRy be defined as

j=1,2

By the choice of w(x), obviously xj = 0.

,..., N.

algo-

(4.11

G(x) > 0 if x > 0, and [G(x>]j = 0 only if

P. P. B. EGGERMONT

32

LEMMA 4.1. Proof.

For al2 r z 0 we have Z(x) > E(G(x)) unless x = G(r).

From the convexity

Z(x)-

Z(G(x))

of Z we have

> (VZ(G(x)),x

-G(r)),

which equals

(VZ(x),x Since

-G(x))-

(VZ(x)-VZ(G(x)),x

VI is locally Lipschitz

continuous,

-G(x)).

this expression

is greater

than or

equal to

xj~[Vz(“)ljle i ‘-

=w(x)~ for the appropriate

l+W(X)[VZ(X)]j

constant

15. By the choice

;W(X)C *jl[VZ(X)] and the conclusion

COROLLARY

convergent Proof.

of w(r)

(4.3)

proceeding

with the

{Z(x”)}, is decreasing.

4.3. {x"}, is bounded,

and every subsequence

subsequence. Since

(4’2)

U

follows as in the proof of Lemma 3.1. of this lemma before

1

this dominates

D(x)vz(X)),

jI"=+OJ(~)(Vz(r),

We state some consequences second monotonicity result. COROLLARY 4.2.

w(x)Lxj

l+ W(X)[VZ(X)] j

Z(xn> 2 Z(x “+ ‘>, then for every n, X” E

which is a compact set.

{x> O:Z(x)

Q

Z(xO)],

itself has a

MULTIPLICATIVE

COROLLARY

ITERATIVE

4.4.

Let

V(x)=

ALGORITHMS

w(x)(VZ(x),

33

D(x)VZ(x)).

Then

V(x.)<4[Z(x)-Z(G(x))].

This last result just summarizes the proof of Lemma 4.1. We now introduce the following notation, For x E Iwy and y > 0 let

4xllv) = d(xlly)+8

l(Y) - Z(x) M

(4.4)

.

Note that e(x 11 y) is a measure for IIx - y l[s, provided

x solves the minimiza-

tion problem (1.11, or is otherwise known to satisfy Z(r) 6 Z(y). Let m and R be defined by (3.2)-(3.3) with lx”}, generated explicit algorithm.

Let x* E Q. Then e(x*(lx”)>,

LEMMA 4.5.

Proof. Since defined. Again

e(x*((x”+l)

by the

for all YL

x0 > 0, we have rn > 0, for all n, and so d(x*llx”)

is well

?I+1

d(x*llr”)-d(x*llr”+l)=~r~log~+xl’-r;” j

Once more, log(l+

t)-’

> - t, and so the above expression

on{VZ(r”), xnBy the choice

I

x*) -

cj

dominates

~~~~n[Vz(r”)l j12 l+w,[vz(xn)]j

of w, the last sum is dominated

by 2w,V(x”)

.

(see Corollary

P.P.B.EGGERMONT

34 4.4) and thus by 8[Z(xn)-

Z(x”+‘)]/M.

We have thus shown that

where we have once more used convexity. follows.

Since

5.

ALGORITHMS

CONVERGENCE

OF THE EXPLICIT

I(%“) > Z(x*), the result n

With the Lemmas 3.1-3.2 and 4.1-4.5 in hand, the convergence proofs of the two algorithms are virtually identical We will just consider the explicit algorithms, since this is the slightly more complicated case. The outline of the proof is from Vardi et al. 114, Appendix]. LEMMA 5.1. The sequence converges.

{x”},

generated

by the explicit

algorithm

Proof. By Corollary 4.3 the sequence {x”),, has a convergent subsequence {x”‘}k with limit x*, say. Then Z(x*) = limk em Z(x”~). We may now apply Lemma 4.5 to our x*; thus e(z*JJx”)>/ e(x*]{x”+‘) for all n. Now for our subsequence {x”“} we have e(x*]]x”‘) + 0 as k + Q), and by the monotonicity then lim e(x*](x”) n-+m Then also d(x*jlx”)

+ 0, and the convergence

= 0. of {x”}, to x* follows.

n

We finally need to show that the I* obtained above solves the minimization problem (1.1). First we have that x * is a fixed point of the iteration x n+l = G(x”), so that

Xj*[VZ(X*)]j=O, j=1,2,...,N.

(5.1)

We have of course also that xi” > 0,

j=1,2

,..‘, N.

(5.2)

MULTIPLICATIVE

ITERATIVE

35

ALGORITHMS

In order to show that VZ(x*) > 0, consider for a fixed XJ’+~/X~. Since {x”), is bounded, it follows that liminf,,, liminfw,[VZ(x”)]j n+m It follows that [VZ(r*)]j

j

the ratios r” < 1, or

rn =

> 0.

> 0. Since j was arbitrary.

[vz(x*)]j>o,

j=1,2

,...)

N.

Thus, x* satisfies the Kuhn-Tuckner conditions, and is a solution minimization problem (1.1). We have thus proven

(5.3) of the

The sequence Ix”), generated by the explicit algom’thm THEOREM 5.2. starting from an x0 > 0 converges to a solution of the minimization problem (1.1). It should be remarked that in the above proof we can show that x* satisfies the Kuhn-Tucker conditions only after we have shown that {x”], converges to x*. In the absence of Lemma 4.5 we do not have this convergence, and indeed it does not appear to be easy to prove that x* is optimal only from knowing that a subsequence of (n”) converges to x* [so 2(x*) = lim” _m Z(xn)], without some further assumptions on Z(x).

6.

APPLICATIONS

TO LINEAR

MODELS

In this section we discuss some applications of multiplicative iterative algorithms. We consider the following class of convex programming problems. Suppose equations

some physical phenomenon

Ax=b,

is modeled by the system of linear

(6.1)

where A E iWMxN is a nonnegative matrix with nonzero column sums, the data vector b eIWM is nonnegative and x E rWy describes the physical quantity to be identified. Due to all kinds of approximations in the modeling process (linearization, statistical effects), the notion of a good solution of the system (6.1) needs to be clarified. A common choice is to interpret (6.1) in

36

P. P. B. EGGERMONT

the least squares sense. The “solution” of the minimization problem

Slightly

more generally,

of (6.1) is defined to be the solution

minimize

1IA.x- Z7llg

subject to

x >, 0.

let L(y)

(6.2)

be a strictly convex function

on WY, with

compact level sets and locally Lipschitz continuous gradient. Typically, will also depend on the data vector b, but we suppress this dependence the notation. Now choose as interpretation of (6.1) minimize

L( Ax)

subject to

X > 0,

L in

(6.3)

and we are interested in applying the explicit multiplicative rithm to the function Z(X) = L(Ar). We first show that assumptions of Section 2. First of all, it is easy to show that and has a locally Lipschitz continuous gradient. It should be

iterative algo1 satisfies the Z(x) is convex remarked that

Z(X) is not strictly convex if A has a nontrivial nullspace. There remains the question of compact level sets of Z(X). To settle this, note that for fixed y E rWy the level set of L

is compact,

and so is its intersection

with

Thus it follows that for every z E rW$’ the set

{Ax : Z(x) =sZ(z),

x E

lq}

(6.4)

is compact. Since A is nonnegative and has nonzero column sums, we have that N(A)n lRy = {O}, where N(A) = {x E IWN: AX = 0) is the nullspace of A, and so the compactness of (6.4) implies the compactness of

(x E i.e.,

2 has compact

level

sets.

rwy:Z(r) Thus

<

Z(z)},

Z satisfies

all our assumptions.

The

MULTIPLICATIVE conclusion

ITERATIVE ALGORITHMS

37

is that the algorithm

XO>O, n

,;+I

= 1+

o.[A&(Axn)]j ’ j=1,2

converges to a solution of the minimization problem We now discuss a few specific examples.

>...> N,

(6.3).

6.1. Constrained Least Squares We already mentioned the minimization problem (6.2). It is not clear whether the original multiplicative algorithm (1.3) converges in case N(A) # (0). In case N(A) = (0) the solution of (6.2) is unique, and it is not hard to show that then convergence follows. The explicit multiplicative algorithm

n

“q+1=

converges

6.2.

j=1,2 ,..., N,

l+~,[AT$x”-b)]jy

for appropriate

w,, whether

(6.5)

A has full column rank or not.

Image Space Reconstruction Algorithm (ZSRA) If we apply a diagonal scaling in the constrained

least squares problem,

we get

minimize

lIAD( A) y - b 11:

subject to

y 2 0,

(6.6)

where D(h) is a diagonal matrix with diagonal elements A,, A,, . . . , A, which are all strictly positive, and we take r = D(A)y as our solution of (6.1). The algorithm (6.5) applied to (6.6) gives the iteration formula n

,;+I

= l+w”Aj[Ar(:D(A)yn-h)li.

j=1,2

,.**, N,

P. P. B. EGGERMONT

38

and scaling back to xn,

,..., l+0.“,[A$4X”-b)]jT n

xy+’

=

3

In emission

j=1,2

tomography

f

N.

the column sums of A are normalized

(6.7)

to one,

j = 1,2 ,..., N,

aij=l,

i=l

and the above algorithm is applied with resulting algorithm has the elegant form

,;+I

This

algorithm

=

is the

Daube-Witherspoon we have

[ATbIj x; [AT~“lj, image

space

and Muehllehner

w, = I and

hj’

= [ATbIj.

j=l,c...,N.

reconstruction

The

(6.8) algorithm

(ISRA)

of

131. de Pierro [5] proves that for (6.8)

IlAx” - blJ; > IlAP+'

- b/l;,

(6.9)

and proves convergence when the solution of (6.2) is unique. When there is no uniqueness, it is tempting to appeal to Lemma 4.5 and Section 5. However, Lemma 4.5 has been proved for a restricted choice of o, only. We prove here an analogue of Lemma 4.5 for the present circumstances. First we quote the result (6.9) as proved by de Pierro [5].

LEMMA

6.1.

ments xr/[ATAx”]j,

JJAx” -

Let D, E [WNxN be the diagonal j = 1,2,. . ., N. Then

@I; - (l&n+1 - bl\; >, (r”

matrix with diagonal ele-

- xn+‘, D,+”

-

xn+‘)).

In order to formulate the analogue of Lemma 4.5, we let x* denote any accumulation point of (x “),, and we let A E (WNXN denote the diagonal

MULTIPLICATIVE ITERATIVE ALGORITHMS matrix with diagonal elements

e(x*llr)

LEMMA6.2. Proof.

hj = [Arb]j,

= d(Ax*)lhx)+

39

j = 1,2,. . . , N. Finally

bll; - &4x*-

&4x -

let

big.

e(x*((x”) > e(x*J1x”+l).

As before,

we have

II+1

d(Ax*llAx”+‘)

d(Ax*llhx”)-

= c

hjrT

log s

Sincelogt=-logt-‘al-t-

+ Aj(xj” - x7+‘) 3

j

‘, the last expression

dominates

which equals

c

Aj(x; - x;+l)(x;+l xn+l

j Since

Aj/xjn+l

I

= [ATAx”]j /xJ,

(x” - xn+l,

y(yn+I

- x;)

the above expression

-x*))=

is equal to

- (x” - Xn+l, D,‘(*”

- X”+l))

+(x”--n+l,D~‘(Xn-X*)).

Since

D;‘(x”

- xn+l) = AT(Ax” - b), the last inner product can be written

as (x” -

x*, AT&”

- b)),

40

P. P. B. EGGERMONT

which, by convexity,

dominates

By Lemma 6.1, this last expression have shown that

d(hr*llAx”)-

is nonnegative.

d(Ax*)(Ax”+t’)> - (+

which, again by Lemma

Summarizing

xn+l, Dn-‘(xn-

now, we

xn+l)),

6.1, dominates

- (px”

-

bll; + IIAx”+l - b&

and the lemma follows.

H

The above two lemmas now imply the convergence by the material

in Section

of the algorithm

(6.8),

5.

It should be remarked that (6.8) has been applied in remote sensing for the reconstruction of temperature profiles (see Chahine [l], Chu [2], Twomey [13], and references therein), but then in the form [cf. Equation (6.1)]

q+l

bj

=x”-



j=1,2

,..‘,

N.

(6.10)

[AX"]j'

This must be regarded with even more skepticism than (6.8), since the above algorithm necessarily generates an oscillating sequence {x”), if the system (6.1) is not consistent. The same criticism would apply to the variation proposed by Twomey [12]. Note that de Pierro [5] considers

the minimization

minimize

(x,Mx)-2(b,x)

subject to

X20

problem

with M positive definite and M > 0 elementwise. Our theory applies to this case as well, even when M is only nonnegative definite (but still M > 0 elementwise), since then the level sets remain compact. In [6], de Pierro studies the above minimization problem without the assumption of definiteness (convexity), but with M 2 0 elementwise, and with strictly positive diagonal.

MULTIPLICATIVE

ITERATIVE

41

ALGORITHMS

6.3.

Maximum Likelihood Estimation in Emission Tomography In emission tomography the problem can be stated as the minimization of Z(x) = d(bllAx) over rWy. Here d(*I(.) is the Kullback-Leibler divergence (1.6). The EM algorithm of Shepp and Vardi [ll] has the simple form ~7”

with rt”=bi/[Ax”li, in this case

= Xy[ Arr^]j,

i=I,2

,...,

j=I,2

>...1 N,

(6.11)

M. It is a special case of the algorithm (1.3);

‘j n+l=~~{l-~,(l-[ATrn]j)),

j=1,2,...,N.

(6.12)

Vardi et al. [ll] were able to prove Lemma 4.1 (easy) and Lemma 4.2 (hard) in this case, but it only works for 0 < w, < 1. However, Lcwitt and Muehllehner [9] in their experiments advocated the choice w, = 4 after a few iterations, so convergence in this case is still an open question. In our version of the algorithm we would have n

q”

=

Wn(IIIIATr”]j) ’ j=l,2

1+

,...> N,

(6.13)

and there are no restraints on the size of o, other than the (easily modifiable) choice (2.4); see the comments on this choice in Section 2. It remains of course to be seen what the practical difference (if any) is between the algorithms

(6.12)

and (6.13).

REFERENCES M. T. Chabine, Inverse problems in radiative transfer: Determination of atmo-

spheric parameters, J. Atmospheric Sci. 27:960-967 (1970). of Chabine’s nonlinear relaxation inversion method used for limb viewing remote sensing, Appl. Opt. 24:445-447 (1985). M. E. Daube-Witherspoon and G. Muehllehner, An iterative image space reconstruction algorithm suitable for volume ECT, IEEE Trans. Med. Imaging MI-5:61-66 (1986). J. Hotbauer and K. Sigmund, The theory of Evolution and Dynamical Systems, Cambridge U.P., Cambridge, 1988. A. R. de Pierre, On the convergence of the iterative image space reconstruction algorithm for volume ECT, IEEE Trans. Med. Imaging MI-6:174-175 (1987). W. P. Chu, Convergence

42

P. P. B. EGGERMONT

6

A. R. de Pierro, Nonlinear relaxation methods for solving symmetric complementarity problems, J. Optim. Theory Appl., to appear.

7 8 9

F. Liese and I. Vajda, Convex Statistical Distances, Teubner, Leipzig, 1987. 0. L. Managasarian, Nonlinear Programming, McGraw-Hill, New York, 1969. R. M. Lewitt and G. Muehllehner, Accelerated iterative reconstruction for positron emission tomography based on the EM algorithm for maximum hkelihood estimation, IEEE Trans. Med. lmaging MI-5:16-22 (1986). R. T. Rockafellar, Monotone operators and the proximal point algorithm, SlAM J. Control Optim. 14:877-898 (1976). L. A. Shepp and Y. Vardi, Maximum likelihood reconstruction in emission tomography, ZEEE Trans. Med. Zmuging MI-1:113-122 (1982). S. Twomey, Comparison of constrained linear inversion and an iterative nonlinear algorithm applied to the indirect estimation of particle size distribution, /. Comput. Phys. 18:188-200 (1975). S. Twomey, Introduction to the n4athematics of inversion in Remote Sensing and indirect Measurements, Elsevier, Amsterdam, 1977. Y. Vardi, L. A. Shepp, and L. Kaufman, A statistical model for positron emission tomography, J. Amer. Statist. Assoc. 80:8-38 (1985).

10 11 12

13 14

Receiced 23 August 1988; final manuscript accepted 28 April 1989

linear