Journal of Statistical Planning and Inference 15 (1987) 391-399 North-Holland
391
AN A L G O R I T H M F O R L E A S T S Q U A R E S P R O J E C T I O N S O N T O T H E I N T E R S E C T I O N OF T R A N S L A T E D , CONVEX CONES* Richard L. DYKSTRA and James P. BOYLE Department of Statistics and Actuarial Science, University of Iowa, IA 52242, USA Received 19 June 1985; revised manuscript received 2 May 1986 Recommended by E.J. Wegman
Abstract: A commonly occurring problem is that of minimizing least squares expressions subject to restrictions on the solution. Dykstra (1983) has given a simple algorithm for solving these types of problems when the constraint region can be expressed as a finite intersection of closed, convex cones. Here it is shown that this algorithm must still work correctly even when each cone is allowed to be arbitrarily translated, as long as the intersection is nonempty. This allows the algorithm to be applied to a much larger collection of problems than previously indicated.
AMS Subject Classification: 49099, 65015. Key words and phrases: Least squares projections; Regression; Cones constraints; Translations; Minimizations; Dual cones; Iterations.
1. Introduction
The problem of obtaining least squares projections subject to various constraints is a frequently occurring problem in many areas. For example, the area of isotonic regression usually is concerned with obtaining least squares vectors which must satisfy certain partial order restrictions (see Barlow, Bartholomew, Bremner and Brunk (1972) and Robertson and Wright (1981)). Another example concerns finding the closest (least squares) convex (concave) function to a set of points in the plane (see Hildreth (1954) and Wu (1982)). Many times, the constraint region can be written as a finite intersection of simpler constraint regions. This raises the possibility of using iterative schemes based upon the projections onto the simpler regions for solving the overall problem. John Von Neumann (1950) has shown that if the constraint region is an intersection of two subspaces, cyclic iterative projections onto the individual subspaces must converge to the desired projection. Norbert Wiener (1955) independently proved a version of this theorem in a slightly different setting. Dykstra and Robertson (1982) have developed an iterative procedure for finding * This work was partially supported by ONR Contract N000 14-83-K-0249. 0378-3758/87/$3.50 © 1987, Elsevier Science Publishers B.V. (North-Holland)
R.L. Dykstra, J.P. Boyle / Least squares projections
392
projections of rectangular arrays onto the class of arrays with nondecreasing rows and nondecreasing columns based only upon one-dimensional smoothings. Later, Dykstra (1983) extended this approach to the general framework of projections onto the intersection of closed convex cones. T h i s procedure is based upon finding projections onto the individual cones, and reduces to Von Neumann's and Wiener's method when the cones are also subspaces. It is the purpose of this paper to show that Dykstra's (1983) algorithm can be extended to work for projections onto a finite nonempty intersection of translated, closed, convex cones. Of course we can approximate any convex set by a finite intersection of translated half spaces. Thus, in theory, we could approximate the projection onto any convex set by the method proposed in this paper.
2. Notation and setting We denote m-dimensional Euclidean space by R m, and let g and w (wi>0) be fixed points in R m. The inner product of x and y (with respect to w) is given by m
(x, y) = ~
i=1
xiYiwi.
The corresponding inner product norm of x is defined as ~X~= (X~X)1/2=
m ~1/2 E X2Wi~ • / 1
(2.1)
A closed subset K of R m is a closed convex cone if w, y e K; a, b_> 0 implies ax+ by~K. The dual cone of a closed, convex cone K is defined as
K*= Iy~Rm; (y'x)= ~ yixiwi<-O
(2.2)
Of course K* is also a closed, convex cone with the property that K * * = K . A standard least squares problem is to find the x which will Minimize ~g - x~
(2.3)
x~C
where C is a dosed, convex set (x, y e C implies a x + (1 - tr) e C Va ~ [0, 1]). A vector g * ¢ C achieves the minimal value in (2.3) iff
(g-g*g*-f)>_O
for all
f~C.
(2.4)
If C is actually a dosed, convex cone, we may replace (2.4) by O) (ii)
( g - g * g*)=O, (g-g*f)
(See Theorem 7.8 of Barlow et al. (1972).)
(2.5)
R.L. Dykstra, J.P. Boyle / Least squares projections
393
Note that (2.5) implies that if C is a closed, convex cone and g* solves (2.3), then g-g*EC*.
If C is actually a subspace, the projection g* is characterized by condition (2.5) (ii) if the inequality sign is replaced by equality.
3. The algorithm
We wish to consider problems of the form Minimize Jig- x]l xe N,'.., (K~- ~)
(3.1)
where Ki is closed, convex cone in R m, bi ~ R m, K i _ bi = { x - b i ; x E g i }, a n d ~ = 1 ( K i - b i ) ~ : f l . We assume that we can find the vector in K i - b i which will Minimize ~ f - x~
(3.2)
for any f and any i, and wish to use this ability to solve (3.1). We note that if P ( f ] C) denotes the projection of f ~ Rm o n t o the closed, convex set C, then for a closed, convex cone K i ,
P fl Ki-
b, I
(3.3)
hi.
Thus J'-PCflKi-bi)=(f+bi)-P(f+bi]Ki)~K*
for all f
(3.4)
by (2.5)(ii). We shall make extensive use of (3.4). Our proposed algorithm is identical to that given in Dykstra (1983) except that we allow projections onto translated, closed, convex cones. Our scheme can be succinctly stated with the aid of the following notation: (i) For any positive integer n, we define n (rood r ) = i if n = k r + i for integers k and i where 1 -< i___r. (ii) Initially, set n = 0, go = g, and I i = 0 e R m, i = 1,..., r. The iterative procedure is: (1) Set gn + 1 = P(gn - I(n + l)(modr) l K{n + IXmodr) -- b(n + 1Xmodr)),
(3.5)
and then update I(n + l)(modr)'bY resetting it equal to gn + l - (gn - I(n + 1×roodr)). (2) Replace n by n + 1 and go to (1). To provide notation for the proof and to make sure the algorithm is well understood, we shall redescribe the algorithm in a step-by-step manner. (1) Let g~,l denote the projection of g o n t o / : i - b ~ . We let 111 = g l , 1 - g denote the difference between the projection and the original value. (2) Let g~,2 denote the projection o f gl,~ = g + I i , l onto K z - b 2 . The difference
R.L. Dykstra, J.P. Boyle / Least squares projections
394
between the projection and the original value is denoted by I1, 2 = gl, 2 - gl, 1 so that gl, E=g + II, l + / 1 , 2 . (3) We continue iteratively projecting onto the K i - bi until we have projected onto each set once and g l , r = g + I I , 1 + " " + / l , r - We then begin another cycle where we subtract the increment associated with that set from the previous cycle before projecting. Thus we project g1,r-I1,1=g+Ii,2+I1,3+...+II, r onto K 1 - b I to obtain g2,1- The new increment is 12,1 = g2, i - (gl, r - 11,1), so that g2,1 = g + I2,1 + I 1 , 2 + -.. +I1, r. (4) Note that as we cyclically project onto the translated cones, gk, i is the projection onto the i-th set during the k-th cycle. The last increment associated with that set is removed prior to the projection and a new increment is always formed. Thus, in general, gk, i is the projection of
g+ Ik, 1 + ' " + [k,i-I + Ik-l,i+l + ""+ Ik-l,r onto K i - bi, and if 2<_i<_r, lk'i= t.gk, l--(gk-l,r--lk-l, 1) if i= 1.
fgk, i--(gk, i - 1 - - I k - l , i )
(3.6)
In terms of notation of (3.5), g n = g k , i if n = ( k - 1 ) r + i where 1 <_i<_r. In this situation, the updated I,(modr) in (3.5) equals lk, i. We refer the reader to Dykstra (1983) for further elaboration on the algorithm and its uses. This procedure requires only the ability to find projections onto the Ki (see (3.3)). These individual projections are often easy to program and quick to execute, and hence can be combined to solve rather difficult optimization problems. If Kris also a subspace, one need not subtract the vector lk_ 1,i from gk, i-l before projecting it onto K i - b i . This can be seen since
~D(gk,i-1---[k- l,i [K i - bi) = Ptgk, i_ l + b i - Ik_ l, i IKi) • =P(gk, i-1 + bi lKi)--P(Ik-l,i [Ki) =P(gk, i_l ] K i - b i ) . This follows because a projection onto a subspace is a linear operator and (lk- 1,i, f ) = 0 V f e Ki if Ki is a subspace. For a simple example which shows that simple cyclic, iterative projections need not converge correctly if the Ki are not subspaces, consider the following example in R2:
Kl={(x,y):x>_O, y->O}, bl=b2=O,
wl=w2=l,
K2={(x,y):x>y>-O }, and
g=(-2,2).
If the Ik, i are not subtracted off, then gl 1= (0, 2) and g12 = g21 = to (1, 1). However, the actual projection on Kl NK2 is (0, 0). Some applications of the algorithm follow.
g22 ~'~""
are all equal
R.L. Dykstra, J.P. Boyle / Least squares projections
395
(a) The most fundamental application of the algorithm is to find least squares projections under multiple linear constraints of the form ~,'~aixi<_b. While the algorithm previously applied to these constraints only when b = 0, we note that
tV
: ~., aix i<_
=
i=l
: ~., aix i<_ i=1
+ (b/a 1, 0,..., O)
is of the desired form. Constaints of the form yi~_Xi~_Zi,
i = 1,...,n.
where y and z are fixed vectors fall into the translated cone framework by writing
{ix: y~ <-xi-< z~ ~ } = ({x: x~-> o} + Cv~, ..., yn)) n ({x: xi < O} + (z~, ..., zn)). (b) We say the probability vectors x and y are stochastically ordered (x>Sty) if ~,~=,xi<- ~ J : l Y i ' l < _ j < m . Suppose that we wish to require that the probability vector x be stochastically larger than all of the fixed probability vectors Yl, Y2,.--, Yr. If we let 3~i denote the vector of partial sums of the v e c t o r y i = ( Y i l , "°,Yira), SO that Yi = Yn, ~ Yij,...,
j=l
j=l Yij
,
~-)r+2
then the constraint region can be expressed as ~ li= 1 Ai where
an, Since all of the Ai are of the form of translated cones, the algorithm is applicable onto regions of this form. (c) Consider the set of two-dimensional arrays
, =
X? 1
:
kXrl
1
"x~i6R, i = l , . . . , r ; j = l , . . . ,
"'"
Xrc.J
.} .
A commonly occurring constraint region is those arrays with nondecreasing rows and columns, i.e.,
{x: xu <-x~j+ l Vi, j } (l {x: xij <--Xi+ l,j Vi, j } = Kl nK2. A natural generalization would be the region {X: X u - Xi, j+ 1 <_Sij, Xu--Xi+ I,j
396
R.L. Dykstra, J.P. Boyle / Least squares projections
for specified e0 and J0 which can be expressed in the form (KI - b l ) r l (K2can clearly use the algorithm for projections onto these types of regions.
b2).
We
4. Proof of proper convergence of the algorithm We will have need of the following lemma, a proof of which is found in Dykstra (1983). Lemma 4.1. Suppose a sequence o f nonnegative real numbers {an }n=l is such that ~,~=1 a2 < oo. Then there exists a subsequence {a.j}j% i such that
nj m=l
amanff*O
as j--*oo.
(4.1)
We now establish the fundamental result of the paper.
Theorem 4.1. The vectors gn defined in (3.5) converge to the true solution, say g*, o f the problem stated in (3.1) as n -~ oo. r
Proof. Since ~ = 1 ( K i - b i ) ~ O , we may assume w.l.o.g, that 0 e ~ i = l (Ki-bi). r Then biEKi and the true solution g* exists uniquely, since ~)i=1 (Ki-bi) is a convex set. Note from (3.6) that
gn, i_l--gn, i=ln_l,i--in,i, i = 2 , . . . , r , g n - 1, r - - g n , 1 - - I._ 1, 1 - - In, I.
(4.2)
Thus, in general (Io,i = 0), for i_> 2,
Ilgn,i-1 - g*i 2 = ~(gn,i- g*) + (In_ 1,i-- In,i)~ 2
= lien,i- g.[[2 -I- [In_ l,i-- In, i[2 +2(gn, i+bi, ln_l,i-]n,i)-2(g*+bi,]n_l,i-In,i).
(4.3)
Note that (gn,i+bi, In, i)=O (by (2.5) and (3.3)). Induction on n using (3.4) shows that - I n_ 1, +e K * . Moreover, gn, i + b+ ~ K i so that the next to last term is nonnegative. Similarly
Ig,- 1,,- g*l 2-> [gn, l - g*l 2 + HI,_ l, l - I,~ ill2 - 2(g*+ bl, In- 1,1 -In, l).
(4.4)
Repeated application of (4.3) and (4.4) together with addition and the telescoping property of the last term yields /I
~g--g*~2>--~gn,r--g*~2+ ~ k=ll=l
r
~ IIk-l,l--Ik, l~2
R.L. Dykstra, J.P. Boyle / Least squares projections
397
r
+2 ~ i=l
(g*+bi, In, i)
Since the last sum is nonnegative oo
for all n.
(4.5)
(g*+bieKi,-l,.ieK*), we know
r
E E ~Ik-l,t--Ik, t~2<°°,
(4.6)
k - I !=1
and hence
IlI.- ,i-I t[--IIg t- -g tl
and
IIIn_l,l-In, l~--~gn_l,r-gn, l ~ O
(4.7) as n--,oo, for l_>2.
We note that (4.5) implies that gn,r and gn.r--g=In, l+'"+In, bounded. Next we show that there exists a subsequence {ny} such that
r are uniformly
r
limsup(ln~,l+In:,2+...+lnj, r, gnj, l-f)<_O
Vfe ~ (Ki-bi).
j
(4.8)
i=1
To see this, note that ([n,l + "" + In, r, gn, l -- f ) r
= ~, (1,~i,g,~l-f-(g,~i+bi))
(since
(1,~i,gn,i+bi)=O)
i=1 r
r
(4.9)
= ~, (In, i, gn, l--gn, i)+ ~ (-In, i,f+bi). i=2
i=l
The last sum is nonpositive since f + bi e Ki and - I.,i ~ K*. For the first part, we use the Cauchy-Schwarz inequality to say r
i=2
<-- E
iffi2
m=l
m = l i--2 n
= ~ aman m--1
where
a.= E Ig !~-2
(see (4.7)). Since
i
r
,-1-gd = E
1=2
[g~,t-l-g,,,t[
ign, t-l-gn, ll
IIm, i - - I m - l , il
=
r
Ilm, i-Im-l, ii
I
R.L. Dykstra, J.P. Boyle / Least squares projections
398
r
a2<2r-2
E 1=2
I .z-Z,-l.zll 2,
(4.6) implies that ~:~=1 a 2 < o o . Thus Lemma 4.1 can be employed to guarantee the existence of a subsequence which satisfies (4.1). It is clear that this subsequence satisfies (4.8). Moreover, since the g.,~ are uniformly bounded, we may assume that we have chosen a subsequence such that (4.8) holds and g.~,~ converges, say to h. Note that (4.7) insures that gnj,t also converges to h for every l, and r hence that h e ~i=1 (Ki-bi) (the K i are closed). In addition, since the g/1j,r - g = I/1j,l+'"+Inj, r are uniformly bounded and converge to h - g , we may use (2.4) and (4.8) to argue h=g*. Finally, in a manner similar to (4.5), we can show /1
[Ignj,r - g*ll2= ~gn, r-- g*[I 2 + n
r
~
+2
r
E ~ I]Im,! - Im-1,1~ 2 m=nj+ l l=l
~ ((gm,l+bt)-(g*+bl),Im_l,t-Im, i)
(4.10)
m=nj+l I=1
if n > nj. We m a y write the last sum as n
2
r
~,
(4.11)
~ (gm,l+bt, Im_l,t)
m=t!i+ l l=l
n
- 2
r
(4.12)
~, (gin,l+ bt, Im, l)
~
m =nj+ 1 1= 1 r
+2
(4.13)
(g* + 1=1 r
- 2 E (g*+bt, Inj, t) •
(4.14)
l=l
Each term o f (4.11) is nonnegative (gm,t+bteKt,-Im_l, teK]~). Each term of (4.12) is zero by (2.5). Each term of (4.13) is nonnegative for the same reason as (4.11). Finally, we can write (4:14) as
' * - 2 lira j~
=-2
'
lira j--'~
1,
~1
Inj, t +
/=1
,i, gnj, l - g . j , t If
(bt, Inj, t)
R.L. Dykstra, J.P. Boyle / Least squares projections
399
by taking f to be 0 in (4.9). This clearly equals zero by the way {nj}j% l was chosen. Then noting that the left side of (4.10) goes to zero as j ~ oo, it easily follows that gn, r--*g* as r/---~oo. This clearly is good enough by (4.7).
Acknowledgement The authors would like to thank the Associate Editor and a reviewer for a careful reading of this paper and several helpful comments.
References Barlow, R.E., D.J. Bartholomew, J.M. Bremner and H.D. Brunk (1972). Statistical Inference Under Order Restrictions. John Wiley, New York. Dykstra, R.L. (1983). An algorithm for restricted least squares regression. J. Amer. Statist. Assoc. 78, 837-842. Dykstra, R.L. and T. Robertson (1982). An algorithm for isotonic regression for two or more independent variables. Ann. Statist. 10, 708-716. Hildreth, C. (1954). Point estimates of ordinates of concave functions. J. Amer. Statist. Assoc. 49, 598-619. Robertson, T. and F.T. Wright (1981). Likelihood ratio tests for and against a stochastic ordering between multinomial populations. Ann. Statist. 9, 1248-1257. Von Neumann, J. (1950). Functional Operators (Vol. II). Princeton University Press, Princeton, NJ. Wiener, N. (1955). On Factorization of Matrices, Comment. Math. Helv. 29, 97-111. Wu, C.-F. (1982). Some algorithms for concave regression and isotonic regression. In: J.S. Rustagi and S.H. Zanakis, Eds. Studies in the Management Sciences: Optimization in Statistics, Vol. 19. NorthHolland, New York.