A stochastic Newton-Raphson method

A stochastic Newton-Raphson method

Journal of Statistica Planning and Inference 2 (1978) ~53-43. 0 North-Hoifand Publishing Company Dan ANBAR* Case Western Reserve University and ‘Tefe...

1005KB Sizes 0 Downloads 41 Views

Journal of Statistica Planning and Inference 2 (1978) ~53-43. 0 North-Hoifand Publishing Company

Dan ANBAR* Case Western Reserve University and ‘TefeAviv University Received 26 July 1977 Recommended by Shelley Zacks Abstract: A stochastic approxfmiation procedure of the Robbins-Monro type is considered. The original idea behind the New&Raphson method is used as follows. Given n approximations X ,, . . .,X, with observations Yr,. . ., Y,,, a least squares line is fitted to the points (X,, Y,),. . ., (X,, Y,) where M
Robbins-Monro

Procedure,

Elkient

1. Introduction The Newton-Raphson (NR) method for approximating the zero of a function is probably the best known approximation method. The id,ea is a very simple one. Let M(x) be a differentiable function and 0 be the unique solution of the equation M(x) =O. Pick a point x, arbitrarily, draw the tangent line to M at x =x1 and let x2 be the point of intersection of this line with the x-axis. Continuing in this manner one obtains a sequence of points x,, x2,. . . where x,+ 1= SC, - M (x,)/M'(x,) n=l,2,... . When the function A4 satisfies some regularity conditions, the sequence (x,,} converges to the rooi 8. The first attempt to consider a stochastic version of the NR method was made by Robkins and Monro (1951). They considered the case where M can only be observed statistically. Namely, at each point x one can observe a random variable Y(x) such that E Y(x)= M(x). They suggested the following procedure. Let x1 be a real number. Let X1 =x1 and for n=1,2,.. ., Xn+r =Xm-a,,Y, where y, is a random variable with conditional distribution given (X 1= x1,X2 =x2,. ..,X,, = x,) equals to the distribution of Y(x,*) and (a,}, II= 1,2,,. . is a sequtmt: of real numbers. This process is known as the Robbins-Monro (RM) process. The analogy between the RM process and the NR method is clear. The function A4 case by its esti ate Y” and the seqLlence of at x, is replaced Ztt the derivatives by a fixed sequence of teal numberr. It was shown that under certain *Support#:d by the Omce of Naval at Case Wettern Reserve University.

esearch under Contract NO001495-C-0529, Project N 153

154

Darr slnbar

regularity conditionis and a proper choice of the sequence (a,>, the RM process converges to 8 a.s. and in the mean square. (e.g. Robbins and Monro (1W ), Blum (1$&t), Dvoretzky (1956), Robbins and Siegmund (1971))* Chung (1954) and later Sacks (W-58) have found conditions under which the RM is asymptotically normal. Sacks (1958) investigated the case where a,=&’ ‘, PT = 1,2,. . .) where A is some pc%sitiveconstant. He has shown that under some general conditions the sequence tP2(XR -- 0) converges in law to a normal ratldom variable with mean zero and variance r12a2/(2Au - 1) where c2 ==limx+, Var Y(x) and 01A is the derivative of M at x =O. It is immediate to check that the asymptotic variance is minimized when A=Ix? However, without the knowledge of lcll and 8, a cannot be computed. Thus the Qroblem of constructing efficient process (i.e. with minimal asymptotic variance) for estilmating 8 becomes natural. This problem was considered first by Albert and Gardner (1967) and by Sakrison (1965).’ Both Albert and Gardner and Sakrison have replaced the ronstant A by a stochastic sequence estimating CC*. Since in both cases the estimating sequence depends on M, their methods are useful only when M is a known function. The case where M is unknown was considered by Venter (1967). Venter’s method requires that at the rlth stage (n= 1,2,. . .) the statistician makes two observations YLand Y,”at X,-c, and X, + c,, where X, is the rzth approximation and {c,} is a given null sequence of positive numbers. This way of collecting the data made it possible for Venter to construct an efficient adaptive process, Although Venter’s method is asymptotically efficient, the present author feels that taking the observations two at a time may make it unattractive in situations where the total number of experiments allowed is small. This raises a natural question. After the nth stage of the experiment the statistician has observed the first n approximations X1,X2,. . .,X, and the observations on M at these points Y1,YZ,. . ., K. Is it not possible to use these data for determining the next approximation X,+ t in an effEent way? The purpose of this paper is to provide a positive answer. Motivated by the original NR method one might fit a least squares line to the points (X1, Y1), (X2, Y2),. . ., (X,, &) and taite the point of iniersection of this line with the x-axis as the next approximating point. This yields the following procedure. Let X1 be a random variable to be chosen by the statistician. For n= 1,2,. . . let

(1) where

hln= i_ (Xii= I

X”)q;’ $I 1)-

(Xi-X

n

)’

for 022

and h,, is an arbitrary positive number. Since M is not assumed to be linear it seems preferable to consider the process giv :n by

A stochastic Newton-Raphson method

155

with X ,. arbitrary and bm,n”

i

(xi-xm,n)K

i I

i=m

where

(XiwXm

‘ “t-3

. n12

II

c xi.

x rn.8=(n-m)-l

i=mf

1

As indicated, m may be taken to depend upon n. This is not needed in any of tlhe proofs. Mowever, one might want to base the estimator (3) of a only upon the later part of the sequence {X,} and ignore the beginning. This can obviously be done if m is chosen to go to infinity with n. As will be seen later ell the results hold in this was provided m(n)= o((log n)l’?+e) for every E> 0. In this paper a truncated version of (2) is considered. It is shown that this process yields a strongly consistent sequence of estimators for 0 which is a,symptoticaIly efficient.

2:. The r~esults For Phe purpose of easy reference let us iist all the assumptions which will be needed in the sequel. (Ml)

M(x) is a measurabIe function satisfying (x - 0) M(x) A for all x # 0.

(Mi.) For every 0 CE < ‘I inf lkf (x)1> 0. r:c\x-e\+-’ (MI3) M(x)=ar(x-8)+6(x-0)

for all x where O
(M4) There exist positke numbers 0 CK
(0 (ii)

IM(x>ISIX - el”

Ll?t Y(x) be ELrandox variable such that E Y(x) = M(x). It is assumed khat for every - co
(i) supx (Z(x)jz+J<~., sO

Em Anbar

156

Remrark. Condition (Zl)(ii) is obviously implied by (24). Both conditions are included, however, since some of the results which are of independent interest (such as Lemma 4) make use only of the weaker version (Zl)(ii). Assumption (M4) implies that the derivative of M at x ~8 satisfies K
n) lf2+&)

(4)

for every 00,

where if

b,,,- 1
if a,

(5)

if u7;&b,,._r. b,,$,”is given by (3), and Yn is a random variable with conditional given (X1, YI, Y,,. e ., Y,- 1) the same as the distribution of YX,).

distribution

Theorem 1. Ifconditions (Ml), (M2), (M4) (i) and (21) (i) clre satisfied then X, given bq’ (4) and (5) converges almost surely to 6. The proof is omitted since it follows from Robbins and Siegmund’s Application

2 (1971, Sec. 4). Remark. It c.an be easily verified that (5) can be replaced by truncating b,,, in (xn,B,) where (a,,) arId {P,> are sequences of positive numbers satisfying Car,-2 11~*
of Le

A stochmticrrNcrwtowRaphson method

157

1 a:nd Conditions (M3) urvrd(24) ~6okLE,fm = m(n) rj such that m(n)= o((log n)lj2 ’ “)

as w+ 00 for euery E> O!,hen r Ak=IX,“-+co as. at least

as

fist as log n.

Prtnlf. By (M3) it follows that kr(x) can be written as M(X) = (a + t,b(x))x where $(x)-+O as x-+0. Thus

X r+l:=(l-d,n-.‘rX,-A,,n--l.Z,

(6)

whr;re d,=A&-P$(X,)). By Theorem 1,

and:

Thus for n&z, = n,(o) md almosl. all 1ti1,

==(1 -d,n X,2+1

- l )"X," + Atn -i2Zf-2A,n-‘(l

2 (l - agl-~

-d,,n-‘)X,2,

l )X,”+A$Ti?,Z -2A,r-‘l[l

-d,r~-~)X,&

(7)

=j, (1 -aziW1 ), n &,, vlrhere ,i,-, is such that 1 -ad-l >Q for all j2jo. where 0 c c5 s( b, s ce,c 00 uniformly in n. Let Plnn=ri ’ yn for n >=m ;;; ma.x (jO,n,?). Iterating 17) yields

Cfewly

-Y,,,= h,n-*2

for all n & m 2 m.ax (ia, nol)a# Thus

N-l _

‘!T A!d a.:111

II

C .4&k-

1(1 -dkk-

’ );?i’Jk .

(f-9

k=m

ConsiQer first the last term of (8). ILy ckilging

the order OFsummation one ge,ts

where the !2iNare uniformly bounded positive numbers. How

Furthermore Ef-flogk)-” +ej12Ak(l -dkk-

‘)X,.zJ2

=
by Lemma 1 and (Zl )(i) for every E> 0. Thus

converges a.s. to a finite limit as ;tl--,ac, by Theorem D of Lo&ye (1963) p. 387. Next observe that Kronecker’s Lemma (Lo&e (1963) p. 238) can be slightly modified to read: If z Wn= W < 00, { 6,,} is a monotone increasing sequence of positive numbers such that lim,,, b, = cx, and h,, are positive and uniformly bounded numbers, then Em,,, m b; 1 c;(“=l h,,b, W, = 0. Thus,

i f A&k-‘(1 rt=olfk-m

-d,k-‘)X,Z,=o((rogN)“2+e)

Gorevery E> 0. Now N

Ca*-2

i,)Izcm(t

-

(m/N)202-1)=o((IogN)“2’“)

n=m

as N---CO for every 00.

It remains to consider the middle term of (8). Again by changing the order of summation one gets N

c i n=m

5 k2az-2(k-2a2+1_N-2az+l)~~

/@j;,,k-2~;&

k=m

k=m N z

c k

A stocl~stk

Akwton-Raphson

method

159

where OK& -. 1. and .ID=(1 --LI)*~‘~~~‘I . Let A> 0 and 1~ > 0 be as in Assumption (2X). Ikfme

Clearly

But by

Hence

b\yhichimplies that

where cl land c2 are positive constants. Since 2: &Zz,*2 the proof is complete. The following result is; due to Stout (1970). Lemlna 3. Let {cm, &,

u, = I[2log

s,,=i

k.= 1

n 2 1) be a martingale diference sequence. Let

log Vy’“,

tk, n==l,2,...

,

Let K, be .F,+ mea..;urable functi’ws such that x,+0 as. as n-+ 00. If Vi -+ co as. and I{,ls K&~/W, a.s., then lim suf~~_+~SJU,V/, = 1 a.s. Let X, be given by (4) and (5).

ce

ofPC?

Ifconditions

(

n--+co as n-=+co.

Dm Anbar

I6Q

Roof. Let jl be an integer such that 1 -a& a, < dR8.s. for all n, one obtains

1 ~4.

Iterating (6) and recalling that

for every m kjr, where

B&=

fi

if nxn,

(l-aJ1)

Ji=mt 1

if

=lt

Bkn== fi

(I-dij+)

m=n

if nwn,

j=m+A

if n=m.

=l

to the definition of yn, let yi = &. Thus /!I:,,= y,“/$, for every n 2 mkJ,. that 7:: is a random variable measurable with respect to .F(XI,Z,,.. .Z,_,). Since m=m(!~)=o((kogn)~‘~‘&) for every c>O and since Q, >j, j?k_,,n =o((log npliI~;~+E~n“1). it is sufficient to conskier only the second term on the right-hand side of (9). Now apply Lemma 3 with Sk:= k - l/2 +&Zkwhere E> 14is to be deterr;;ined later. By (21) there exist positive con:,tants c, c2, c3 and c., such that

kyilar

No&e

c,n2&C V2n= cc ? nzE

and c3uog

log

n) “~$Un~c~(10g10gn)“2

for a!l n sufficiently large. Let -+O as PI--*m and x,log logs)’

be a sequence o * Fositiv? numbers such that K~ +?~,(~+%2’ -‘I2 c co for every S >O. By (Zl)(i),

(K,)

Therefore by the Borcl--Cant&i Lemma with probability one

lz.1s tc,(n/log log tl)*‘2

A stochastic Newton+-Raphson method

161

where +0 as n-+oo. Thus the conditions of Lemma 3 are satisfied and therefore I Ck Q1/2+rZk= o(q#(log log n)lj2) a.s. &=I where {qn}is any sequence such that q,,- oo as FI+ 00. To complete the proof let f$=y;-’ k-112-e for k&.

Now ~;c)-~&k”* hence we cdn choose E to be so small such t$at al.--* -E > 0 and thus 0fk-b00 as k-, 00. “l’herefore by Kronecker’s Lemma

ak>Cgk0’-1’2-e.

Since a,>*

n

C+ n

s

k = .*

akk-

Ii2 +?!?k = o (?j,#(bg

log

n)“2)

a&

for 221m>jI. Or alternatively k-‘y;-1Zk=o(~nn-““(loglogrlj1’2)

as.

k=n

which completes the proof. Corollary. Let X, and qll he1as in Lemma 4. Then .Y, = o(tp-

1’2(lcbglog n)“‘) nnd

n tl.x;

=o (q, log log n) as. as n-m,

where X,=n-

’ r, Xi. i= 1

Proof Obvious The consistency of b,,, can now be proved. Theorem 2. LrtX,begivenby(4)and(5)andb~,,b~(3)withm=nz(n)=o((logri)“”+”) as n-+ 00. If Conditions (A41) through (M4), (Zl) and (24) clre satisfied and a2 < 2K then &,,,--a as. where a is defined in (M3). roof. By (M3) n

(Xi -~,,,bw,) h,,,,,,=

fJ

+

i (xi -R.n,JG

‘_=zi ________ ____ + ‘yL______-

C

i=m

(xi-xm

,n)’

As in the proof of Lemma 2,6(x) can

C

i=m

(xi

-xtn

-- * . n)’

Dan Anbar

162

By I,emma 2 and the corollary to Lemma 4, z;1)5*(& -gm,Jt goes to 08 at feast as log n. Thus by the Toeplitz Lemma (Lo&e (1963) page 238) it follows that the second term of the right-hand side of (10) goes to zero a.s. p1+08, By the same argument given in the proof of Lemma 2 it folows that

for every E> 0. Now it follows from Lemma 3 that

Hence by the Corollary, rHTm,n z= ,,,Zi= o(q, log log n) as PI-+w and q,, is as in

Lemma 4. Hence the Iast term on the right-hand side of (10) converges to zero as. as n+ 00. This completes the proof. The Alowing theorem states that the process X, is asymptotically enicient. The P k !j proof is omitted since it follows exactly the proof of Sacks (1958). Theorem 3. Let X, be given by (4) and (5) with m(n)=o((log

PI)‘/~+~)us n-+x~ If

conditions (MI) through (M4) and (21) through (24) are satisfied and a2 <2K then ;Pz(Xn - t?) ccmverges in law to 4 noro7lal random variable with mean zero and variance cr2/a2.

[ 14 $ b

1

Finally it should be noted that once the convergence of bm,, to a is proved it follows from Gaposhkin and Krasulina (1971) that X, obeys the law of the iterated logarithm, i.e.

lim sup [2~2(loglog n)- ‘1 ‘I2 (& - 8) = G/a

Pl-+uJ

a.s.

efeaerices Cl3 Albert, A.E. and L.A. Gardner (1967). Stochastic approximation and nonlinear regression. Research Monograiph No. 42. The M.I.T. Press, Cambridge, MA.

PI Blum, J.R. (1954). Approximation methods which converge with probability one. Ann. Math. Statist. 25, 382-386.

PI Chung, K.L. (1954). On stochastic approximation method. Ann. &f&t. Statist. 25,463-483. PI Dvoretzky, A. (1956). On stochastic approximation. Proc. Third Berkeley Symp. 1, 39-56.

fsli ’Gaposhkin, V.F. and Krasulina, T.P. (1974). On the lalv of the iterated logarithm in stochastic approximation processes. Theor. Probability Appt. 19,84WW. . (1963) Probability Theory, 3rd. ed. Van Nostrand, Princeton. onro (1951). A stochastic a ethod. Ann. Mnth. Statist. 22,

i 1

A stochasttcNewton-kaphson me&&

163

[KJ Robbins, H, and D. Siegmund (1971). A convergence theorem for non-negative almost supermarting&s and some applications, Optimizing Methods in Statistics. Academic Press, New York, 233-257. [9] Sacks, 3. (1958). Asymptotic distribution of stochastic approximation procedures, Ann. Math. Statist. 29, 373-405. [W’J ‘&&t%& Da; “(l$#). BTi&nt recursive estimation; application to estimating the parameters of a 1conrj&&e funicttion,lni. J. E@g,Science’3,461483. Jl’j Stout, W,F. ”(19%)1 A mwtitigak analbgue of KoImogorov’s law of the iterated logarithm. 2. VCbhrschtrfnlkk’~itsthearle Verw, Gebiete 1!5,279-29% [S2] Venter, J.H. (1967). An extension of the Robbins-Monro procedure, Ann. Math. ~CaCisC.38, 181-I 190.