Convergence and Error Bounds for Passive Stochastic Algorithms Using Vanishing Step Size

Convergence and Error Bounds for Passive Stochastic Algorithms Using Vanishing Step Size

JOURNAL OF MATHEMATICAL ANALYSIS AND APPLICATIONS ARTICLE NO. 200, 474]497 Ž1996. 0217 Convergence and Error Bounds for Passive Stochastic Algorith...

238KB Sizes 0 Downloads 55 Views

JOURNAL OF MATHEMATICAL ANALYSIS AND APPLICATIONS ARTICLE NO.

200, 474]497 Ž1996.

0217

Convergence and Error Bounds for Passive Stochastic Algorithms Using Vanishing Step SizeU G. Yin Department of Mathematics, Wayne State Uni¨ ersity, Detroit, Michigan 48202 Submitted by E. Stanley Lee Received September 20, 1994

This work is concerned with passive stochastic approximation ŽPSA. algorithms having vanishing or decreasing step size and window width. Unlike the traditional stochastic approximation methods, the passive stochastic approximation algorithms utilize passive strategies. Under the framework of PSA, not only the measurement noise is unobservable, but also the ‘‘state’’  x n 4 is a randomly generated sequence. In our formulation, both the observation noise and the randomly generated  x n 4 are correlated random processes. Under rather general conditions, w.p.1. convergence of the algorithms is established. Then upper bounds on estimation errors are obtained. It is shown that the bounds depend on the smoothness of the function under consideration in an essential way, which reveals another distinct feature of the passive algorithms. Q 1996 Academic Press, Inc.

1. INTRODUCTION The main goal of this work is to investigate the asymptotic properties of passive stochastic approximation algorithms with decreasing step size and window width. We obtain sufficient conditions for with probability one Žw.p.1. convergence and we derive upper bounds of the estimation errors under rather general conditions. Various applications in control theory and optimization often require one find the solution of f Ž x . s 0 for some nonlinear function f Ž?.. Very often either the form of the function is quite complex or it is not available at all; only noisy measurements are available. Thus one is forced to use * This research was supported in part by the National Science Foundation under grants DMS-9224372 and DMS-9529378, in part by the Deutscher Akademischer Austauschdienst under a study visit grant, and in part by Wayne State University. 474 0022-247Xr96 $18.00 Copyright Q 1996 by Academic Press, Inc. All rights of reproduction in any form reserved.

PASSIVE STOCHASTIC APPROXIMATION

475

methods such as stochastic approximation to resolve the problems. In many situations Žfor instance, the example considered in w16x., however, both the input and output are subject to errors and the standard or traditional stochastic approximation methods are not applicable. Nevertheless, these problems fit into the framework of passive stochastic approximation, where unlike the traditional approach, passive strategies rather than active strategies are employed. Consider the following problem. Find f Ž x . s 0 provided only noisy measurements yn s f Ž x n . q j n are available, where  j n4 is a sequence of random variables with mean 0 and  x n4 is a randomly generated sequence. Compared with the traditional stochastic approximation method, there is an added difficulty: The sequence  x n4 is generated passively and cannot be chosen in accordance with the experimenter’s desire. This difficulty prevents us from using the traditional stochastic approximation methods of the form: x nq 1 s x n q a n yn .

Ž 1.

Hardel and Nixdorf w2x suggested using another sequence  z n4 to approx¨ imate the zero of f Ž?.. The sequence  z n4 is given by z nq 1 s z n q

an hn

K

ž

x n y zn hn

/

yn ,

Ž 2.

where a n is the step size of the algorithm, h n represents the window width, and K Ž?. is a kernel function. The purpose of this kernel function K Ž?. is to keep the estimate z n not far away from the random ‘‘state’’  x n4 . If the difference is large, a very small amount of yn will be added to the current estimate z n . A scalar problem was treated in w2x, whereas multidimensional cases were studied in w10x. The origin of the passive stochastic approximation can be traced back to an early work of Revesz ´ w12x, in which the author applied the stochastic approximation methods to a non-parametric estimation problem of a regression function. Note that if  x n4 can be generated actively, then z n s x n and the problem reduces to the setting of classical or traditional stochastic approximation. Although some interesting ideas were presented in w2, 10x, in these references both the noise  j n4 and the randomized sequence  x n4 are assumed to be independent and identically distributed Ži.i.d.. random variables. In addition, the noise in the measurements appears in an additive form. However, for many applications in control and adaptive signal processing, the random processes are correlated and the noise is non-additive. As a consequence, the results obtained in the aforementioned papers have certain limitation.

476

G. YIN

In a related work of Yin and Yin w16x, constant step size and window width algorithms were proposed and analyzed by means of weak convergence methods. Applications to a chemical engineering problem, steady estimation for a continuously stired-tank-reactor, were also considered. Our simulations show very promising results. In the current paper, we examine the algorithm: z nq 1 s z n q

an hn

K

ž

x n y zn hn

f x n , jn .



with initial condition z1 , Ž 3 .

where j n , z n , x n g R r , and f Ž?, ? . is an R r-valued function. Notice that the model is much more general than those studied in w2, 10x. The measurement noise is not necessarily additive. In fact, in our case, yn s f Ž x n , j n .. We also remove the i.i.d. restriction on both  j n4 and  x n4 . Sufficient conditions for w.p.1 convergence will be derived. We then evaluate the quality of the estimation by deriving upper bounds on the estimation errors. It is shown that the upper bounds depend explicitly on the smoothness of the function f Ž?. and, hence, reveal another distinct feature of the passive stochastic approximation procedures. This distinct character is a consequence of the corresponding methods in non-parametric statistics. The remainder of the paper is arranged as follows. The main assumptions and conditions, as well as statements of results, are gathered next. In Section 3 we present the proof of w.p.1 convergence. This is divided into two steps. In the first step, we obtain the w.p.1 boundedness of the iterates and in the second step, we establish the desired convergence property. In Section 4 we derive the upper bounds on the estimation errors. The main analytical techniques employed are perturbed Liapunov function methods. Finally, some further remarks are made in Section 5. For future reference, we use aX to denote the transpose of a; f z and f z z denote the first and the second derivatives of f, respectively. In addition, k stands for a generic positive constant, having possibly different values for each appearance.

2. CONDITIONS AND RESULTS Throughout this paper, we assume that the measurement noise  j n4 is exogenous, i.e., P Ž j nq 1 g A1 , . . . , j nqk g A k < z1 , j j , x j ; j - n . s P Ž j nq 1 g A1 , . . . , j nqk g A k < z1 , x j , j j , z jq1 ; j - n . , for all Borel sets A i , i F k, and all k and n.

PASSIVE STOCHASTIC APPROXIMATION

477

To proceed, we make the following assumptions: ŽA1. The step size and window width satisfy a2n

Ýh n

- `,

Ý an h n - `,

n

Ý an s `.

n

n

ŽA2. The kernel K Ž?. has a bounded support, i.e., K Ž x . s 0 for < x < ) R for some R ) 0. In addition, K Ž?. satisfies K Ž x . G 0,

K Ž x . s K Ž yx . ,

HK Ž x . dx s 1.

ŽA3. The sequences  x n4 and  j k 4 are independent of each other. For each n and each k G n, there exists a conditional density of x k given Fn , denoted by p k Ž?< Fn .; p k Ž?< Fn . is continuously differentiable with bounded derivative. For each n, the sequence p k Ž?< Fn .4k G n is bounded uniformly. There exists a probability density p Ž?., that is bounded and continuously differentiable having bounded derivative with p Ž z . ) 0 for each z such that `

Ý < Enp k Ž z < Fn . y p Ž z . < - ` ksn `

Ý

Enp k Ž z < Fn . y p Ž z .

z

-`

ksn

w.p.1 for each z, where En denotes the conditioning up to n, i.e., conditioning on the s-algebra Fn s s  z1 , x j , j j ; j - n4 . ŽA4. There is a twice continuously differentiable Liapunov func< z <ª`

tion V Ž?. such that V Ž z . G 0 for all z, V Ž z . ª `, < Vz Ž z .< F k Ž1 q < VzX Ž z . f Ž z .< 1r2 ., and Vz z Ž?. is bounded uniformly. For some l 0 ) 0, define Q0 s  x ; V Ž x . F l 0 4 . VzX Ž z . f Ž z . - yd for each z f Q0 and for some d ) 0. ŽA5. f Ž z, j . s Ž f 0 Ž z . q a . q f 1Ž z, b . q f 2 Ž z .g such that f i Ž?., for i s 0, 1, 2 are twice continuously differentiable with respect to z and f 0, z z Ž?., f 2, z z Ž?, b ., and f 2, z z Ž?. are bounded.  a n4 ,  bn4 , and gn4 are stationary and are independent of each other. Ea n s 0, and En < a n < 2 - `; gn4 is a martingale difference sequence with En < gn < 2 - `;  bn4 is a

478

G. YIN

sequence of bounded random variables with Ef 1Ž z, bn . s f Ž z . y f 0 Ž z . for each z and each n. f 2 Ž?. is a bounded function, and f 0 Ž?. and f 1Ž?. satisfy < f 1 Ž z, j . < 2 F k Ž 1 q < VzX Ž z . f Ž z . < . ,

< f 0 Ž z . < 2 F k Ž 1 q < VzX Ž z . f Ž z . < .

< f 0, z Ž z . < q < f 1, z Ž z, j . < F k Ž 1 q < VzX Ž z . f Ž z . < 1r2 . .

Ž 4.

In addition, < VzX Ž z . f Ž z .< F k Ž1 q V Ž z ... ŽA6. The following inequalities hold: `

Ý < En a i < F k , isn `

Ý

EnVzX Ž z . f 1 Ž z, bi . y Ž f Ž z . y f 0 Ž z . .

isn

F k Ž 1 q < VzX Ž z . f Ž z . < . , `

En VzX

½

Ý

Ž 5.

Ž z . f 1 Ž z, bi . y Ž f Ž z . y f 0 Ž z . .

isn

5

z

F k Ž 1 q < VzX Ž z . f Ž z . < 1r2 . . Remark 1. We commend on the assumptions briefly. Condition ŽA2. is an assumption on the kernel function K Ž?.. Many functions satisfy this condition, for example, the ‘‘square window’’ functions, or the function K Ž x. s

½

3 4

Ž1 y < x < 2 . ,

0,

< x < F 1, < x < ) 1,

etc. The condition HK Ž x . dx s 1 is not a restriction. Suppose HK Ž x . dx s c 0 / 1. We can define a new function K˜Ž x . s K Ž x .rc0 . Clearly HK˜Ž x . dx s 1. Since K Ž?. has bounded support,

HK Ž x . < x <

i

dx - `

for each positive integer i.

In assumption ŽA3., we have removed the restrictive condition that  x n4 and  j k 4 are i.i.d. sequences as in w2, 10x. As mentioned before, this is particularly important for many applications in optimization, control, and related problems. Suppose that  x n4 is a stationary uniform mixing sequence. Then an inequality < Enp k Ž x < Fn . y p Ž x .< F w Ž k y n. holds, where

PASSIVE STOCHASTIC APPROXIMATION

479

w Ž?. denotes the mixing rate Žsee w5x or w1x.. Suppose that Ý k w Ž k . - `. Then the summability condition in ŽA3. is verifiable. Condition ŽA4. assumes that a Liapunov function exists. The exact form of V Ž?. need not be known. Assumption ŽA5. includes additive noise, bounded non-additive noise, and unbounded non-additive noise. If  a n4 is a sequence of stationary w-mixing random variable satisfying E < a n < 2 - `, then < a n < 2 4 is also a w-mixing sequence. By the well-known mixing inequality Žsee w5, p. 82x or w1x., for any j G n, En < a j < 2 F < En < a j < 2 y E < a j < 2 < q E < a j < 2 Fw ˜ Ž j y n . q E < a n < 2 - `, where w ˜ Ž?. denotes the mixing rate. This shows that En < a n < 2 - ` includes many interesting situations. Similar assumptions as that of ŽA6. have been used in the past Žsee w4x.. Again, if the random sequences are certain w-mixing processes, then the condition is easily verified. We are now in a position to present the convergence theorem, which is inspired by the work of Kushner w4x. The main idea is to compare the discrete iterates with that of the dynamic system

˙z s p Ž z . f Ž z . , THEOREM 1. ments hold: v

v

z Ž 0. s z0 .

Ž 6.

Assume ŽA1. ] ŽA6. are satisfied. Then the following state-

 z n4 is bounded w. p.1. Let Z be defined as Z s  z ; VzX Ž z . f Ž z . s 0 4 .

If V X Ž z . f Ž z . F 0 for all z, then n

z n ª Z w. p.1. The above theorem presents convergence properties. The results here are similar to that of the classical procedures in w4x. However, due to the presence of the kernel function, as well as the randomized sequence, the approach in w4x is not directly applicable. Some modifications are necesn sary. By z n ª Z, we mean that lim n r Ž z n , Z . s 0, where r Ž?. is the usual distant function such that r Ž z, Z . s inf y g Z < z y y <.

480

G. YIN

To investigate further, we wish to find the rate of convergence for the passive recursive algorithms. In particular, we are aiming at deriving upper bounds on the estimation errors. Although Theorem 1 is similar to that of the classical counterpart, the upper bounds of the estimation errors reflect very different phenomenon from that of the traditional approach, owing to the explicit dependence of the smoothness of the functions under consideration. To proceed, we replace ŽA4. by the following condition. ŽA4X . Let Ž6. have a unique asymptotically stable point u Žin the sense of Liapunov.. There is a twice continuously differentiable Liapunov < z <ª`

function V Ž?. such that V Ž z . G 0 for all z, V Ž z . ª `, < Vz Ž z .< F k Ž1 q < VzX Ž z . f Ž z .< 1r2 ., and Vz z Ž?. is bounded uniformly. For any z / u , and for ˜ ) 0, VzX Ž z . f Ž z . F yl˜V Ž z .. Furthermore, V Ž z . is locally quadratic, some l i.e., V Ž z . s Ž z y u .X QŽ z y u . q oŽ< z y u < 2 ., where Q is a symmetric positive definite matrix. In what follows, without loss of generality, we assume that u s 0. We need another condition for deriving the upper bounds of the estimation error. ŽA7. In addition to the assumption on f Ž?. in ŽA6., assume the following conditions hold: For each j , the partial derivatives of f Ž?, j . up to the order of l are continuous such that for all z s Ž z1 , . . . , z r .X g R r , and for some M ) 0, < ­ l f Ž z, j .r­ z l 1 ??? ­ z rl r < F M, where l 1 q ??? ql r s l. For the kernel K Ž?., v

v

Hz

n1 1

??? z rn r K Ž z . dz s 0

for all 1 F n 1 q ??? qnr F 2 l y 3.

For each n, the partial derivatives of pnŽ?< Fn . up to the order l are continuous and < ­ ipnŽ z < Fn .r­ z1i 1 ??? ­ z ri r < F M for some M ) 0, where i 1 q ??? qi r s i , for i s l y 1 and i s l. v

Remark 2. In the condition above, it requires the integrals of the kernel up to the order of l y 1 to be zero. Kernel functions satisfying such conditions can be constructed easily. Hence this condition is hardly a restriction. The main condition is on the smoothness of the function f Ž?. and the density p Ž?.. THEOREM 2. Take a n s 1rng , for 0 - g F 1 and h n s 1rn m with 0 - m - 1. Under the conditions of ŽA1. ] ŽA3., ŽA4X ., ŽA5. ] ŽA7., for sufficiently large n, E < z n < 2 s O Ž nyŽ2 lq1.g rŽ2 lq2. . . Remark 3. First, we point out that if we add the condition ‘‘p k Ž x < Fn . ) 0 for each x’’ in ŽA3., the smoothness condition of p k Ž?< Fn . in ŽA7. can be

481

PASSIVE STOCHASTIC APPROXIMATION

reduced to l y 1. In lieu of a n s 1rng , we may use a n s arng for some a ) 0. Nevertheless, the constant a can be absorbed in the function f Ž?.. In the previous work Žsee w10x., dealing with the error bounds, the authors concentrated on the case that a n s 1rn only. In the actual computing, it is desirable to use larger step size, especially in the beginning stage of the computation so as to force the iterates getting to a vicinity of the true parameter faster. Thus it is necessary to consider the cases that the step size is larger than O Ž1rn.. Moreover, we point out that as in the recent trends for improving the performance of the stochastic approximation algorithms via averaging Žsee w11, 7, 13, 16x, etc.., it is crucial to allow the step size to be larger than O Ž1rn.. In the proof of the theorem, we actually provide a way of choosing the window width which leads to the desired order estimates.

3. W.P.1 CONVERGENCE The object of this section is to derive the w.p.1 convergence of the iterates as stated in Theorem 1. The approach we are using is the methods of perturbed Liapunov function, which have been utilized successfully in the past for analyzing recursive stochastic algorithms. We exploit the natural connection of the discrete iteration and that of the continuous dynamic systems. Using ODE Žordinary differential equation. methods to analyze stochastic recursive algorithms was first proposed and investigated by Ljung w9x, and Kushner and Clark w3x in the 1970s and subsequently was developed and employed in a wide range of applications. qq zn, ˆ z n , and ˇ z n all Proof of Theorem 1. In what follows, zq n , zn , ˜ represent points on the line segment joining z n and z nq1. We divide the proof into two steps. The first step establishes the w.p.1 boundedness and the second step proves the convergence.

Step 1. Prove the boundedness of  z n4 . By virtue of the basic recursion, EnV Ž z nq1 . y V Ž z n . s

an hn q

VzX Ž z n . En K a2n h22

En K

ž

ž

x n y zn hn

x n y zn hn

/

f x n , jn .



f X Ž x n , j n . Vz z Ž zq n .K

ž

x n y zn hn

f x n , jn . .



Ž 7.

482

G. YIN

Since f 2 Ž z n . is Fn-measurable and gn4 is a martingale difference sequence, En f 2 Ž z n . gn s f 2 Ž z n . En gn s 0, by virtue of a change of variable v s Ž x y z n .rh n , we have an hn

VzX Ž z n . En K s

an hn

ž

x n y zn hn

VzX Ž z n . En K

ž

f x n , jn .



x n y zn hn

f z n , jn . q f z Ž z n , jn . Ž x n y z n .

/Ž Ž

X

qŽ x n y zn . fz z Ž ˆ zn , jn . Ž x n y zn . . s

an hn

VzX Ž z n . En K

ž

x n y zn hn

f 0 Ž z n . q En a n q En f 1 Ž z n , bn .

/

q a n h nVzX Ž z n . K Ž v . En f z Ž z n , j n . vpn Ž z n q h n v < Fn . d v

H

q O Ž a n h 2n . < Vz Ž z n . < K Ž v . < f z z Ž ˆ z n , j n . < < v < 2pn Ž z n q h n v < Fn . d v

H

s

an hn q q

VzX Ž z n . En K an hn an hn

ž

x n y zn

VzX Ž z n . En K VzX Ž z n . En K

/Ž ž / ž / hn

f zn .

x n y zn hn

x n y zn hn

En a n En f 1 Ž z n , bn . q f 0 Ž z n . y f Ž z n .

q O Ž a n h n . Ž 1 q < VzX Ž z n . f Ž z n . < . .

Ž 8.

As for the second term in Ž7., a2n h2n

En K

ž

Fk

x n y zn hn a2n h 2n

En K

/ ž

f X Ž x n , j n . Vz z Ž zq n .K x n y zn hn

ž

2

f x n , jn .



x n y zn hn

f x n , jn .



PASSIVE STOCHASTIC APPROXIMATION

Fk

a2n

HK Ž v . p Ž z n

hn

n

483

q h n v < Fn . d v

= En < f 0 Ž z n . < 2 q < a n < 2 q < f 1 Ž z n , bn . < 2 q < f 2 Ž z n . < 2 < gn < 2 q k a2n h n K Ž v . pn Ž z n q h n v < Fn . < v < 2 d v En < f 0, z Ž ˇ zn . < 2

H

qEn < f 1, z Ž ˇ z n , bn . < 2 q < f 2, z Ž ˇ z n . < 2 En < gn < 2 Fk

a2n

Ž 1 q < VzX Ž z n . f Ž z n . < . .

hn

Ž 9.

Noticing that an hn

VzX Ž z n . En K

ž

x n y zn



f zn .

hn

s a npn Ž z n < Fn . VzX Ž z n . f Ž z n . q O Ž a n h n < VzX Ž z n . f Ž z n . < . , and putting the estimates in Ž7., yields EnV Ž z nq1 . y V Ž z n . s a nVzX Ž z n . f Ž z n . pn Ž z n < Fn . q q

an hn an hn

qO

VzX Ž z n . En K VzX Ž z n . En K a2n

ž /Ž hn

ž ž

x n y zn hn x n y zn hn

/ /

En a n En f 1 Ž z n , bn . q f 0 Ž z n . y f Ž z n .

1 q < VzX Ž z n . f Ž z n . < .

q O Ž a n h n . Ž 1 q < VzX Ž z n . f Ž z n . < . .

Ž 10 .

Define the perturbations by V1 Ž z, n . s V2 Ž z, n . s

`

Ý

ai

isn

hi

`

ai

Ý isn

hi

VzX Ž z . En K VzX Ž z . En K

ž ž

x i y zi hi x i y zi hi

/ /

En a i En f 1 Ž z, bi . q f 0 Ž z . y f Ž z . .

484

G. YIN

By virtue of ŽA4. and ŽA6., we have `

< V1 Ž z, n . < F a n < Vz Ž z . <

Ý < En a i
F k a n Ž 1 q < VzX Ž z . f Ž z . < . . Similarly, < V2 Ž z, n . < F k a n Ž 1 q < VzX Ž z . f Ž z . < . . Using the definitions of V1Ž?. and V2 Ž?., detailed computation leads to EnV1 Ž z nq1 , n q 1 . y V1 Ž z n , n . an

sy

hn

En K `

q

ž

x n y zn hn

ai an

Ý isnq1

hi hn

En K

/ ž

En a n x n y zn hn

= f Ž x n , j n . Vz z Ž zq n . Enq1 K

/ ž

x i y zi hi

/

Enq1 a i .

Noticing the boundedness of Vz z Ž?. and the boundedness of p i Ž?< Fn ., by virtue of ŽA6., an hn

žž

En K

x n y zn

=

hn `

< f Ž x n , j n . < < Vz z Ž zq < n .

ai

Ý isnq1

F k a2n En K

/

ž

hi

HK Ž v . p Ž z q h v < F . d v i

x n y zn hn



i

i

n

< Enq1 a i <

/

< f Ž z n , j n . < q < f z Ž zqq < n , jn . Ž x n y zn . .

F k a2n Ž 1 q < VzX Ž z n . f Ž z n . < . . Therefore, EnV1 Ž z nq1 , n q 1 . y V1 Ž z n , n . s y

an hn

En K

ž

x n y zn hn

/

En a n

q O Ž a2n . Ž 1 q < VzX Ž z n . f Ž z n . < . . Ž 11 .

485

PASSIVE STOCHASTIC APPROXIMATION

Likewise, EnV2 Ž z nq1 , n q 1 . y V1 Ž z n , n . sy

an hn

En K

ž

x n y zn hn

En f 1 Ž z n , bn . q f 0 Ž z n . y f Ž z n .

/

q O Ž a2n . Ž 1 q < VzX Ž z n . f Ž z n . < . .

Ž 12 .

Define the perturbed Liapunov function VˆŽ n. as VˆŽ n . s V Ž z n . q V1 Ž z n , n . q V2 Ž z n , n . . Owing to Ž10. ] Ž12., EnVˆŽ n q 1 . y VˆŽ n . s a npn Ž z n < Fn . VzX Ž z n . f Ž z n . q

a2n hn

dnpn Ž z n < Fn . VzX Ž z n . f Ž z n . q

a2n hn

zn

q a n h n d˜npn Ž z n < Fn . VzX Ž z n . f Ž z n . q a n h n z˜n , where dn , zn , d˜n , and z˜n are uniformly bounded random variables. Define m n s VˆŽ n q 1 . y VˆŽ n . y a npn Ž z n < Fn . VzX Ž z n . f Ž z n . y y

a2n hn

a2n hn

zn

dnpn Ž z n < Fn . VzX Ž z n . f Ž z n .

y a n h n d˜npn Ž z n < Fn . VzX Ž z n . f Ž z n . y a n h n z˜n and ny1

Mn s

Ý mi . is1

It is easily seen that ny1

Mn s VˆŽ n . y

Ý is0

ny1

y

Ý is0

a2i hi

ž

ai q ny1

zi y

and Mn is a martingale.

a2i hi

d i q a i h i d˜i p i Ž z i < Fi . VzX Ž z i . f Ž z i .

Ý ai h i z˜i ,

is0

/

486

G. YIN

By the bounds on Vi Ž?., i s 1, 2, < VˆŽ n . < G Ž 1 y k a n . V Ž z n . y k a n G yk a n . Define the stopping rules t i , for i s 0, 1 as

t 0 s min  n; z n f Q0 4 t 1 s min  n G t 0 ; z n g Q0 4 . Now VˆŽ n l t 1 . for n G t 0 is a supper martingale bounded below by yk a n . There is an N0 ) 0 such that for all n G N0 , an hn

zn F

d 5

d

h n z˜n F

,

5

an

,

hn

dn F

1 5

,

h n d˜n F

1 5

,

if z n f Q0 . Consequently, EnVˆŽ n q 1 . y VˆŽ n . F Ž ya n d q 45 a n d . F y 15 d a n . As a result, z n g Q0 infinitely often. For arbitrary l1 ) l0 , let Q1 s  z; V Ž z . F l1 4 . By virtue of the definition of m n , similar to the previous calculation Ž10. ] Ž12., with modifications, we obtain that if z n g Q1 then it is bounded w.p.1, and as a result corresponding to such z n , < m n < 2 F k a2n q

ž

a4n h 2n

/

q a2n h2n .

Recall that k is a generic positive constant. Define

t 2 s min  n G t 0 ; z n f Q1 4 . We then have P

ž

n

sup

Ý

t 0Fn- t 2 is t 0

/

mi G « F

F

k « k «

t 2y1

E

Ý

ž

ž

a2i q

is t 0 `

Ý is0

a2i q a4i h2i

a4i h 2i

q a2i h 2i

/

/

q a2i h 2i - `.

Ž 13 .

We claim that eventually z n will stay in Q1 w.p.1. That is, there is an N# G t 0 such that for all n G N#, z n g Q1 w.p.1. We establish this assertion by an argument of proof by contradiction. Suppose the assertion

PASSIVE STOCHASTIC APPROXIMATION

487

were not true. For any N# G t 0 , there exits an n G N# such that V Ž z ny1 . F l1 but V Ž z n . ) l1. Consequently, V Ž z n . F V Ž z ny1 . y a n d q k a n Ž 1 q V Ž z ny1 . . F l1 y a n d q k a n Ž 1 q l1 . . Taking lim sup leads to lim sup nV Ž z n . F l1 , which is a contradiction. Thus the boundedness of  z n4 is established. Step 2. Prove the convergence of the algorithm. In view of the fact HK Ž v . d v s 1, 1 hn

En K

ž

x n y zn hn

/

y p Ž zn .

s

HK Ž v . p Ž z

s

HK Ž v .

n

n

q h n v < Fn . d v y p Ž z n .

pn Ž z n q h n v < Fn . y pn Ž z n < Fn . d v

q pn Ž z n < Fn . y p Ž z n . . By virtue of the Lipschitz continuity of pnŽ z < Fn . and the estimate above a nVzX Ž z n . f Ž z n .

ž

1 hn

En K

ž

x n y zn hn

/

y p Ž zn .

/

s a nVzX Ž z n . f Ž z n . pn Ž z n < Fn . y p Ž z n . q O Ž a n h n < VzX Ž z n . f Ž z n . < . . Define V0 Ž z, n . s

`

Ý aiVzX Ž z . f Ž z . En Ž p i Ž z < Fn . y p Ž z . . . isn

Owing to ŽA3. and ŽA4., < V0 Ž z, n . < F k a n Ž 1 q < VzX Ž z . f Ž z . < . . Moreover, by the boundedness of  z n4 and the continuity of p i Ž?< Fn ., p Ž?., Vz Ž?., and f Ž?., together with the estimates in ŽA3., EnV0 Ž z nq1 , n q 1 . y V0 Ž z n , n . s ya nVzX Ž z n . f Ž z n . En Ž pn Ž z n < Fn . y p Ž z n . . q O Ž a2n . .

Ž 14 .

488

G. YIN

Next define V˜Ž n . s V Ž z n . q

2

Ý Vi Ž z n , n . . is0

Using the detailed estimate as in Step 1 and noticing that  z n4 is bounded w.p.1, EnV˜Ž n q 1 . y V˜Ž n . s a np Ž z n . VzX Ž z n . f Ž z n . q

a2n hn

m ˜ n q an h nhn q a2nh˜n ,

where  m ˜ n4, hn4, and h˜n4 are sequences of uniformly bounded random variables. 1 0Ž . Ž .  4 Define t n s Ý ny is0 a i , m t s max n; t n F t . Let z ? be a piecewise linear function, which equals z 0 on Žy`, 0., equals z n at t n for n G 0, and is a linear interpolation of z n and z nq1 in each Ž t n , t nq1 .. Denote z n Ž t . s z 0 Ž t q t n .. Similar to the last part of the proof of Theorem 1 in w4x, by virtue of ŽA1., my1

sup V Ž z m . y V Ž z n . y

Ý

mGn

isn

n

a ip Ž z i . VzX Ž z i . f Ž z i . ª 0

w.p.1

sup V n Ž z n Ž t . . y V Ž z n Ž 0 . .

Ž 15 .

tG0 m Ž t nqt .y1

y

Ý

n

a ip Ž z i . VzX Ž z i . f Ž z i . ª 0

w.p.1.

isn

By the w.p.1 boundedness of  z n4 , it can be shown Žsimilar to w3, 4x.  z n Ž?.4 is uniformly bounded and equicontinuous. By virtue of the Ascoli]Arzela’s theorem, we may pick out a convergent subsequence, which converges uniformly on bounded intervals and still denote it by  z n Ž?.4 with limit z Ž?.. We have V Ž z Ž t . . s V Ž z Ž 0. . q

t

X z

H0 p Ž z Ž s . . V Ž z Ž s . . f Ž z Ž s . . ds.

Consequently, Ž15. implies if VzX Ž z . f Ž z . F 0 for all z then n

z n ª  z ; p Ž z . VzX Ž z . f Ž z . s 0 4 . However, p Ž z . ) 0 for all z; the desired result thus follows.

PASSIVE STOCHASTIC APPROXIMATION

489

4. BOUNDS ON ESTIMATION ERRORS For passive stochastic approximation algorithms, less information is known compared with that of the standard stochastic approximation methods. As a result, it is expected that the rate of convergence is slower than that of the standard ones. In this section, we take a closer scrutiny on this issue. In standard stochastic approximation algorithms, under broad conditions, one can establish upper bounds on the estimation errors. Suppose that a n s 1rng , for 0 - g F 1. Then E < z n < 2 s O Ž1rng . for sufficiently large n Žrecall that in this case z n and x n coincide.. In deriving such a result one usually uses a locally quadratic Liapunov function V Ž?. for the corresponding dynamical system. The error bounds above depend mainly on the step size. This picture changes for the passive stochastic approximation algorithms. In what follows, we prove that the error bounds will not only depend on the step size, but also they depend on the smoothness of the density and the smoothness of the function f Ž?.. Proof of Theorem 2. In view of the locally quadratic structure of V Ž?., for all z with < z < sufficiently small, V Ž z . G zX Qz y 14 zX Qz s Ž3r4. zX Qz, and by virtue of the Rayleigh quotient, X < z < 2 F ry1 ˆV Ž z . 0 z Qz F k

for some k ˆ ) 0,

where r 0 ) 0 is the minimal eigenvalue of Q Žrecall that Q is positive definite.. This, together with Theorem 1, then implies that for sufficiently large n, E < z n < 2 is bounded above by EV Ž z n .. Thus to obtain the desired estimate on E < z n < 2 , it suffices to show EV Ž z n . s O Ž nyŽ2 lq1.g rŽ2 lq2. .. The proof is by the perturbed Liapunov function methods again. We obtain Ž7. and Ž9. as before. There are some changes for Ž8., however. By virtue of Taylor expansion, 1 ­ n f Ž z, j .

ly1

f Ž x, j . s f Ž z, j . q

Ý

< n
n ! ­ z1n 1 ??? z nn r

n

Ž x y z. q OŽ < x y z
where n s Ž n 1 , . . . , nr . is a multi-index, with < n < s n 1 q ??? qnr ,

n !s n 1 ! ??? nr !,

z n s z1n 1 ??? z rn r .

Note the ­ l f Ž z n , j n .r­ z1l 1 ??? ­ z rl r should be interpreted as ­ l f Ž z, j n .r­ z1l 1 ??? ­ z rl r < zsz n . Although there is a slight abuse of notation Žsubscripts used

490

G. YIN

as both components and iteration numbers., there should be no confusion from the context. Consequently, an hn

VzX Ž z n . En K s

an hn

ž

x n y zn hn

VzX Ž z n . En K

ž

f x n , jn .



x n y zn hn ly1

ž

= f Ž zn , jn . q

Ý

/

1

< n
­ n f Ž zn jn .

n ! ­ z1n 1 ??? ­ z rn r

Ž x n y zn .

n

qO Ž < x n y z n < l . .

/

Ž 16 .

While the estimate for an hn

VzX Ž z n . En K

x n y zn

ž

hn

f z n , jn .



is the same as before, the estimate for the last term in Ž16. leads to an hn

VzX Ž z n . En K

ž

x n y zn hn

/

O Ž < x n y zn < l .

F k a n h ln < Vz Ž z n . < K Ž v . < v < lpn Ž z n q h n v < Fn . d v

H

F O Ž a n h ln . Ž 1 q V Ž z n . . . Similar to the expansion of f Ž?., we also have

pn Ž z n q h n v < Fn . s pn Ž z n < Fn . ly2

q

Ý

­ ppn Ž z n < Fn .

p1 < p
???

­ z rp r

p

Ž h n v . q O Ž < h n v < ly1 . . Ž 17 .

491

PASSIVE STOCHASTIC APPROXIMATION

For the middle term in the square bracket of Ž16., by virtue of ŽA7., an hn

VzX

Ž z n . En K

ž

x n y zn

s a nVzX Ž z n .

1 ­ n f Ž z n , jn .

ly1



hn

n ! ­ z1n 1 ??? ­ z rn r

< n
ly1

1

Ý

n!

< n
hnn En

Ž x n y zn .

n

­ n f Ž z n , jn . ­ z1n 1 ??? ­ z rn r

= K Ž v . v 1n 1 ??? v rn rpn Ž z n q h n v < Fn . d v

H

s a nVzX Ž z n .

ly1

1

Ý

n!

< n
hnn En

­ n f Ž z n , jn . ­ z1n 1 ??? ­ z rn r

= K Ž v . v 1n 1 ??? v rn rpn Ž z n < Fn . d v

H

q

a nVzX

Ž zn .

ly1

1

Ý

n!

< n
hnn En

­ n f Ž z n , jn . ­ z1n 1 ??? ­ z rn r

n

HK Ž v . v pˆ

n

dv

s O Ž a n h ln . Ž 1 q V Ž z n . . , where

­ pp Ž z n < Fn .

ly2

p ˆn s

Ý

­ z p 1 ??? ­ z rp r

< p
p

Ž h n v . q O Ž < h n v < ly1 . .

Using the expansion of pnŽ?< Fn ., and the fact HK Ž v . d v s 1, we have that a nVzX Ž z n . f Ž z n .

ž

1 hn

En K

s a nVzX Ž z n . f Ž z n .

žH

ly1

s

H

KŽ v.

Ý

< p
ž

h np

x n y zn hn

/

y pn Ž z n < Fn .

/

K Ž v . pn Ž z n q h n v < Fn . d v y pn Ž z n < Fn .

1 ­ ppn Ž z n < Fn . p! ­ z1p 1 ??? ­ z rp r

v p dv

q O Ž a n h ln . < VzX Ž z n . f Ž z n . < K Ž v . < v < l d v

H

s O Ž a n h ln . < VzX Ž z n . f Ž z n . < .

/

492

G. YIN

Gathering the estimates in the preceding derivations, in lieu of Ž10., we have EnV Ž z nq1 . y V Ž z n . s a nVzX Ž z n . f Ž z n . p Ž z n . q a nVzX Ž z n . f Ž z n . Ž pn Ž z n < Fn . y p Ž z n . . q q

an hn an hn

VzX Ž z n . En K VzX Ž z n . En K

ž ž

x n y zn

/ /

hn x n y zn hn

En a n

= f 0 Ž z n . q En f 1 Ž z n , bn . y f Ž z n . qO

a2n

ž /Ž hn

1 q < VzX Ž z n . f Ž z n . < .

q O Ž a n h ln . Ž 1 q < VzX Ž z n . f Ž z n . < . .

Ž 18 .

Owing to ŽA4X . and the boundedness of p Ž?. Žin particular, p Ž?. is ˜ such that bounded below from 0., there is a l ) 0 with 0 - l - l VzX Ž z n . f Ž z n . p Ž z n . F ylV Ž z n . . Define Vi Ž?. for i s 0, 1, 2 and V˜Ž n. as before. Since < VzX Ž z . f Ž z .< F k Ž1 q V Ž z .., we obtain EnV˜Ž n q 1 . y V˜Ž n . F yl a nV Ž z n . q O

a2n

ž / hn

Ž1 q V Ž z n . .

q O Ž a n h ln . Ž 1 q V Ž z n . . . Based on the bounds of Vi Ž?. for i s 0, 1, 2, we also have EnV˜Ž n q 1 . y EV˜Ž n . F yl a nV˜Ž n . q O

a2n

ž /Ž hn

1 q V˜Ž n . .

q O Ž a n h ln . Ž 1 q V˜Ž n . . . For n large enough, i.e., there is some N1 such that for some D ) 0 with 0 - D F 1 and for all n G N1 ,

l hDn y k

an 1yD hn

y k h lqD G l0 ) 0 n

493

PASSIVE STOCHASTIC APPROXIMATION

for some l0 ) 0 with l ) l0 ) 0. Hence, EV˜Ž n q 1 . F Ž 1 y l 0 a nrhDn . EV˜Ž n . q k a2nrh n q k a n h ln n

F A n , N1y1 EV˜Ž N1 . q

Ý

Ani

isN1

where

¡Ł ~ s ¢1, n

An j

ksjq1

ai

ž

h Di

Ž 1 y l0 a jrhDj . ,

ai 1yD hi

q

ai h Di

/

h lqD , i

if j / n; if j s n.

Using the bounds on Vi Ž?. for i s 0, 1, 2, we also obtain EV Ž z nq 1 . F A n , N1y1 EV Ž z N1 . n

q

An , i

Ý isN1

ž

ai

ai 1yD hi

hDi

q

ai hDi

h lqD q O Ž an . . i

/

Ž 19 .

Substitute a n s 1rng and h n s 1rn m into the equation. Without loss of generality, assume that N1 is chosen large enough such that A n, N1 F k nyŽ gy m D .. By means of a summation by parts, n

Ý

Ani

isN1

1 i

2gy m

n

s

Ani

Ý isN1

s

i

1

1

gy m D

gy mq m D

n

1 n

Ý

gy mq m D

Ý isN1

ž

A ni

isN1

ny1

q

i

1 i

gy mq m D

1 i

gy m D

y

i

1

Ž i q 1.

gy mq m D

/

Ý

An j

jsN1

1 j

gy m D

.

Ž 20 . Notice that n

Ý isN1

A ni

1 i

gy m D

s

1

l0

n

Ý Ž A ni y A n , iy1 . s isN1

1

l0

Ž 1 y A n , N y1 . F 1

1

l0

.

The first term on the last line of Ž20. is bounded above by k nyŽ gy m q m D .. As for the second term, without loss of generality, we may assume that N1 is large enough that 1rn1q gy m q m D F Ž 1r2 . Ž 1rn2gy m .

for all n G N1 .

494

G. YIN

Owing to the fact that 1 i

gy m q m D

1

y

Ž i q 1. i

1

An j

Ý

s

gy mq m D

j

jsN1

1 i

g y m q mD

ž

gy mq m D

i

qO

1

ž // i2

,

s A ni y A n , N1y1 F A ni ,

gy m D

coupled with g y m q mD F g - 1, we have ny1

Ý isN1

ž

1 i

gy mq m D

Ž i q 1.

ny1

F

Ani

Ý isN1

F

2

1

An j

j

jsN1

gy m D

1 i

A ni

Ý

/

gy mq m D

Ý

1q g y m q m D

n

1

i

1

y

isN1

1 i

2gy m

.

It then follows from Ž20. and the above estimate n

Ý

A ni

isN1

1 i

2gy m

F

k n

q

gy mq m D

n

1 2

1

Ý A ni i 2gy m ,

is1

which implies that n

Ani

Ý isN1

1 i

2gy m

sO

ž

1 n

gy mq m D

/

.

Ž 21 .

Similarly, we obtain that n

Ý

A ni

isN1

1 i

1

gy m D

i

m Ž lqD .

sO

ž

1 Ž lqD . m

n

/

.

Ž 22 .

Combining Ž19. ] Ž22., EV Ž z n . s O

ž

1 n

gy m D

qO

/ ž

1 n

gy mq m D

qO

/ ž

1 Ž lqD . m

n

/

.

To balance the first and the second terms on the right-hand side above, we select D s 12 . To balance the second and the last terms, we need to choose m s grŽ l q 1.. Consequently, the desired order estimate follows.

PASSIVE STOCHASTIC APPROXIMATION

COROLLARY 3.

495

As l ª `. E < z n < 2 s O Ž ny g .

for sufficiently large n. Remark 4. It is interesting to observe how the upper bounds depend on the smoothness of the function under consideration. For example, if l s 2, the theorem indicates that E < z n < 2 s O Ž nyŽ5 r6.g .. Thus the convergence rate is slightly slower than the classical rate O Ž nyg .. The rates of convergence of the passive stochastic approximation algorithm increase as the smoothness of the function increases. Such a concept is very different from the traditional stochastic approximation methods.

5. CONCLUDING REMARKS This paper focues on the w.p.1 convergence and error bounds of a class of passive stochastic approximation algorithms. The corresponding weak convergence theorem can be obtained Žsee the related work for constant step size and window width algorithms w16x.. If a large dimensional problem is encountered, we may wish to use paralllel processing methods with multiprocessors Žsee Kushner and Yin w8x.. Further research can be directed to obtaining results of large deviation type as well as to passive algorithms with constraints and projections. As in the traditional stochastic approximation methods, there are continuous time counterpart of the passive recursive algorithms. In this case, the difference equations are replaced differential equations. We may wish to study the decreasing step size algorithms of the form:

˙z s

at ht

K

ž

x t y zt ht

f xt , jt . ,



z Ž 0. s z0 ,

or constant step size and constant window width algorithms

˙z s

« d

K

ž

x t y zt

d

/

f Ž xt , jt . ,

z Ž 0. s z0 .

The corresponding asymptotic properties can be obtained. As mentioned previously, renewed interests have been shown in designing asymptotic optimal version of stochastic approximation algorithms w7, 11, 13, 14, 15x. The main ingredient of such procedures is the use of averaging of the iterates andror the averaging of the iterates and mea-

496

G. YIN

surements. Similar multi-step passive stochastic approximation algorithms can be designed. They take the forms an

z nq 1 s z n q zn s

1 n

hn

K

ž

x n y zn hn

f x n , jn . ,



n

Ý zi is1

for the postaveraging algorithm w7, 11, 13x, and z nq 1 s z n q zn s

1 n

an hn

n

ÝK is1

ž

x i y zi hi

f xi , ji .



n

Ý zi is1

for the smoothed algorithm w15x. The step size sequences may be selected in the following way: a n s 1rng , h n s 1rn m , and 12 - g y m - 1. It is conceivable that the arithmetic averaging will also play an important role in improving the performance of the passive stochastic approximation algorithms.

REFERENCES 1. S. N. Ethier and T. G. Kurtz, ‘‘Markov Processes, Characterization and Convergence,’’ Wiley, New York, 1986. 2. W. K. Hardle and R. Nixdorf, Nonparametric sequential estimation of zeros and extrema ¨ of regression functions, IEEE Trans. Inform. Theory IT-33 Ž1987., 367]372. 3. H. J. Kushner and D. S. Clark, ‘‘Stochastic Approximation Methods for Constrained and Unconstrained Systems,’’ Springer-Verlag, New YorkrBerlin, 1978. 4. H. J. Kushner, Stochastic approximation with discontinuous dynamics and state dependent noise; w.p.1 and weak convergence, J. Math. Anal. Appl. 82 Ž1981., 527]542. 5. H. J. Kushner, ‘‘Approximation and Weak Convergence Methods for Random Processes, with applications to Stochastic Systems Theory,’’ MIT Press, Cambridge, MA, 1984. 6. H. J. Kushner and H. Huang, Rates of convergence for stochastic approximation type of algorithms, SIAM J. Control Optim. 17 Ž1979., 607]617. 7. H. J. Kushner and J. Yang, Stochastic approximation with averaging of the iterates: optimal asymptotic rate of convergence for general processes, SIAM J. Control Optim. 31 Ž1993., 1045]1062. 8. H. J. Kushner and G. Yin, Asymptotic properties of distributed and communicating stochastic aproximation algorithms, SIAM J. Control Optim. 25 Ž1987., 1266]1290. 9. L. Ljung, Analysis of recursive stochastic algorithms, IEEE Trans. Automat. Control 22 Ž1977., 551]575. 10. A. V. Nazin, B. T. Polyak, and A. B. Tsybakov, Passive stochastic approximation, Automat. Remote Control 50 Ž1989., 1563]1569.

PASSIVE STOCHASTIC APPROXIMATION

497

11. B. T. Polyak, New method of stochastic approximation type, Automat. Remote Control 51 Ž1990., 937]946. 12. P. Revesz, ´ ´ How to apply the method of stochastic approximation in the non-parametric estimation of regression function, Mat. Oper. Statist. Ser. Statist. 8 Ž1977., 119]126. 13. G. Yin, On extensions of Polyak’s averaging approach to stochastic approximation, Stochastics Stochastic Rep. 36 Ž1991., 245]264. 14. G. Yin and I. Gupta, On a continuous time stochastic approximation problem, Acta Appl. Math. 33 Ž1993., 3]20. 15. G. Yin and K. Yin, Asymptotically optimal rate of convergence of smoothed stochastic recursive algorithms, Stochastics Stochastic Rep. 47 Ž1994., 21]46. 16. G. Yin and K. Yin, Passive stochastic approximation with constant step size and window width, IEEE Trans. Automat. Control 41 Ž1996., 90]106.