On the sequential construction of optimum bounded designs

On the sequential construction of optimum bounded designs

Journal of Statistical Planning and Inference 136 (2006) 2783 – 2804 www.elsevier.com/locate/jspi On the sequential construction of optimum bounded d...

357KB Sizes 4 Downloads 82 Views

Journal of Statistical Planning and Inference 136 (2006) 2783 – 2804 www.elsevier.com/locate/jspi

On the sequential construction of optimum bounded designs Luc Pronzato∗ Laboratoire I3S, CNRS-UNSA, UMR 6070, Bât. Euclide, Les Algorithmes, 2000 route des Lucioles, BP 121, 06903 Sophia Antipolis cedex, France Received 10 November 2003; accepted 15 October 2004 Available online 8 December 2004

Abstract We consider a parameter estimation problem with independent observations where one samples from a finite population of independent and identically distributed experimental conditions X. The size of the population is N but only n samples, a proportion  of N, can be used. The quality of a sample is measured by a regular optimality criterion (·) based on the information matrix, such as the Dcriterion. The construction of an optimal approximate design bounded by /, with  the probability measure of X, can be used to construct a sampling strategy which is asymptotically optimum (when the size N of the population tends to infinity). We show that a sequential strategy which does not require any information on  is also asymptotically optimum. Some possible applications are indicated. © 2004 Elsevier B.V. All rights reserved. MSC: 62K05; 62L05 Keywords: Sequential design; Sampling; Constrained design measure; Bounded design

1. Introduction We consider a parameter estimation problem, with  the vector of parameters to be estimated and X the experimental variables, X ∈ X ⊂ Rq . We assume that the observations are independent, so that the Fisher information matrix is the sum of rank-one matrices of ∗ Tel.: +33 4 92 94 27 08; fax: +33 4 92 94 28 96.

E-mail address: [email protected] (L. Pronzato) URL: http://www.i3s.unice.fr/pronzato/. 0378-3758/$ - see front matter © 2004 Elsevier B.V. All rights reserved. doi:10.1016/j.jspi.2004.10.020

2784

L. Pronzato / Journal of Statistical Planning and Inference 136 (2006) 2783 – 2804

the type f (X)f  (X). For instance, this may correspond to estimation in a linear regression ¯ k , where the εk ’s are independently model, with independent observations Yk =f  (Xk )+ε identically distributed (i.i.d.) with E{εk } = 0 and ¯ ∈ Rd is the unknown true value of the model parameters to be estimated. We shall assume that f (·) is continuous on X. We consider design criteria (·) that are functions of the information matrix M, with [M] to be maximized, and generalized designs  that are probability distributions on the set X. We denote  the set of such designs,  M() =

X

f (x)f  (x)(dx)

and () = [M()],  ∈ . We give in Appendix A a list of assumptions on  that will be used throughout the paper. We shall always assume that  is strictly concave (H 1), differentiable (H 2) and increasing (H 3). The assumptions are discussed in the same appendix; they are satisfied in particular when (M) = log det M or (M) = −trace(AM−p ), with p a positive integer and A a positive definite matrix. We consider the situation where the experimental conditions Xk ∈ X form a sequence of i.i.d. variables sampled in a population of size N, and only n < N samples can be used. We focus on the sequential problem, where, as soon as a new sample Xk becomes available, we must decide whether to use it or not, that is, to observe Yk or not. Notice the difference with a standard experimental design problem where the Xk ’s can be chosen: here we can only decide to accept or reject Xk . The paper is rather theoretically oriented, but many practical decision problems could be formulated in this way. For instance, for some experiments in nuclear physics events are selected according to the energy dissipated in a detector (C.E.R.N., 1994), but the selection could be based on the information content of the event, measured by its contribution to the information matrix; in phase-I clinical trials, the volunteers could be selected according to their size, weight, age, etc., all variables to be used to build a model for the tolerance dose. Other possible developments are given in Section 6. Notice that the case dim() = 1 corresponds to a variant of the secretary problem, see Pronzato (2001a).  Let  denote the probability measure of X1 , with X (dx) = 1. We assume that  is such that  M() = E{f (X1 )f  (X1 )} = f (x)f  (x)(dx) X

exists, with −∞ < () < ∞; a list of additional assumptions on  is given in Appendix A. The sequence of decisions will be denoted (uk ): uk = 1 if we decide to observe Yk , with experimental conditions Xk , and uk = 0 otherwise, with, for any admissible policy, uj ∈ Uj ⊆ {0, 1}, j = 1, . . . , N,

N  j =1

uj = n.

(1)

L. Pronzato / Journal of Statistical Planning and Inference 136 (2006) 2783 – 2804

2785

(Note that Xk is known when uk is chosen.) The associated information matrix MN,n (normalized, per sample collected) is given by MN,n =

N 1  uk f (Xk )f  (Xk ). n

(2)

k=1

(It may be singular for small n, but (MN,n ) > − ∞ for n large enough if the Xk ’s are selected in a suitable population, see H 1, H 1.) For N finite, we shall consider the following sequential problem: (3)

maximize E{(MN,n )}

with respect to (uj ) satisfying (1), the expectation E{·} being with respect to the product measure ⊗N of X1 , . . . , XN (we shall see, cf. (12), that the concavity and increasing properties of  and () < ∞ imply that E{(MN,n )} < ∞ for any N, n, 0 < nN , and any sequence (uj )). For any sequence (uj ) and any step k, 1k N , ak will denote the number of observations already made; that is, ak =

k−1 

(4)

uj

j =1

with a1 = 0. For each k ∈ {1, . . . , N}, the optimal decision at step k is obtained by solving:        max EXk+2 max . . . EXN −1 max EXN max EXk+1 uk ∈Uk uk+1 ∈Uk+1 uk+2 ∈Uk+1 uN −1 ∈UN −1 

 N  ui f (Xi )f  (Xi ) ... , (5) × max  [1/n] uN ∈UN

i=1

where EXj {.} denotes the expectation with respect to Xj , distributed with the measure , and the sets Uj satisfy  {0} if aj = n, Uj = Uj (aj ) = {1} if aj + N − j + 1 = n, {0, 1} otherwise. In the first case (aj =n) the maximum number of samples allowed has already been collected, the second one (aj + N − j + 1 = n) means that all remaining samples of the population must be accepted in order to reach a total of n samples selected. It is only in the last case that some freedom exists for decisions. The case d = dim() = 1 is considered in Pronzato (2001a). The optimal (closed-loop) solution is given by a backward recurrence equation. Using results on extreme value distributions, a simple open-loop solution is proved to be asymptotically optimal for N → ∞ with n fixed (for measures  absolutely continuous with respect to the Lebesgue measure and such that the associated distribution function is a von Mises function). This extends the results of Albright and Derman (1972) which concern the case n = N ,  ∈ (0, 1).

2786

L. Pronzato / Journal of Statistical Planning and Inference 136 (2006) 2783 – 2804

In the multidimensional case d > 1, in general, the optimal solution cannot be obtained in closed form. Open-loop feedback-optimal control is used in Pronzato (1999) and a heuristic one-step ahead decision rule in Pronzato (2001b), without any result on the asymptotic performance of these strategies. The asymptotics considered in this paper will only concern the case n= N ,  ∈ (0, 1), N → ∞. The fact that n tends to infinity at the same speed as N means that we shall obtain asymptotic performances that are achieved -almost surely (a.s.), contrary to Pronzato (2001a) which concerns expected performances. The different types of strategies to be considered (with finite or infinite horizon, stationary, non-adaptive, randomized) are defined in Section 2. The value (∗ ), with ∗ / a optimum bounded design measure, is shown to form an upper bound on the asymptotic performance of any strategy. A trivial asymptotically optimal strategy SN,n () is proposed in Section 3: SN,n () is the truncated version (finite horizon) of a stationary strategy S(∗ ) which samples from ∗ . We show in Section 4 that a slight modification of S(∗ ) leads to an asymptotically optimal strategy that does not require the prior knowledge of the measure . Illustrative examples are presented in Section 5. Section 6 gives some concluding remarks and suggests some extensions. 2. Strategies, performances and asymptotic optimality 2.1. Strategies We shall consider sequential strategies for which at step k only X1 , . . . , Xk have been observed. When  is known, k =(k, ak , Mk−1,ak , Xk ) then summarizes all the information necessary to make a decision about the acceptance of Xk , and the problem (3) corresponds to a discrete-time stochastic control problem, where k represents time, (k, ak , Mk−1,ak , Xk ) and uk ∈ Uk ⊆ {0, 1}, respectively, represent the state1 and control at time k. A sequential strategy SN,n is then defined by a sequence of mappings (k, a, M, X) → u ∈ {0, 1}, k = 1, . . . , N. For instance, the optimal strategy defined by (5) is sequential. Strategies are non-sequential when the selection of the n Xk ’s is made after all the population X1 , . . . , XN has been observed. A non-sequential strategy will be used to obtain an upper bound on the performance of sequential strategies. A sequential strategy S will be called stationary when the mapping is from (M, X) to u ∈ {0, 1}. Stationary strategies correspond to the case of infinite horizon (N = ∞) where we only require that ak /k →  when k → ∞. We shall denote k−1 the empirical measure of the ak samples already accepted at step k, with M(k−1 ) the current value of the (normalized) information matrix, M(k−1 ) =

k−1 1  uj f (Xj )f  (Xj ) ak j =1

1 When  is unknown, a case considered in Section 4, at step k an estimate  ˆ k of  (empirical version, parametric representation, etc.) based on X1 , . . . , Xk must also be used to decide about Xk ; the “state” k should

then be extended to include this ˆ k .

L. Pronzato / Journal of Statistical Planning and Inference 136 (2006) 2783 – 2804

2787

and write k = (k ) = [M(k )]. Non-adaptive strategies are special cases of stationary strategies for which the decision at step k only depends on the value of Xk , and S samples from some measure  /; we shall then write S = S(). We consider non-adaptive strategies as sequential, although the information carried by previous samples is not used. The non-adaptive strategies S() that we shall consider have the property that  = 0 on X1 ,  = / on X2 and  = /, 0 < < 1, on X\(X1 ∪ X2 ) for some subsets X1 , X2 of X. This means that S() should reject every Xk falling in X1 , accept all the samples in X2 but only a fraction of those in X3 . Therefore, the strategy is completely defined when X1 ∪ X2 = X but, when X3 = ∅, we must specify how the fraction of the samples in X3 is selected. We shall then consider randomized strategies, which lead to a simple analysis: a sample in X3 will be accepted with probability , by tossing a biased coin. The mapping X → u ∈ {0, 1} is then a random function when X ∈ X3 . A stationary strategy, defined for infinite horizon, will be implemented when the horizon is finite through a truncation procedure, which will be shown to preserve asymptotic optimality (when n = N ,  ∈ (0, 1), N → ∞). 2.2. Performances For any strategy SN,n used to solve (3), with 0 < n N , we shall denote (SN,n ) = (MN,n ), which measures the performance of SN,n (for a particular realization of the sequence X1 , . . . , XN ). Following the same line as in Albright and Derman (1972), which concerns the case d =1, we can use as a benchmark the infeasible, but better-than-optimal, non-sequential strategy S∗N,n , obtained by selecting the n design points Xk1 , . . . , Xkn that maximize (MN,n ) after the N points X1 , . . . , XN have been observed. S∗N,n is thus a -optimum design algorithm that generates an exact n-point -optimum design in the finite design space {X1 , . . . , XN }. Obviously, for any N, n and any strategy SN,n , sequential or not,

(SN,n ) (S∗N,n ).

(6)

Also, S∗N,n satisfies ∀ ∈ (0, 1),

lim sup (S∗N, N ) ∗ ,

-a.s.

(7)

N→∞

with ∗ = (∗ ) and ∗ / a -optimum constrained design measure: ∗ maximizes (),  ∈ D(, ), with D(, ) ⊂  the set of admissible measures satisfying (dx)(dx)/. When (·) is strictly concave (H 1), M(∗ ) is unique (although ∗ is not necessarily unique). (H 1) implies ∗ () > − ∞ and (H 3) implies ∗ (/) < ∞. Other properties of ∗ will be presented in Section 3.1, see also Wynn (1982), Fedorov (1989), Fedorov and Hackl (1997), Sahm and Schwabe (2001) and Pronzato (2004).

2788

L. Pronzato / Journal of Statistical Planning and Inference 136 (2006) 2783 – 2804

Following (6), (7), a strategy SN,n will be called asymptotically optimal (for n = N ,  ∈ (0, 1), N → ∞) when limN→∞ (SN, N ) = ∗ , -a.s. Although sampling from a -optimum constrained measure ∗ yields k → ∗ , -a.s., k → ∞ (we shall see in Section 3.2 how this can be implemented), it corresponds to a non-adaptive strategy that does not satisfy constraint (1). We show below how enforcing (1) can preserve asymptotic optimality. For any  ∈ (0, 1) let S be a stationary strategy such that ak /k →  and k → ∗ , -a.s., k → ∞. In particular, it may correspond to the non-adaptive strategy S(∗ ) which samples from a -optimum constrained measure ∗ /. To any such S we associate the following truncation, which defines a sequential strategy SN,n (S ) satisfying the constraint aN +1 = n for any finite N n:

reject Xk SN,n (S ): if ak = n,

(8)

if N − k + 1 = n − ak accept Xk

otherwise, apply S . We show that SN,n (S ) is asymptotically optimal for n = N ,  ∈ (0, 1), N → ∞. Replace  by  + , with > 0, and consider the following strategy, simpler than SN,n (S+ ):

¯ N,n (S+ ): if ak = n, reject Xk S

otherwise, apply S+ .

¯ N,n (S+ ). We have n¯ n, and, if Let n¯ denote the number of samples accepted by S ¯ MN,n (S+ ) and MN,n (S+ ) denote the information matrices associated with both strate ¯ N,n (S+ )+(1/n) n−n¯ f (X  )f  (X ¯ N,n (S+ ) when n=n ¯ and MN,n (S+ )=M gies, MN,n (S+ )=M j =1 j ¯ N,n (S+ ). when n¯ < n, where the Xj ’s denote the points selected by SN,n (S+ ) and not S Since (·) is increasing, we have ¯ N,n (S+ )] [SN,n (S+ )].

[S

(9)

Let n+ denote the number of points accepted by S+ among the N samples of the ¯ N,n (S+ ) if ignoring the sequence (these are points that would have been accepted by S constraint ak n). It satisfies n+ /N →  + , -a.s. as N tends to infinity. Take n = N

and let N tend to infinity. The probability that n+ n infinitely often is zero when > 0. ¯ N, N (S+ ) stops after n = N samples from S+ have Therefore, asymptotically, S been collected and ¯ N, N (S+ )] → ∗+ ,

[S

-a.s., N → ∞.

Together with (9), it implies lim inf [SN, N (S+ )]∗+ N→∞

-a.s., N → ∞.

On the other hand, from (6), (7), lim supN→∞ [SN, N (S+ )] ∗ -a.s. We may then let tend to zero and use the continuity of ∗ with respect to , see Theorem 6, to obtain the following property. Theorem 1. For  ∈ (0, 1) let S be a stationary strategy such that ak /k →  and k → ∗ , -a.s., k → ∞. The strategy SN,n (S ) defined by (8) is asymptotically optimal for n = N , N → ∞.

L. Pronzato / Journal of Statistical Planning and Inference 136 (2006) 2783 – 2804

2789

Section 3.1 will present some properties of -optimum constrained design measures, to be used in Section 3.2 to implement the stationary strategy S(∗ ). When S(∗ ) is plugged in (8), the resulting strategy SN,n [S(∗ )] is asymptotically optimal from the previous theorem. Notice that from (6), (7) the existence of such an asymptotically optimal strategy implies lim (S∗N, N ) = ∗ ,

N→∞

-a.s.

(10)

This property can be extended into the following (see Appendix B), which will be used in the proof of Theorem 9. Lemma 2. Let (k ) be a sequence in (0, 1), with limk→∞ k =  ∈ (0, 1). Then, under H 1, H 1–H 3 and H 5, lim (S∗N, N N ) = ∗ ,

N→∞

-a.s.

Whereas (10) and Lemma 2 only concern the limiting behavior of (S∗N,n ), the following result on E{ (S∗N,n )} holds for any N under very general conditions. The proof is given in Appendix B. Lemma 3. For a concave criterion (·), the non-sequential strategy S∗N,n (that is, a optimum algorithm for an exact design with n point in {X1 , . . . , XN }) satisfies ∀(n, N ), 0 < nN,

E{ (S∗N,n )} (∗n/N )

(11)

with ∗n/N a -optimum constrained design measure in D(, n/N ). H 1 and H 3 imply () < ∞ for any design measure  /,  given in (0, 1), so that Lemma 3 implies that for any (n, N ), 0 < n N , and any strategy SN,n E{ (SN,n )} E{ (S∗N,n )} (∗n/N ) < ∞.

(12)

(When SN,n involves randomized decisions, see e.g. Section 3.2, the first expectation in (12) is also with respect to them.) It may be noticed that the upper bound (∗n/N ) for the expected performance E{ (SN,n )} is not necessarily achievable when n is finite, N → ∞ and the strategy SN,n is sequential, even in the case d = dim() = 1, see Example 12 of Pronzato (2001a). 3. Constrained design measures and asymptotically optimum sequential strategies 3.1. Optimum constrained design measures The main result (Wynn, 1982; Sahm and Schwabe, 2001), presented in the following theorem, states that when ∗ is a -optimum constrained measure, X can be partitioned into three subsets X∗1, , X∗2, and X∗3, = X\(X∗1, ∪ X∗2, ), with ∗ = 0 on X∗1, , ∗ = / on X∗2, and the directional derivative F (∗ , x) (see Appendix A) constant on X∗3, .

2790

L. Pronzato / Journal of Statistical Planning and Inference 136 (2006) 2783 – 2804

Theorem 4. The following statements are equivalent: (i) ∗ is a -optimum constrained design measure; (ii) there exists a number c such that F (∗ , x) c for ∗ -almost all x and F (∗ , x) c for ( − ∗ )-almost all x; (iii) there exist two subsets X∗1, and X∗2, of X such that • • •

∗ = 0 on X∗1, and ∗ = / on X∗2, , inf x∈X∗2, F (∗ , x) c supx∈X∗1, F (∗ , x), F (∗ , x) = c on X∗3, = X\(X∗1, ∪ X∗2, ).

The construction of the sets X∗1, and X∗2, is important in order to be able to sample from  ; also, we must precise the value of ∗ on X∗3, = X\(X∗1, ∪ X∗2, ). This is considered in the rest of this section. For a given , consider the random variable F (, X1 ) and let F (·) denote the corresponding distribution function, ∗

F (s) = {x/F (, x) s}.

(13)

Define c () as c () = min{s/F (s)1 − }

(14)

X1, () = {x/F (, x) < c ()}, X2, () = {x/F (, x) > c ()}, X3, () = {x/F (, x) = c ()}.

(15)

and

We then obtain X∗j, = Xj, (∗ ), j = 1, 2, 3, and c (∗ ) is the constant c of Theorem 4. Consider now the following transformation: T, :  ∈  → T, () ∈ D(, ), ⎧ / on X2, (), ⎪ ⎨  − [X ()] 2, T, () = / on X3, (), ⎪ ⎩ [X3, ()] 0 on X1, (). Notice that F [; T, ()] = max ∈D(,) F (; ), with D(, ) the set of admissible measures on X satisfying / (indeed, T, () distributes its mass on X where F (, x) takes its highest values). The next theorem complements Theorem 4 by a minimax formulation similar to the Kiefer-Wolfowitz (1960) Equivalence Theorem. Theorem 5. The following statements are equivalent: (i) ∗ is a -optimum constrained design measure; (ii) F [∗ ; T, (∗ )] = 0;

L. Pronzato / Journal of Statistical Planning and Inference 136 (2006) 2783 – 2804

(iii) (iv)

2791

∗ minimizes F [; T, ()],  ∈ D(, ); ∗ minimizes max ∈D(,) F (; ),  ∈ D(, ).

Proof. From Theorem 4, (i) gives [T, (∗ )] = (∗ ) which implies (ii). Take any  ∈  and denote ∗ an optimum constrained design measure. From the definition of T, , ∀ ∈ D(, ),

F [; T, ()] F (; )

(16)

and thus F [; T, ()] F (; ∗ ). Concavity of  then gives F [; T, ()] ∗ − ().

∀ ∈ ,

(17)

Since ∗ (),  ∈ D(, ), (ii) implies (i). Equivalence between (i) and (iii) is obvious from (17); (16) implies the equivalence between (iii) and (iv).  We conclude this section by mentioning the following Lipschitz property, see Pronzato (2004), which shows that ∗ , is continuous in . Theorem 6. For any , in (0, 1), the associated -optimum constrained design measures ∗ and ∗ satisfy 



|c (∗ )| |c ( )| , | −  | | − | max  ∗





with c () defined by (14). 3.2. An asymptotically optimum sequential strategy The implementation of a non-adaptive strategy that samples from ∗ is straightforward when  has no atoms, that is, when for any X with (X) > 0 exists X ⊂ X such that 0 < (X ) < (X), with measures absolutely continuous w.r.t. the Lebesgue measure as a special case. Indeed, in that case there exists a ∗ such that the set X∗3, is empty, see Wynn (1982), Fedorov (1989) and Fedorov and Hackl (1997), and a trivial implementation of S(∗ ) then consists in accepting the samples that fall into X∗2, and rejecting the others. We shall use the transformation T, of previous section to deal with the general situation, and sample from T, (∗ ) (which is equivalent to sampling from ∗ since [T, (∗ )] = (∗ ), see the proof of Theorem 5). Since only the fraction [ − (X∗2, )]/(X∗3, ) of the samples in X∗3, must be accepted, we randomize the selection procedure and define

S(∗ ): if Xk ∈ X∗2, accept Xk

if Xk ∈ X∗3, accept Xk with probability (18)



P3, = [ − (X∗2, )]/(X∗3, )

otherwise reject Xk . From Theorem 1, when the non-adaptive strategy S(∗ ) is plugged in (8) the resulting SN,n [S(∗ )] is asymptotically optimal for n = N , N → ∞. However, it requires the construction of the sets X∗j, , j = 1, 2, 3, and thus of a -optimum constrained design

2792

L. Pronzato / Journal of Statistical Planning and Inference 136 (2006) 2783 – 2804

measure ∗ /. We show in the next section that it is possible to avoid this construction while preserving asymptotic optimality. Remark 7. A deterministic version of (18) consists in accepting a Xk that falls into X∗3, only when ak /k < . One can then easily show that ak /k →  and k → ∗ as k → ∞, -a.s. (however, the strategy is then non-stationary, see Section 2.1).

4. Sampling asymptotically from a constrained measure Consider the following modification of the strategy S(∗ ) defined by (18): at step k, we simply substitute the sets Xj, (k−1 ) for X∗j, , j = 1, 2, 3, with Xj, () defined by (15) and k−1 the empirical measure defined by the ak design points already selected. We thus define the following adaptive strategy:

S (): if Xk ∈ X2, (k−1 ) accept Xk

if Xk ∈ X3, (k−1 ) accept Xk with probability (19)

Pk () = { − [X2, (k−1 )]}/[X3, (k−1 )]

otherwise reject Xk . Remark 8. 1. Notice that the sets Xj, (k−1 ) are obtained from the function F (k−1 , x), that is, from M(k−1 ). Hence, the (randomized) decision at step k only depends on Xk and M(k−1 ) and S () is stationary. 2. In practice, the first samples Xk are always accepted until M(k−1 ) becomes nonsingular, but this initialization has no effect on the asymptotic behavior of S (). Also, for technical reasons, we can assume that Xk is always accepted when ak /k < /C, with C an arbitrarily large constant. This has no practical importance for the asymptotic behavior of the strategy (since ak /k →  -a.s., see Theorem 9 below) and for that reason is not mentioned in the definition of S (). On the other hand, it implies ak+1 /k > /C for any k which permits to guarantee that E{|k |} < ∞, a property used in the proof of asymptotic optimality, see Theorem 9. 3. S () takes a simpler form when F (s) given by (13) is continuous in s for any  (so that X3, (k−1 ) is always empty): we accept Xk when Fk−1 (Xk ) > 1 −  and reject Xk otherwise, and the strategy is fully deterministic. 4. We observe the same behavior when S () is modified as follows: any Xk ∈ X3, (k−1 ) is accepted if ak /k <  and is rejected otherwise; see Remark 7. However, the strategy is then non-stationary and its asymptotic behavior is more difficult to analyse. 5. Obviously, the sequence (k ) generated by (19) is not monotonically increasing, since (i) the step-length 1/(1 + ak ) when Xk is accepted and k−1 updated is predetermined and (ii) Xk is random. While (i) is standard in the construction of optimum designs, (ii) is less common and forms a specific feature of the context considered here. In order to eliminate the unboundedness case encountered in the dichotomous theorem of Wu and Wynn (1978), which is usually the main issue raised by (i), we introduce the assumption H 3 on , see Appendix A. The asymptotic behavior of (19) satisfies the following.

L. Pronzato / Journal of Statistical Planning and Inference 136 (2006) 2783 – 2804

2793

Theorem 9. Under H 1–H 6 and H 1–H 3 the empirical measure k defined by the points accepted by the stationary strategy S () satisfies ak /k → , -a.s., and lim (k ) = (∗ ), -a.s.,

k→∞

(20)

as k → ∞, with ∗ / a -optimum constrained design measure. The proof is given in Appendix B. When the strategy is truncated as indicated in (8), the resulting SN,n [S ()] is asymptotically optimal for n = N , N → ∞, from Theorem 1. Another possible truncation consists in adapting  and using, at step k, the strategy Sk (), with k =

n − ak N −k+1

(21)

and ak defined by (4). Sk () coincides with the one-step-ahead rule suggested in Pronzato (2001b): when k = 0, X2,k (k−1 ) is empty, [X2,k (k−1 )] = 0 and Xk is always rejected; when k 1, X2,k (k−1 ) = X and Xk is always accepted. When  is unknown, we can use S (ˆ k ), or Sk (ˆ k ), at step k, with ˆ k the empirical version of  (or a kernel estimate, or a parametric representation  ˆ , with ˆ k estimated k from X1 , . . . , Xk ). The estimation of  does not depend on the strategy that is used (which corresponds to a separation property in control theory), and Theorem 9 still holds: we can thus asymptotically sample from ∗ , without constructing ∗ beforehand and even without knowing  in advance. Illustrative examples are presented in the next section. Consider finally a nonlinear situation where the information matrix M depends on the 0 0 model parameters , so that local optimum design is based on M(ˆ ) with ˆ a nominal

0 value for . When  can be estimated on line, it is natural to replace ˆ by an estimate, k ˆ at step k. When using least-squares estimation in a nonlinear regression problem with ¯ + εk , where (εk ) is a sequence of i.i.d. errors with independent observations Yk = (Xk , ) k zero mean and finite variance, consistency and asymptotic normality of ˆ will hold under H 3 (and additional conditions on higher-order derivatives of (, x) with respect to  and their tail cross-product, see Jennrich, 1969). The sampling strategy S () will then ensure ¯ -a.s., k → ∞, with ∗ ()/ ¯ k → [∗ ()], a -optimum constrained design measure  ¯ for the true value  of the model parameters. This is illustrated by Example 12 below.

5. Examples We take (·) = log det(·) in all the examples below. Note that all the conditions H are then satisfied, see Appendix A. Example 10. We consider the quadratic regression model (, X) = 0 + 1 X + 2 X 2 . Let c correspond to the normal distribution N(0, 1) and d be the discrete  measure supported at the points {−1, − 21 , 0, 21 , 1} with respective weights 18 , 41 , 41 , 41 , 81 . We take  = 0.5c + 0.5d and  = 0.1.

2794

L. Pronzato / Journal of Statistical Planning and Inference 136 (2006) 2783 – 2804

6.5 6 5.5 5 4.5 4 3.5 3 2.5 −a

2 1.5 −2.5

−2

−1.5

a

−1

−0.5

0

0.5

1

1.5

2

2.5

Fig. 1. Sensitivity function d(∗ , x) for the optimal constrained measure ∗ in Example 10.

Easy calculations show that the -optimum constrained measure ∗ is equal to / on X∗ = (−∞, −a] ∪ [a, ∞), with a  1.5625, and puts the rest of its weight, approximately 0.4091, at zero. Fig. 1 presents a plot of the sensitivity function d(∗ , x) = F (∗ , x) + 3 = f  (x)M−1 (∗ )f (x) and illustrates the optimality of ∗ . Fig. 2, left, gives a histogram of the first 1000 samples accepted by S (ˆ k ), with ˆ k the empirical measure of the Xk ’s. The right part of the figure presents k as a function of k (log scale), with the optimal value ∗0.1 indicated by the dashed line. (Note the fast convergence of k , the large value of N being required only to have enough points for the histogram plot.) Example 11. We consider again the case of a measure  having both discrete and continuous components. The response of the regression model is (, X) = 0 + 1 x1 + 2 x2 with  = (0 , 1 , 2 ) and X is two-dimensional, X = (x1 , x2 ). The continuous component c corresponds to the normal distribution N(0, I2 ), with I2 the 2-dimensional identity matrix, the discrete component d puts weight 41 at each one of the points (±1, ±1). Define   B(a) = {x/x > a}. When  = 21 (c + d ) the results are as follows. e = exp(1), ∗ coincides with c /(2) on B(a ) with a =  For 0 < 1/(2e), √ with 2 log[1/(2)]}  2; ∗ = 2 log[1 − log(2)]. √ For 1/(2e) < 1/(2e) + 21 , ∗ = / on B( 2) and ∗ = 2[ − 1/(2e)]/ on the four points (±1, ±1); ∗ = 2 log[1 + 1/(2e)].  √ For 1/(2e) + 21 <  1, ∗ = / on B(b ) with b = 2 log[1/(2 − 1)] < 2; ∗ = 2 log{1 − [(2 − 1) log(2 − 1)]/(2)}.

L. Pronzato / Journal of Statistical Planning and Inference 136 (2006) 2783 – 2804

250

2795

4

2

200

0 150 −2 100 −4 50

0 −6

−6

−4

−2

0

2

4

−8 100

102

104

Fig. 2. Left: histogram of the 1000 first samples Xk accepted by S (ˆ k ) in Example 10. Right: k as a function of k (log scale); the value of ∗0.1 corresponds to the dashed line.

Fig. 3 presents ∗ as a function of  (although it does not appear clearly from the figure,  tends to infinity when  tends to zero). Here c ( ) is a continuous function of , so that ∗ is differentiable with respect to , see Pronzato (2004). We take N =10 000, n=2000 and use Sk (ˆk ), see (19), (21), with ˆ k the empirical measure of the Xk ’s. (Note that  = 0.2 is related to the intermediate case 1/(2e) <  1/(2e) + 1 2 .) In the simulation presented M(k ) has full rank after the first three samples X1 , X2 , X3 have been accepted, and Fig. 4 (top) presents a histogram of the remaining 1997 points accepted by Sk (ˆk ). Fig. 4 (bottom) gives k as a function of k = 4, . . . , 2000 (log scale), with the optimal value ∗0.2  1.3043 indicated by the dashed line. (Again, notice the fast convergence of k , the large value of N being required only to have enough points for the histogram plot.) ∗

Example 12. We consider now the nonlinear regression model used in (Box and Lucas, 1959), where E{Yk |Xk = x; } = (, x) =

1 [exp(−2 x) − exp(−1 x)]. 1 − 2

We estimate  = (1 , 2 ) by LS, and use at step k the information matrix corresponding k to the current estimated value ˆ . The numerical values used to generate the observations Yk correspond to ¯ = (0.7, 0.2) and the standard deviation of the measurement errors is ¯ the D-optimal experiment corresponds to performing the same 0.2. (With this value for , number of observations at x  1.23 and 6.86.)

2796

L. Pronzato / Journal of Statistical Planning and Inference 136 (2006) 2783 – 2804

4 3.5 3 2.5 2 1.5 1 0.5 0 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Fig. 3. ∗ as a function of  in Example 11.

2 1 0 4

1.5 1 0.5 0 −0.5 −1 −1.5 −2 100

2

0

−2

101

−4

−4

102

−2

0

103

2

4

104

Fig. 4. Top: histogram of the 2000 points accepted by Sk (ˆk ) in a sequence of 10 000 points; bottom: k as a function of k (log scale), the value of ∗0.2 corresponds to the dashed line (Example 11).

We assume that the experimental variables Xk have a lognormal distribution: log(Xk ) is normally distributed N(1, 0.25). Fig. 5 gives a histogram of the first 2500 points accepted by S (ˆ k ), with ˆ k the empirical measure of the Xk ’s, when =0.5. Easy calculations show ¯ = 2 on [0, a] ∪ [b, ∞), with a  1.996 and b  3.922, and [∗ ()] ¯  −2.143. that ∗ () 

L. Pronzato / Journal of Statistical Planning and Inference 136 (2006) 2783 – 2804

2797

200 180 160 140 120 100 80 60 40 20 0

0

2

4

6

8

10

12

14

Fig. 5. Histogram of the 2500 first samples Xk accepted by S (ˆ k ) in Example 12.

6. Concluding remarks and further developments Possible extensions of these results, that will be the subject of future work, include the following situations. First, there are cases where the design variables are not directly observed: an example is when one observes covariates Zk and the conditional probability measure (·|Zk ) for the experimental conditions Xk is known for any k. A sequential selection strategy for this problem might reveal useful in phase-I clinical trials, where covariates Zk such as the size, weight and age of volunteers could be used for their selection, in order to build a model of the tolerance dose as a function of (unobserved) pharmacokinetic/pharmacodynamic variables Xk . Second, applications to parameter estimation in dynamical systems, nonlinear in particular, call for an extension to correlated design variables Xk . A simple example is when Xk = (Uk , Uk−1 , . . . , Uk−m ), with (Ui ) a random input sequence for the system. Note, however, that when the model contains an autoregressive part, that is, when Xk =(Uk , Uk−1 , . . ., Uk−m , Yk−1 , . . . , Yk−l ), the decision not to observe Yk implies that l future experimental conditions are unknown, which makes the problem much different from the one we considered here and will require specific developments. In the one-dimensional case d =dim()=1, asymptotic optimality of a strategy similar to Sk (), see (19), (21), is proved in Pronzato (2001a) for N → ∞ with n fixed, provided the distribution function of X is a von Mises function (see, e.g., Embrechts et al., 1997, p. 138), with a tail decreasing faster than any power law. Extending Theorem 1 to the situation where n is fixed but d > 1 remains an open issue. Note in particular that in this case, although (12)

2798

L. Pronzato / Journal of Statistical Planning and Inference 136 (2006) 2783 – 2804

gives an upper bound on the expected performance, the optimal performance achievable by a sequential strategy is unknown (not to speak about the optimal strategy itself). Also, all intermediate situations, between n fixed and n = N , such as n = log N , or N with < 1, etc., are of interest. A possible application concerns the construction of optimum design algorithms. Indeed, classical algorithms for the determination of ∗ (unconstrained) that maximizes (·) rely on the determination at iteration k of a design point Xk that maximizes F (k−1 , x) with respect to x ∈ X, with k−1 the current design measure. This (global) maximization problem may prove cumbersome, especially if X is high dimensional, so that it is sometimes recommended to accept any Xk such that F (k−1 , Xk ) > , with  some small positive number, see Fedorov and Hackl (1997, p. 49). A consequence of the results presented above is that generating candidates Xk randomly with some suitably chosen measure , with an acceptation rule such as (19), will ensure the convergence of k to ∗ , provided  tends to zero at a proper speed. Whether or not efficient algorithms can be obtained in this way remains an open issue.

Acknowledgements This work originated from a comment by V.V. Fedorov (GlaxoSmithKline, USA) at a meeting on experimental design held in Cardiff in April 2000, see Pronzato (2001b); he is gratefully acknowledged for pointing out the connection between the results presented in this meeting and optimum design with constraints. Also, stimulating discussion with H.P. Wynn (Warwick University and London School of Economics, UK) and E. Thierry (Laboratoire I3S) on different occasions proved very helpful, and the comments of two anonymous referees have been extremely useful to improve the paper.

Appendix A. Assumptions and notations H 1:  is strictly concave and (M) > − ∞ for non-singular M. H 2:  is linearly differentiable; that is, the directional derivative F (M1 , M2 ) = lim {[(1 − )M1 + M2 ] − (M1 )}/ →0+

 satisfies F (1 ; 2 ) = F [M(1 ), M(2 )] = X F (1 , x)2 (dx) for any 1 , 2 with (1 ) = [M(1 )] > − ∞, where F (, x) = F (; x ) and x is the Dirac measure supported at x. H 3:  is increasing: M2 − M1 non-negative definite implies (M2 ) (M1 ). H 4: min (M)l > 0 implies F (M, ff  )  − 1 (l) for any f, for some 1 (l) < ∞. H 5: there exist a function g1 (·) from R+ to R and a function g2 (·) from R+ to R+ such that lima→1+ g1 (a) = 0, lima→1+ g2 (a) = 1 and (aM)g1 (a) + g2 (a) (M) for any nonnegative definite M and any a 1. Note that this assumption is satisfied for homogeneous criteria with g1 = 0 and g2 the identity. Also, if one wishes that maximizing (M) be equivalent to maximizing (aM) (which is often implicitly assumed when approximate designs are used), the assumption (aM) = g1 (a) + g2 (a) (M) comes naturally.

L. Pronzato / Journal of Statistical Planning and Inference 136 (2006) 2783 – 2804

2799

2 (M , M ) the second-order direcH 6:  is two times differentiable. We denote by ∇ 1 2 tional derivative 2

* [(1 − )M1 + M2 ] . ∇ (M1 , M2 ) = *2 |=0+ 2

2 (M, ff  )  −  (l) − It satisfies, for any f  R and M such that min (M) l > 0: ∇ 0 2 (l)R 4 , for some 0 (l) < ∞ and 2 (l) < ∞. H 1: −∞ < ()  < ∞. H 2: 4 = X f (x)4 (dx) < ∞. Note that Chebyshev’s inequality implies Prob{ f (X1 )R } 4 /R 4 . H 3: For  ∈ (0, 1) the proportion of interest, there exists l > 0 and ∈ (0, ) such that for any design measure  /( − ), min [M()] > l.

A.1. Discussion of the assumptions In the case of D-optimality, where (M) = log det M, we have F (, x) = f  (x)M−1 ()f (x)−d with d=dim(). H 1–H 4 are satisfied, with 1 (l)=d in H 4. H 5 2 (M , M )=− trace{[M−1 (M − is satisfied with g1 (a)=d log a and g2 (a)=1. Finally, ∇ 1 2 2 1 2 M1 )] } so that 2 ∇ (M, ff  ) = −(f  M−1 f )2 + 2(f  M−1 f ) − d

and H 6 is satisfied with 0 = d, 2 (l) = 1/ l 2 . Take now (M) = − trace(AM−p ), with p a positive integer and A a positive definite matrix (Kiefer’s p -class of optimality criteria). We can assume that A = Id , the d-dimensional identity matrix, by a linear transformation on the set M() = {M(),  ∈ −(p+1) }. We have F (M1 , M2 ) = p trace[M1 (M2 − M1 )]. H 1–H 4 are satisfied, with 2 (M , M )= p 1 (l)=p/ l in H 4. H 5 is satisfied with g1 (a)=0 and g2 (a)=a −d . Finally, ∇ 1 2  −p a+b=p+2, a,b  1 trace[M1−a (M2 − M1 )M1−b (M2 − M1 )], see Wu and Wynn (1978),  2 (M, ff  )=−p  −b  −a and easy calculations give ∇ a+b=p+2, a,b  1 [(f M f ) (f M f )] +2p(p + 1)f  M−(p+1) f − p(p + 1) trace[M−p ]. Therefore, H 6is satisfied with 0 (l) = p(p + 1)/ l p , 2 (l) = p/ l p+2 . H 1–H 3 are satisfied for instance when: (i) the functions f (x) are linearly independent on any open subset of X with finite Lebesgue measure, (ii)  has a component c absolutely continuous with respect to the Lebesgue measure, with density , and a finite number of discrete components, (iii) the mass of c is larger than 1 −  + (that is, the mass of the discrete components is less than  − ), and (iv) X is compact or (x) is exponentially decreasing when x → ∞. Appendix B. Proofs ∗ Proof of Lemma 2. From H 3, for n1 n2 N , (S∗N,n1 ) = (MN,n ) 1 ∗ ∗ [(n2 /n1 )MN,n2 ], with MN,n the normalized information matrix associated with the n points selected by S∗N,n , see (2). Since N → , for any such that 0 < < min(, 1 − )

2800

L. Pronzato / Journal of Statistical Planning and Inference 136 (2006) 2783 – 2804

and for N large enough,  − < N <  + . This implies that there exists N0 such that, for any N > N0 ,   + ∗ ∗ (MN, N N )  M  − N, (+ )N

and ∗ (MN, ( − )N ) 



 + ∗ MN, N N . −

Denote a = ( + )/( − ), notice that a > 1 for  > 0 and a → 1 when → 0. H 5 then gives ∗ (MN, ( − )N ) − g1 (a )

g2 (a )

∗ ∗ (MN, N N ) g2 (a ) (MN, (+ )N ) + g1 (a )

and thus, from (10), ∗ ∗ lim sup (MN, N N ) g2 (a )+ + g1 (a ), N→∞ ∗−

− g1 (a ) ∗  lim inf (MN, N N ), N→∞ g2 (a )

-a.s. The continuity of ∗ with respect to  (Theorem 6) and lima→1+ g1 (a) = 0, ∗ ∗ lima→1+ g2 (a) = 1 give limN→∞ (MN, N N ) =  , -a.s.  ∗ ) with Proof of Lemma 3. The strategy S∗N,n satisfies (S∗N,n ) = (MN,n

∗ = MN,n

N 1  f (Xi )f  (Xi )Jn [Xj (j = 1, . . . , N, j = i), Xi ], n i=1

where Jn ∈ {0, 1} defines the decision function (the decision to accept Xi depends on the rest of the sequence). Therefore, ∗ }= E{MN,n

  N 1  f (xi )f (xi )Jn [xj (j = 1, . . . , N, j = i), xi ] n XN i=1 ×

N 

(dxj ).

j =1

Using the fact that decisions are invariant by any permutation of the sequence, we get  ∗ E{MN,n }= f (x)f  (x)∗N,n (dx) = M(∗N,n ), X

L. Pronzato / Journal of Statistical Planning and Inference 136 (2006) 2783 – 2804

2801

where ⎞ ⎛  N−1  N ∗N,n (dx) = ⎝ (dxj )⎠ (dx). Jn [xj (j = 1, . . . , N − 1), x] n XN −1 j =1 Since Jn ∈ {0, 1}, ∗N,n (dx) satisfies ∗N,n (dx)(N/n)(dx). This implies [M(∗N,n )] (∗n/N ). Concavity of (·) finally gives ∗ ∗ }] (∗n/N ) )} [E{MN,n E{(MN,n

which concludes the proof.



Proof of Theorem 9. Since Xk is accepted with probability , ak /k →  -a.s. as k → ∞. The rest of the proof is decomposed in three steps. In (i), we construct a lower bound on E{k |X1k−1 }, where X1k−1 denotes the collection X1 , . . . , Xk−1 . In (ii) we show that lim supk→∞ k = ∗ , -a.s. Finally, in (iii) we show that lim inf k→∞ k = ∗ , -a.s., by an approach similar to Doob’s upcrossing Lemma, see Williams (1991, p. 108). (i) From H 2, H 6, criterion value k associated with the empirical design measure k generated by S () satisfies the recurrence   1 1 k = k−1 + F (k−1 , Xk ) + H ( , X ,  ) k k  k−1 1 + ak 2(1 + ak )2 (B.1) × [IX2,k (Xk ) + I[0,Pk ()] (Z)IX3,k (Xk )], where ak is defined by (4), Z is a random variable uniformly distributed in [0, 1], IA (·) denotes the indicator function of the set A, and 2 H (k−1 , Xk , k ) = ∇ [(1 − k )M(k−1 ) + k f (Xk )f  (Xk ), f (Xk )f  (Xk )]

for some k ∈ [0, 1/(1 + ak )]. Since ak /k →  -a.s., from H 3 there exists K0 (-a.s.) such that for any k > K0 , min [M(k−1 )] > l/2. Since min [(1 − k )M(k−1 ) + k f (Xk )f  (Xk )] [1 − 1/(1 + ak )]min [M(k−1 )], there exists K1 (-a.s.) such that for k > K1 , min [(1 − k )M(k−1 ) + k f (Xk ) f  (Xk )] > l/4. From H 6, this implies H (k−1 , Xk , k ) > − 0 (l/4) − 2 (l/4) f (Xk )4 . We can now compute a lower bound on E{k |X1k−1 }.Notice that E{F (k−1 , Xk )[IX2,k (Xk ) + I[0,Pk ()] (Z)IX3,k (Xk )]|X1k−1 } = F [k−1 ; T, (k−1 )] which gives for k > K1 , E{k |X1k−1 }k−1 +

 0 (l/4) + 2 (l/4)4 F [k−1 ; T, (k−1 )] −  1 + ak 2(1 + ak )2

2802

L. Pronzato / Journal of Statistical Planning and Inference 136 (2006) 2783 – 2804

 with 4 = X f (x)4 (dx), and 4 < ∞ from H 2. Using (17), we obtain E{k |X1k−1 }k−1 +

A  (∗ − k−1 ) − 1 + ak (1 + ak )2

(B.2)

for k > K1 , where A =  [0 (l/4) + 2 (l/4)4 ]/2. (ii) Assume that lim sup k < ∗ −  for some  > 0, (B.2) gives E{k |X1k−1 } > k−1 +

 2(1 + ak )

(B.3)

for k larger than some K2 . Since k < (S∗k,ak+1 ) and ak+1 /k > /C for any k, see Remark 8.2, we have −∞ E{k } < E{ (S∗k,ak+1 )} < (∗ak+1 /k ) < (∗/C ) < ∞ (see Lemma 3, H 1, H 3) and E{k } is well defined. Also, for k > K2 , (B.3) gives E{k } > E{K2 } > − ∞ and thus supk E{|k |} < ∞. The martingale convergence theorem then says that k converges -a.s. to a finite limit, which contradicts (B.3). Repeating the same arguments for a countable number of rational  gives lim supk→∞ k ∗ . At the same time, lim supk→∞ k limk→∞ (S∗k,ak+1 ) = ∗ -a.s. from Lemma 2, so that lim sup k = ∗ ,

-a.s.

(B.4)

k→∞

(iii) Assume that lim inf k→∞ k < ∗ − for some > 0. We show that this event has probability zero. First, we show that there exists K ∗ (-a.s.) such that for any k > K ∗ , k−1 < ∗ + /6,

(B.5)

( /6)[/(1 + ak )] − A/(1 + ak )2 > 0,

(B.6)

k − k−1 > − /6.

(B.7)

(B.5) follows from (B.4) and (B.6) from ak /k →  -a.s. H 4 implies that F (k−1 , Xk )  − 1 (l/2) for k > K0 (since min [M(k−1 )] > l/2 for k > K0 ) and (B.1) gives k − k−1  −

1 (l/2) 0 (l/4) + 2 (l/4)f (Xk )4 − 1 + ak 2(1 + ak )2

for k > K1 . H 2 implies Prob{f (Xk ) > k  } 4 /k 4 and therefore, from Borel-Cantelli, Prob{f (Xk ) > k  infinitely often} = 0 for any  > 41 . It implies that exists K3 (-a.s.) such that for any k > K3 k − k−1  −

1 (l/2) 0 (l/4) + 2 (l/4)k 4 − . 1 + ak 2(1 + ak )2

Take  < 21 , (B.7) then follows from ak /k →  -a.s. We work conditionally on k > K ∗ in the rest of the proof. Take > 0 arbitrarily small and define Uk as the number of down-crossings by k of the = ∞. Also interval (∗ − , ∗ − /6) for k > K ∗ : lim inf k→∞ < ∗ − thus implies U∞

L. Pronzato / Journal of Statistical Planning and Inference 136 (2006) 2783 – 2804

2803

define a previsible process (Ck ) as follows: set Ck to one when k−1 gets above ∗ − /6 (strictly) and turn it to zero when k−1 gets below ∗ − (strictly). This defines a new process Vk =

k 

Ci (i − i−1 )

i=K ∗

which satisfies Vk < − (5 /6)Uk + max{0, k − (∗ − /6)}.

(B.8)

Define kj , . . . , k¯j as the jth set of consecutive indices for which Ci = 1, that is, such that Ckj −1 = 0, Ckj = Ckj +1 = · · · = Ck¯j = 1, Ck¯j +1 = 0. Within this set, we define kj∗ as the largest index such that k−1 ∗ − /6 for all kj∗ k  k¯j . Note that necessarily k ∗ > kj (since k −1 > ∗ − /6) and k ∗ < k¯j (since (B.7) is satisfied for k > K ∗ ). We j

j

k −1

j

bound E{Vk¯j − Vkj |X1j } as follows: We have k ∗ −2 > ∗ − /6 (from the definition of kj∗ ) and k ∗ −1 − k ∗ −2 > − /6 from j

j

j

(B.7). Also, (B.5) implies kj −1 < ∗ + /6. Therefore, k ∗ −1 − kj −1 > − /2. Now, for j all k such that k ∗ k  k¯j , k−1 ∗ − /6 and thus E{k −k−1 |X k−1 } > 0 from (B.6) and j

k −1 E{Vk¯j − Vkj |X1j } > − /2

(B.2). We obtain which, together with (B.8) gives E{Uk |X1K

∗ −1

}<

and

1 ∗ K ∗ −1 E{Vk |X1 } > − ( /2)E{Uk |X1K −1 },

∗ 3 E{max{0, k − (∗ − /6)}|X1K −1 } < ∞,

= ∞} = 0, that is, where the last inequality follows from E{|k |} < ∞. It implies Prob{U∞ ∗ lim inf k→∞ k >  − , -a.s. Repeating the same arguments for a countable number of rational gives lim inf k→∞ k = ∗ , -a.s., which completes the proof. 

References Albright, S., Derman, C., 1972. Asymptotically optimal policies for the stochastic sequential assignment problem. Management Sci. 19 (1), 46–51. Box, G., Lucas, H., 1959. Design of experiments in nonlinear situations. Biometrika 46, 77–90. C.E.R.N., 1994. The compact muon solenoid, technical proposal, trigger and data acquisition. Technical Report CERN/LHCC 94-38, European Laboratory for Particle Physics. Embrechts, P., Klüppelberg, C., Mikosch, T., 1997. Modelling Extremal Events, Springer, Berlin. Fedorov, V., 1989. Optimal design with bounded density: optimization algorithms of the exchange type. J. Statist. Planning Inference 22, 1–13. Fedorov, V., Hackl, P., 1997. Model-Oriented Design of Experiments, Springer, Berlin. Jennrich, R., 1969. Asymptotic properties of nonlinear least squares estimation. Ann. Math. Statist. 40, 633–643. Kiefer, J., Wolfowitz, J., 1960. The equivalence of two extremum problems. Canad. J. Math. 12, 363–366. Pronzato, L., 1999. Sequential selection of observations in randomly generated experiments. In: Proceedings of ProbaStat’98, Smolenice (Slovaquie), February 98, Tatra Mountains Mathematical Publications, vol. 17, pp. 167–175. Pronzato, L., 2001a. Optimal and asymptotically optimal decision rules for sequential screening and resource allocation. IEEE Trans. Automat. Control 46 (5), 687–697.

2804

L. Pronzato / Journal of Statistical Planning and Inference 136 (2006) 2783 – 2804

Pronzato, L., 2001b. Sequential construction of an experiment design from and i.i.d. sequence of experiments without replacement. In: Atkinson, A., Bogacka, B., Zhigljavsky, A. (Eds.), Optimum Design, 2000, Kluwer, Dordrecht, pp. 113–122 (Chapter 11). Pronzato, L., 2004. A minimax equivalence theorem for optimum bounded design measures. Statist. Probab. Lett. 68, 325–331. Sahm, M., Schwabe, R., 2001. A note on optimal bounded designs. In: Atkinson, A., Bogacka, B., Zhigljavsky, A. (Eds.), Optimum Design 2000, Kluwer, Dordrecht, pp. 131–140 (Chapter 13). Williams, D., 1991. Probability with Martingales, Cambridge University Press, Cambridge. Wu, C., Wynn, H., 1978. The convergence of general step-length algorithms for regular optimum design criteria. Ann. Statist. 6 (6), 1273–1285. Wynn, H., 1982. Optimum submeasures with applications to finite population sampling. In: Gupta, S., Berger, J. (Eds.), Statistical Decision Theory and Related Topics III. Proceedings of the Third Purdue Symposium, vol. 2, Academic Press, New York, pp. 485–495.