Constructing prior distributions with trees of exchangeable processes

Journal of Statistical Planning and Inference 73 (1998) 113–133 Constructing prior distributions with trees of exchangeable processes Michael Montici...

Download PDF

166KB Sizes 0 Downloads 110 Views

Report

PDF Reader
Full Text

Journal of Statistical Planning and Inference 73 (1998) 113–133

Constructing prior distributions with trees of exchangeable processes Michael Monticino ∗ University of North Texas, Department of Mathematics, P.O. Box 5116, Denton TX 76203, USA

Abstract This paper introduces a scheme for constructing prior distributions on a space of probability measures using a tree of exchangeable processes. The exchangeable tree scheme provides a natural generalization of the Polya tree priors presented by Mauldin et al. (1992). Exchangeable tree priors provide useful conjugate families of priors for statistical models in which an experiment proceeds in several stages with each stage dependent on the previous outcomes. The exchangeable tree construction has some advantages over other constructions. For instance, exchangeable tree priors can give probability one to the set of continuous measures unlike, say, Dirichlet processes. Moreover, the scheme’s perspective is both a conceptual aid in sampling applications and a useful tool in deriving properties of the priors. The exchangeable tree scheme also gives an alternate way of constructing the random rescaling priors de ned by Graf et al. (1986) and more generally by Mauldin and Monticino (1995). Here, some basic properties of exchangeable tree priors are developed and connections with other schemes – in particular, with c 1998 Elsevier Science B.V. All random rescaling – for constructing priors are established. rights reserved. AMS classi cation: Primary 60A10; 62A15; 60G57; secondary 60G57; 60G30 Keywords: Prior distribution; Exchangeable tree; Random probability measure; Exchangeable process

1. Introduction This paper presents a scheme for constructing prior distributions on a space of probability measures. The scheme involves a tree of exchangeable processes and provides a natural generalization of the Polya tree priors presented by Mauldin et al. (1992). Polya tree priors, in some sense, are generalizations of some of the Dirichlet priors given by ∗

E-mail: [email protected].

c 1998 Elsevier Science B.V. All rights reserved. 0378-3758/98/$ – see front matter PII: S 0 3 7 8 - 3 7 5 8 ( 9 8 ) 0 0 0 5 5 - X

114

M. Monticino / Journal of Statistical Planning and Inference 73 (1998) 113–133

Ferguson (1973). While these generalizations are, perhaps, straightforward, they provide useful alternate perspectives. In particular, Polya tree priors and the exchangeable tree priors constructed here provide convenient conjugate families of priors for statistical models in which an experiment proceeds in several stages and each stage depends on the previous outcomes. Moreover, these priors, unlike Dirichlet priors, can be constructed so as to give probability one to the set of continuous measures. Lavine (1992, 1994) discusses a variety of applications of the priors, including empirical Bayes models. Here, we develop some basic properties of exchangeable tree priors and establish connections with other schemes for constructing prior distributions. The exchangeable tree scheme replaces the Polya urn processes of Mauldin et al. (1992) by arbitrary exchangeable processes. The use of the exchangeable processes to derive the priors, instead of just the associated directing measures as in Ferguson (1974), is both a conceptual aid in sampling applications and a useful tool in deriving properties of the priors. Additionally, as shown in Theorem 5.1, the exchangeable tree scheme provides an alternate way to construct the random rescaling priors de ned by Graf et al. (1986) and more generally by Mauldin and Monticino (1995). As an example of the utility of the exchangeable tree scheme, its properties can be exploited to give a straightforward proof that the associated priors give probability one to the set of continuous measures. The next section gives a selective overview of various schemes for constructing priors. We mention several applications of these schemes. For instance, Lavine (1992) uses exchangeable trees – in particular, Polya trees – to investigate models for pressurized vessel lifetimes. Other applications include determining average case errors for numerical methods of equation solving – see, for instance, Graf et al. (1989). Section 3 develops the exchangeable tree scheme for obtaining a prior on, P([0; 1]), the space of probability measures on [0,1]. While it is convenient to develop the priors on P([0; 1]) in order to establish connections with other schemes for constructing priors, the theory could just as well be developed for any standard Borel space. The scheme involves taking a reinforced random walk on a tree of exchangeable processes to produce a sequence of exchangeable random variables. The de Finetti or directing measure of the sequence determines the prior. Collections of these exchangeable tree priors can form conjugate families. Several properties of exchangeable tree priors are given in Section 4. It is shown that the priors can give probability one to the set of continuous measures. Also, conditions under which the priors have full support on the space of probability measures are presented. As discussed in Mauldin et al. (1992) and Ferguson (1973), this latter property is desirable in statistical sampling applications. Section 5 shows that the random rescaling priors constructed by Graf et al. (1986) and by Mauldin and Monticino (1995) can be obtained with exchangeable trees. Ways that the random rescaling perspective can be used to examine geometric properties of exchangeable trees are mentioned. In particular, the derivative structure of the distribution functions and the Hausdor dimension of the supports of the probability measures generated with exchangeable trees can be determined.

M. Monticino / Journal of Statistical Planning and Inference 73 (1998) 113–133

115

2. Background This section provides some background on several schemes for constructing priors. A nice overview emphasizing Dirichlet processes and their relation to other construction schemes is given by Ferguson (1974). Various schemes have been introduced in the literature for constructing prior distributions on spaces of probability measures. At least two broadly de ned approaches are taken. One approach might be called “analytical” in the sense that schemes in this class typically involve generating a non-decreasing right continuous function at random. The generated function is then regarded as the distribution function of a probability measure. This approach is used in a seminal paper by Dubins and Freedman (1967). The random rescaling scheme introduced by Graf et al. (1986) and later generalized by Mauldin and Monticino (1995) is another example. Schemes in a second group have a more probabilistic avor. This group includes the schemes introduced by Blackwell and MacQueen (1973) and Mauldin et al. (1992) in which Polya urns are used to construct priors. Each approach has its advantages. Analytical schemes generally facilitate the examination of the geometric properties of the generated probability measures. They are also used more directly to obtain probability measures on homeomorphisms for applications in numerical equation solving. See, e.g., Graf et al. (1989), Novak (1989) and Ritter (1992). Probabilistic schemes, on the other hand, provide useful models for experimental design and sampling theory applications as discussed in Mauldin et al. (1992), Lavine (1992) and Lavine (1994). Conveniently, seemingly dierent analytical and probabilistic schemes can generate the same priors. Various properties of such priors can be studied from the perspective of that scheme which provides the most convenient viewpoint. In particular, we show that the exchangeable tree scheme given in the next section produces the random rescaling priors de ned by Graf et al. (1986) and by Mauldin and Monticino (1995). And, as mentioned, the exchangeable tree perspective can be exploited to show, among other things, that constructed priors give probability one to the set of continuous measures. To help motivate the exchangeable tree construction, we review some of the schemes mentioned above in more detail. First, recall that a sequence Z = (Z1 ; Z2 ; : : :) of Hvalued random variables is exchangeable if the distribution of Z is invariant under nite permutations of the indices. That is, for each n, measurable set A of Hn and permutation of {1; : : : ; n}, P[(Z1 ; : : : ; Zn ) ∈ A] = P[(Z−1 (1) ; : : : ; Z−1 (n) ) ∈ A]: Or equivalently, for each n, measurable function f : Hn → R and permutation of {1; : : : ; n}, Z Z n f(z1 ; : : : ; zn ) d = f(z−1 (1) ; : : : ; z−1 (n) ) dn ; where n is the marginal distribution on Hn of Z.

116

M. Monticino / Journal of Statistical Planning and Inference 73 (1998) 113–133

One way to generate a sequence of exchangeable {0; 1}-valued random variables is the Polya urn scheme. Suppose that initially an urn contains n0 balls labeled 0 and n1 balls labeled 1. At each play, a ball is selected at random from the urn and replaced by two balls of the same type. Let Zn be the type – 0 or 1 – of ball selected on the nth play. Then the Zn are exchangeable. Analogous schemes involving N (¿2) types of balls also yield exchangeable sequences. A discussion of elementary properties and applications of Polya urn schemes is given by Feller (1968). Polya schemes based on a continuum of colors are presented by Blackwell and MacQueen (1973). Hill et al. (1980) describe some generalized urn schemes. One interesting result shown by Hill et al. (1987) is that if a generalized urn scheme involving two types of balls yields an exchangeable sequence, then the scheme is essentially a Polya urn scheme. For a Borel space H, if Z = (Z1 ; Z2 ; : : :) is an exchangeable sequence of H-valued random variables, then there exists a unique probability measure Q on P(H), the probability measures on H, such that for each Borel subset A of H∞ = H × H × · · · Z P[Z ∈ A] = ∞ (A) dQ(); where ∞ = × × · · · is the in nite product measure on H∞ . The measure Q is called the de Finetti or directing measure of Z. If is a random probability measure with distribution Q then, given = , Z1 ; Z2 ; : : : are independent with common distribution . Furthermore, the empirical distribution of the Zn converges almost surely and the distribution of the limit is Q. (References for these facts include Hewitt and Savage (1955) and Aldous (1983).) Note that if Z = (Z1 ; Z2 ; : : :) is an exchangeable {0; 1}-valued sequence, then the limit of the empirical distribution is a Bernoulli distribution with parameter equal to the limiting frequency of the 1’s in Z. If we identify a Bernoulli measure with its de ning parameter, then the de Finetti measure Q can be viewed as a distribution on [0; 1]. For the case of a Polya urn consisting of n0 “0” balls and n1 “1” balls, Q has a beta distribution with parameters n0 and n1 (see Freedman, 1965; Blackwell and Kendall, 1964). So the Polya urn scheme provides an elementary way to randomly generate a probability measure on {0; 1} and hence obtain a prior on the set of probability measures on {0; 1}. Are there analogous ways to construct priors on P([0; 1])? One such scheme, presented by Mauldin and Williams (1990), utilizes a tree of Polya urns. Let F ∗ be the set of all nite sequences of elements of F = {0; 1} including the empty sequence ∅. For each p ∈ F ∗ , let U (p) be an urn containing balls labeled either 0 or 1. This can be pictured as a binary tree with root ∅ and an urn at each node. Motivated by this picture we call the pair (F; U ) a Polya tree. A Polya tree can be used to generate a sequence Z1 ; Z2 ; : : : of F ∞ valued random variables as follows. Draw a ball at random from urn U (∅) and replace with two of the same type. Let Z1; 1 , the rst coordinate of Z1 , be the label of the selected ball. Now draw from urn U (Z1; 1 ) and replace it by two of the same type. Let Z1; 2 be the label of the ball selected from U (Z1; 1 ). Next draw from urn U (Z1; 1 ; Z1; 2 ) to determine Z1; 3 , and so on. The random variable Z2 is obtained similarly to Z1 , except Z2 is generated by the modi ed Polya

M. Monticino / Journal of Statistical Planning and Inference 73 (1998) 113–133

117

tree. Note that this modi ed or reinforced tree is the same as the original tree except for the urns U (∅); U (Z1; 1 ); U (Z1; 1 ; Z1; 2 ) : : : : Continue the process to obtain {Zn }n¿1 . The Zn can also be viewed as assuming values in the interval [0; 1] by taking Zn to be the number with dyadic expansion Zn = 0:Zn; 1 Zn; 2 Zn; 3 : : : From either viewpoint (taking values in F ∞ or in [0; 1]) the Zn are exchangeable. This was shown by Mauldin and Williams (1990) for the case where each U (p) initially contains exactly one ball labeled 1 and one ball labeled 0. Mauldin et al. (1992) generalized this for more than two types of balls and an arbitrary mix of balls in an urn. Also, as discussed by Mauldin et al. (1992) and Lavine (1992), the sequences Zn; 1 ; Zn; 2 ; : : : could be mapped into spaces other than [0; 1] – e.g., into a probability simplex of appropriate dimension. In fact, any standard Borel space will work. Consider the case where each urn initially contains exactly one ball labeled 1 and one ball labeled 0 and regard the Zn as [0,1]-valued random variables. Then the directing measure of the sequence Z = (Z1 ; Z2 ; : : :) is a probability measure on P([0; 1]). Moreover, this measure is one of the measures presented by Dubins and Freedman (1967). As will be discussed below, viewing it as a measure on distribution functions, it is also one of the basic measures constructed by Graf et al. (1986). If more than two types of balls and an arbitrary mix of balls in an urn is allowed, then Mauldin et al. (1992) showed that the associated directing measures form a useful conjugate family of prior distributions. Called Polya tree priors, this family is a generalization of the Dirichlet processes studied by Ferguson (1973). Unlike Dirichlet processes, Polya tree priors can give probability one to the set of continuous probability distributions with full support. Moreover, using urns to generate the priors makes the properties of Polya tree priors easily discernible. This is analogous to Blackwell and MacQueen’s (1973) formulation of Dirichlet priors via certain Polya urn schemes. Applications of Polya tree priors are presented in Lavine (1992). There, uncertainty in a parametric model is investigated and a speci c example involving pressurized vessel lifetimes is given. In Lavine (1994), models for errors in regression problems and empirical Bayes models are studied. An alternate (analytical) way to obtain a prior on P([0; 1]) is to randomly generate a distribution function, h, on [0,1]. One method for doing this is the random rescaling scheme investigated by Graf et al. (1986) – a special case of Dubins’ and Freedman’s (1967) scheme. Let ∈ P([0; 1]) have support on the open interval (0,1). Set h(0) = 0 and h(1) = 1. Select h( 12 ) according to . Next select h( 14 ) according to scaled to the interval [0; h( 12 )] and independently select h( 34 ) according to scaled to [h( 12 ); 1]. Continue in this manner to de ne a function on the dyadic rationals. With probability one, h extends to a homeomorphism of [0,1]. Mauldin and Williams (1990) showed that if is the uniform distribution on [0,1] then the associated prior on P([0; 1]) is equal to the prior obtained by a Polya tree where each urn initially contains exactly one ball labeled 1 and one ball labeled 0. Mauldin and Monticino (1995) generalize the random rescaling scheme so that a possibly dierent measure is rescaled at each stage.

118

M. Monticino / Journal of Statistical Planning and Inference 73 (1998) 113–133

As mentioned, applications of random rescaling priors include determining average case errors for numerical methods of equation solving. Here we extend the notion of a Polya tree to exchangeable trees and show, among other things, that the latter give an alternate way to generate and study these generalized random rescaling priors. Finally, note that the exchangeable tree and random rescaling priors are special cases of the tailfree priors of Freedman (1963) and Fabius (1964).

3. Exchangeable trees An exchangeable tree is a pair (F; U ) such that S∞ (1) F = {0; 1; : : : ; k} and F ∗ = n=0 F n is the set of all nite sequences of elements of F including the empty sequence ∅ = F 0 . (2) U is a function which assigns to each element p ∈ F ∗ a sequence U (p) = (U (p)1 ; U (p)2 ; : : :) of exchangeable F-valued random variables for which the processes {U (p): p ∈ F ∗ } are independent. Denote the de Finetti measure of the sequence U (p) by U (p) . As noted above, U (p) is a prior on the set of probability measures on F and, equivalently, can be regarded as a measure on the simplex Sk = {(s0 ; : : : ; sk−1 ) ∈ Rk+ : s0 + · · · + sk−1 61}. (A point (s0 ; : : : ; sk−1 ) ∈ Sk denotes the probability measure s0 0 + · · · + sk−1 k−1 + (1 − s0 − · · · − sk−1 )k , where i is point mass measure at i.) If F = {0; 1}, then the exchangeable tree is called a binary exchangeable tree. Note that a Polya tree is a special case of an exchangeable tree in which U (p) is the sequence generated by successively sampling (with double replacement) an urn at p. Moreover, like a Polya tree, an exchangeable tree and the procedure described below for generating a sequence exchangeable random variables can serve as a model for experiments which proceed in a sequence of stages with each stage of the experiment dependent on the previous outcomes. The outcomes of each stage are denoted by the elements of F. The tree structure provides a visual representation of the dependency. The distribution of U (∅)1 is the marginal distribution of the rst stage of the experiment and the distribution of U (p)1 is the conditional distribution of the nth stage, given the rst n¿1 outcomes were p = (p1 ; : : : ; pn ). Distributions associated with subsequent repetitions of the experiment are encoded in the U (p)i ’s. The measures U (p) are priors on the individual stages of the experiment. In Polya trees, the U (p) ’s are all Dirichlet distributions. An advantage of exchangeable trees is that they admit any distribution for U (p) , in particular, mixtures of Dirichlet distributions. Lavine (1992) and Lavine (1994) give several examples of how to construct exchangeable trees for various applications, in the context of Polya trees. Exchangeable tree priors and conjugate families. For p ∈ F ∗ , denote the distribution of U (p)1 on F by q0p and, for each n¿1 and (x1 ; : : : ; xn ) ∈ F n , let qnp [x1 ; : : : ; xn ] denote the conditional distribution of U (p)n+1 on F given (U (p)1 ; : : : ; U (p)n ) = (x1 ; : : : ; xn ). Construct an exchangeable sequence X = (X1 ; X2 ; : : :) of [0,1]-valued random variables from an exchangeable tree (F; U ) as follows.

M. Monticino / Journal of Statistical Planning and Inference 73 (1998) 113–133

119

Heuristically, X1 is generated via a random walk on the tree, X2 is generated from a walk on the tree modi ed along the observed history (x1; 1 ; x1; 2 ; : : :) of X1 , X3 is generated from a walk on the tree modi ed along the observed histories of the previous two walks, and so on. The modi cations to the tree are determined by the exchangeable process associated with the observed partial histories. More speci cally, X1 , with k-adic expansion X1 = :X1; 1 X1; 2 : : : ; is generated by selecting x1; 1 , the value of X1; 1 , according to the distribution q0∅ . Select (x ) the value, x1; 2 , of X1; 2 according to q0 1; 1 . Next select the value of X1; 3 according to (x1; 1 ; x1; 2 ) q0 . Continue in this way to obtain the value of each X1; n . Now X2 = :X2; 1 X2; 2 : : : is generated by selecting the value, x2; 1 , of X2; 1 according to q1∅ [x1; 1 ]. The value (x ) (x ) of X2; 2 is determined by q1 2; 1 [x1; 2 ] if x2; 1 = x1; 1 and is determined by q0 2; 1 other(x2; 1 ;:::; x2; j−1 ) wise. Continue, where, for j¿2, the value of X2; j is determined by q1 [x1; j ] (x ;:::; x ) if (x2; 1 ; : : : x2; j−1 ) = (x1; 1 ; : : : ; x1; j−1 ) and is determined by q0 2; 1 x; j−1 otherwise. In general Xn = : Xn; 1 Xn; 2 : : : is generated by selecting the value of Xn; 1 according to (x ;:::; x ) ∅ [x1; 1 ; : : : ; xn−1; 1 ]. For m¿2, the value of Xn; m is determined by qj∗n; 1 n; m−1 qn−1 [xi1 ; m ; : : : ; xij∗ ; m ], where j ∗ = #{16i6n − 1: xi; j = xn; j for all 16j6m − 1}; ik ∈ {16i6n − 1: xi; j = xn; j for all 16j6m − 1} for 16k6j ∗ , and i1 ¡i2 ¡ · · · ¡ij∗ . Theorem 3.1 ver es that the sequence X = (X1 ; X2 ; : : :) is an exchangeable sequence of [0; 1]-valued random variables. The de Finetti measure, QU , of X is the exchangeable tree prior associated with (F; U ). X can also be viewed as an exchangeable sequence of F ∞ -valued random variables by taking Xn = (Xn; 1 ; Xn; 2 ; : : :). In this context, QU is a prior on probability measures on F ∞ . Exchangeable tree priors can form a useful class of conjugate priors for statistical modeling and inference applications. To see this, view X as a sequence of exchangeable F ∞ -valued random variables. Suppose ∈ P(F ∞ ). Let (∅) denote the marginal distribution of on F and let (p) be the conditional distribution on F given p ∈ F ∗ . For a random probability measure on F ∞ , let (p) denote the random conditional distribution on F given p ∈ F ∗ induced by . Suppose that, for each p ∈ F ∗ , Fp is a conjugate family of priors on probability measures on F. A random probability measure on F ∞ is an {Fp }-conjugate random measure if the random measures {(p)}p ∈ F ∗ are independent and (p) ∈ Fp , for every p ∈ F ∗ . If (F; U ) is an exchangeable tree such that U (p) ∈ Fp , for every p ∈ F ∗ , then the random probability measure U with distribution QU is an {Fp }-conjugate random measure. Moreover, U given X1 remains an {Fp }-conjugate random measure. In particular, U given X1 has distribution equal

120

M. Monticino / Journal of Statistical Planning and Inference 73 (1998) 113–133

to the exchangeable tree prior obtained from the updated exchangeable tree arising from (F; U ) after X1 has been generated and (F; U ) is modi ed as indicated above. Theorem 3.1. X = (X1 ; X2 ; : : :) is an exchangeable sequence of [0; 1]-valued random variables. Proof. The proof of the theorem is, perhaps, clearer if the Xn ’s are viewed as taking values in F ∞ . We do this. The exchangeability of the Xn ’s as [0,1]-valued random variables follows easily. Fix n and let A be a Borel subset of (F ∞ )n . To show X is exchangeable, we must prove that P[(X1 ; : : : ; Xn )] ∈ A] is invariant under a permutation of the indices. Note that it is sucient to do this for A of the form A = A1 × · · · × An ; where, for each i; A1 = Ai; 1 × · · · × Ai; m × F ∞ for some positive integer m and, for each i and j, Ai; j ⊆ F. We will prove that P[(X1 ; : : : ; Xn )] ∈ A] is invariant under permutation of the indices for sets A of this form by induction on m. Suppose m = 1; A = (A1; 1 × F ∞ ) × · · · × (An; 1 × F ∞ ) and is a permutation of {1; : : : ; n}. Then P[(X1 ; : : : ; Xn ) ∈ A] = P[X1; 1 ∈ A1; 1 ; : : : ; Xn; 1 ∈ An; 1 ] = P[U (∅)1 ∈ A1; 1 ; : : : ; U (∅)n ∈ An; 1 ] = P[U (∅)−1 (1) ∈ A1; 1 ; : : : ; U (∅)−1 (n) ∈ An; 1 ] = P[(X−1 (1) ; : : : ; X−1 (n) ) ∈ A]: The third equality holds because U (∅) is exchangeable. Assume that the desired invariance holds for all m6k. Let A = (A1; 1 × · · · × A1; k+1 × F ∞ ) × · · · × (An; 1 × · · · × An; k+1 × F ∞ ) and, again, let be a permutation of {1; : : : ; n}. Let Bi = Ai; 1 × · · · × Ai; k and let P[X1; k+1 ∈ A1; k+1 ; : : : ; Xn; k+1 ∈ An; k+1 |(y 1 ; : : : ; y n )] be the conditional probability of [X1; k+1 ∈ A1; k+1 ; : : : ; Xn; k+1 ∈ An; k+1 ] given that ((X1; 1 ; : : : ; X1; k ); : : : ; (Xn; 1 ; : : : ; Xn; k )) = (y 1 ; : : : ; y n ), for y i = (yi; 1 ; : : : ; yi; k ). Then P[(X1 ; : : : ; Xn ) ∈ A] Z = P[X1; k+1 ∈ A1; k+1 ; : : : ; Xn; k+1 ∈ An; k+1 | (y 1 ; : : : ; y n )] dn; k ; B1 ×···× Bn

where n; k is the marginal of X on (F k )n . Set f(y 1 ; : : : ; y n ) = P[X1; k+1 ∈ A1; k+1 ; : : : ; Xn; k+1 ∈ An; k+1 | (y 1 ; : : : ; y n )]:

M. Monticino / Journal of Statistical Planning and Inference 73 (1998) 113–133

By the induction assumption, Z Z n; k f(y 1 ; : : : ; y n ) d = B1 ×···× Bn

B(1) ×···× B(n)

121

f(y −1 (1) ; : : : ; y −1 (n) ) dn; k :

Fix (y 1 ; : : : ; y n ) and consider f(y −1 (1) ; : : : ; y −1 (n) ). Let {z1 ; : : : ; zr } be the smallest set of elements of F k such that, for each 16i6n, y i = zs for some 16s6r. Let i1; s ¡ · · · ¡ijs ; s be those i’s such that y i = zs . And nally, for 16s6r, let t1; s ¡ · · · ¡ tjs ; s be the (il; s )’s arranged in increasing order. Then, f(y −1 (1) ; : : : ; y −1 (n) ) = P[X1; k+1 ∈ A1; k+1 ; : : : ; Xn; k+1 ∈ An; k+1 | (y −1 (1) ; : : : ; y −1 (n) )] = =

r Q s=1 r Q s=1

P[U (zs )1 ∈ At1 ; s ; : : : ; U (zs )js ∈ Atjs ; s ] P[U (zs )1 ∈ A(i1; s ) ; : : : ; U (zs )js ∈ A(ijs ; s ) ]

= P[X1; k+1 ∈ A(1); k+1 ; : : : ; Xn; k+1 ∈ A(n); k+1 | (y 1 ; : : : ; y n )] = P[X−1 (1); k+1 ∈ A1; k+1 ; : : : ; X−1 (n); k+1 ∈ An; k+1 | (y 1 ; : : : ; y n )]; where the second and third equalities hold because the processes {U (zs ): 16s6r} are independent and each U (zs ) is exchangeable, respectively. It now follows that Z P[(X1 ; : : : ; Xn ) ∈ A] = f(y −1 (1) ; : : : ; y −1 (n) ) dn; k B(1) ×···× B(n) Z = P[X−1 (1); k+1 ∈ A1; k+1 ; : : : ; X−1 (n); k+1 B(1) ×···× B(n)

∈ An; k+1 | (y 1 ; : : : ; y n )] dn; k = P[(X−1 (1) ; : : : ; X−1 (n) ) ∈ A]: This completes the induction argument. 4. Properties of exchangeable tree and random rescaling priors This section develops some properties of exchangeable tree priors. First, the exchangeable tree perspective is used in Theorem 4.1 to determine conditions under which the priors give probability one to the set of continuous probability measures. Such priors are useful in Bayesian statistics. Theorem 4.2 states when exchangeable tree priors give probability one to the set of probability measures which have full support on [0,1]. And, Theorem 4.3 gives conditions such that QU has full support on P([0; 1]). This is a convenient property for sampling applications. Analogous statements of the theorems hold true when QU is viewed as a prior on P(F ∞ ).

122

M. Monticino / Journal of Statistical Planning and Inference 73 (1998) 113–133

Recall that a probability measure ∈ P([0; 1]) is continuous if ({x}) = 0, for all x ∈ [0; 1]. Let C = { ∈ P([0; 1]): ({x}) = 0; for all x ∈ [0; 1]}: A probability de ned on a compact Hausdor space H has full support if every nonempty, open subset of H has positive -measure. This is equivalent to H being the smallest compact set which has -measure one. For an exchangeable tree (F; U ) with F = {0; : : : ; k}, the mapping U is centered if for each ¿0 there exists a ∈ (0; 12 ) such that U (p) ({(s0 ; : : : ; sk−1 ∈ Sk : ¡s0 ; : : : ; sk−1 ; (1 − s0 − · · · − sk−1 )¡1 − })¿1 − ; for all p ∈ F ∗ . The intuitive idea in centering U is to prevent any stage of the experiment from excluding the possibility of some of the outcomes from occurring. For example, U is centered if each U (p) is the same and supported on the interior of the simplex Sk . Theorem 4.1. Suppose (F; U ) is an exchangeable tree and U is centered. Then QU (C) = 1. Theorem 4.2. Suppose (F; U ) is an exchangeable tree. QU gives probability one to the set { ∈ P([0; 1]): has full support on [0; 1]} if and only if U (p) has support on {(s0 ; : : : ; sk−1 ∈ Sk : 0¡s0 ; : : : ; sk−1 ; (s0 + · · · + sk−1 )¡1}, for each p ∈ F ∗ . Theorem 4.3. Suppose (F; U ) is an exchangeable tree. QU has full support on P([0; 1]) if and only if, for each p ∈ F ∗ ; U (p) has full support on Sk . Theorem 4.1 is established with Lemmas 4.4 and 4.6. Lemma 4.4 is essentially Lemma 5.2 of Mauldin et al. (1992), and so is stated without proof. The proofs of Theorems 4.2 and 4.3 are similar, so only the proof of Theorem 4.3 is given. Note that if (F; U ) is centered, then the conditions of Theorem 4.2 are satis ed. Lemma 4.4. Let (F; U ) be an exchangeable tree and let X1 ; X2 ; : : : be the sequence of exchangeable [0; 1]-valued random variables generated from (F; U ). Then QU (C) = 1 if and only if P[X1 = X2 ] = 0. Remark 4.5. As noted by Mauldin, Sudderth and Williams (1992), Lemma 4.4 can be formulated in terms of a general prior on P([0; 1]) or P(F ∞ ). In particular, let Y = (Y1 ; Y2 ; : : :) = ((X1; 1 ; X1; 2 ; : : :); (X1; 1 ; X2; 2 ; : : :); : : :) be the sequence of exchangeable F ∞ -valued variables generated from the exchangeable tree (F; U ) and let Q U be the de Finetti measure of Y . Then Q U is a prior on P(F ∞ ); and, Q U ({ ∈ P(F ∞ : ({y}) = 0; for all y ∈ F ∞ }) = 1 if and only if P[Y1 = Y2 ] = 0. With this remark and Lemma 4.4, we get the following.

M. Monticino / Journal of Statistical Planning and Inference 73 (1998) 113–133

123

Lemma 4.6. Let (F; U ) be an exchangeable tree. If ∞ Q i=1

P[U ((x1 ; x2 ; : : : ; xi−1 ))2 = xi | U ((x1 ; x2 ; : : : ; xi−1 ))1 = xi ] = 0;

for all x = (x1 ; x2 ; : : :) ∈ F ∞ ; then QU (C) = 1. Proof. By Lemma 4.4, QU (C) = 1 if P[X1 = X2 ] = 0. Let Y = (Y1 ; Y2 ; : : :) and Q U be as in Remark 4.3. Then, Z P[Y1 = Y2 ] =

F∞

P[Y2 = y | Y1 = y] dY1 (y)

Z =

P[X2; i =yi ; for all i = 1; 2; : : : | X2; i = yi ; for all i = 1; 2; : : :] dY1 (y):

F∞

So, P[Y1 = Y2 ] = 0 if P[X2; i = yi ; for all i = 1; 2; : : : | X2; i = yi ; for all i = 1; 2; : : :] = 0, for all y = (y1 ; y2 ; : : :) ∈ F ∞ . But, for y ∈ F ∞ , P[X2; i = yi ; for all i = 1; 2; : : : | X2; i = yi ; for all i = 1; 2; : : :] ∞ Q = P[U ((y1 ; y2 ; : : : ; yi−1 ))2 = yi | U ((y1 ; y2 ; : : : ; yi−1 ))1 = yi ]: i=1

Hence, by hypothesis, P[Y1 = Y2 ] = 0. And so, by Remark 4.5, Q U ({ ∈ P(F ∞ : ({y}) = 0; for all y ∈ F ∞ }) = 1. From Mauldin et al. (1992, Lemma 5.1), we have P[Y1 = y] = 0, for all y ∈ F ∞ . It now follows that P[X1 = X2 ] = 0. Proof of Theorem 4.1. By Lemma 4.6, it is sucient to show that ∞ Q i=1

P[U ((x1 ; x2 ; : : : ; xi−1 ))2 = xi | U ((x1 ; x2 ; : : : ; xi−1 ))1 = xi ] = 0;

for all x = (x1 ; x2 ; : : :) ∈ F ∞ . But, ∞ Q i=1

P[U ((x1 ; x2 ; : : : ; xi−1 ))2 = xi | U ((x1 ; x2 ; : : : ; xi−1 ))1 = xi ]

∞ P[U ((x ; x ; : : : ; x Q 1 2 i−1 ))2 = xi ; U ((x1 ; x2 ; : : : ; xi−1 ))1 = xi ] P[U ((x1 ; x2 ; : : : ; xi−1 ))1 = xi ] i=1 ∞ Q = ki ;

=

i=1

where R 2 sj dU ((x1 ; :::; xi−1 )) (s0 ; : : : ; sk−1 ) ; ki = R sj dU ((x1 ;:::; xi−1 )) (s0 ; : : : ; sk−1 )

124

M. Monticino / Journal of Statistical Planning and Inference 73 (1998) 113–133

if xi = j, for j = 0; : : : ; k − 1; and R (1 − s0 − · · · − sk−1 )2 dU ((x1 ;:::; xi−1 )) (s0 ; : : : sk−1 ) ; ki = R (1 − s0 − · · · − sk−1 ) dU ((x1 ;:::; xi−1 )) (s0 ; : : : ; sk−1 ) if xi = k. Now let ∈ (0; 12 ) be such that U (p) ({(s0 ; : : : ; sk−1 ) ∈ Sk : ¡s0 ; : : : ; sk−1 ; (1 − s0 − · · · − sk−1 )¡1 − })¿ 12 ; for all p ∈ F ∗ . Thus, Z sj dU (p) (s0 ; : : : ; sk−1 ) Z =

{(s0 ;:::; sk−1 )∈Sk :06sj 61−}

sj dU (p) (s0 ; : : : ; sk−1 )

Z

+

{(s0 ; :::; sk−1 )∈Sk :1−6sj 61}

sj dU (p) (s0 ; : : : ; sk−1 )

¡1 − : 2 It also follows that Z

sj dU (p) (s0 ; : : : ; sk−1 )¿ : 2

Hence, R 2 s dU (p) (s0 ; : : : ; sk−1 ) R j y dU (p) (s0 ; : : : ; sk−1 ) R s2 dU (p) (s0 ; : : : ; sk−1 ) {(s0 ;:::; R sk−1 ):06sj 61−} j + {(s0 ;:::; sk−1 ):1−6sj 61} sj2 dU (p) (s0 ; : : : ; sk−1 ) R = sj dU (p) (s0 ; : : : ; sk−1 ) R (1 − ) {(s0 ; :::; sk−1 ):06sj 61−} sj dU (p) (s0 ; : : : ; sk−1 ) R + {(s0 ;:::; sk−1 ):1−6sj 61} sj dU (p) (s0 ; : : : ; sk−1 ) R 6 sj dU (p) (s0 ; : : : ; sk−1 ) R R sj dU (p) (s0 ; : : : ; sk−1 ) − {(s0 ;:::; sk−1 ):06sj 61−} sj dU (p) (s0 ; : : : ; sk−1 ) R = sj dU (p) (s0 ; : : : ; sk−1 ) 61 −

2 : 2−

M. Monticino / Journal of Statistical Planning and Inference 73 (1998) 113–133

125

Similarly, R (1 − s0 − · · · − sk−1 )2 dU (p) (s0 ; : : : ; sk−1 ) 2 R 61 − : 2− (1 − s0 − · · · − sk−1 ) dU (p) (s0 ; : : : ; sk−1 ) Therefore, ∞ Q i=1

ki = 0:

Proof of Theorem 4.3. The proof is given for the case of a binary exchangeable tree. The proof in general is similar but notationally messier. For the forward implication, suppose that there exists a p = (p1 ; : : : ; pn ) ∈ F ∗ such that U ( p) does not have full support on [0; 1]. That is, for some nontrivial subinterval [a; b] ⊂ [0; 1]; U ( p) ([a; b]) = 0. Pn Now let j and k be such that 2jn = k=2n+1 = i=1 pi =2i . (Assume p 6= ∅ – the proof for p = ∅ is easier.) Also, let ∈ C be such that b−a j j+1 j j+1 = =1− ; ; 2n 2n 2n 2n 6 and

k +1 ; n+1 n+1 2 2 k

=

k +1 ; n+1 n+1 2 2 k

=

b+a : 2

Each of the sets b−a j j+1 j j+1 O1 = ∈ P([0; 1]): ; ¡ + ; ; 2n 2n 2n 2n 6 b−a j j+1 j j+1 O2 = ∈ P([0; 1]): ; ¿ − ; ; 2n 2n 2n 2n 6 b−a k k +1 k k +1 O3 = ∈ P([0; 1]): ; ¡ + ; ; 2n+1 2n+1 2n+1 2n+1 6 b−a k k +1 k k +1 O4 = ∈ P([0; 1]): ; ¿ − ; ; 2n+1 2n+1 2n+1 2n+1 6 are open in the topology of weak convergence on P([0; 1]). Moreover, O1 ∩ O2 ∩ O3 ∩ O4 is nonempty and QU (O1 ∩ O2 ∩ O3 ∩ O4 ) m 1 P I( p1 ;:::;pn ) (Xi; 1 ; : : : ; Xi; n )¡1; 6P lim n→∞ m i=1 m b−a 1 P lim ; I( p1 ;:::;pn ) (Xi; 1 ; : : : ; Xi; n )¿1 − n→∞ m i=1 3 m 2b + a 1 P lim ; I( p1 ;:::;pn ; 0) (Xi; 1 ; : : : ; Xi; n ; Xi; n+1 )¡ n→∞ m i=1 3

126

M. Monticino / Journal of Statistical Planning and Inference 73 (1998) 113–133

m b + 2a 1 P lim I( p1 ;:::;pn ; 0) (Xi; 1 ; : : : ; Xi; n ; Xi; n+1 )¿ n→∞ m i=1 3 Z m P 1 b+2a 2b+a ; P lim I( p1 ;:::;pn ; 0) (Xi; 1 ; : : : ; Xi; n ; Xi; n+1 ) ∈ = b−a n→∞ m i=1 3 3 (1− 3 ; 1) m 1 P I( p1 ;:::;pn ) (Xi; 1 ; : : : ; Xi; n ) = z dP(∗p1 ;:::;pn ;) (z); lim n→∞ m i=1 where P(∗p

is the measure de ned on ([0; 1]; B([0; 1])) by m 1 P ∗ I( p1 ;:::;pn ) (Xi; 1 ; : : : ; Xi; n ) ∈ B : P( p1 ;:::;pn ) (B) = P lim n→∞ m i=1 1 ;:::;pn )

But, for each z ∈ (1 − b −3 a ; 1), m P b + 2a 2b + a ; P lim I( p1 ;:::;pn ; 0) (Xi; 1 ; : : : ; Xi; n ; Xi; n+1 ) ∈ m→∞ i=1 3 3 m 1X lim I( p1 ;:::;pn ) (Xi; 1 ; : : : ; Xi; n ) = z n→∞ m i=1 ! ( b +3 2a ; 2b 3+ a ) = U ( p) z 6U ( p) ([a; b]) = 0; where

( b +3 2a ; 2b 3+ a ) x b + 2a 2b + a = c: c = for some x ∈ ; : z z 3 3

That is, QU (O1 ∩ O2 ∩ O3 ∩ O4 ) = 0 and hence QU does not have full support. The reverse implication follows from Mauldin et al. (1992), (Theorem 6.1) with minor modi cations.

5. Random rescaling and exchangeable trees A short description of the random rescaling scheme, introduced by Graf et al. (1986) and generalized by Mauldin and Monticino (1995) is given below. Theorem 5.1 shows that random rescaling and exchangeable trees produce the same priors on P([0; 1]). After the statement of the theorem, some ways that the random rescaling perspective can be used to study the geometric properties of the probability measures in the support of exchangeable tree priors are mentioned. Let be a mapping (transition kernel) from D, the dyadic rationals of [0; 1], to P([0; 1]). A distribution function, h, of a probability measure on [0; 1] is generated

M. Monticino / Journal of Statistical Planning and Inference 73 (1998) 113–133

127

by randomly rescaling as follows. First, set h(0) = 0 and h(1) = 1. Now randomly select the value of h( 12 ) according to the distribution ( 12 ). Select h( 14 ) according to the distribution ( 14 ) scaled to the interval [0; h( 12 )] and independently select h( 34 ) according to ( 34 ) scaled to [h( 12 ); 1]. Continue in this manner to de ne a function on the dyadic rationals. In the natural way, extend h to a distribution function of a probability measure on [0; 1]. This scheme thus induces a probability measure or prior, denoted by R , on the space of distribution functions – or, equivalently, on P([0; 1]): A more precise view of R can be obtained by introducing a scaling map from [0; 1]D to P([0; 1]). Let Dn be the set of strictly nth level dyadic rationals – eg., 12 ∈= D2 . For t = (t( 12 ); t( 14 ); t( 34 ); : : :) ∈ [0; 1]D , de ne the distribution function (t) inductively on D as follows: (t)(0) = 0;

(t)(1) = 1:

n n D has been de ned. And, for 06j62 − 1, set Assume (t) | ∪i=1 i

(t)

2j + 1 j j+1 j 2j + 1 = (t) + (t) − (t) ·t : 2n+1 2n 2n 2n 2n+1

It is straightforward to show is a well de ned, open and continuous map from [0; 1]D Q into P([0; 1]). The prior R is induced by P = d∈D (d) through . As mentioned above, Dubins and Freedman (1967) introduced a related method of generating probability measures. To see that random rescaling can produce priors which can not be constructed within their framework, set (d) = 1 , for all dyadic rationals

d¡ 12 ,

and set (d) =

1 , {2}

for all

d¿ 12 .

{3}

Then R , regarded as a probability

1 ˜ ˜ measure on distribution functions, is {h} ˜ , where h(x) = x, for x¿ 2 , and h is singular 1 over (0; 2 ). Dubins and Freedman (1967, Theorem 5.1) show that this prior can not be obtained from their construction. More generally, for any homeomorphism, h, there exists a transition kernel for which R = {h} . However, if h is not singular almost everywhere and if h is not the identity function, then {h} cannot be obtained from a Dubins–Freedman construction. Recall that U (p) denotes the de Finetti measure of the sequence U (p) and, when (F; U ) is a binary tree, regard U (p) as the distribution of the limiting frequency of 0’s in the sequence U (p) (this is more convenient than using the limiting frequency of 1’s). So in this case U (p) is a measure on [0; 1]. Theorem 5.1 assumes that the exchangeable tree prior QU has support on the set of continuous measures of [0; 1]. As shown in Theorem 4.1, this is true whenever U is centered. For F = {0; 1}, de ne a map : F ∗ → D by (∅) = 12 and, for all other (b1 ; : : : ; bn ) ∈ F ∗ ,

((b1 ; : : : ; bn )) =

n ( − 1)bi 1 P − : 2 i−1 2i+1

128

M. Monticino / Journal of Statistical Planning and Inference 73 (1998) 113–133

Theorem 5.1. Let : D → P([0; 1]) be a transition kernel and de ne a binary exchangeable tree (F; U ) by setting U (p) = ( (p)); for each p ∈ F ∗ . Suppose QU gives probability one to the set of continuous measures. Then QU = R . Geometric properties. With Theorem 5.1, Mauldin and Monticino (Theorem 4.1, 1995) can be applied to show that if there exists a ¿0 and a compact set K ⊂ [0; 1]− { 12 } such that U (p) (K)¿ for all p ∈ F ∗ , then almost all the probability measures in the support of the (binary) exchangeable tree prior QU are strictly singular (i.e., their distribution functions do not have a nite positive derivative anywhere). Conversely, Kraft (1964) can be used to state conditions which guarantee that almost all the measures are absolutely continuous with respect to Lebesgue measure. Recall that the Hausdor dimension of a probability measure on [0; 1] is de ned as dimH () = min{dimH (A): A ⊂ [0; 1] and (A) = 1}; where dimH (A) is the Hausdor dimension of A (see, e.g., Falconer, 1985). If all the U (p) ’s are the same, then Kinney and Pitcher (1964) shows that QU -almost all the supports of the probability measures have dimension equal to Z 1 − s ln(s) + (1 − s) ln(1 − s) dU (p) (s): ln 2 [0; 1] When the U (p) are not all the same, bounds on the dimension of the supports can be obtained through Theorem 5.2 of Mauldin and Monticino (1995). Remark 5.2 recalls some standard facts about showing two priors on P([0; 1]) are equal. Remark 5.2. Let E = {E ∈ B(P([0; 1])): −1i

i+1 (A) = E ( 2m ; 2 m ] m

for some m ∈ N; 06i62 − 1; and A ∈ B([0; 1])}; where B(P([0; 1])) is the -algebra of Borel sets of P([0; 1]) given the weak topology and C : P([0; 1]) → [0; 1] is de ned by C () = (C). De ne G to be the set of all nite intersections of elements of E. It is easy to see that any G ∈ G is of the form m 1 1 2 2 −1 G = ∈ P([0; 1]) : 0; m ; ; : : : ; ; ; 1 ∈ C 2 2m 2m 2m (3.10 )

M. Monticino / Journal of Statistical Planning and Inference 73 (1998) 113–133

129

m

for some m and C ∈ B([0; 1]2 ). Hence, by Bertsekas and Shreve (1978, Proposition 7.25), to show that priors P1 ; P2 ∈ P(P([0; 1])) are equal it is enough to show that they agree on all sets of the form given in (3.1). Proof of Theorem 5.1. For m = 1; 2; : : : ; let 1 1 2 ; Gm = G ∈ G : G = ∈ P([0; 1]) : 0; m ; ;:::; 2 2m 2m m 2 −1 2m ; 1 ∈ C ; for some C ∈ B([0; 1] ) : 2m Then, as noted in Remark 5.2, to prove QU = R it is sucient to show that QU and R agree on each Gm . This is done by induction on m. Let X = (X1 ; X2 ; : : :) be the sequence of exchangeable [0; 1]-valued random variables generated from (F; U ). Suppose m = 1 and G ∈ G1 . Then 1 1 G = ∈ P([0; 1]) : 0; ; ;1 ∈C 2 2 for some C ∈ B([0; 1]2 ); and 1 QU (G) = QU ∈A ; ∈ P([0; 1]): 0; 2 for A = {y ∈ [0; 1]: (y; 1 − y) ∈ C}. Moreover, n P 1 1 QU ∈A = P weak lim ∈A ∈ P([0; 1]): 0; 0; Xi n→∞ i=1 2 2 n P I 1 (Xi ) ∈ A = P lim n→∞ i=1 [0; 2 ]

=P

lim

n P

n→∞ i=1

= U (∅) (A) 1 (A): = 2 Denote the distribution function of ∈ P([0; 1]) by h . Then 1 R (G) = R ∈A ∈ P([0; 1]): 0; 2 1 ∈A ∈ P([0; 1]): h = R 2 1 (A): = 2 So R and QU agree on G1 .

I{(0)} (Xi; 1 ) ∈ A

130

M. Monticino / Journal of Statistical Planning and Inference 73 (1998) 113–133

Assume that R and QU agree on Gm for all m6k. To show that R and QU agree on Gk+1 it is enough to show that they agree on all G ∈ Gk+1 of the form 1 2 1 G = ∈ P([0; 1]) : 0; k+1 ; ;:::; ; 2 2k+1 2k+1 k+1 2 −1 ;1 ∈C ; 2k+1 for C = C1 × · · · × C2k+1 with Ci ∈ B([0; 1]). For such a G ∈ Gk+1 , we have Z QU (G) = QU (G | (y1 ; : : : ; y2k )) dQUk (y1 ; : : : ; y2k ); where QU (G | (y1 ; : : : ; y2k )) is the conditional QU -probability of G given { ∈ P([0; 1]) : k (([0; 21k ]); (( 21k ; 22k ]); : : : ; (( 2 2−k 1 ; 1])) = (y1 ; : : : ; y2k )} and QUk is the measure de ned k k on ([0; 1]2 ; B([0; 1]2 )) by 1 2 1 0; k+1 ; ;:::; ; 2 2k+1 2k+1 k+1 2 −1 ;1 ∈B : 2k+1

QUk (B) = QU

∈ P([0; 1]) :

Similarly, Z R (G) =

R (G | (y1 ; : : : ; y2k )) dRk (y1 ; : : : ; y2k );

where R (G | (y1 ; : : : ; y2k )) and Rk are de ned analogously to QU (G | (y1 ; : : : ; y2k )) and QUk . By the induction assumption Rk = QUk . So to complete the argument it is sucient to show R (G | (y1 ; : : : ; y2k )) = QU (G | (y1 ; : : : ; y2k )), for all (y1 ; : : : ; y2k ). First, QU (G | (y1 ; : : : ; y2k )) 1 2 1 ∈ P([0; 1]) : 0; k+1 ; ;:::; ; = QU 2 2k+1 2k+1 k+1 1 2 −1 ; ;1 ∈ C ∈ P([0; 1]) : 0; k 2k+1 2 k 2 −1 1 2 ; ; 1 = (y ; : : : ; y ; : : : ; k 1 2 2k 2k 2k 2i − 2 2i − 1 k ; ; 16i62 : ∈ A = QU i 2k+1 2k+1 i−1 i ; = yi ; 16i62k ; : 2k 2k

M. Monticino / Journal of Statistical Planning and Inference 73 (1998) 113–133

131

where Ai = {z ∈ [0; 1] : (z; yi − z) ∈ C2i−1 × C2i }. Since the processes {U (p)}p∈F ∗ are − 2 2i − 1 ; 2k+1 ] ∈ Ai }}16i62k are conditionally independent independent, the events {{ : (( 2i2k+1 i−1 i given { : ((( 2k ; 2k ]) = yi ; 16i62k }. Therefore, 2i − 2 2i − 1 QU (G | (y1 ; : : : ; y2k )) = QU ; k+1 ∈ Ai : k+1 2 2 i=1 i−1 i ; = y : i 2k 2k " 2k n Q 1P 2i − 2 2i − 1 P weak lim Xj ; ∈ A = i n→∞ n j=1 2k+1 2k+1 i=1 # n 1P i−1 i = yi Xj ; weak lim n→∞ n j=1 2k 2k " 2k n Q 1P P lim I 2i − 2 2i − 1 ∈ Ai = ( k+1 ; k+1 ] n→∞ n 2 2 i=1 j=1 # n 1P I i − 1 i = yi lim n→∞ n j=1 ( 2k ; 2k ] " 2k n Q 1P I(bi; 1 ;:::; bi; k ) (Xj; 1 ; : : : ; Xj; k )I((0)) = P lim n→∞ n j=1 i=1 k

2 Q

n 1P (Xj; k+1 )∈Ai | lim I(bi; 1 ;:::; bi; k ) (Xj; 1 ; : : : ; Xj; k ) = yi n→∞ n j=1 2k Q Ai = U (bi; 1 ;:::; bi; k ) ; yi i=1

where :bi; 1 : : : bi; k is the kth order binary expansion of (i − 1)=2k and a Ai = r:r= for some a ∈ Ai : yi yi On the other hand, R (G | (y1 ; : : : ; y2k )) 2i − 2 2i − 1 k : ∈ Ai ; 16i62 ; k+1 = R k+1 2 2 i−1 i k = y ; 16i62 : i 2k 2k 2i − 1 2i − 2 k ; 16i62 : h − h ∈ A = R i k+1 k+1 2 2 i i−1 − h = yi 16i62k : : h 2k 2k

#

132

M. Monticino / Journal of Statistical Planning and Inference 73 (1998) 113–133

−1 −2 By the random rescaling construction, the events {{ : h ( 2i2k+1 ) − h ( 2i2k+1 ) ∈ Ai ; i−1 i k 2k 16i62 }}i=1 are conditionally independent given { : ((( 2k ; 2k ] = yi ; 16i62k }. Hence, 2k Q 2i − 1 i−1 R : h − h ∈ Ai R (G | (y1 ; : : : ; y2k )) = k+1 k 2 2 i=1 i i−1 − h = y : h i 2k 2k 2k Q 2i − 1 i−1 = i−1 Ai + h i (h ( k ); h ( k )] 2k+1 2k 2 2 i=1 2k Q Ai = ( (bi; 1 ; : : : ; bi; k )) yi i=1 k 2 Q Ai = U (bi; 1 ;:::; bi; k ) ; yi i=1

where

2i − 1 i i−1 i ( k+1 ) is the measure ( 2k ) scaled to the ) − h ( k )] 2 2k 2 1 1 Ai + h ( i − ) = {r : r = a + h ( i − ); for some a ∈ Ai } and, 2k 2k k

(h (

1 interval (h ( i − ); 2k

h ( 2ik )]; again, bi; 1 : : : bi; k is the kth order binary expansion of (i − 1)=2 . Therefore, R (G) = QU (G), for all G ∈ Gk+1 and the induction is complete. Acknowledgements I would like to express my appreciation to Professor R. Daniel Mauldin for his encouragement and for our many interesting conversations during the course of this work. References Aldous, D.J., 1983. Exchangeability and related topics. Ecole d’Ete de Probabilite de Saint-Flour XIII. Lecture Notes in Math., vol. 1117, 1–197. Bertsekas, D.P., Shreve, S.E., 1978. Stochastic Optimal Control: The Discrete Time Case. Academic Press, New York. Blackwell, D., Kendall, D., 1964. The Martin boundary for Polya’s urn scheme and an application to stochastic population growth. J. Appl. Probab. 1, 284 – 296. Blackwell, D., MacQueen, J.B., 1973. Ferguson distributions via Polya urn schemes. Ann. Statist. 1, 353 – 355. Dubins, L.E., Freedman, D.A., 1967. Random distribution functions. Proc. 5th Berkeley Symp. Math. Statist. Probab., vol. 2, pp. 183 – 214. Fabius, J., 1964. Asymptotic behavior of Bayes estimates. Ann. Math. Statist. 35, 846 – 856. Falconer, K.J., 1985. The Geometry of Fractal Sets. Cambridge University Press, Cambridge, Great Britain. Feller, W., 1968. An Introduction to Probability Theory and Its Applications, 3rd ed., Wiley, New York. Ferguson, T.S., 1973. A Bayesian analysis of some nonparametric problems. Ann. Statist. 1, 209 – 230. Ferguson, T.S., 1974. Prior distributions on spaces of probability measures. Ann. Statist. 2 (4) 615 – 629.

M. Monticino / Journal of Statistical Planning and Inference 73 (1998) 113–133

133

Freedman, D.A., 1963. On the asymptotic behavior of Bayes’ estimates in the discrete case. Ann. Math. Statist. 34, 1386 –1403. Freedman, D.A., 1965. Bernard Friedman’s urn. Ann. Math. Statist. 36, 956 – 970. Graf, S., Mauldin, R.D., Williams, S.C., 1986. Random homeomorphisms. Adv. Math. 60, 239 – 359. Graf, S., Novak, E., Papageorgiou, A., 1989. Bisection is not optimal on the average. Numer. Math. 55, 481– 491. Hewitt, E., Savage, L.J., 1955. Symmetric measures on Cartesian products. Trans. Amer. Math. Soc. 80, 470 – 501. Hill, B.M., Lane, D., Sudderth, W., 1980. A strong law for some generalized urn processes. Ann. Probab. 8, 214 – 226. Hill, B.M., Lane, D., Sudderth, W., 1987. Exchangeable urn processes. Ann. Probab. 15, 1586 –1592. Kinney, J.R., Pitcher, T.S., 1964. The dimension of the support of a random distribution function. Bull. Amer. Math. Soc. 70, 161–164. Kraft, C.H., 1964. A class of distribution function processes which have derivatives. J. Appl. Probab. 1, 385 – 388. Lavine, M., 1992. Some aspects of Polya tree distributions for statistical modeling. Ann. Statist. 20 (3), 1222 –1235. Lavine, M., 1992. More aspects of Polya tree distributions for statistical modeling. Ann. Statist. 22 (3), 1161–1176. Mauldin, R.D., Williams, S.C., 1990. Reinforced random walks and random distributions. Proc. Amer. Math. Soc. vol. 110 (1), pp. 251– 258. Mauldin, R.D., Sudderth, W.D., Williams, S.C., 1992. Polya trees and random distributions. Ann. Statist. 20 (3), 1203 –1221. Mauldin, R.D., Monticino, M.G., 1995. Randomly generated distributions. Israel J. Math., to appear. Novak, E., 1989. Average-case results in zero nding. J. Complex. 5, 489 – 501. Ritter, K., 1992. Average errors for zero nding: lower bounds for smooth or monotone functions. University of Kentucky Technical Report No. 209 – 292.

Constructing prior distributions with trees of exchangeable processes

Constructing prior distributions with trees of exchangeable processes

Recommend Documents