Journal of Statistical Planning and Inference 141 (2011) 787–799
Contents lists available at ScienceDirect
Journal of Statistical Planning and Inference journal homepage: www.elsevier.com/locate/jspi
Quantitative comparisons between finitary posterior distributions and Bayesian posterior distributions Federico Bassetti Universita degli Studi di Pavia, Dipartimento di Matematica, via Ferrata 1, 27100 Pavia, Italy
a r t i c l e in fo
abstract
Article history: Received 13 January 2010 Received in revised form 15 July 2010 Accepted 4 August 2010 Available online 10 August 2010
The main object of Bayesian statistical inference is the determination of posterior distributions. Sometimes these laws are given for quantities devoid of empirical value. This serious drawback vanishes when one confines oneself to considering a finite horizon framework. However, assuming infinite exchangeability gives rise to fairly tractable a posteriori quantities, which is very attractive in applications. Hence, with a view to a reconciliation between these two aspects of the Bayesian way of reasoning, in this paper we provide quantitative comparisons between posterior distributions of finitary parameters and posterior distributions of allied parameters appearing in usual statistical models. & 2010 Elsevier B.V. All rights reserved.
Keywords: de Finetti’s theorem Dudley metric Empirical distribution Finitary Bayesian inference Finite exchangeability Gini–Kantorovich–Wasserstein distance Predictive inference Quantitative comparison of posterior distributions
1. Introduction In the Bayesian reasoning the assumption of infinite exchangeability gives rise to fairly tractable a posteriori quantities, which is very attractive in real applications. If observations form an infinite exchangeable sequence of random variables, de Finetti’s representation theorem states that they are conditionally independent and identically distributed, given some random parameters, and the distribution of this random parameter is the center of the current Bayesian statistical inference. The theoretical deficiency of this approach lies in the interpretation of these parameters. In fact, as pointed out for the first time by de Finetti (see de Finetti, 1930, 1937 and also Bassetti and Regazzini, 2008), parameters ought to be of such a nature that one should be able to acknowledge at least the theoretical possibility of experimentally verifying whether hypotheses on these parameters are true or false. A closer look to the usual Bayesian procedures shows that Bayesian statisticians often draw inferences (from observations) both to empirical (i.e. verifiable) and to non-empirical hypotheses. To better understand this point, it is worth stating a more complete formulation of the already mentioned de Finetti’s representation theorem: A sequence ðxn Þn Z 1 of random elements taking values in some suitable measurable space ðX,X Þ (e.g. a Polish space), is exchangeable if and only if the empirical distribution e~ n ðÞ ¼
n 1X d ðÞ n i ¼ 1 xi
E-mail address:
[email protected] 0378-3758/$ - see front matter & 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.jspi.2010.08.004
788
F. Bassetti / Journal of Statistical Planning and Inference 141 (2011) 787–799
converges in distribution to a random probability p~ with probability one and the xn ’s turn out to be conditionally independent ~ with common distribution p. ~ Hence, it is p~ that takes the traditional role of parameter in Bayesian modeling. given p, However, since p~ is a limiting entity of mathematical nature, hypotheses related to it might be devoid of empirical value. It is clear that this drawback vanishes when one confines oneself to considering a finite horizon framework, say ðx1 , . . . , xN Þ, ~ In where N can be seen as a finite population size. Here e~ N , that is always (at least ideally) observable, takes the place of p. this way one preserves the hypothesis of exchangeability, which is quite natural in many statistical problems, but avoids the problem of assessing probability law to unobservable entities. In particular, in this context, the conditional distribution of the empirical measure e~ N given xðnÞ :¼ ðx1 , . . . , xn Þ (n o N) takes the place of the conditional distribution of p~ given xðnÞ, i.e. the usual posterior distribution of the Bayesian (nonparametric) inference. Even if, in view of de Finetti’s representation, the parameter corresponding to the so-called ‘‘unknown distribution’’ ~ is the limit, as N- þ 1, of empirical distribution, it should be emphasized that in the Bayesian practice two (i.e. p) conflicting aspects sometimes occur. On the one hand, statistical inference ought to concern finitary and, therefore, observable entities whereas, on the other hand, simplifications of a technical nature can generally be obtained by dealing ~ Hence, it is interesting to compare the with (parameters defined as a function of) the ‘‘unknown distribution’’ p. conditional distribution of e~ N given xðnÞ with the conditional distribution of p~ given xðnÞ, when ðxk Þk Z 1 is an infinite ~ that is when the ‘‘population size’’ N diverges. This is the aim of this paper that can exchangeable sequence directed by p, be thought of as a continuation of the papers (Bassetti and Bissiri, 2007; Bassetti and Bissiri, 2008), where specific forms of (finitary) exchangeable laws have been defined and studied in terms of finitary statistical procedures. The rest of the paper is organized as follows. Section 2 contains a brief overview of the finitary approach to statistical inference together with some examples. Section 3 provides some useful definitions and properties of the so-called Gini–Kantorovich distance between probability measures to be used in the rest of the paper. Sections 4 and 5 deal with the problem of quantifying the discrepancy between the conditional law of e~ N given xðnÞ and the conditional law of p~ given xðnÞ. To conclude these introductory remarks it is worth mentioning (Diaconis and Freedman, 1980), which to some extent, is connected with our present work. In point of fact, Diaconis and Freedman (1980) provide an optimal bound for the total variation distance between the law of ðx1 , . . . , xn Þ and the law of ðz1 , . . . , zn Þ, ðx1 , . . . , xN Þ being a given finite exchangeable sequence and ðzk Þk Z 1 a suitable infinite exchangeable sequence. 2. Finitary statistical procedures As said before, we assume that the process of observation can be represented as an infinite exchangeable sequence ðxk Þk Z 1 of random elements defined on a probability space ðO,F ,PÞ and taking values in a complete separable metric space ðX,dÞ, endowed with its Borel sfield X . Let P0 be a subset of the set PðXÞ of all probability measures on ðX,X Þ and let t : P0 -Y be a parameter of interest, Y being a suitable parameter space endowed with a sfield. From a finitary point of ~ view, a statistician must focus his attention on the empirical versions tðe~ N Þ of the more common parameter tðpÞ. It might be useful, at this stage, to recast the decision theoretic formulation of a statistical problem in finitary terms. Usually one assumes that the statistician has a set D of decision rules at his disposal and that these rules are defined, for any n rN, as functions from Xn to some set A of actions. Then one considers a loss function L, i.e. a positive real-valued function on Y A, such that Lðy,aÞ represents the loss when the value of tðe~ N Þ is y and the statistician chooses action a. It is supposed that rN ðdðxðnÞÞÞ :¼ E½Lðtðe~ N Þ, dðxðnÞÞÞjxðnÞ is finite for any d in D and, then, rN ðÞ is said to be the a posteriori Bayes risk of dðxðnÞÞ. Moreover, a Bayes rule is defined to be any element dFB of D such that rN ðdFB ðxðnÞÞÞ ¼ minrN ðdðxðnÞÞÞ d2D
for any realization of xðnÞ. We shall call such a Bayes rule finitary Bayes estimator in order to distinguish it from the more common Bayes estimator obtained by minimizing ~ dðxðnÞÞÞjxðnÞ: rðdðxðnÞÞÞ :¼ E½LðtðpÞ, While the law of the latter estimator is determined by the posterior distribution, that is the conditional distribution of p~ given xðnÞ, the law of a finitary Bayes estimator is determined by the ‘‘finitary’’ posterior distribution, that is the conditional distribution of tðe~ N Þ given xðnÞ. A few simple examples will hopefully clarify the connection between the finitary Bayesian procedures and the usual Bayesian ones. In all the examples we shall present, observations are assumed to be real-valued, that is ðX,X Þ ¼ ðR,BðRÞÞ, the space of actions is some subset of R and the loss function is quadratic, i.e. Lðx,yÞ ¼ jxyj2 . It is clear that, under these hypotheses,
dFB ðxðnÞÞ ¼ E½tðe~ N ÞjxðnÞ and the usual Bayes estimator is given by ~ xðnÞ: E½tðpÞj
F. Bassetti / Journal of Statistical Planning and Inference 141 (2011) 787–799
789
Example 1 (Estimation of the mean). Suppose the statistician has to estimate the mean under the squared error loss, i.e. the R functional of interest is tðpÞ :¼ R xpðdxÞ. The usual Bayes estimator is
m^ n :¼ E½xn þ 1 jxðnÞ while the ‘‘finitary Bayes’’ estimator is
m^ FB ¼
n Nn m þ m^ n , N n N
where n 1X x: ni¼1 i
mn ¼
Note that in this case the finitary Bayes estimator is a convex combination of the usual Bayes estimator with the empirical (plug-in) estimator m n . R R Example 2 (Estimation of the variance). Now, consider the estimation of the variance tðpÞ ¼ R x2 pðdxÞð R xpðdxÞÞ2 , under þ the squared error loss. In this case the space of actions is R and the usual Bayes estimator is
s^ 2n :¼ s^ 2n c^ 1,2,n , where 2 2 s^ n :¼ E½xn þ 1 jxðnÞ
and
c^ 1,2,n :¼ E½xn þ 1 xn þ 2 jxðnÞ:
Some computations show that the ‘‘finitary Bayes’’ estimator is
s^ 2FB ¼
n 2 n2 Nn þ n=N1 2 ðNnÞðNn 1Þ 2ðNnÞn s^ n c^ 1,2,n s c 1,2,n þ m n m^ n N n N2 N N2 N2
where s 2n :¼
n 1X x2 ni¼1 i
and
c 1,2,n :¼
n X n 1 X xx: 2 n i¼1j¼1 i j
Example 3 (Estimation of the distribution function). Assume one has to estimate tðpÞ ¼ Fp ðyÞ ¼ pfð1,yg, where y is a fixed real number. Under the square loss function, the classical Bayes estimator is
E½Ið1,y ðxn þ 1 ÞjxðnÞ while the ‘‘finitary Bayes’’ estimator is n Nn F^ FB ðyÞ ¼ En ðyÞ þ E½Ið1,y ðxn þ 1 ÞjxðnÞ, N N P where En ðyÞ ¼ ð1=nÞ ni¼ 1 Ið1,y ðxi Þ. Example 4 (Estimation of the mean difference). Estimate the Gini mean difference Z tðpÞ ¼ DðpÞ ¼ jxyjpðdxÞpðdyÞ R2
under the squared error loss. The usual Bayes estimator is
E½jxn þ 1 xn þ 2 jjxðnÞ while the ‘‘finitary Bayes’’ estimator is
E½Dðe~ N ÞjxðnÞ ¼
n2 ðNnÞ2 ðNnÞ 2ðNnÞ X Dn þ E½jxn þ 1 xn þ 2 jjxðnÞ þ E½jxj xn þ 1 jjxðnÞ, 2 2 N N2 j o n N
where
Dn :¼
1 X jx x j: n2 i,j r n i j
It is worth noticing that in all the previous examples when N goes to þ 1 the ‘‘finitary Bayes’’ estimator converges to the usual Bayes estimator, while the ‘‘finitary Bayes’’ estimator becomes the usual plug-in frequentistic estimator if n ¼ N. 3. Notation and preliminaries on the Kantorovich distance between probability distributions In order to state our results we need some more notations. Let us start pointing out that the probability distribution of p~ is a probability measure on PðXÞ. In what follows, without loss of generality, consider PðXÞ endowed with a bounded
790
F. Bassetti / Journal of Statistical Planning and Inference 141 (2011) 787–799
metric l which induces the weak convergence on PðXÞ (e.g. the Prohorov metric), and set P for its Borel sfield. If necessary, expand ðO,F ,PÞ in order to contain all the random variables needed and, for any random variable V, let LV designate the probability distribution of V and, for any other random element U, by LVjU denote some conditional probability distribution for V given U. In particular, Le~ N jxðnÞ will denote (a version of) the conditional distribution of e~ N given ~ given xðnÞ, i.e. the so-called xðnÞ :¼ ðx1 , . . . , xn Þ and Lpj ~ xðnÞ will stand for (a version of) the conditional distribution of p ~ Such distributions exist since ðPðXÞ, lÞ is Polish. posterior distribution of p. As already said, the main goal of this paper is comparing Le~ N jxðnÞ with Lpj ~ xðnÞ . In order to quantify closeness of probability distributions we shall use to the so-called Gini–Kantorovich–Wasserstein distance. Hence, at this stage, it is worth recalling few basic general facts about this distance. Let ðM,dM Þ be a Polish metric space and denote by BðMÞ the Borel sfield of M. Recall that PðMÞ is the set of all probability measures on M and set Z P1 ðMÞ :¼ Q 2 PðMÞ : dM ðm,m0 ÞQ ðdmÞ o þ 1 8 m0 2 M : M
If Q1 and Q2 are two probability measures in P1 ðMÞ the Gini–Kantorovich–Wasserstein (GKW) distance of order 1 (relative to the metric dM ) is Z MÞ ðQ ,Q Þ :¼ inf d ðm ,m Þ G ðdm dm Þ : G 2 F ðQ ,Q Þ , Kðd M 1 2 1 2 1 2 1 2 1 M2
where FðQ1 ,Q2 Þ is the set of all probability measures on ðM M,BðM MÞÞ with marginals Q1 and Q2. Here, it is worth MÞ recalling that, whenever Q1 and Q2 belong to P1 ðMÞ, Kðd admits the following dual representation: 1 Z MÞ ðQ ,Q Þ ¼ sup f ðmÞðQ ðdmÞQ ðdmÞÞ; f : MR , jf ðm Þf ðm Þj r d ðm ,m Þ 8m ,m 2 M : ð1Þ Kðd M 1 2 1 2 1 2 1 2 1 2 1 P
MÞ is a distance on P1 ðMÞ. See, e.g., Theorem 11.8.2 in Dudley (2002). Using this representation it is easy to see that Kðd 1 A well-known result, which clarifies the descriptive power of the GKW distance, is: Let ðQn Þn Z 1 be a sequence of MÞ probability measures in P1 ðMÞ, then Kðd ðQ ,Qn Þ-0 as n- þ1 if and only if Qn ) Q and if 1 Z Z dM ðx,yÞQn ðdxÞdM ðx,yÞQ ðdxÞ ðy 2 MÞ
M
M
as n-þ 1. As usual, ) denotes weak convergence. See Corollary 7.5.3 in Rachev (1991). In the next sections we shall use the GKW distance for two different choices of ðM,dM Þ: (i) ðM,dM Þ ¼ ðRm ,de Þ where de is the Euclidean metric induced by the norm JxJ ¼ ð 2Þ Kðd by w1; 1 (ii) ðM,dM Þ ¼ ðPðXÞ, lÞ). In this case we shall denote Kð1lÞ by W1.
Pm
i¼1
x2i Þ1=2 . In this case we shall denote
4. Comparison between posterior distributions of means In order to compare Le~ N jxðnÞ with Lpj ~ xðnÞ we start by comparing posterior means. Indeed, as we have seen in Section 2, the posterior mean of a function f appears in many natural statistical estimation problems. For the sake of notational simplicity, for any measurable real-valued function f, set Z N 1X e~ N ðf Þ :¼ f ðxÞe~ N ðdxÞ ¼ f ðxi Þ Ni¼1 X and ~ Þ :¼ pðf
Z
~ f ðxÞpðdxÞ: X
First of all we prove this very simple proposition. ~ Þ ~ jÞ o þ 1g ¼ 1, then e~ N ðf Þ converges in law to pðf Proposition 4.1. Given a real-valued measurable function f, if Pfpðjf (as N- þ 1). Analogously, Le~ N ðf ÞjxðnÞ converges weakly (almost surely) to Lpðf ~ ÞjxðnÞ . ~ converges Proof. Let f be a bounded continuous function with JfJ1 ¼ : c o þ 1. Then fðe~ N Þ r c. Now, Eðfðe~ N ðf ÞÞjpÞ ~ ÞÞjpÞ. ~ To see this, note that, conditionally on p, ~ e~ N ðf Þ is a sum of independent random variables with almost surely to Eðfðpðf ~ Þ and absolute moment pðjf ~ jÞ, and, since pðjf ~ jÞ is almost surely finite, the conditional law of e~ N ðf Þ given p~ converges mean pðf ~ Þ, and hence also in law. Since jEðfðe~ N ðf ÞÞjpÞj ~ r c almost surely, to conclude the proof it is enough to almost surely to pðf apply the dominated convergence theorem. The second part of the theorem can be proved in the same way conditioning ~ xðnÞÞ. & with respect to ðp,
F. Bassetti / Journal of Statistical Planning and Inference 141 (2011) 787–799
791
In order to give a quantitative version of the previous statement we resort to the Gini–Kantorovich–Wasserstein distance w1 introduced in Section 3, that is Z JxyJgðdxdyÞ : g 2 Fðp,qÞ , w1 ðp,qÞ :¼ inf R2m
Fðp,qÞ being the class of all probability measures on ðR2m ,BðR2m ÞÞ with marginal distributions p and q. If Z1 and Z2 are two random variables with laws p1 and p2, respectively, w1(Z1,Z2) will stand for w1(p1,p2). Proposition 4.2. Given a real-valued measurable function f such that E½f ðx1 Þ2 o1, then qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 2 ~ ÞÞ r pffiffiffiffi ðEjf ðx1 Þpðf ~ Þj2 Þ1=2 r pffiffiffiffi E½f ðx1 Þ2 : w1 ðe~ N ðf Þ, pðf N N Moreover, w1 ðLe~ N ðf ÞjxðnÞ ,Lpðf ~ ÞjxðnÞ Þ r
n N
! n 1X 2 ~ ÞjxðnÞ þ pffiffiffiffiffiffiffiffiffiffi ðE½f ðxn þ 1 Þ2 jxðnÞÞ1=2 f ðxi Þ þ E½pðf ni¼1 Nn
ða:e:Þ:
Proof. Applying a well-known conditioning argument, note that !# " Z 1 X N ~ ~ ÞÞ r Eje~ N ðf Þpðf ~ Þj ¼ E½Eðje~ N ðf Þpðf ~ ÞJpÞ ~ ¼ E E ~ f ðxi Þ f pðdxÞ w1 ðe~ N ðf Þ, pðf p N i¼1 ðby the Cauchy2Schwartz inequalityÞ 3 " " 2 #1=2 Z 1 5 ðby the Jensen inequalityÞ r pffiffiffiffi E E f ðx1 Þ f p~ p~ N
1=2 Z 1 ~ 2 r pffiffiffiffi E ðf ðx1 Þ f pÞ : N R R R R ~ 2 1=2 r ð2ðE½f ðx1 Þ2 þ ð f pÞ ~ 2 Þ1=2 and, by the Jensen inequality, Eð f pÞ ~ 2 r Eð f 2 pÞ ~ ¼ E½f ðx1 Þ2 . As for the Clearly, E½ðf ðx1 Þ f pÞ second part of the proposition, first note that " # " # N n X Nn 1 n 1X ~ ÞxðnÞ þ E ~ ÞxðnÞ : E f ðxi Þpðf f ðxi Þpðf w1 ðLpðf ~ ÞjxðnÞ ,Le~ N ðf ÞjxðnÞ Þ r N Nn i ¼ n þ 1 N ni¼1 ~ xðnÞÞ and use again the Cauchy–Schwartz inequality to obtain Now, take the conditional expectation given ðp, " # N X 1 1 ~ ÞjxðnÞ r pffiffiffiffiffiffiffiffiffiffi E½Eðjf ðxn þ 1 Þpðf ~ Þj2 jp, ~ xðnÞÞjxðnÞ1=2 : E f ðxi Þpðf Nn i ¼ n þ 1 Nn Finally, to complete the proof, apply the Jensen inequality and argue as in the previous part of the proof.
&
Of course, the mean is not the unique interesting functional which appears in statistical problems. For instance, statisticians frequently deal with functionals of the form Z t1 ðpÞ ¼ f ðx1 , . . . ,xk Þpðdx1 Þ . . . pðdxk Þ, Xk
or even of the form Z t2 ðpÞ ¼ argmin fy ðx1 , . . . ,xk Þpðdx1 Þ . . . pðdxk Þ: y2Y
Xk
Think, for example, to the variance or to the median of a probability measure, respectively. It is immediate to generalize Proposition 4.1 according to Proposition 4.3. Given a measurable function f : X k -R such that Z ~ ~ jf ðx1 , . . . ,xk Þjpðdx Þ . . . pðdx Þ o þ 1 ¼ 1, P 1 k Xk
~ (as N- þ1). Analogously, LxðnÞjt1 ðe~ N Þ converges weakly (almost surely) to LxðnÞjt1 ðpÞ then t1 ðe~ N Þ converges in law to t1 ðpÞ ~ . As far as functionals of the type of t2 are concerned, the situation is less simple. As a general strategy, one could apply the usual argmax argument. See, e.g., van der Vaart and Wellner (1996). To do this, set Z MN ðyÞ :¼ fy ðx1 , . . . ,xk Þe~ N ðdx1 Þ . . . e~ N ðdxk Þ, Xk Z ~ ~ MðyÞ :¼ fy ðx1 , . . . ,xk Þpðdx 1 Þ . . . pðdx kÞ Xk
792
F. Bassetti / Journal of Statistical Planning and Inference 141 (2011) 787–799
and yN ¼ t2 ðe~ N Þ. Assume that Y is a subset of Rd and, for every T Rd define the set l1 ðTÞ of all measurable functions f : T-R satisfying Jf JT :¼ supjf ðtÞj o þ1: t2T
A version of the argmax theorem (Theorem 3.2.2 in van der Vaart and Wellner, 1996) implies that: If MN converges in law to M in l1 ðKÞ for every compact set K Rd , if almost all sample paths y/MðyÞ are lower semi-continuous and possess a unique ~ and if ðyN ÞN Z 1 is tight, then yN converges in law to y^ . minimum at a random point y^ ¼ t2 ðpÞ, As for the first hypothesis, that is MN converges in law to M in l1 ðKÞ for every compact set K Rd , one can resort to Theorems 1.5.4 and 1.5.6 in van der Vaart and Wellner (1996). Such theorems imply that if ðMN ðy1 Þ; . . .; MN ðyk ÞÞ converges in law to ðMðy1 Þ, . . . , Mðyk ÞÞ for every k and every ðy1 , . . . , yk Þ in K k and if, for every e and Z 4 0, there is a finite partition fT1 , . . . ,TN g of K such that ( ) lim supP sup sup jMN ðh1 ÞMN ðh2 Þj 4 e o Z N
i
ð2Þ
h1 ,h2 2Ti
then MN converges in law to M in l1 ðKÞ for every compact set K Rd . Hence, one can try to show that jfy1 ðx1 , . . . ,xk Þfy2 ðx1 , . . . ,xk Þj rgðJy1 y2 J2 Þfðx1 , . . . ,xk Þ for some continuous function g, with gð0Þ ¼ 0, and some function f such that for some y0 Z ~ ~ P ½fðx1 , . . . ,xk Þ þjfy0 ðx1 , . . . ,xk Þjpðdx 1 Þ . . . pðdx k Þ o þ1 ¼ 1: Xk
If these conditions hold, then the convergence of MN to M is easily proved, whereas both tightness of ðyN ÞN Z 1 and uniqueness of y^ require additional assumptions. Here is an example, where med(p) denotes the median of the distribution p. Proposition 4.4. Let MN ¼ medðe~ 2N þ 1 Þ, that is MN ¼ xðN þ 1Þ if xð1Þ r r xð2N þ 1Þ . If Z ~ ~ is uniqueg ¼ 1, P jxjpðdxÞ o þ1 ¼ 1 and PfmedðpÞ R
~ as N diverges. Analogously, if, for some n o N, then MN converges in law to medðpÞ Z ~ xðnÞ/P jxjpðdxÞ o þ 1xðnÞ ¼ 1 ða:e:Þ R
and ~ is unique jxðnÞg ¼ 1, ða:e:Þ, xðnÞ/PfmedðpÞ then LMN jxðnÞ converges weakly (almost surely) to LmedðpÞj ~ xðnÞ as N diverges. Proof. In this case Z MN ðyÞ ¼ jxyj de~ 2N þ 1 R
and
MðhÞ ¼
Z
~ jxyj dp:
R
R ~ Since Pf R jxjpðdxÞ o þ 1g ¼ 1, from Proposition 4.3 we get that ðMN ðy1 Þ, . . . , MN ðyk ÞÞ converges in law to ðMðy1 Þ, . . . , Mðyk ÞÞ for every k and every ðy1 , . . . , yk Þ. Moreover jMN ðy1 ÞMN ðy2 Þj rjy1 y2 j, hence (2) is verified. It remains to prove the tightness of ðMN ÞN Z 1 . First of all observe that if X1 , . . . ,X2N þ 1 are i.i.d random variables with common distribution function F then the distribution function of the median of X1 , . . . ,X2N þ 1 is given by x/
2N þ1 X k ¼ Nþ1
2n þ1 k
F k ðxÞð1FðxÞÞ2N þ 1k ¼
1 BðN þ1,N þ 1Þ
Z
FðxÞ
t N ð1tÞN dt,
0
where B is the Euler integral of the first kind (the so-called beta function). Hence, denoting by F~ ðxÞ the distribution function of p~ and setting Hx ðtÞ ¼ PfF~ ðxÞ r tg,
F. Bassetti / Journal of Statistical Planning and Inference 141 (2011) 787–799
it follows that PfMN r xg ¼ E BðN þ1,N þ 1Þ1
Z
F~ ðxÞ
793
! t N ð1tÞN dt
0
¼ BðN þ 1,N þ 1Þ1 ¼ BðN þ 1,N þ 1Þ1
Z Z
1
Z t
t N ð1tÞN dt dHx ðtÞ
0
0 1
t N ð1tÞN ½1Hx ðtÞ dt:
0
Now, by the Markov inequality, 1 ½1Hx ðtÞ ¼ PfF~ ðxÞ 4 tg r E½F~ ðxÞ ¼ Pfx1 rxg, t hence, Z 1 2N þ 1 : t N ð1tÞN dt ¼ Pfx1 r xg PfMN r xg rPfx1 r xgBðN þ 1,N þ1Þ1 N 0 In the same way, it is easy to see that Z 1 PfMN o xg ¼ 1PfMN r xg ¼ 1BðN þ 1,N þ1Þ1 t N ð1tÞN ½1Hx ðtÞ dt 0 Z 1 t N ð1tÞN Hx ðtÞ dt ¼ BðN þ 1,N þ 1Þ1 0 Z 1 t N ð1tÞN Hx ð1tÞ dt ¼ BðN þ 1,N þ 1Þ1 0
and hence PfMN 4 xg r
2N þ 1 Pfx1 Zxg: N
With these inequalities it is immediate to prove the tightness of ðMN ÞN Z 1 . The proof of the second part of the proposition is analogous. & 5. Comparing posterior distributions of random probabilities We now turn our attention to the direct comparison of Le~ N jxðnÞ with Lpj ~ xðnÞ . We shall use the Gini–Kantorovich– Wasserstein distance on the space of all probability measures P :¼ PðXÞ, that is the distance W1 introduced in Section 3. Recall that if Q1Q2 are two probabilities on ðP,PÞ Z W1 ðQ1 ,Q2 Þ :¼ inf lðp1 ,p2 ÞGðdp1 dp2 Þ : G 2 FðQ1 ,Q2 Þ , P2
where FðQ1 ,Q2 Þ is the set of all probability measures on ðP P,P PÞ with marginals Q1 and Q2. Note also that, since we are assuming that l is a bounded metric, in this case W1(Q1,Q2) is well-defined for every choice of (Q1,Q2). The main goal of this section is to give explicit upper bounds for the random variable W1 ðLe~ N jxðnÞ ,Lpj ~ xðnÞ Þ. 5.1. A first bound for the posterior distributions There is a large body of literature on the rate of convergence to zero (when N diverges) of EN ðpÞ :¼ E½lðp, nðpÞ N Þ, PN ðpÞ where nN :¼ i ¼ 1 dzðpÞ =N and ðzðpÞ Þi Z 1 is a sequence of independent and identically distributed (i.i.d.) random variables i i taking values in X, with common probability measure p. See, for instance, Alexander (1984), Dudley (1968) and Kalashnikov and Rachev (1990). The next lemma shows how these well-known results can be used to get a bound for W1 ðLpj ~ xðnÞ ,Le~ N jxðnÞ Þ. Lemma 5.1. Assume that l is bounded and satisfies
lðp, ep1 þ ð1eÞp2 Þ r elðp,p1 Þ þ ð1eÞlðp,p2 Þ
ð3Þ 2
for every e in (0,1) and every p,p1,p2 in P. Moreover, let K :¼ supflðp,qÞ : ðp,qÞ 2 P g. Then Z nK W1 ðLpj ENn ðpÞLpj ~ xðnÞ ,Le~ N jxðnÞ Þ r ~ xðnÞ ðdpÞ þ N P holds true for P-almost every xðnÞ. Proof. First of all, note that for every A in P Z Le~ N jxðnÞ ðAÞ ¼ Le~ N jxðnÞ,p ðAÞLpj ~ xðnÞ ðdpÞ P
ð4Þ
794
F. Bassetti / Journal of Statistical Planning and Inference 141 (2011) 787–799
~ Hence, where, according to our notation, Le~ N jxðnÞ, p~ denotes (a version of) the conditional distribution of e~ N given ðxðnÞ, pÞ. from the dual representation of W1, see (1), it is easy to check that Z W1 ðLpj W1 ðdp ,Le~ N jxðnÞ,p ÞLpj ~ xðnÞ ,Le~ N jxðnÞ Þ r ~ xðnÞ ðdpÞ: P
Now, write n Nn e~ n þ e~ N,n N N PN ~ Moreover, e~ N,n has the with e~ N,n ¼ i ¼ n þ 1 dxi =ðNnÞ, and observe that e~ n and e~ N,n are conditionally independent given p. R same law of e~ Nn and W1 ðdp ,Q Þ ¼ P lðp,qÞQ ðdqÞ. Hence, Z W1 ðdp ,Le~ N jxðnÞ,p Þ ¼ lðp,qÞLe~ N jxðnÞ,p ðdqÞ P " # ! n n X 1N 1X dzðpÞ þ d xðnÞ ¼ E l p, i N N i ¼ 1 xi " i¼1 !# n Nn 1 NX nK Nn nK ¼ ENn ðpÞ þ : r E l p, dzðpÞ þ N Nn i ¼ 1 i N N N e~ N ¼
The statement follows from integration over P with respect to Lpj ~ xðnÞ .
&
In the next three subsections we shall use the previous lemma with different choices of X and l. 5.2. The finite case We start from the simple case in which X ¼ fa1 , . . . ,ak g. Here P can be seen as the simplex ( ) k X xi ¼ 1 : S k ¼ x 2 Rk : 0 r xi r1, i ¼ 1 . . . ,k, i¼1
P Define l to be the total variation distance, i.e. lðp,qÞ ¼ 12 ki ¼ 1 jpðai Þqðai Þj. In point of fact, it should be noted that, since X is finite, there is no difference between the strong and the weak topology on P. In this case, for every j ¼ 1, . . . ,k, one has e~ N ðaj Þ ¼ ]fi : xi ¼ aj ; 1 ri r Ng=N: Now, denoting by Zi a binomial random variable of parameters ðNn,pi Þ (pi :¼ pðai Þ), we get k X 1 E½jZi ðNnÞpi j 2ðNnÞ i ¼ 1 k qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi X 1 E½jZi ðNnÞpi j2 r 2ðNnÞ i ¼ 1
E½lðp, nðpÞ Nn Þ ¼
¼
k pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi k pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi X X 1 1 k ðNnÞpi ð1pi Þ ¼ pffiffiffiffiffiffiffiffiffiffi pi ð1pi Þ r pffiffiffiffiffiffiffiffiffiffiffiffiffiffi : 2ðNnÞ i ¼ 1 2 Nn i ¼ 1 4 ðNnÞ
Observing that K :¼ supfTVðp,qÞ : p,q 2 S k g r1 and that the total variation distance satisfies (3), Lemma 5.1 gives Proposition 5.2. If X ¼ fa1 ; . . .; ak g and l is the total variation distance, then k n : W1 ðLpj ~ xðnÞ ,Le~ N jxðnÞ Þ r pffiffiffiffiffiffiffiffiffiffi þ 4 Nn N 5.3. The case X ¼ R Passing to a general Euclidean space we first need to choose a suitable metric l. We recall that if p and q belongs to PðRm Þ, the so-called bounded Lipschitz distance (denoted by b) between p and q is defined by Z
bðp,qÞ ¼ sup
Rm
f ðxÞ½pðdxÞqðdxÞ; f : Rm -R,Jf JBL r 1 ,
where Jf JBL :¼ supx2Rm jf ðxÞj þinf ðx,yÞ2Rm Rm jf ðxÞf ðyÞj=JxyJ. See Section 11.3 in Dudley (2002). Note that supðp,qÞ2P bðp,qÞ r 2 and that b satisfies bðp, ep1 þ ð1eÞp2 Þ r ebðp,p1 Þ þð1eÞbðp,p2 Þ for every e in ð0,1Þ and every p, p1, p2 in P. Recall also that b metrizes the weak topology (see, e.g., Theorem 11.3.3 in Dudley, 2002 ). In what follows we take X ¼ R and l ¼ b. As a consequence of Lemma 5.1, we get the next proposition in which, for every p in P, we set Fp ðxÞ ¼ pfð1,xg.
F. Bassetti / Journal of Statistical Planning and Inference 141 (2011) 787–799
Proposition 5.3. Let X ¼ R and l ¼ b. Set DðpÞ :¼
795
R pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ~ R Fp ðtÞð1Fp ðtÞÞ dt. If E½DðpÞ o þ1, then the inequalities
1 2n ~ xðnÞ þ W1 ðLpj ~ xðnÞ ,Le~ N jxðnÞ Þ r pffiffiffiffiffiffiffiffiffiffi E½DðpÞj N Nn 1 2n p ffiffiffiffiffiffiffiffiffiffi , r Yþ N Nn ~ xðnÞ o þ 1, for P-almost every xðnÞ. holds true for all n oN, with Y :¼ supn E½DðpÞj Proof. As already recalled, supðp,qÞ2P2 bðp,qÞ r 2 and b satisfies (3). Using the dual representation of w12 see (1) – it is easy to see that
bðp,qÞ rw1 ðp,qÞ:
ð5Þ
Moreover, recall that, when X ¼ R, Z w1 ðp,qÞ ¼ jFp ðxÞFq ðxÞj dx:
ð6Þ
R
See, for instance, Rachev (1991). For any p in P1 , for the sake of simplicity, set zðpÞ i ¼ zi and observe that combination of (5) and (6) gives # "Z " !# n n 1 NX 1 NX E l p, dzi r E Iðzi rtÞ dt Fp ðtÞ Nn i ¼ 1 Nn i ¼ 1 R # "Z N n X 1 ¼ E Iðzi r tÞ dt ðby Fubini theoremÞ ðNnÞFp ðtÞ Nn R i¼1 " # Z N n X 1 ¼ E ðNnÞFp ðtÞ Iðzi r tÞ dt: Nn R i¼1 PNn Now, note that i ¼ 1 Iðzi rtÞ are binomial random variables of parameters ((N n), Fp(t)). Hence, when Z qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð7Þ Fp ðtÞð1Fp ðtÞÞ dt o þ 1 R
holds true, from the Cauchy–Schwartz inequality one gets Z qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 E½lðp, nðpÞ Fp ðtÞð1Fp ðtÞÞ dt: Nn Þ r pffiffiffiffiffiffiffiffiffiffi Nn R
ð8Þ
Since (7) holds true P-almost surely, combination of (8) with Lemma 5.1 and the obvious identity R ~ xðnÞ gives the first part of the stated result. To conclude the proof, apply Doob’s martingale ~ xðnÞ ðdpÞ ¼ E½DðpÞj P DðpÞLpj ~ xðnÞ in order to prove that convergence theorem (see, e.g., Theorem 10.5.1 in Dudley, 2002) to E½DðpÞj ~ xðnÞ o þ 1 almost surely. & supn E½DðpÞj A first simple consequence of the previous proposition is embodied in Corollary 1. Let X ¼ ½M,M for some 0 o M o þ 1. Then, 2M 2n W1 ðLpj ~ xðnÞ ,Le~ N jxðnÞ Þ r pffiffiffiffiffiffiffiffiffiffi þ N Nn holds true for all n oN for P-almost every xðnÞ. It is worth recalling that DðpÞ o þ1 implies finite second moment for p but not conversely (this condition defines the Banach space L2,1, cf. Ledoux and Talagrand, 1991, p. 10). It is easy to show that if p has finite moment of order 2 þ d, for some positive d, then " Z 1=2 # DðpÞ r 1þ Cd jxj2 þ d pðdxÞ ð9Þ R
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi holds true with Cd :¼ 2ð1 þ dÞ=d. As a consequence of these statements we have the following: ~ Corollary 2. If E½DðpÞo þ1, then ~ E½DðpÞ 2n E½W1 ðLpj ~ xðnÞ ,Le~ N jxðnÞ Þ r pffiffiffiffiffiffiffiffiffiffi þ Nn
N
and PfW1 ðLe~ N jxðnÞ ,Lpj ~ xðnÞ Þ 4 eg r
~ 1 E½DðpÞ 2n pffiffiffiffiffiffiffiffiffiffi þ N e Nn
796
F. Bassetti / Journal of Statistical Planning and Inference 141 (2011) 787–799
hold true for all n oN. Moreover, if Ejx1 j2 þ d o þ 1 for some positive d, then " 1=2 # 2ð1 þ dÞ 2þd ~ E½DðpÞ r 1 þ Ejx1 j :
d
~ o þ1, one can write Proof. By Proposition 5.3, whenever E½DðpÞ 1
~ xðnÞ þ E½W1 ðLpj ~ xðnÞ ,Le~ N jxðnÞ Þ r pffiffiffiffiffiffiffiffiffiffi E½E½DðpÞj Nn ~ E½DðpÞ 2n : ¼ pffiffiffiffiffiffiffiffiffiffi þ N Nn
2n N
~ Now, let pðÞ ¼ EðpðÞÞ. Then, (9) together with Fubini theorem and Jensen inequality yield " Z 1=2 # 2 þ d ~ r 1 þ Cd : E½DðpÞ jxj pðdxÞ R
Combining these facts with Markov inequality completes the proof.
&
5.4. The case X ¼ Rd and X Polish Let us briefly consider the general case X Polish. It is worth recalling that distance b can be defined for a general metric space (X,d), in this case Z bðp,qÞ ¼ sup f ðxÞ½pðdxÞqðdxÞ; f : X-R,Jf JBL r1 , X
where Jf JBL :¼ supx2X jf ðxÞj þinf ðx,yÞ2XX jf ðxÞf ðyÞj=dðx,yÞ. For any p in P ¼ PðXÞ and k in N consider !1=2
Ck ðpÞ :¼
sup ek Nðe, ek=ðk2Þ ,pÞ
,
e2ð0,1
where Nðe, Z,pÞ is the minimal number of sets of diameter r2e which cover X except for a set A with pðAÞ r Z. Proposition 3.1 in Dudley (1968) (see also Theorem 7 in Kalashnikov and Rachev, 1990) gives 1=k 4 E½bðp, nðpÞ ½3 þ 4 32k Ck ðpÞ: Nn Þ rðNnÞ
Using the last inequality and arguing as in the proof of Proposition 5.3 we obtain the following: ~ o þ 1 for some positive k, then the inequality Proposition 5.4. Let l ¼ b. If E½Ck ðpÞ 1 4 2n ~ xðnÞ þ þ4 32k E½Ck ðpÞj W1 ðLpj ~ xðnÞ ,Le~ N jxðnÞ Þ r 1=k 3 N ðNnÞ 1 4 2n 2k þ4 3 Y þ , r N ðNnÞ1=k 3 ~ xðnÞ o þ1, for P-almost every xðnÞ. holds true for all n oN, with Y :¼ supn E½Ck ðpÞj ~ o þ 1 is almost impossible to check. In what follows we will assume that X ¼ Rm and we will Condition E½Ck ðpÞ R discuss a more tractable hypothesis. If Rm JxJg pðdxÞ o þ1 where g ¼ km=ðkmÞðk2Þ, m Z 2 and k 4m, Proposition 3.4 in Dudley, 1968 (see also Theorem 8 in Kalashnikov and Rachev, 1990) yields " Z 1=g # m 2 g Ck ðpÞ r 2 1 þ2 JxJ pðdxÞ : ð10Þ Rm
Using this last inequality we can prove the following: Proposition 5.5. Let X ¼ Rm and l ¼ b with mZ 2. Let k 4m and set g :¼ km=ðkmÞðk2Þ. Assume that EJx1 Jg is finite and that g Z 1 and set Z
1=g ~ jxjg pðdxÞj xðnÞ : Yn :¼ 2 E Rm
Then, for all n o N and for P-almost every xðnÞ, one gets Y :¼ supn Yn o þ1 and
1 4 2n 2k m=2 1=2 þ4 3 2 ð1 þ Y Þ þ W1 ðLpj ~ xðnÞ ,Le~ N jxðnÞ Þ r n N ðNnÞ1=k 3
F. Bassetti / Journal of Statistical Planning and Inference 141 (2011) 787–799
797
Moreover, PfW1 ðLpj ~ xðnÞ ,Le~ N jxðnÞ Þ 4 eg r
1
"
K
e ðNnÞ1=k
2n þ N
#
holds true for all n oN with K ¼ 43 þ 4 32k 2m=2 ð1 þ 2ðEjx1 jg Þ1=g Þ1=2 : Proof. Using (10) and applying the Jensen inequality two times, we obtain ( Z
1=g )1=2 m mþ1 g~ ~ E½Ck ðpÞjxðnÞr 2 þ 2 E JxJ pðdxÞjxðnÞ : Rm
Combining Lemma 4 with this last inequality, Doob’s martingale convergence theorem, Markov inequality and Jensen inequality concludes the proof. & 5.5. Examples The application of the theorems of this section essentially require conditions on the moments of xi . In the most common cases, the marginal distribution of each observation is available. Indeed, from a Bayesian point of view, the marginal ~ In the next three distribution of each observation is usually treated as a prior guess of the mean of the unknown p. examples we review a few classical Bayesian nonparametric priors from this perspective. Example 5 (Normalized random measures with independent increments). Probably the most celebrated example of nonparametric priors is the Dirichlet process, see, for example Ferguson (1973, 1974). A class of nonparametric priors which includes and generalizes the Dirichlet process is the class of the so-called normalized random measures with independent increments, introduced in Regazzini et al. (2003) and studied, e.g., in Nieto-Barajas et al. (2004), Lijoi et al. (2005a,b), James (2005) and Sangalli (2006). To define a normalized random measure with independent increments it is worth recalling that a random measure m~ with independent increments on Rm is a random measure such that, for any measurable collection fA1 , . . . ,Ak g (k Z1) of pairwise disjoint measurable subsets of Rm , the random variable m~ ðA1 Þ, . . . , m~ ðAk Þ are stochastically independent. Random measures with independent increments are completely characterized by a measure n on Rm R þ via their Laplace functional. More precisely, for every A in BðRm Þ and every positive l one has Z Eðelm~ ðAÞ Þ ¼ exp ð1elv ÞnðdxdvÞ : AR þ
A systematic account of these random measures is given, for example, in Kingman (1967). Following Regazzini et al. (2003), R if Rm R þ ð1elv ÞnðdxdvÞ o þ 1 for every positive l and nðRm R þ Þ ¼ þ 1, then one defines a normalized random measure ~ :¼ m~ ðÞ=m~ ðRm Þ. In point of fact, under the previous assumptions, Pfm~ ðRm Þ ¼ 0g ¼ 0; with independent increments putting pðÞ see Regazzini et al. (2003). The classical example is the Dirichlet process, obtained with nðdxdvÞ ¼ aðdxÞrðdvÞ ¼ aðdxÞv1 ev dv, a being a finite measure on Rm . Consider now a sequence ðxi Þi Z 1 of exchangeable random variables driven ~ When nðdxdvÞ ¼ aðdxÞrðdvÞ, then Pfxi 2 Ag ¼ aðAÞ=aðRm Þ for every i Z1. More generally, by p. Z Z fðlÞ elu unðdxduÞ dl, Pfxi 2 Ag ¼ Rþ
AR þ
where Z
fðlÞ :¼ exp
R m R þ
ð1elv ÞnðdydvÞ
see, e.g., Corollary 5.1 in Sangalli (2006). Hence, EJxJg o þ1 if and only if Z Z fðlÞ elu JxJg unðdxduÞ dl o þ 1: Rþ
R m R þ
Example 6 (Species sampling sequences and stick-breaking priors). An exchangeable sequence of random variables ðxn Þn is called a species sampling sequence (see Pitman, 1996) if, for each n Z1, Pfxn þ 1 2 AjxðnÞg ¼ l0,n aðAÞ þ
kðnÞ X j¼1
and Pfx1 2 Ag ¼ aðAÞ
lj,n dxj ðAÞ
ðA 2 X Þ
798
F. Bassetti / Journal of Statistical Planning and Inference 141 (2011) 787–799
with the proviso that x1 , . . . xkðnÞ are the kðnÞ distinct values of x1 , . . . , xn in the same order as they appear, lj,n (j ¼ 0, . . . ,kðnÞ) are non-negative measurable functions of ðx1 , . . . , xn Þ, and a is some non-atomic probability measure on ðX,X Þ. See, among others, Hansen and Pitman (2000), Gnedin and Pitman (2005) and Pitman (1995, 2003). Of course, in this case, it a simple task to check conditions on the marginal distribution of each observation, since it coincides with a. A particular kind of random probability laws connected with the species sampling sequences are the so-called stick-breaking priors. Such priors are almost surely discrete random probability measures that can be represented as ~ ¼ pðÞ
N X
pk dZk ðÞ
k¼1
P where ðpk Þk Z 1 and ðZk Þk Z 1 are independent, 0 r pk r 1 and N k ¼ 1 pk ¼ 1 almost surely, and ðZk Þk Z 1 are independent and identically distributed random variable taking values in X with common probability a. Stick-breaking priors can be constructed using either a finite or infinite numbers of terms, 1 rN r þ 1. Usually, p1 ¼ V1 ,
pk ¼ ð1V1 Þð1V2 Þ ð1Vk1 ÞVk ,
k Z 2,
where Vk are independent Betaðak ,bk Þ random variables for ak 4 0, bk 40. See Ishwaran and James (2003, 2001). It is clear that in this case Pfxi 2 Ag ¼ aðAÞ: S Example 7 (Plya tree). Let X ¼ R and let Ej be the set of all sequences of 0s and 1s of length j. Moreover, set E ¼ j Ej . For each n, let T n ¼ fBe : e 2 En g be a partition of R such that for all e in E , Be 0 , Be 1 is a partition of Be . Finally let @ ¼ fae : e 2 E g be a set of nonnegative real numbers. A random probability p~ on R is said to be a Po´lya tree with respect to the partition T ¼ fT n gn with parameter @ if ~ e 0 jBe 0 Þ : e 2 E g are a set of independent random variables; fpðB ~ e 0 jBe 0 Þ is Betaðae 0 , ae 1 Þ. for all e in E* pðB See Mauldin and Williams (1990) and Lavine (1992, 1994). Under suitable condition on @, such a random probability does exist. See Theorem 3.3.2 in Ghosh and Ramamoorthi (2003). Moreover, if ðxn Þn Z 1 is an exchangeable sequence with driving ~ for any Be with e ¼ e1 e2 . . . ek , measure p, Pfxn 2 Be g ¼
k Y
ae1 e2 ei : a þ ae1 e2 ei 1 i ¼ 1 e1 e2 ei 0
See, e.g., Theorem 3.3.3 in Ghosh and Ramamoorthi (2003). In this case it is a difficult task to give explicit conditions for the P P existence of the moments of xi . Nevertheless, Lavine suggests that, if the partitions has the form F 1 ð ei =2i , ei =2i þ1=2i Þ, F being a continuous distribution function, and
ae1 e2 ...ei 1 ¼ ae1 e2 ...ei 0 þ ae1 e2 ...ei 1 2 then Pfxn rxg ¼ FðxÞ.
Acknowledgements I am grateful to Eugenio Regazzini for providing much of the inspiration behind this paper. Moreover I want also to thank Luca Monno, who is a virtual coauthor of this paper, Laura Sangalli and an anonymous referee for helpful comments. References Alexander, K.S., 1984. Probability inequalities for empirical processes and a law of the iterated logarithm. Ann. Probab. 12, 1041–1067. Bassetti, F., Bissiri, P.G., 2007. Finitary Bayesian statistical inference through partitions tree distributions. Sankhya 69, 808–841. Bassetti, F., Bissiri, P.G., 2008. Random partition model and finitary Bayesian statistical inference. Sankhya 70, 1–21. Bassetti, F., Regazzini, E., 2008. The unsung de finetti’s first paper about exchangeability. Rend. Mat. 28. de Finetti, B., 1930. Funzione caratteristica di un fenomeno aleatorio. Memorie della Reale Accademia dei Lincei. vol. V of IV, pp. 86–133. de Finetti, B., 1937. La pre´vision: ses lois logiques ses sources subjectives. Ann. Instit. H. Poincare´ 7. Diaconis, P., Freedman, D., 1980. Finite exchangeable sequences. Ann. Probab. 8, 745–764. Dudley, R.M., 1968. The speed of mean Glivenko–Cantelli convergence. Ann. Math. Statist. 40, 40–50. Dudley, R.M., 2002. Real Analysis and Probability. Cambridge University Press, Cambridge. Ferguson, T.S., 1973. A Bayesian analysis of some nonparametric problems. Ann. Statist. 1, 209–230. Ferguson, T.S., 1974. Prior distributions on spaces of probability measures. Ann. Statist. 2, 615–629. Ghosh, J.K., Ramamoorthi, R.V., 2003. Bayesian Nonparametrics. Springer-Verlag, New York. Gnedin, A., Pitman, J., 2005. Exchangeable Gibbs partitions and Stirling triangles. Zap. Nauchn. Sem. S.-Peterburg. Otdel. Mat. Inst. Steklov. POMI 325 83–102, 244–245.
F. Bassetti / Journal of Statistical Planning and Inference 141 (2011) 787–799
799
Hansen, B., Pitman, J., 2000. Prediction rules for exchangeable sequences related to species sampling. Statist. Probab. Lett. 46, 251–256. Ishwaran, H., James, L.F., 2001. Gibbs sampling methods for stick-breaking priors. J. Amer. Statist. Assoc. 96, 161–173. Ishwaran, H., James, L.F., 2003. Some further developments for stick-breaking priors: finite and infinite clustering and classification. Sankhya¯ 65, 577–592. James, L.F., 2005. Bayesian Poisson process partition calculus with an application to Bayesian Le´vy moving averages. Ann. Statist. 33, 1771–1799. Kalashnikov, V.V., Rachev, S.T., 1990. Mathematical Methods for Construction of Queueing Models. Wadsworth & Brooks/Cole Advanced Books & Software, Pacific Grove, CA. Kingman, J.F.C., 1967. Completely random measures. Pacific J. Math. 21, 59–78. Lavine, M., 1992. Some aspects of Po´lya tree distributions for statistical modelling. Ann. Statist. 20, 1222–1235. Lavine, M., 1994. More aspects of Po´lya tree distributions for statistical modelling. Ann. Statist. 22, 1161–1176. Ledoux, M., Talagrand, M., 1991. Probability in Banach Spaces, Ergebnisse der Mathematik und ihrer Grenzgebiete. Springer-Verlag, Berlin. ¨ Lijoi, A., Mena, R.H., Prunster, I., 2005a. Bayesian nonparametric analysis for a generalized Dirichlet process prior. Statist. Inference Stochastic Process. 8, 283–309. ¨ Lijoi, A., Mena, R.H., Prunster, I., 2005b. Hierarchical mixture modeling with normalized inverse-Gaussian priors. J. Amer. Statist. Assoc. 100, 1278–1291. Mauldin, R.D., Williams, S.C., 1990. Reinforced random walks and random distributions. Proc. Amer. Math. Soc. 110, 251–258. ¨ Nieto-Barajas, L.E., Prunster, I., Walker, S.G., 2004. Normalized random measures driven by increasing additive processes. Ann. Statist. 32, 2343–2360. Pitman, J., 1995. Exchangeable and partially exchangeable random partitions. Proc. Roy. Soc. A 102. Pitman, J., 1996. Some developments of the Blackwell–MacQueen urn scheme. In: Statistics, Probability and Game Theory, IMS Lecture Notes Monographs Series, vol. 30. Institute of Statistical Mathematics, Hayward, CA, pp. 245–267. Pitman, J., 2003. Poisson-Kingman partitions. In: Statistics and science: a Festschrift for Terry Speed, IMS Lecture Notes Monographs Series, vol. 40. Institute of Statistical Mathematics, Beachwood, OH, pp. 1–34. Rachev, S.T., 1991. Probability Metrics and the Stability of Stochastic Models. John Wiley & Sons Ltd., Chichester. ¨ Regazzini, E., Lijoi, A., Prunster, I., 2003. Distributional results for means of normalized random measures with independent increments. Ann. Statist. 31, 560–585. Sangalli, L., 2006. Some developments of the normalized random measures with independent increments. Sankhya¯ 68, 461–487. van der Vaart, A.W., Wellner, J.A., 1996. Weak Convergence and Empirical Processes. Springer-Verlag, New York.