Journal of Statistical Planning and Inference 142 (2012) 2844–2861
Contents lists available at SciVerse ScienceDirect
Journal of Statistical Planning and Inference journal homepage: www.elsevier.com/locate/jspi
Finite population corrections for multivariate Bayes sampling Simon C. Shaw a,n, Michael Goldstein b a b
Department of Mathematical Sciences, University of Bath, Bath BA2 7AY, UK Department of Mathematical Sciences, Durham University, Science Laboratories, South Road, Durham DH1 3LE, UK
a r t i c l e in f o
abstract
Article history: Received 17 October 2011 Received in revised form 23 March 2012 Accepted 2 April 2012 Available online 9 April 2012
We consider the adjustment, based upon a sample of size n, of collections of vectors drawn from either an infinite or finite population. The vectors may be judged to be either normally distributed or, more generally, second-order exchangeable. We develop the work of Goldstein and Wooff (1998) to show how the familiar univariate finite population corrections (FPCs) naturally generalise to individual quantities in the multivariate population. The types of information we gain by sampling are identified with the orthogonal canonical variable directions derived from a generalised eigenvalue problem. These canonical directions share the same co-ordinate representation for all sample sizes and, for equally defined individuals, all population sizes enabling simple comparisons between both the effects of different sample sizes and of different population sizes. We conclude by considering how the FPC is modified for multivariate cluster sampling with exchangeable clusters. In univariate two-stage cluster sampling, we may decompose the variance of the population mean into the sum of the variance of cluster means and the variance of the cluster members within clusters. The first term has a FPC relating to the sampling fraction of clusters, the second term has a FPC relating to the sampling fraction of cluster size. We illustrate how this generalises in the multivariate case. We decompose the variance into two terms: the first relating to multivariate finite population sampling of clusters and the second to multivariate finite population sampling within clusters. We solve two generalised eigenvalue problems to show how to generalise the univariate to the multivariate: each of the two FPCs attaches to one, and only one, of the two eigenbases. & 2012 Elsevier B.V. All rights reserved.
Keywords: Finite population correction Two-stage cluster sampling Canonical directions Canonical resolutions Second-order exchangeability Bayes linear methods
1. Introduction A fundamental result in sampling theory is the finite population correction (FPC) formula for a random sample without replacement from a finite population. The FPC corrects the variance of a sample mean, to take account of the sampling fraction for the data. This allows us to judge how large the population must be in order to ignore such corrections, and gives a simple multiplier to reduce the variance when this correction is not ignorable. The Bayesian counterpart of the FPC is of similar form. The direct counterpart to the sample theoretic FPC is the corresponding Bayes linear mean and variance for a population mean based on a random sample without replacement from a finite population whose members are judged exchangeable, as this analysis is also based only on mean and variance judgements. The full Bayes counterpart to this FPC corresponds to the Gaussian likelihood and prior distribution for the
n
Corresponding author. Tel.: þ 44 1225 386106; fax: þ44 1225 386492. E-mail addresses:
[email protected] (S.C. Shaw),
[email protected] (M. Goldstein).
0378-3758/$ - see front matter & 2012 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.jspi.2012.04.001
S.C. Shaw, M. Goldstein / Journal of Statistical Planning and Inference 142 (2012) 2844–2861
2845
population mean of the finite collection. For either version, the FPC again corrects the variance for the sample mean, providing a simple determination of sufficient sample size for finite population Bayesian sampling. However, each FPC above relates to a univariate sample. When we take a sample of exchangeable vectors from a finite population, the corresponding form for the FPC depends on a matrix inverse which requires evaluation for each choice of sample size. It is therefore not straightforward to determine whether the sampling fraction is ignorable when planning a multivariate Bayesian sampling design for a finite population and, when the correction is not ignorable, to provide a simple guide to sample size determination. The problem is further complicated by the requirement that we may also wish to learn about linear combinations of elements of the population mean vector. Sampling of exchangeable vectors from a finite population can arise in a number of real-world contexts. For example, we might seek to assess total economic activity in a region from sampling a non-trivial proportion of economic units in that region. In the modelling of a complex system, at any given time there may be a number of models of the system, for example from different research groups. As Tebaldi et al. (2011, p. 639) write ‘it is believed that comparing and synthesising simulations of multiple models is key to quantifying a best estimate of the future changes and its uncertainty.’ In applications such as climate change (Tebaldi and Knutti, 2007) the multi-model ensembles are treated as infinitely exchangeable. Rougier et al. (submitted for publication) describe the framework when the multi-model ensembles are judged infinitely second-order exchangeable and co-exchangeable with the true values of the system. In such cases, treating the ensemble as finitely exchangeable is more natural and meaningful. Vernon et al. (2010) carry out a second-order analysis for outputs of an exchangeable collection of 512 models, each relating to galaxy formulation in the presence of dark matter, sampling 40 of them. Whilst they do account for the uncertainty involved in this subsampling, a full accounting of the finiteness would be preferable and would have increased in practical importance as the size of the sampling fraction increased. In this paper, we show that there is a simple and natural representation of the FPC for multivariate Bayesian sampling from a finite population, which is suitable for planning multivariate sampling designs, illustrating the use of the multivariate FPC for setting appropriate sample sizes. We then show that the results carry over to the case of two-stage multivariate cluster sampling where each stage can be viewed as sampling exchangeable vectors from a finite population. The paper proceeds as follows. In Section 2 we recall the FPC in classical univariate simple random sampling and compare this to the Bayesian analogue using normal modelling both in the univariate and multivariate cases. In Section 3 we show how the univariate FPC generalises to individual quantities in the multivariate population and how we can easily compare both different sample sizes and different population sizes. In Section 4 we illustrate the theory using an example concerning examination data. We extend the model to encompass two-stage cluster sampling in Section 5, concluding in Section 6 with an example illustrating the theory. 2. Univariate and multivariate sampling 2.1. Classical univariate simple random sampling We draw a sample of n individuals from a population and make a single measurement, X, on each individual. We let Xi P denote the measurement for the ith individual and X ¼ ð1=nÞ ni¼ 1 X i the sample mean. Suppose that the population is 2 infinite with mean m and variance s ¼ 1=r and that we take a simple random sample. Then EðX 9m, sÞ ¼ m,
Var1 ðX 9m, sÞ ¼ nr:
ð1Þ
Consider, instead, the scenario when the population is finite, containing only N members. We make the following definition. Definition 1. The finite population correction (FPC) for a sample of size n drawn from a population of size N is defined to be n1 1 gðN,nÞ ¼ 1 : N1
ð2Þ
P PN 2 2 Let the population mean be mN ¼ ð1=NÞ N i ¼ 1 X i and the population variance sN ¼ ð1=NÞ i ¼ 1 ðX i mN Þ ¼ 1=r N . For a random sample, without replacement, we find that EðX 9mN , sN Þ ¼ mN ,
Var1 ðX 9mN , sN Þ ¼ gðN,nÞnrN :
ð3Þ
Notice how the FPC, as given by (2), expresses the increase in sampling precision as a fraction of the population observed. 2.2. Bayesian univariate sampling theory The classical model of simple random sampling may be viewed, in a Bayesian context, as a judgement of exchangeability. If the infinite sequence of random quantities X 1 ,X 2 , . . . is judged exchangeable, then conditional on an unknown distribution function F, X 1 ,X 2 , . . . is independent (de Finetti, 1937; Hewitt and Savage, 1955). One conventional
2846
S.C. Shaw, M. Goldstein / Journal of Statistical Planning and Inference 142 (2012) 2844–2861
model is that the Xis are judged to be independent and identically distributed as Nðm,1=rÞ, where the prior for m is Nðm0 ,1=r 0 Þ. Hence, we judge our prior for F to be only non-zero on the subspace of normal distributions with precision r. The posterior for m given X 1 , . . . ,X n is Nðmn ,1=r n Þ, where
mn ¼
r 0 m0 þ nrX , r 0 þ nr
r n ¼ r 0 þ nr:
ð4Þ
Note that if r 0 -0, to represent weak prior information about m, then mn -X and r n -nr corresponding exactly with the classical simple random sampling approach, see (1). The corresponding formulation when we sample from a finite population, of size N, is to view the Xis as independent and identically distributed Nðy,1=rÞ, where the prior distribution of y is Nðm0 ,1=r0 Þ. This framework was developed by Ericson (1969) with extensions by, for example, Royall and Pfeffermann (1982), Smouse (1984) and Bolfarine (1990). O’Hagan and Forster (2004, Section 14.30) and Ghosh and Meeden (1997) 1 1 provide textbook discussions. The prior for mN is thus Nðm0 ,1=r~ 0 Þ, where r~ 1 and the posterior, given 0 ¼ r 0 þ ð1=NÞr X 1 , . . . ,X n , for mN is Nðm~ n ,1=r~ n Þ, where
m~ n ¼
r~ 0 m0 þgðN,nÞnr~ X , r~ 0 þgðN,nÞnr~
r~ n ¼ r~ 0 þ gðN,nÞnr~
ð5Þ
and r~ ¼ Var1 ðX i 9mN Þ ¼ ð11=NÞ1 r. In this Bayesian setting (5) relates to (4) in exactly the same way as (3) relates to (1) in the classical framework. 2.3. Bayesian multivariate sampling theory We now explore whether the simplicity of the univariate approach, exhibited by the posterior means and precisions given in (4) and (5), remains when we make multivariate measurements. Suppose that we wish to make the same series of measurements C ¼ fX 1 , . . . ,X v0 g on each individual in a sample. The measurements for the ith such individual are denoted by Ci ¼ fX 1i , . . . ,X v0 i g. The full population collection is formed as the union of all of the elements in all of the collections, Ci , and is denoted by Cn . Consider the case when the population is infinite and we judge the collections C1 ,C2 , . . . to be independent and identically distributed multivariate normal random quantities with expectation vector mðCÞ ¼ ½mðX 1 Þ . . . mðX v0 ÞT and known variance matrix S. The prior for mðCÞ is also judged to be multivariate normal with known expectation vector m0 and known variance matrix S0 . For simplicity of exposition, we assume that S and S0 are positive definite; otherwise, the corresponding Moore–Penrose inverses should be used to obtain equivalent results to those that follow. The precision matrices corresponding to S and S0 are denoted by R and R0. We observe the measurements for the first n individuals, CðnÞ ¼ [ni¼ 1 Ci and let CðnÞ ¼ fX 1 , . . . ,X v0 g denote the collection of sample averages. Given CðnÞ, the posterior distribution for mðCÞ is multivariate normal with expectation vector mn and precision matrix Rn, where mn ¼ fR0 þ nRg1 fR0 m0 þ nRCðnÞg,
Rn ¼ R0 þ nR:
ð6Þ
The multivariate case mimics the univariate: The univariate quantities in (4) are replaced by their matrix equivalents in (6). The same generalisation occurs when the population consists of only N individuals; the total collection being PN CðNÞ ¼ [N i ¼ 1 X vi denote the population mean for the vth measurement and mN ðCÞ ¼ fmN ðX 1 Þ, . . . , i ¼ 1 C i . Let mN ðX v Þ ¼ ð1=NÞ mN ðX v0 Þg the collection of population means. We judge that the distribution of any subset of the CðNÞ is identical to that for an equivalently sized subset of individuals drawn from the infinite population. The prior distribution of mN ðCÞ is thus 1 1 normal with expectation vector m0 and precision matrix R~ 0 , where R~ 0 ¼ R1 . The posterior distribution for 0 þ ð1=NÞR ~ n and precision matrix R~ n , where mN ðCÞ, given CðnÞ, is multivariate normal with expectation vector m ~ 1 fR~ 0 m0 þgðN,nÞnRCðnÞg, ~ ~ n ¼ fR~ 0 þgðN,nÞnRg m
R~ n ¼ R~ 0 þgðN,nÞnR~
ð7Þ
and R~ ¼ ðN=ðN1ÞÞR is the posterior precision for Ci given mN ðCÞ. We observe the mimicry between the univariate and multivariate case in the finite population setting: in (7) the matrix equivalents have replaced the corresponding univariate quantities in (5). However, the apparent simplicity of the multivariate generalisation, displayed by (6) and (7), conceals the complex way in which changes in n modify the posterior uncertainties for the individual quantities. We now extend Goldstein and Wooff (1998) to show that there is a natural generalisation from the univariate to the multivariate case which preserves the simplicity of the relationship between sample size and posterior mean and precision for individual quantities within the population collection. 3. Finite population corrections for the Bayes linear analysis 3.1. Sampling theory using Bayes linear methods The infinite models described in Sections 2.2 and 2.3 required a further assumption, that of normality, to that of exchangeability. For example, the finite model in Section 2.2 involves the use of a hyperparameter, y, to generate the joint prior distribution of X 1 , . . . ,X N . As Ericson (1969, p. 198) writes ‘the generation of a joint prior distribution by this approach
S.C. Shaw, M. Goldstein / Journal of Statistical Planning and Inference 142 (2012) 2844–2861
2847
is, barring differences in probabilistic interpretation, equivalent to viewing the finite population as a sample from an infinite superpopulation having unknown parameter y.’ It is implicit in this approach that the finitely exchangeable sequence X 1 , . . . ,X N may be embedded in an infinitely exchangeable sequence of equivalently defined random quantities. However, see for example Bernardo and Smith (1994, p. 171), a finitely exchangeable sequence cannot always be embedded into a larger finitely exchangeable sequence, much less an infinitely exchangeable sequence. An alternative approach is to judge the population members to be second-order exchangeable (Goldstein, 1986) and to update these beliefs using the Bayes linear methodology, where the adjusted expectation of a random quantity X given observation of a collection of quantities D is denoted as ED(X) and the corresponding adjusted variance is denoted as VarD(X); see Goldstein and Wooff (2007, Chapter 3) for a detailed explanation of Bayes linear belief adjustment and Goldstein (1999) for an overview. As Hartigan (1969, p. 447) points out, if X and D are jointly normally distributed, then the adjusted quantities coincide with the usual definitions of conditional expectation, that is ED ðXÞ ¼ EðX9DÞ and VarD ðXÞ ¼ VarðX9DÞ. By virtue of proceeding from only a second-order specification, this Bayes linear approach to population sampling under second-order exchangeability is closer, in spirit, to the classical simple random sampling approach described in Section 1.1. 3.2. Sampling from an infinite population We now consider the problem of Section 2.3 using a model of second-order exchangeability and adjustment using the Bayes linear approach. For second-order exchangeability, as Goldstein (1986, p. 973) writes ‘what is actually required is the consideration of two cases, with all other values following from the perceived ‘‘symmetries’’ between cases’: our considerations are not dependent upon the size of the population. In this section we shall consider the case when the full collection Cn is formed from an infinite number of individuals. Assumption 1. We judge that the collection C is second-order exchangeable over the full collection Cn . For all iaj, our second-order specifications thus take the form EðCi Þ ¼ m0 ,
VarðCi Þ ¼ D,
CovðCi ,Cj Þ ¼ C:
Using the second-order representation theorem of Goldstein (1986), we may write ð8Þ Ci ¼ mðCÞ þ Ri ðCÞ, P C and R ðCÞ ¼ C m ðCÞ. For all i, the R ðCÞ are mutually where mðCÞ in the limit, in mean square, of mN ðCÞ ¼ ð1=NÞ N i i i i¼1 i uncorrelated and also uncorrelated with mðCÞ. We consider the adjustment of our beliefs following the observation of CðnÞ ¼ [ni¼ 1 Ci . In this context, the conditional results given by (6) will match our adjusted beliefs for mðCÞ when 1 D ¼ R1 þ R1 in the population collection. 0 and C ¼ R0 though we now consider individual quantities P Let /CS denote the collection of linear combinations Z ¼ vv0¼ 1 av X v of elements of C. For each Z 2 /CS and each P individual i, we may construct Zi, the value of Z for individual i, as Z i ¼ vv0¼ 1 av X vi . We construct linear combinations Pv0 mðZÞ P ¼ v ¼ 1 av mðX v Þ of the elements of mðCÞ whilst linear combinations of the sample averages CðnÞ are denoted by Z ¼ vv0¼ 1 av X v . /Ci S, /mðCÞS, /CðnÞS respectively denote the collection of linear combinations of the elements of Ci , mðCÞ and CðnÞ. Hence, the labelling convention is such that for any Z 2 /CS, Zi, mðZÞ and Z share the same co-ordinate representation. Definition 2. The underlying canonical variable directions are defined as the columns of the matrix W ¼ ½W 1 . . . W v0 , where W s ¼ RHs and Hs is the sth column of the matrix H solving the generalised eigenvalue problem RH ¼ ðR þ R0 ÞHF,
ð9Þ 1
R1 0
T
¼ C. H is normed so that H RH ¼ I and where F ¼ diagðf1 , . . . , fv0 Þ is the matrix of eigenvalues, R ¼ DC and HT ðR þ R0 ÞHF ¼ I. The ordered eigenvalues 1 4 f1 Z Z fv0 4 0 are termed the underlying canonical variable resolutions. Let Wvs denote the vth component of the sth canonical variable direction and, for each s ¼ 1, . . . ,v0 , define Y s 2 /CS to P be Y s ¼ vv0¼ 1 W vs X v . The underlying canonical variable directions and resolutions provide the wherewithal to generalise the univariate results of Section 2.2 to individual quantities in the multivariate population. The following theorem may be obtained as a reformulation of Goldstein and Wooff (1998, Theorem 3). Theorem 1. For a sample of size n drawn from an infinite population, the collection mðYÞ ¼ fmðY 1 Þ, . . . , mðY v0 Þg form a basis for /mðCÞS. The mðY s Þ are a priori uncorrelated, and, for all samples of any size, a posteriori uncorrelated. The posterior adjusted expectation, mns ¼ ECðnÞ fmðY s Þg, and posterior adjusted precision, r ns ¼ Var 1 CðnÞ fmðY s Þg for mðY s Þ are given by
mns ¼
r 0s m0s þ nr s Y s , r 0s þ nr s
r ns ¼ r 0s þ nr s ,
ð10Þ
where m0s , r 0s are the prior expectation and precision for mðY s Þ and rs is the adjusted precision for any individual Ysi given mðCÞ. The collection of univariate quantities in (10) are expressed identically to those in (4). There is a natural generalisation from the univariate to the multivariate for individual quantities in the population collection /mðCÞS and these quantities
2848
S.C. Shaw, M. Goldstein / Journal of Statistical Planning and Inference 142 (2012) 2844–2861
remain the same for each choice of n. The qualitative and quantitative features of the update remain the same no matter the sample size and may be obtained from the solution of a single generalised eigenvalue problem. Note that the actual values of r 0s and rs are easily obtained. Under the scalings used in Definition 2 we have that, for each s ¼ 1, . . . ,v0 , 1
r 0s ¼ fs ð1fs Þ and r s ¼ 1. However, the fs have a more fundamental interpretive role. Let Resns denote the resolution of mðY s Þ given CðnÞ, that is Resns is the proportion of variance of mðY s Þ resolved by the observation of CðnÞ. We have Resns ¼ 1
r 0s nfs ¼ r ns ðn1Þfs þ 1
ð11Þ
so that fs ¼ Res1s is the resolution of mðY s Þ given Cð1Þ: the proportion of variance of mðY s Þ resolved by a single observation. P As the collection mðYÞ form a basis for /mðCÞS then for any Z 2 /CS we have mðZÞ ¼ vs 0¼ 1 CovfmðZÞ, mðY s Þgr 0s mðY s Þ from which v0 X
ECðnÞ fmðZÞg ¼
CovfmðZÞ, mðY s Þgr 0s mns ,
s¼1
VarCðnÞ fmðZÞg ¼
v0 X
CovfmðZÞ, mðY s Þg2 r 20s r 1 ns :
s¼1
Hence, from the ordering of the fs and (11), we see that for a sample of size n, subject to being uncorrelated with mðY 1 Þ, . . . , mðY j Þ, quantities proportional to mðY j þ 1 Þ have the largest resolution or equivalently the smallest ratio of posterior to prior variance. As Goldstein and Wooff (1998) explain, for each choice of n, mðYÞ forms an orthogonal grid over /mðCÞS for which we expect to learn most, in terms of variance reduction, about those quantities with large correlations with the early mðY s Þ. It is straightforward to utilise (11), see for example Goldstein and Wooff (1998, Corollary 1), to simplify any design problem for which it is necessary to choose the sample size required to achieve a specified variance reduction over elements of /mðCÞS. 3.3. Sampling from a finite population We now show that a similar generalisation occurs when we explore the finite case. We operate under the model given by Assumption 1, the only difference being that we now assume that the full collection Cn contains N individuals rather than an infinite number. In this finite setting, from Goldstein (1986), we may write Ci ¼ mN ðCÞ þRN,i ðCÞ,
ð12Þ
P where mN ðCÞ ¼ ð1=NÞ N i ¼ 1 C i and RN,i ðCÞ ¼ C i mN ðCÞ. For all i, the RN,i ðCÞ are uncorrelated with mN ðCÞ but the finite nature of the population induces a correlation between RN,i ðCÞ and RN,j ðCÞ for each iaj, the correlation being to the order of 1=N. Note that there is no need for an implicit assumption of a superpopulation from which the finite population is a sample. The justification for the prior specification and use of mN ðCÞ comes directly from a judgement of second-order exchangeability and the resulting representation theorem, (12). The natural relationship between mðCÞ and mN ðCÞ also follows from this: the former is the limit, in mean square, of the latter as N-1. P Let /mN ðCÞS denote the collection of linear combinations mN ðZÞ ¼ vv0¼ 1 av mN ðX v Þ of the elements of mN ðCÞ and consider the adjustment of quantities contained in /mN ðCÞS given CðnÞ. We have the following theorem; the proof is in the appendix. Theorem 2. For a sample of size n drawn from a population of size N, the collection mN ðYÞ ¼ fmN ðY 1 Þ, . . . , mN ðY v0 Þg forms a basis for /mN ðCÞS. The mN ðY s Þ are a priori uncorrelated, and, for all samples of any size, a posteriori uncorrelated. The posterior adjusted expectation, m~ ns ¼ ECðnÞ fmN ðY s Þg, and posterior adjusted precision, r~ ns ¼ Var 1 CðnÞ fmN ðY s Þg, for mN ðY s Þ are given by
m~ ns ¼
r~ 0s m0s þ gðN,nÞnr~ s Y s , r~ 0s þgðN,nÞnr~ s
r~ ns ¼ r~ 0s þgðN,nÞnr~ s ,
ð13Þ
where m0s , r~ 0s are the prior expectation and precision for mN ðY s Þ, r~ s is the adjusted precision for Ysi given mN ðCÞ, and gðN,nÞ the FPC. The univariate quantities in (13) are expressed identically to those in (5) showing that there is also a natural generalisation for quantities in the population collection /mN ðCÞS in this finite setting as well as in the infinite setting. Once again, these quantities remain the same for each choice of n and can be derived from the solution of the generalised eigenvalue problem given in Definition 2. The quantitative information may also be directly obtained from this solution. Letting ResðN;nÞs denote the resolution of mN ðY s Þ given CðnÞ then, from the proof to Theorem 2, we have that ResðN;nÞs ¼ 1
nfðN1Þfs þ 1g r~ 0s ¼ Nfðn1Þfs þ 1g r~ ns
ð14Þ
S.C. Shaw, M. Goldstein / Journal of Statistical Planning and Inference 142 (2012) 2844–2861
2849
with fs ¼ ðN1Þ1 ðNResðN;1Þs 1Þ. As mN ðYÞ forms a basis for /mN ðCÞS then for any Z 2 /CS we have
mN ðZÞ ¼
v0 X s¼1
CovfmN ðZÞ, mN ðY s Þgr~ 0s mN ðY s Þ ¼
v0 X
CovfmðZÞ, mðY s Þgr 0s mN ðY s Þ
s¼1
so that it is straightforward to obtain the adjusted mean and variance of mN ðZÞ. The collection mN ðYÞ form an orthogonal grid over /mN ðCÞS for which we expect to learn most, in terms of variance reduction, about those quantities with large correlations with the early mN ðY s Þ. The impact of the sample size upon the quantitative features of the update are thus easy to assess. For example, if we are interested in choosing a sample size to achieve a specified variance reduction over elements in /mN ðCÞS, then we have the following corollary to aid us in this process. Corollary 1. For any mN ðY s Þ 2 mN ðYÞ, the sample size n required to achieve a proportional variance reduction of 0 o k o 1 for mN ðY s Þ, that is so that VarCðnÞ fmN ðY s Þg rð1kÞVarfmN ðY s Þg, is n Z Nkr~ 0s =fðN1Þð1kÞr~ s þ kr~ 0s g. If fs^ is the minimal eigenvalue, then to achieve a proportionate variance reduction of k for every element of /mN ðCÞS we require a sample size, rounded up, of Nkr~ 0s^ =fðN1Þð1kÞr~ s^ þ kr~ 0s^ g. Note that, in addition to not depending upon n, the orthogonal grid formed by mN ðYÞ shares the same co-ordinate representation for each N and also to the orthogonal grid mðYÞ forms over /mðCÞS. We need only to solve the generalised eigenvalue problem given in Definition 2 to obtain the solution for any choice of n and N. Irrespective of sample size or population size the qualitative features of the update are the same whilst it is straightforward to assess the impact of the population size upon the quantitative features of the update. We illustrate this by forming a direct comparison between the results of Theorems 1 and 2. By first obtaining r 0s =r ns from (11) and r~ 0s =r~ ns from (14), we have r~ 0s n r 0s ¼ 1 : ð15Þ ~r ns N r ns Using (15), we may express the expectations in (13) as n n m~ ns ¼ 1 mns þ Y s : N N
ð16Þ
Eq. (15) shows that, for each s ¼ 1, . . . ,v0 , the ratio of adjusted to prior variance of mN ðY s Þ is equal to the ratio of adjusted to prior variance of mðY s Þ multiplied by the finite population correction term 1n=N. For the corresponding expectations, (16) reveals that the expectation of each mN ðY s Þ given CðnÞ is a weighted average of the expectation of mðY s Þ given CðnÞ and the observed mean Y s , the weights being dependent upon the ratio of the total population observed in the sample. 3.4. Extendible second-order exchangeable populations As we discussed in Section 3.1, a finite sequence could form part of a larger (possibly infinite) sequence of second-order exchangeable collections or it may not be embedded in any longer sequence. The results of Theorem 2, derived entirely from beliefs over observable random quantities, hold irrespective of whether the finite sequence may be embedded in a larger one. It is, however, natural to consider the circumstances when a second-order exchangeable sequence may be embedded, or extended, into a longer sequence of similarly defined random quantities. Definition 3. Suppose that the collection of measurements C is second-order exchangeable over the coherently specified Nþq n n n CnN ¼ [N i ¼ 1 C i . The population C N is q-extendible if C is also second-order exchangeable over C N þ q ¼ [i ¼ 1 C i , where C N þ q is coherently specified. For the original sequence and the extended sequence, our judgements for each individual and between each pair of individuals are the same. Infinite exchangeability may be viewed as q-extendibility for all q 4 0. If CovðY i ,Y j Þ Z0 for all Y 2 /CS, then we have q-extendibility for all q 4 0. Otherwise, we have q-extendibility for all q r1Nr1 min , where rmin ¼ fminY2/CS CorrðY i ,Y j Þ : CovðY i ,Y j Þ o 0g. Recall that, for C to be second-order exchangeable over Cn , we are required only to specify the relationship for Cn2 with all other cases followed by symmetry. Thus, we can regard any second-order exchangeable sequence as having been extended from a second-order sequence of length two and the effect of a sample of size n deduced from a sample of size one. We now illustrate formally this extension; in order to do so we shall need to make the population size explicit. For the quantities m~ ns , r~ ns and r~ 0s defined in Theorem 2, we achieve this by respectively denoting them as m~ ðNÞns , r~ ðNÞns and r~ ðNÞ0s . From (13), we can deduce that r~ ðNÞns ¼ gðN,nÞnr~ ðNÞ1s ðNðn1Þ=ðNnÞÞr~ ðNÞ0s and r~ ðNÞ1s =r~ ðNÞ0s ¼ ðN=2ðN1ÞÞðr~ ð2Þ1s =r~ ð2Þ0s Þ. Using these identities, we can obtain the following corollary to Theorem 2 which shows how to extend the results from a population of size two and a sample of size one to any finite population and sample size. Corollary 2. The orthogonal collections m2 ðYÞ and mN ðYÞ share the same co-ordinate representation with r~ ðNÞns n 1 n r~ ð2Þ1s ¼ 1 ðn1Þ , N 2 r~ ð2Þ0s r~ ðNÞ0s
ð17Þ
2850
S.C. Shaw, M. Goldstein / Journal of Statistical Planning and Inference 142 (2012) 2844–2861
o 1 r~ ð2Þ0s 1 n n n2ðn1Þ 2m~ ð2Þ1s Y s1 Y s þ fY s1 þ ðN1ÞY s g, N N r~ ð2Þ1s Pn where Y s ¼ ð1=ðn1ÞÞ i ¼ 2 Y si for n 4 1 and zero otherwise.
m~ ðNÞns ¼ 1
ð18Þ
The qualitative information provided by the adjustment of /mN ðCÞS by CðnÞ remains the same for all possible sequence lengths N and all possible sample sizes n and the quantitative information is easy to compare across these via Eqs. (17) and (18). Eqs. (15) and (16) show how the use of a FPC can be used to compare the finite and infinite cases. Analogous results may be obtained if we wish to compare two finite population sizes, N1 and N2 say, which illustrate the fundamental roles played by the two sampling fractions, n=N1 and n=N 2 . From (17), we have n r~ ðN1 Þns n r~ ðN2 Þns ¼ 1 1 N 1 r~ ðN1 Þ0s N2 r~ ðN2 Þ0s whilst (18) gives n n 1 n n n 1 m~ ðN1 Þns ¼ 1 m~ ðN2 Þns þ Y s: 1 1 N1 N2 N1 N2 N2 This section has illustrated how the familiar univariate FPC naturally generalises to individual quantities in the multivariate population and these quantities, for any population size and sample size, are derived from the underlying canonical variable directions obtained in Definition 2. The qualitative information may be derived from the corresponding underlying canonical variable resolutions. Not only this is theoretically important, but also there is a huge computational advantage: for any choice of n and N we are only required to solve the single v0 v0 generalised eigenvalue problem given in Definition 2. 4. Example: summarising exchangeable observations 4.1. General framework To aide the illustration of the theory, we shall make a further assumption which enables the explicit derivation of the underlying canonical variable directions and resolutions. Assumption 2. For each individual, we judge that the v0 measurements Ci ¼ fX 1i , . . . ,X v0 i g under the model given by Assumption 1 are second-order exchangeable and co-exchangeable across individuals. In this case, the model for all iaj reduces to EðCi Þ ¼ m0 1v0 ,
VarðCi Þ ¼ ðd1 d2 ÞIv0 þ d2 Jv0 ,
CovðCi ,Cj Þ ¼ ðc1 c2 ÞIv0 þ c2 J v0 ,
where m0 , d1, d2, c1, and c2 are the constants, 1v0 is the v0 1 vector of 1s, Iv0 is the v0 v0 identity matrix and J v0 ¼ 1v0 1Tv0 . Under this model we have R1 ¼ fðd1 d2 Þðc1 c2 ÞgIv0 þ ðd2 c2 ÞJv0 and R1 0 ¼ ðc 1 c2 ÞI v0 þ c2 J v0 . Hence, the matrices R and R0 have a particularly simple form (aIv0 þbJ v0 for constants a and b) so that, see Definition 2, the eigenstructure of the problem RH ¼ ðR þR0 ÞHF is analytically straightforward to obtain. We explore the use of the FPC when the population is of size N. The eigenvalues of (9) are given by
f1 ¼
c1 þ ðv0 1Þc2 , d1 þ ðv0 1Þd2
f2 ¼ ¼ fv0 ¼
c1 c2 : d1 d2
ð19Þ
Notice that f1 Z f2 3c2 d1 Z d2 c1 so that the ordering of the eigenvalues will depend upon our specific prior choice of d1, d2, c1 and c2. The eigenvector corresponding to f1 is proportional to 1v0 and the eigenvectors corresponding to f2 are any v0 1 orthogonal vectors which are orthogonal to 1v0 and whose coefficients sum to zero. Thus, the first eigenvector is an average and the remainder is v0 1 linear contrasts. For simplicity of exposition, we will take F ¼ diagðf1 , f2 , . . . , f2 Þ to be ~, our matrix of eigenvalues with the corresponding matrix of eigenvectors, normed as in Definition 2, given by H ¼ Hv0 F where qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ~ ¼ diagð fd1 þðv0 1Þd2 gð1f Þ, ðd1 d2 Þð1f Þ, . . . , ðd1 d2 Þð1f ÞÞ F 1 2 2 pffiffiffiffiffi of the Helmert matrix of order v0. The first column of Hv0 is ð1= v0 Þ1v0 and, for v 41, the and Hv0 is the v0 0 transpose pv ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi T T vth column is ð1= vðv1ÞÞð1v1 v1 0 . . . 0Þ . The underlying canonical variable directions are the columns of the matrix ~ 1 . We then form the collection m ðYÞ ¼ fm ðY 1 Þ, . . . , m ðY v Þg, where W ¼ RH ¼ Hv0 F N N N 0
mN ðY 1 Þ ¼ a1
v0 X
mN ðX v Þ,
ð20Þ
v¼1
(
mN ðY s Þ ¼ as ðs1ÞmN ðX s Þ
s1 X v¼1
)
mN ðX v Þ , s ¼ 2, . . . ,v0
ð21Þ
S.C. Shaw, M. Goldstein / Journal of Statistical Planning and Inference 142 (2012) 2844–2861
2851
pffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffi ~ 1 ~ 22 Þ1 . From Theorem 2, the familiar univariate FPC attaches itself to each of the and a1 ¼ ð v0 F , as ¼ ð sðs1ÞF 11 Þ quantities given by (20) and (21). We shall explicitly illustrate the construction for the precisions given by (13); the illustration for the means given by (13) is obtained in a similar fashion. For each s ¼ 1, . . . ,v0 , we find the prior precision, P r~ 0s , of mN ðY s Þ and the posterior adjusted precision r~ s of Ysi given mN ðCÞ, where Y 1i ¼ a1 vv0¼ 1 X vi and Ps1 Y si ¼ as fðs1ÞX vi v ¼ 1 X vi g for s ¼ 2, . . . ,v0 . We have r~ 01 ¼
Nfðd1 c1 Þ þ ðv0 1Þðd2 c2 Þg , fd1 þ ðv0 1Þd2 g þðN1Þfc1 þðv0 1Þc2 g
ð22Þ
r~ 0s ¼
Nfðd1 d2 Þðc1 c2 Þg , ðd1 d2 Þ þðN1Þðc1 c2 Þ
ð23Þ
s ¼ 2, . . . ,v0
and r~ s ¼ N=ðN1Þ for s ¼ 1, . . . ,v0 . For each s, the posterior precision, r~ ns of mN ðY s Þ is the sum of the prior precision plus the precision of the sample mean, with the latter corrected by the FPC gðN,nÞ, that is r~ ns ¼ r~ 0s þgðN,nÞnr~ s . Explicitly computing each r~ ns we have r~ n1 ¼
N2 fd1 þðv0 1Þd2 g þ ðn1Þfc1 þ ðv0 1Þc2 g , Nn fd1 þ ðv0 1Þd2 g þ ðN1Þfc1 þðv0 1Þc2 g
ð24Þ
r~ ns ¼
N2 ðd1 d2 Þ þ ðn1Þðc1 c2 Þ , Nn ðd1 d2 Þ þ ðN1Þðc1 c2 Þ
ð25Þ
s ¼ 2, . . . ,v0 :
Observe that by dividing (24) by (22) and (25) by (23) and then multiplying each by ð1n=NÞ we have that, for each s ¼ 1, . . . ,v0 , ð1n=NÞr~ ns =r~ 0s does not depend upon N as we expected from (15). Notice that, from Theorem 2, the mN ðY s Þ are a priori and a posteriori uncorrelated so that we can use them to learn about any quantity in /mN ðCÞS. As an illustration, from (20) and (21) we may obtain that
mN ðX 1 Þ ¼
mN ðX s Þ ¼
a1 v0
a1 v0
mN ðY 1 Þ
mN ðY 1 Þ þ
v0 X
av
vðv1Þ v¼2
as s
mN ðY s Þ
mN ðY v Þ, v0 X
av
vðv1Þ v ¼ sþ1
mN ðY v Þ, s ¼ 2, . . . ,v0
so that VarCðnÞ fmN ðX 1 Þg ¼
VarCðnÞ fmN ðX s Þg ¼
a21 v20
a21 v20
r~ 1 n1 þ
r~ 1 n1 þ
v0 X v¼2v
a2v 2 ðv1Þ2
a2s ~ 1 s2
r ns þ
r~ 1 nv ,
v0 X v ¼ sþ1
2 v 2 v ðv1Þ2
a
r~ 1 nv
for s ¼ 2, . . . ,v0 . We now illustrate, via an example concerning examination data, how we can use these results for setting appropriate sample sizes. 4.2. Examination data In a similar vein to Goldstein (1988, Section 5), we consider an examination sat by 854 candidates. Each candidate was required to answer six compulsory questions, each marked out of 10. Any question not attempted received a mark of zero. The chief examiner has a number of questions of interest such as whether the questions were of roughly similar difficulty and if the standard was similar to previous years and will take a sample of the 854 scripts to help answer these. Let Xvi P854 1 denote the mark of candidate i on question v. m854 ðX v Þ ¼ 854 i ¼ 1 X vi is thus the average mark on question v. The chief examiner judges that the candidates are second-order exchangeable and also that the questions are second-order exchangeable so that he follows the model given in Assumption 2 and thus the framework of Section 4.1. He judges that the expected score on each question is 6.5, that is m0 ¼ 6:5. He wishes to explore a range of values for his prior variance statements and their impact upon his choice of sample size. He chooses c1 ¼ 1 and considers 0 o c2 o 0:95 so that, if the population were infinite, c1 would reflect the variance of the underlying average mark for each question and c2 the covariance between the underlying average marks for different questions. He wishes to explore 2o d1 o4, so that 1 od1 c1 o3: larger values of d1 representing greater candidate variation. Finally, he sets d2 ¼ 2c2 so that d2 c2 ¼ c2 . The chief examiner solves the generalised eigenvalue problem given by Definition 2. The canonical variable resolutions have the form given by (19). Note that for d1 4 2 we have f1 4 f2 with equality when d1 ¼ 2. The collection m854 ðYÞ may be expressed by (20) and (21) (with v0 ¼ 6). Thus, m854 ðY 1 Þ is proportional to the overall mark on the examination and this enables the chief examiner to learn about the general standard of the exam, a high/low value suggesting an easy/hard examination. The collection m854 ðY 2 Þ, . . . , m854 ðY 6 Þ may be viewed as summarising all of the differences between the difficulty of the questions.
2852
S.C. Shaw, M. Goldstein / Journal of Statistical Planning and Inference 142 (2012) 2844–2861
The chief examiner is interested in selecting the sample size n to achieve a proportional variance reduction of k for the overall mark in the examination. This is equivalent to choosing the n to achieve this task for m854 ðY 1 Þ and the n may be found using Corollary 1. Goldstein and Wooff (1997) provide an overview of how sample size selection can be performed for experiments analysed using a Bayes linear approach assuming an infinite population. In Fig. 1(a)–(c), the values of n are plotted for three values of k. We observe that as d1 increases so does n which is to be expected due to the increasing candidate variability. As c2 decreases, n also increases which is again expected as this controls both the covariance between questions and candidates. The most extreme case of d1 ¼ 4 and c2 ¼ 0 would require a sample size of 27, which is about 3% of the population, to achieve a variance reduction of 0.9. Observe that we need to approximately double the sample size to increase the variance reduction from 0.90 to 0.95. A sample size of 220 is required for a reduction of 0.99 in the most extreme case which is approximately 26% of the population. If instead, the chief examiner is interested in selecting the sample size n to achieve a proportional variance reduction of k for learning about the differences between the questions then this is equivalent to choosing the n to achieve this task for m854 ðY s Þ, s ¼ 2, . . . ,6. Once again, Corollary 1 may be used to find the n and Fig. 1(d)–(f) shows the values of n for three values of k. Notice that, for any value of k, if d1 ¼ 2 the sample size required is the same as that for m854 ðY 1 Þ and that this is where f1 ¼ f2 . For all other plotted values of d1 and c2, the sample size required is larger than that for m854 ðY 1 Þ which is to be expected as here f2 o f1 . In some cases, it may be appreciably higher for example when d1 is large and c2 is large which corresponds to large candidate variation and high correlation between questions making it difficult to learn about the differences. Finally, it is important to note here that as there are only two canonical variable resolutions, the plots give upper and lower bounds on the sample size n required to achieve a proportional variance reduction of k for any linear combination of the m854 ðX v Þ of interest to the chief examiner.
25
50
20
40
200 175 150
15
30 0.0
10 4.0
0.25 3.5 3.0 d1
0.5
125 0.0
20 4.0 3.5 3.0 d1
c2
0.75
2.5
0.5
0.0
100 4.0
0.25
0.25 3.5 3.0 d1
c2
0.75
2.5
0.5
2.0
2.0
2.0
400
250 200
700 600
300
150
500 400
200
100
c2
0.75
2.5
300
50
0.0
4.0
0.25 3.5 3.0 d1
0.5 0.75
2.5 2.0
100
0.0
4.0
0.25 3.5
c2
3.0 d1
0.5 0.75
2.5 2.0
200
0.0
100 4.0
0.25
3.5 c2
3.0 d1
0.5
c2
0.75
2.5 2.0
Fig. 1. Sample size n required to achieve a proportional variance reduction of k for m854 ðY 1 Þ and each m854 ðY s Þ, s ¼ 2, . . . ,6 for the examination problem with c1 ¼ 1, 0 o c2 o 0:95, 2 o d1 o 4 and d2 ¼ 2c2 : (a) m854 ðY 1 Þ; k ¼ 0:90; (b) m854 ðY 1 Þ; k ¼ 0:95; (c) m854 ðY 1 Þ; k ¼ 0:99; (d) m854 ðY s Þ, s ¼ 2, . . . ,6; k ¼ 0:90; (e) m854 ðY s Þ, s ¼ 2, . . . ,6; k ¼ 0:95; and (f) m854 ðY s Þ, s ¼ 2, . . . ,6; k ¼ 0:99.
S.C. Shaw, M. Goldstein / Journal of Statistical Planning and Inference 142 (2012) 2844–2861
2853
5. Cluster sampling We now show how the results of Section 3 can be utilised when we extend the model of second-order exchangeability, as given in Assumption 1, to encompass two-stage cluster sampling where each individual additionally belongs to a cluster. We intend to make the same series of measurements C ¼ fX 1 , . . . ,X v0 g on each individual and let Cgi ¼ fX g1i , . . . , X gv0 i g denote the measurements for the ith individual in the gth cluster and we suppose that there are a total of M clusters each of which contain N individuals. Assumption 3. We judge that individuals in each cluster are second-order exchangeable and that they are co-exchangeable (Goldstein, 1986) across clusters. For all gah, iaj, k our second-order specifications thus take the form EðCgi Þ ¼ mg ,
VarðCgi Þ ¼ Dg ,
CovðCgi ,Cgj Þ ¼ C gg ,
CovðCgi ,Chk Þ ¼ C gh :
Using the second-order representation theorem of Goldstein (1986), we may write Cgi ¼ mN ðCg Þ þRN,i ðCg Þ,
ð26Þ
PN
where mN ðCg Þ ¼ ð1=NÞ i ¼ 1 Cgi is the gth cluster population mean vector and RN,i ðCg Þ ¼ Cgi mN ðCg Þ the residual vector for the ith individual in the gth cluster. For all g, h, i, RN,i ðCg Þ is uncorrelated with mN ðCh Þ. We collect the cluster population mean vectors together as mN ðCÞ ¼ fmN ðC1 Þ, . . . , mN ðCM Þg. Assumption 4. We judge that the cluster population means are second-order exchangeable. For all gah, our second-order specifications thus take the form EðmN ðCg ÞÞ ¼ mg ¼ m, VarðmN ðCg ÞÞ ¼ C gg þ
1 ðDg C gg Þ ¼ A, N
CovðmN ðCg Þ, mN ðCh ÞÞ ¼ C gh ¼ B, where the vector m and matrices A and B do not depend upon g. From the second-order representation theorem of Goldstein (1986), we may write
mN ðCg Þ ¼ mðM;NÞ ðCÞ þ RðM;NÞ,g ðCÞ,
ð27Þ
PM
where mðM;NÞ ðCÞ ¼ ð1=MÞ g ¼ 1 mN ðCg Þ is the overall population mean and RðM;NÞ,g ðCÞ ¼ mN ðCg ÞmðM;NÞ ðCÞ. For each g, RðM;NÞ,g ðCÞ is uncorrelated with mðM;NÞ ðCÞ. Combining the two representations given by (26) and (27), we may write Cgi ¼ mðM;NÞ ðCÞ þ RðM;NÞ,g ðCÞ þ RN,i ðCg Þ,
ð28Þ
where, for each i, g, the three components in the decomposition of Cgi are mutually uncorrelated. However, due to the finite nature of the populations, the RðM;NÞ,g ðCÞ are not uncorrelated across clusters but rather, as Goldstein (1986, p. 974–975) terms, uncorrelated to order 1=M, whilst the RN,i ðCg Þ are uncorrelated across clusters and uncorrelated to order 1=N within clusters. In a univariate setting, models derived from decompositions analogous to (28), though with different stochastic structures, have been studied by Stanek and Singer (2004) and San Martino et al. (2008). One such example is that of Scott and Smith (1969), see also Little and Zheng (2007), though this explicitly has observations in differing clusters being uncorrelated whereas our use of co-exchangeability allows a correlation between observations in differing clusters. Suppose that we sample m of the M clusters and in each of the m sampled clusters, sample n individuals. For notational simplicity, we use the labelling convention that we sample the first m clusters and, in each sampled cluster, the first n individuals. Let Cg ðnÞ ¼ fCg1 , . . . ,Cgn g denote the measurements of the individuals sampled in the gth cluster and Cðm; nÞ ¼ fC1 ðnÞ, . . . ,Cm ðnÞg the total collection of observations. The observed sample mean in the gth cluster is P C g ¼ ð1=nÞ ni¼ 1 Cgi so that the total collection of sample means is Cðm; nÞ ¼ fC 1 , . . . ,C m g. Theorems 1 and 2 show that only the observed sample mean was sufficient to compute the posterior quantities. In this extended model, we can restrict attention to various sample means by exploiting the concept of Bayes linear sufficiency (Goldstein and O’Hagan, 1996). We have the following lemma; the proof is in the appendix. Lemma 1. 1. The collection of sample means Cðm; nÞ is Bayes linear sufficient for the observations Cðm; nÞ for adjusting the overall mean mðM;NÞ ðCÞ, so that ECðm;nÞ ðmðM;NÞ ðCÞÞ ¼ EC ðm;nÞ ðmðM;NÞ ðCÞÞ, VarCðm;nÞ ðmðM;NÞ ðCÞÞ ¼ VarC ðm;nÞ ðmðM;NÞ ðCÞÞ:
2854
S.C. Shaw, M. Goldstein / Journal of Statistical Planning and Inference 142 (2012) 2844–2861
2. The population mean across the sampled clusters, mðm;NÞ ðCÞ ¼ ð1=mÞ adjusting mðM;NÞ ðCÞ, so that
Pm
g¼1
mN ðCg Þ, is Bayes linear sufficient for Cðm; nÞ for
EC ðm;nÞ ðmðM;NÞ ðCÞÞ ¼ EC ðm;nÞ fEmðm;NÞ ðCÞ ðmðM;NÞ ðCÞÞg, VarC ðm;nÞ ðmðM;NÞ ðCÞÞ ¼ Varmðm;NÞ ðCÞ ðmðM;NÞ ðCÞÞ þ VarC ðm;nÞ fEmðm;NÞ ðCÞ ðmðM;NÞ ðCÞÞg: Lemma 1 shows that the adjustment can be performed in two stages. In the first, the population mean mðM;NÞ ðCÞ is adjusted following the observation of mðm;NÞ ðCÞ, the mean across the sampled clusters, and so the adjustment is related to the proportion of clusters sampled. In the second stage, mðm;NÞ ðCÞ is adjusted by the sample means Cðm; nÞ and thus relates to the proportion of individuals sampled in the sampled clusters. Notice that this separation mirrors that in the (univariate) classical setting, see for example Cochran (1977, Chapter 10). We now consider separately each of these stages, showing that each can be viewed as a finite population sampling problem, before combining them to find the full adjustment. 5.1. Sampling m from M: adjustment of /mðM;NÞ ðCÞS given mðm;NÞ ðCÞ We now consider the adjustment of quantities in /mðM;NÞ ðCÞS given mðm;NÞ ðCÞ under the model formed by Assumptions 3 and 4. From the model, mN ðCðMÞÞ ¼ fmN ðC1 Þ, . . ., mN ðCM Þg is a second-order exchangeable population of size M and, from the representation given by (27), mðM;NÞ ðCÞ is the underlying population mean. If we observe m of the M population members, P mN ðCðmÞÞ ¼ fmN ðC1 Þ, . . . , mN ðCm Þg, then the sample mean mðm;NÞ ðCÞ ¼ ð1=mÞ m g ¼ 1 mN ðC g Þ is Bayes linear sufficient for mN ðCðmÞÞ for adjusting /mðM;NÞ ðCÞS. Definition 4. The underlying canonical cluster directions are defined as the columns of the matrix U ¼ ½U 1 . . . U v0 , where U s ¼ SF s and Fs is the sth column of the matrix F solving the generalised eigenvalue problem SF ¼ ðS þ S0 ÞF C, where C ¼ diagðc1 , . . . , cv0 Þ is the matrix of eigenvalues and, for gah, S1 ¼ VarðmN ðCg ÞÞCovðmN ðCg Þ, mN ðCh ÞÞ and S1 0 ¼ CovðmN ðCg Þ, mN ðCh ÞÞ. F is normed so that F T SF ¼ I and F T ðS þ S0 ÞF C ¼ I. The ordered eigenvalues 1 4 c1 Z Z cv0 40 are termed the underlying canonical cluster resolutions. Let Uvs denote the vth component of the sth canonical cluster direction and, for each s ¼ 1, . . . ,v0 , define W s 2 /CS to be P W s ¼ vv0¼ 1 U vs X v . The corresponding quantities in /mðM;NÞ ðCÞS, /mðm;NÞ ðCÞS and each /mN ðCg ÞS which share the same co-ordinate representation as Ws are defined to be mðM;NÞ ðW s Þ, mðm;NÞ ðW s Þ and mN ðW gs Þ, respectively. An application of Theorem 2 gives the following corollary. Corollary 3. For a sample of size m drawn from a population of size M, the collection mðM;NÞ ðWÞ ¼ fmðM;NÞ ðW 1 Þ, . . . , mðM;NÞ ðW v0 Þg forms a basis for /mðM;NÞ ðCÞS. The mðM;NÞ ðW s Þ are a priori uncorrelated, and, for all samples of any size, a posteriori uncorrelated. The posterior adjusted expectation, m^ ms ¼ Emðm;NÞ ðCÞ ðmðM;NÞ ðW s ÞÞ, and posterior adjusted precision, r^ ms ¼ Var1 mðm;NÞ ðCÞ ðmðM;NÞ ðW s ÞÞ, for mðM;NÞ ðW s Þ are given by
m^ ms ¼
r^ 0s m^ 0s þ gðM,mÞmr^ s mðm;NÞ ðW s Þ , r^ 0s þ gðM,mÞmr^ s
r^ ms ¼ r^ 0s þ gðM,mÞmr^ s ,
ð29Þ
where m^ 0s , r^ 0s are the prior expectation and precision for mðM;NÞ ðW s Þ, r^ s is the adjusted precision for mN ðW gs Þ given mðM;NÞ ðCÞ, and gðM,mÞ the FPC. The canonical cluster directions and resolutions, for any choice of m and M given N, completely summarise the adjustment of /mðM;NÞ ðCÞS given mðm;NÞ ðCÞ and, from Lemma 1, give the first stage of the adjustment of /mðM;NÞ ðCÞS given Cðm; nÞ. 5.2. Sampling n from N: adjustment of /mðm;NÞ ðCÞS given mðm;nÞ ðCÞ The second stage of the adjustment of /mðM;NÞ ðCÞS given Cðm; nÞ requires the adjustment of /mðm;NÞ ðCÞS given Cðm; nÞ. We can exploit the symmetries in our beliefs, as given by Assumptions 3 and 4, to invert the variance matrix of Cðm; nÞ and, utilising the resolution transform (see Goldstein and Wooff, 2007, Section 3.9), reduce this latter problem to one of solving the generalised eigenvalue problem 1 B þ ðABÞ V ðnÞ ¼ ðO1 þ BÞV ðnÞ XðnÞ , m Pm where O ¼ g ¼ 1 fðC gg BÞ þð1=nÞðDg C gg Þg1 . Typically both V ðnÞ , the matrix of eigenvectors, and XðnÞ , the diagonal matrix of eigenvalues, will depend upon the within cluster population size N and sample size n in a less than tractable way. An additional assumption that the Cðm; nÞ are second-order exchangeable reduces the problem to one of sampling n individuals from a finitely second-order exchangeable population of size N.
S.C. Shaw, M. Goldstein / Journal of Statistical Planning and Inference 142 (2012) 2844–2861
2855
Lemma 2. If the collection of sample means Cðm; nÞ is second-order exchangeable, then the overall observed mean P mðm;nÞ ðCÞ ¼ ð1=mÞ m g ¼ 1 C g is Bayes linear sufficient for Cðm; nÞ for adjusting mðm;NÞ ðCÞ, so that EC ðm;nÞ ðmðm;NÞ ðCÞÞ ¼ Emðm;nÞ ðCÞ ðmðm;NÞ ðCÞÞ, VarC ðm;nÞ ðmðm;NÞ ðCÞÞ ¼ Varmðm;nÞ ðCÞ ðmðm;NÞ ðCÞÞ: Proof. Using the second-order representation theorem for each g, we may write C g ¼ mðm;nÞ ðCÞ þRðm;nÞ,g ðCÞ, where Rðm;nÞ,g ðCÞ ¼ C g mðm;nÞ ðCÞ is orthogonal to mðm;nÞ ðCÞ so that Covðmðm;nÞ ðCÞ,C g Þ ¼ Varðmðm;nÞ ðCÞÞ. As Covðmðm;NÞ ðCÞ,C h Þ ¼ Varðmðm;NÞ ðCÞÞ it follows that Covðmðm;NÞ ðCÞ, mðm;nÞ ðCÞÞ ¼ Varðmðm;NÞ ðCÞÞ. Hence, Covðmðm;NÞ ðCÞ,mðm;nÞ ðCÞÞVar1 ðmðm;nÞ ðCÞÞCovðmðm;nÞ ðCÞ, Cðm; nÞÞ ¼ Covðmðm;NÞ ðCÞ, Cðm; nÞÞ and the result follows from Goldstein and Wooff (2007, Theorems 5.20 and 5.23). & The effect of the additional judgement that Cðm; nÞ is exchangeable is that the matrices Dg and Cgg do not depend upon g and so, for all gah, iaj, k, our second-order specifications take the form EðCgi Þ ¼ m,
VarðCgi Þ ¼ D,
CovðCgi ,Cgj Þ ¼ C,
CovðCgi ,Chk Þ ¼ B,
ð30Þ
where the vector m and matrices D, C, and B do not depend upon either the clusters or the individuals. Notice that such a scenario removes the explicit dependence upon the cluster population size N in the judgement of exchangeability of the cluster population means. Suppose that we proceed under the model given by (30). In this case, Lemma 2 shows that the second stage of the adjustment of /mðM;NÞ ðCÞS given Cðm; nÞ requires the adjustment of /mðm;NÞ ðCÞS given mðm;nÞ ðCÞ. We now show that this adjustment can be viewed as sampling n of N possible individuals in an exchangeable population. N Consider the set of possible observations in the sampled clusters, [m g ¼ 1 [i ¼ 1 C gi and the collection of sampled m n observations Cðm; nÞ ¼ [g ¼ 1 [i ¼ 1 Cgi . Suppose that we collect the sampled observations into n mutually exclusive sets, each one containing a single observation from each of the m sampled clusters and that we also form Nn similar sets for N the unsampled observations, [m assume that we form the collection Ci ðmÞ ¼ fC1i , . . . ,Cmi g g ¼ 1 [i ¼ n þ 1 C gi . For ease of notation, P n for each i ¼ 1, . . . ,N and the corresponding means mm ðCi Þ ¼ ð1=mÞ m g ¼ 1 C gi . Hence, Cðm; nÞ ¼ [i ¼ 1 C i ðmÞ. Under the model given by (30), we have, for all iaj Eðmm ðCi ÞÞ ¼ m, Varðmm ðCi ÞÞ ¼ B þ
1 ðDBÞ, m
Covðmm ðCi Þ, mm ðCj ÞÞ ¼ B þ
ð31Þ
1 ðCBÞ: m
ð32Þ
Thus, the collection mm ðCðNÞÞ ¼ fmm ðC1 Þ, . . . , mm ðCN Þg is a second-order exchangeable population of size N and so, using the second-order representation theorem, we may write
mm ðCi Þ ¼ mðm;NÞ ðCÞ þ Rðm;NÞ,i ðCÞ, where, for each i ¼ 1, . . . ,N, Rðm;NÞ,i ðCÞ is uncorrelated with mðm;NÞ ðCÞ. We propose to observe n of the N members of mm ðCðNÞÞ, P that is we observe mm ðCðnÞÞ ¼ fmm ðC1 Þ, . . . , mm ðCn Þg. The sample mean mðm;nÞ ðCÞ ¼ ð1=nÞ ni¼ 1 mm ðCi Þ is Bayes linear sufficient for mm ðCðnÞÞ for adjusting /mðM;NÞ ðCÞS (see Goldstein and Wooff, 1998, Theorem 2). Definition 5. The underlying canonical sample directions are defined as the columns of the matrix V ¼ ½V 1 . . . V v0 , where V s ¼ TGs and Gs is the sth column of the matrix G solving the generalised eigenvalue problem TG ¼ ðT þ T 0 ÞGX, where X ¼ diagðx1 , . . . , xv0 Þ is the matrix of eigenvalues and, for iaj, T 1 ¼ Varðmm ðCi ÞÞCovðmm ðCi Þ, mm ðCj ÞÞ and T T T 1 0 ¼ Covðmm ðC i Þ, mm ðC j ÞÞ. G is normed so that G TG ¼ I and G ðT þ T 0 ÞX ¼ I. The ordered eigenvalues 1 4 x1 Z Z xv0 4 0 are termed the underlying canonical sample resolutions. Let Vvs denote the vth component of the sth canonical sample direction and, for each s ¼ 1, . . . ,v0 , define Y s 2 /CS to be P Y s ¼ vv0¼ 1 V vs X v . The corresponding quantities in /mðm;NÞ ðCÞS, /mðm;nÞ ðCÞS and each /mm ðCi ÞS which share the same coordinate representation as Ys are defined to be mðm;NÞ ðY s Þ, mðm;nÞ ðY s Þ and mm ðY is Þ, respectively. An application of Theorem 2 gives the following corollary. Corollary 4. For a sample of size n drawn from a population of size N, the collection mðm;NÞ ðYÞ ¼ fmðm;NÞ ðY 1 Þ, . . . , mðm;NÞ ðY v0 Þg forms a basis for /mðm;NÞ ðCÞS. The mðm;NÞ ðY s Þ are a priori uncorrelated, and, for all samples of any size, a posteriori uncorrelated. The posterior adjusted expectation, m~ ns ¼ Emðm;nÞ ðCÞ ðmðm;NÞ ðY s ÞÞ, and posterior adjusted precision, r~ ns ¼ Var1 mðm;nÞ ðCÞ ðmðm;NÞ ðY s ÞÞ, for mðm;NÞ ðY s Þ are given by
m~ ns ¼
r~ 0s m~ 0s þgðN,nÞnr~ s mðm;nÞ ðY s Þ , r~ 0s þ gðN,nÞnr~ s
r~ ns ¼ r~ 0s þ gðN,nÞnr~ s ,
ð33Þ
where m~ 0s , r~ 0s are the prior expectation and precision for mðm;NÞ ðY s Þ, r~ s is the adjusted precision for mm ðY is Þ given mðm;NÞ ðCÞ and gðN,nÞ the FPC.
2856
S.C. Shaw, M. Goldstein / Journal of Statistical Planning and Inference 142 (2012) 2844–2861
Corollary 4 shows that, for each m, the canonical sample directions and resolutions, for any choice of n and N, completely summarise the adjustment of /mðm;NÞ ðCÞS given mðm;nÞ ðCÞ and, from Lemmas 1 and 2, give the second stage of the adjustment of /mðM;NÞ ðCÞS given Cðm; nÞ. 5.3. Full adjustment The results of Sections 5.1 and 5.2 enable us to obtain the adjustment of any quantity in /mðM;NÞ ðCÞS given Cðm; nÞ under the model given by (30). For a general Z 2 /CS, as mðM;NÞ ðWÞ forms a basis for /mðM;NÞ ðCÞS, we have
mðM;NÞ ðZÞ ¼
v0 X
CovfmðM;NÞ ðZÞ, mðM;NÞ ðW s Þgr^ 0s mðM;NÞ ðW s Þ ¼
s¼1
v0 X
ds mðM;NÞ ðW s Þ:
s¼1
It follows immediately from Corollary 3 that Emðm;NÞ ðCÞ ðmðM;NÞ ðZÞÞ ¼
v0 X
ds m^ ms ¼
s¼1
Varmðm;NÞ ðCÞ ðmðM;NÞ ðZÞÞ ¼
v0 X
v0 X
ds ð1Es Þm^ 0s þ
s¼1
v0 X
ds Es mðm;NÞ ðW s Þ,
ð34Þ
s¼1
d2s r^ 1 ms ,
ð35Þ
s¼1 1
where Es ¼ 1r^ 0s r^ ms . Eqs. (34) and (35) give, respectively, the adjusted mean and variance for a general quantity in /mðM;NÞ ðCÞS given mðm;NÞ ðCÞ and form the first stage of the adjustment of /mðM;NÞ ðCÞS given Cðm; nÞ. We now complete the adjustment using the results of Corollary 4. As mðm;NÞ ðYÞ forms a basis for /mðm;NÞ ðCÞS then for each s ¼ 1, . . . ,v0
mðm;NÞ ðW s Þ ¼
v0 X
Covfmðm;NÞ ðW s Þ, mðm;NÞ ðY t Þgr~ 0t mðm;NÞ ðY t Þ ¼
t¼1
v0 X
gst mðm;NÞ ðY t Þ:
ð36Þ
t¼1
From Lemmas 1 and 2, by substituting (36) into (34) and using (33), we have ECðm;nÞ ðmðM;NÞ ðZÞÞ ¼ Emðm;nÞ ðCÞ fEmðm;NÞ ðCÞ ðmðM;NÞ ðZÞÞg ¼
v0 X s¼1
¼
v0 X
ds ð1Es Þm^ 0s þ
s¼1
v0 X
v0 X
t¼1
s¼1
ds ð1Es Þm^ 0s þ !
v0 X
v0 X
t¼1
s¼1
ds Es gst ð1Zt Þm~ 0t þ
!
ds Es gst m~ nt
v0 X
v0 X
t¼1
s¼1
!
ds Es gst Zt mðm;nÞ ðY t Þ,
ð37Þ
where Zt ¼ 1r~ 0t r~ 1 nt . Similarly, from Lemmas 1 and 2, Eq. (35) and by substituting (36) into (34) and using (33), we have !2 v0 v0 v0 X X X 2 1 ^ VarCðm;nÞ ðmðM;NÞ ðZÞÞ ¼ Varmðm;NÞ ðCÞ ðmðM;NÞ ðZÞÞ þ Varmðm;nÞ ðCÞ fEmðm;NÞ ðCÞ ðmðM;NÞ ðZÞÞg ¼ ds r ms þ ds Es gst r~ 1 ð38Þ nt : s¼1
t¼1
s¼1
Eqs. (37) and (38) give, respectively, the adjusted expectation and variance for any quantity in /mðM;NÞ ðCÞS in terms of the m^ ms , r^ ms , m~ nt , r~ nt for s,t ¼ 1, . . . ,v0 which can be obtained through finding the underlying canonical cluster directions and resolutions, as given in Definition 4, and the underlying canonical sample directions and resolutions, as given in Definition 5. Notice how in (38) the FPC for sampling m from M attaches itself only within the r^ ms , whilst the FPC for sampling n from N attaches only within the r~ nt . Corollary 5. Suppose that the underlying canonical cluster directions and underlying canonical sample directions share, up to a constant of proportionality, the same co-ordinate representation. For the adjustment of /mðm;nÞ ðCÞS given Cðm; nÞ the collection mðM;NÞ ðWÞ, as defined by Corollary 3, form a basis for /mðm;nÞ ðCÞS and are a priori uncorrelated, and, for all choices of m and n, a posteriori uncorrelated. The posterior adjusted expectation, m^ ½m,ns ¼ ECðm;nÞ ðmðM;NÞ ðW s ÞÞ, and posterior adjusted precision, r^ ½m,ns ¼ Var 1 Cðm;nÞ ðmðM;NÞ ðW s ÞÞ, for mðM;NÞ ðW s Þ are given by 1 1 1 r 0s r^ r^ m^ ½m,ns ¼ ^ 0s m^ 0s þ 1 ^ 0s m ns , r^ 1 þ , ð39Þ ½m,ns ¼ ^ r ms r^ 0s r^ ms r ns r ms r ms where m^ 0s , r^ 0s , r^ ms are as given in Corollary 3 and
m ns ¼
r 0s m 0s þ gðN,nÞnr s mðm;nÞ ðW s Þ , r 0s þ gðN,nÞnr s
r ns ¼ r 0s þ gðN,nÞnr s ,
ð40Þ
where m 0s , r 0s are the prior expectation and precision for mðm;NÞ ðW s Þ and r s is the adjusted precision for mm ðW is Þ given mðm;NÞ ðCÞ. Proof. For each s ¼ 1, . . . ,v0 , we have U s ¼ ts V s for some constant ts , where Us, Vs are, respectively, the sth underlying canonical cluster direction and sth underlying canonical sample direction. Eq. (40) then follows immediately from (33). Noting that, in this case, we have mðm;NÞ ðW s Þ ¼ ts mðm;NÞ ðY s Þ then in (36) we must have gss ¼ ts and gst ¼ 0 for all sat. The 1 1 expectations in (39) then follow from (37) evaluated for Z ¼ W s . Similarly evaluating (38) we have r^ ½m,ns ¼ r^ ms þ E2s r 1 ns . As the expectations in (29) are of the form m^ ms ¼ ð1Es Þm^ 0s þ Es mðm;NÞ ðW s Þ then, taking variances, we have E2s ¼ Varðm^ ms Þr 0s .
S.C. Shaw, M. Goldstein / Journal of Statistical Planning and Inference 142 (2012) 2844–2861
2857
Varðm^ ms Þ is the variance of mðM;NÞ ðW s Þ resolved by mðm;NÞ ðCÞ, see Goldstein and Wooff (2007, p. 57). The precisions in (39) 1 1 thus follow by noting that, by definition, Varðm^ ms Þ ¼ r^ 0s r^ ms . & Let Res½M,N;m,ns denote the resolution of mðM;NÞ ðW s Þ given Cðm,nÞ. Rearranging the precisions in (39), we have r^ 0s r 0s Res½M,N;m,ns ¼ 1 1 ¼ Res½M;ms Res½N;ns , r ns r^ ms
ð41Þ
where Res½M;ms denotes the resolution of mðM;NÞ ðW s Þ given mðm;NÞ ðCÞ and Res½N;ns denotes the resolution of mðm;NÞ ðW s Þ given mðm;nÞ ðCÞ. Eq. (41) shows how the full sampling can be viewed to act multiplicatively: the resolution for mðM;NÞ ðW s Þ given Cðm,nÞ is obtained by multiplying the corresponding resolutions from the m from M and n from N sampling problems together. Res½M,N;m,ns may be directly obtained from the underlying canonical cluster resolutions and underlying canonical sample resolutions. Using (14), we have Res½M,N;m,ns ¼
mfðM1Þcs þ 1g nfðN1Þxs þ1g : Mfðm1Þcs þ 1g Nfðn1Þxs þ1g
ð42Þ
We expect to learn most, in terms of variance reduction, about those quantities with large correlations with the early mðM;NÞ ðW s Þ. As we shall see in Section 6.2, the simplicity of (42) makes selection of suitable sample sizes m and n to achieve specified variance reductions over elements in /mðM;NÞ ðCÞS a straightforward task. We now illustrate the theory with an example which also demonstrates that it is not unreasonable that the underlying canonical cluster directions and underlying canonical sample directions share, up to a constant of proportionality, the same co-ordinate representation. 6. Example: summarising exchangeable observations in cluster sampling 6.1. General framework In line with the approach taken in Section 4.1, we shall make a further assumption. Assumption 5. For each individual, we judge that the v0 measurements Cgi ¼ fX g1i , . . ., X gv0 i g under the model given by (30) are second-order exchangeable and co-exchangeable across individuals. In this case, the model for all gah, iaj, k, reduces to EðCgi Þ ¼ m0 1v0 ,
VarðCgi Þ ¼ ðd1 d2 ÞIv0 þd2 J v0 ,
CovðCgi ,Cgj Þ ¼ ðc1 c2 ÞIv0 þ c2 Jv0 ,
CovðCgi ,Chk Þ ¼ ðb1 b2 ÞIv0 þ b2 J v0 ,
where m0 , d1, d2, c1, c2, b1, and b2 are the constants, 1v0 is the v0 1 vector of 1s, Iv0 is the v0 v0 identity matrix, and J v0 ¼ 1v0 1Tv0 . The effect of this simplification is that the matrices S and S0 in Definition 4 and matrices T and T0 in Definition 5 have a particularly simple form: they are all of the form aIv0 þbJ v0 for suitable constants a and b. This enables the underlying canonical cluster directions and underlying canonical sample directions to both be explicitly derived and, in this case, to share up to a constant of proportionality, the same co-ordinate representation. Thus, the results of Section 5 reduce to those of Corollary 5. We first solve the m from M problem. For the solution of SF ¼ ðS þS0 ÞF C, we have C ¼ diagðc1 , c2 , . . . , cv0 Þ, where
c1 ¼
b1 þ ðv0 1Þb2 , a1 þ ðv0 1Þa2
c2 ¼ ¼ cv0 ¼
b1 b2 a1 a2
ð43Þ
with a1 ¼ c1 þð1=NÞðd1 c1 Þ and a2 ¼ c2 þð1=NÞðd2 c2 Þ. Note that c1 Z c2 3b2 a1 Z a2 b1 so that the ordering of the eigenvalues depends upon the prior choice. In a similar fashion to Section 4.1, and normed as in Definition 4, we take ~ , where the matrix F ¼ Hv0 C qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ~ ¼ diagð fa1 þ ðv0 1Þa2 gð1c Þ, ða1 a2 Þð1c Þ, . . . , ða1 a2 Þð1c ÞÞ: C 1 2 2 ~ The underlying canonical cluster directions are the columns of the matrix U ¼ SF ¼ Hv0 C mðM:NÞ ðWÞ ¼ fmðM:NÞ ðW 1 Þ, . . . , mðM;NÞ ðW v0 Þg, where 1
mðM;NÞ ðW 1 Þ ¼ pffiffiffiffiffi ~
v0 X
v0 C 11 v ¼ 1 1
mðM;NÞ ðW s Þ ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffi ~
sðs1ÞC 22
mðM;NÞ ðX v Þ, ( ðs1ÞmðM;NÞ ðX s Þ
1
. We then form the collection
ð44Þ
s1 X
)
mðM;NÞ ðX v Þ
ð45Þ
v¼1
for s ¼ 2, . . . ,v0 . The mðM;NÞ ðW s Þ form an orthogonal grid over /mðM;NÞ ðCÞS which completely summarises the adjustment of /mðM;NÞ ðCÞS given mðm;NÞ ðCÞ. The adjusted expectation and precision for each mðM;NÞ ðW s Þ are, from Corollary 3, given by (29) where r^ 0s ¼ ð1cs Þ=ðcs þð1=MÞð1cs ÞÞ and r^ s ¼ M=ðM1Þ.
2858
S.C. Shaw, M. Goldstein / Journal of Statistical Planning and Inference 142 (2012) 2844–2861
We now solve the n from N problem. Under the model given by Assumption 5 then from (31), we can write Varðmm ðCi ÞÞ ¼ ðf 1 f 2 ÞIv0 þf 2 J v0 and, from (32), Covðmm ðCi Þ, mm ðCj ÞÞ ¼ ðe1 e2 ÞIv0 þ e2 J v0 . For the solution of TG ¼ ðT þT 0 ÞGX, we have X ¼ diagðx1 , x2 , . . . , xv0 Þ, where
x1 ¼
e1 þðv0 1Þe2 , f 1 þðv0 1Þf 2
x2 ¼ ¼ xv0 ¼
e1 e2 f 1 f 2
ð46Þ
~, and x1 Z x2 3e2 f 1 Zf 2 e1 . Note that the xs depend upon m in a particularly simple fashion. We take the matrix V ¼ Hv0 X where qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi X~ ¼ diagð ff 1 þ ðv0 1Þf 2 gð1x1 Þ, ðf 1 f 2 Þð1x2 Þ, . . . , ðe1 e2 Þð1x2 ÞÞ
and V is thus normed as in Definition 5. The underlying canonical sample directions are the columns of the matrix ~ 1 . We then form the collection m V ¼ TG ¼ Hv0 X ðm:NÞ ðYÞ ¼ fmðm:NÞ ðY 1 Þ, . . . , mðm;NÞ ðY v0 Þg, where 1
mðm;NÞ ðY 1 Þ ¼ pffiffiffiffiffi ~
v0 X
v0 X 11 v ¼ 1 1
mðm;NÞ ðY s Þ ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffi ~
sðs1ÞX 22
mðm;NÞ ðX v Þ, ( ðs1Þmðm;NÞ ðX s Þ
ð47Þ s1 X
)
mðm;NÞ ðX v Þ
ð48Þ
v¼1
for s ¼ 2, . . . ,v0 . Notice that, up to scaling constants, (47) and (48) have no dependence upon m. From Corollary 4, the adjusted expectation and precision for each mðm;NÞ ðY s Þ are given by (33) where r~ 0s ¼ ð1xs Þ=ðxs þ ð1=NÞð1xs ÞÞ and r~ s ¼ N=ðN1Þ. The mðm;NÞ ðY s Þ form an orthogonal grid over /mðm;NÞ ðCÞS which completely summarises the adjustment of /mðm;NÞ ðCÞS given mðm;nÞ ðCÞ. For all choices of M, N, m and n, comparing Eq. (44) with (47) and (45) with (48), for each s ¼ 1, . . . ,v0 , mðM:NÞ ðW s Þ shares, up to a constant of proportionality, the same co-ordinate representation as mðm:NÞ ðY s Þ. Consequently, from Corollary 5, the collection mðM;NÞ ðWÞ forms an orthogonal grid over /mðM;NÞ ðCÞS which, as well as the adjustment of /mðM;NÞ ðCÞS given mðm;NÞ ðCÞ, completely summarises the adjustment of /mðM;NÞ ðCÞS given Cðm; nÞ. We now illustrate how we can use these results for setting appropriate sample sizes for both m and n. 6.2. School examination test To illustrate the theory, we consider an examination competition where 200 schools were invited to enter 150 students. In a similar spirit to Section 4.2, each candidate was required to answer six compulsory questions, each marked out of 10. Let Xgvi denote the mark on question v of the ith candidate in the gth school. As part of the marking process, the chief examiner wishes to sample schools and individuals in order to answer a number of questions of interest, such as whether or not all the questions are of approximately the same level of difficulty. The chief examiner judges that the model given by (30) is appropriate and also that the questions are second-order exchangeable so that the model reduces to that given in Assumption 5 and we can use the framework of Section 6.1. Inspired by the example of Goldstein (1988, Section 5), he specifies that m0 ¼ 6:5, d1 ¼ 5, d2 ¼ 3:75, c1 ¼ 1, c2 ¼ 0:25, b1 ¼ 0:3 and b2 ¼ 0:05. The chief examiner first finds the underlying canonical cluster directions and resolutions. Using (43), he obtains c1 ¼ 165 718 75 and c2 ¼ 226 (notice here that c2 4 c1 but, for simplicity of exposition, we retain the ordering of Section 6.1) and using the resolutions forms the m200;150 ðW s Þ as given by (44) and (45) with v0 ¼ 6. He next finds the underlying canonical sample directions and resolutions. From (46), he finds x1 ¼ ð34 þ 11mÞ=ð464 þ11mÞ and x2 ¼ ð2 þmÞ=ð4 þmÞ and, using the resolutions, forms the mðm;150Þ ðW s Þ as given by (47) and (48). The chief examiner applies Corollary 5: the collection mð200;150Þ ðWÞ ¼ fm200;150 ðW 1 Þ, . . . , m200;150 ðW 6 Þg forms an orthogonal grid over /mð200;150Þ ðCÞS which completely summarises the adjustment of /mð200;150Þ ðCÞS given Cðm; nÞ. mð200;150Þ ðW 1 Þ is proportional to the average overall mark on the examination which gives insight about the general standard of the examination, whilst the mð200;150Þ ðW s Þ, s ¼ 2, . . . ,6 can be interpreted as summarising all of the differences in difficulty of the questions. As the grid has no dependence upon the choice of m and n (and would, under the chief examiner’s beliefs, remain the same for any values of M and N) it is straightforward to explore the effect of difference choices of m and n. For example, (42) can be utilised to establish the effect of various sample sizes. For example, a choice of m ¼25 and n ¼20 will resolve 84.6% of the prior variance of mð200;150Þ ðW 1 Þ and 93.2% of that for the mð200;150Þ ðW s Þ, s ¼ 2, . . . ,6. The shaded areas in Fig. 2(a)–(c) show, respectively, choices of m and n which achieve a proportional variance reduction of at least 0.9, 0.95 and 0.99. The corresponding plots for mð200;150Þ ðW s Þ, s ¼ 2, . . . ,6, are given in Fig. 2(d)–(f). As there are only two distinct resolutions corresponding to mð200;150Þ ðWÞ then, for any choice of m and n, the resolutions provide upper and lower bounds on the proportional variance reduction obtained. For example, the smallest resolution corresponds to mð200;150Þ ðW 1 Þ so any choice of m and n in the shaded region given by Fig. 2(c) will achieve a proportional variance resolution of at least 0.99 for every quantity in /mð200;150Þ ðCÞS. Simple modifications can be made to any analysis using (42) to incorporate situations where, for example, the chief examiner may be faced with costs of sampling and wishes to choose m and n to achieve a specified proportional variance reduction for minimal cost.
S.C. Shaw, M. Goldstein / Journal of Statistical Planning and Inference 142 (2012) 2844–2861
150
150
100
100
100 n
n
n
150
50
50
50
0
0 0
50
100 m
150
0 0
200
50
100 m
150
200
150
150
100
100
100
n
n
50
100 m
150
200
100 m
150
200
0
50
100 m
150
200
0
0 0
50
50
50
0
0
n
150
50
2859
0
50
100 m
150
200
Fig. 2. Shaded region shows valid choices of m and n required to achieve a proportional variance reduction of at least k for mð200;150Þ ðW 1 Þ and each mð200;150Þ ðW s Þ, s ¼ 2, . . . ,6: (a) mð200;150Þ ðW 1 Þ; k ¼ 0:90; (b) mð200;150Þ ðW 1 Þ; k ¼ 0:95; (c) mð200;150Þ ðW 1 Þ; k ¼ 0:99; (d) mð200;150Þ ðW s Þ, s ¼ 2, . . . ,6; k ¼ 0:90; (e) mð200;150Þ ðW s Þ, s ¼ 2, . . . ,6; k ¼ 0:95; and (f) mð200;150Þ ðW s Þ, s ¼ 2, . . . ,6; k ¼ 0:99.
7. Conclusion In the first half of this paper, we have shown that the familiar finite population corrections naturally generalise to individuals in the multivariate population and that the types of information we gain by sampling are identified with the orthogonal canonical variable directions derived from a generalised eigenvalue problem. These ideas and techniques can also be extended to more complex systems. In the second half of the paper, we explored the case of two-stage cluster sampling where each stage can be viewed as multivariate sampling from a finite population. In illustrating the theory in Sections 3 and 5, we focused attention upon cases where additional exchangeability assumptions on the prior variable specifications meant that the generalised eigenvalue problems had simple solutions but stress that the same methods, those derived in Sections 4 and 6, can be utilised when the prior specification does not yield such straightforward solutions to the problems given by Definitions 2, 4 and 5.
Appendix A
Proof of Theorem 2. The adjusted expectation and precision matrix for mN ðCÞ given CðnÞ are as given by (7). We have 1 T that mN ðY s Þ ¼ HTs RmN ðCÞ so that, as VarfmN ðCÞg ¼ R1 , CovfmN ðY s Þ,mN ðY t Þg ¼ HTs RR1 0 þð1=NÞR 0 RHt þ ð1=NÞH s RHt . From T Definition 2, we have RHðIFÞ ¼ R0 HF with H RH ¼ I so that, for sat, CovfmN ðY s Þ, mN ðY t Þg ¼ 0 and r~ 1 0s ¼
fs 1fs
þ
1 1 þ ðN1Þfs ¼ : N Nð1fs Þ
T Similarly, r~ 1 s ¼ ððN1Þ=NÞH s RHs ¼ ðN1Þ=N. Using (9), we have
RH ¼ ðR0 þ aRÞHFfI þ ða1ÞFg1 :
ðA:1Þ
Note that R~ 0 H ¼ NR0 ðR0 þNRÞ1 RH ¼ NR0 HFfI þ ðN1ÞFg1 ,
ðA:2Þ
2860
S.C. Shaw, M. Goldstein / Journal of Statistical Planning and Inference 142 (2012) 2844–2861
where (A.2) follows from (A.1) with a ¼N. Now as R0 HF ¼ RHðIFÞ and NR ¼ ðN1ÞR~ then, using (A.2), we have that ~ L, ~ ¼ fR~ 0 þ RgH RH
ðA:3Þ
1
where L ¼ N fI þ ðN1ÞFg. In a similar vein to (A.1), we may use (A.3) to show that, for a ¼ ð1ðn1Þ=ðN1ÞÞ ~ LðaÞ , where ~ ¼ fR~ 0 þ aRgH RH
LðaÞ ¼
1
n, we have
Nn fI þ ðN1ÞFgfI þ ðn1ÞFg1 : NðN1Þ
~ 1 RH ¼ ððN1Þ=NÞLðaÞ . Hence, using (7), CovCðnÞ fm ðY s Þ, m ðY t Þg ¼ 0 for sat, and Thus, HT RfR~ 0 þ aRg N N r~ 1 ns ¼
ðNnÞf1þ ðN1Þfs g N2 f1þ ðn1Þfs g
:
ðA:4Þ
Now n1 1 Nð1fs Þ ðN1Þn N N 2 f1 þðn1Þfs g ¼ , r~ 0s þ 1 þ r~ s ¼ N1 1 þ ðN1Þfs Nn N1 ðNnÞf1 þðN1Þfs g which, by comparing with (A.4), confirms the precisions given by (13). As RHðIFÞ ¼ R0 HF then (A.2) may be expressed as R~ 0 H ¼ NRHðIFÞfI þ ðN1ÞFg1 . Consequently, we may note that ~ 1 fR~ 0 m0 þ aRCðnÞg ~ ¼ LðaÞ ½ðN1ÞðIFÞfI þðN1ÞFg1 HT Rm0 þ aHT RCðnÞ HT RfR~ 0 þ aRg and so, from (7), we have that Nð1fs Þ N m~ ns ¼ r~ 1 m þ a Y s ns 1 þ ðN1Þfs 0s N1 which gives the expectations in (13). Proof of Lemma 1. Note that for all g, h, CovðC g ,Chj Þ ¼ CovðC g ,C h Þ. Letting J 1,s denote the 1 s vector of ones and the Kronecker product, we may write CovðCðm; nÞ,Cðm; nÞÞ ¼ ½J1,n Var1 ðCðm; nÞÞ . . . J1,n Varm ðCðm; nÞÞ, where Varg ðCðm; nÞÞ ¼ CovðCðm; nÞ,C g Þ. Denoting by Is the s s identity matrix and by Ir,s the rth column of Is then Var1 ðCðm; nÞÞVarg ðCðm; nÞÞ ¼ Ig,m Iv0 so that Var1 ðCðm; nÞÞCovðCðm; nÞ,Cðm; nÞÞ ¼ ½J1,n I1,m Iv0 . . . J1,n Im,m Iv0 :
ðA:5Þ
Now CovðmðM;NÞ ðCÞ,C g Þ ¼ VarðmðM;NÞ ðCÞÞ so that CovðmðM;NÞ ðCÞ,Cðm; nÞÞ ¼ J 1,m VarðmðM;NÞ ðCÞÞ:
ðA:6Þ
Thus, using (A5) and (A6) CovðmðM;NÞ ðCÞ,Cðm; nÞÞVar1 ðCðm; nÞÞCovðCðm; nÞ,Cðm; nÞÞ ¼ ½J 1,n J1,m I1,m VarðmðM;NÞ ðCÞÞ . . . J1,n J 1,m I1,m VarðmðM;NÞ ðCÞÞ ¼ J1,mn VarðmðM;NÞ ðCÞÞ ¼ CovðmðM;NÞ ðCÞ,Cðm; nÞÞ
ðA:7Þ 1
as CovðmðM;NÞ ðCÞ,Chj Þ ¼ VarðmðM;NÞ ðCÞÞ. Now Varðmðm;NÞ ðCÞÞ ¼ Covðmðm;NÞ ðCÞ,C h Þ so that Var ðmðm;NÞ ðCÞÞCovðmðm;NÞ ðCÞ,Cðm; nÞÞ ¼ J 1,m Iv0 . Further, CovðmðM;NÞ ðCÞ,mðm;NÞ ðCÞÞ ¼ VarðmðM;NÞ ðCÞÞ so that, using (A.6), we have that CovðmðM;NÞ ðCÞ, mðm;NÞ ðCÞÞVar1 ðmðm;NÞ ðCÞÞ,Cðm; nÞÞ ¼ CovðmðM;NÞ ðCÞ,Cðm; nÞÞ:
ðA:8Þ
Results 1 and 2 follow, respectively, from (A7) and (A8) using Theorems 5.20 and 5.23 of Goldstein and Wooff (2007). References Bernardo, J.M., Smith, A.F.M., 1994. Bayesian Theory. Wiley, Chichester. Bolfarine, H., 1990. Bayesian linear prediction in finite populations. Annals of the Institute of Statistical Mathematics 42, 435–444. Cochran, W.G., 1977. Sampling Techniques, 3rd ed. Wiley, New York. Ericson, W.A., 1969. Subjective Bayesian models in sampling finite populations (with discussion). Journal of the Royal Statistical Society, Series B 31, 195–233. de Finetti, B., 1937. La pre´vision, ses lois logiques, ses sources subjectives. Annales de L’Institute Henri Poincare´ 7, 1–68. (English Translation in Kyburg Jr., H.E., Smokler, H.E. (Eds.), 1964. Studies in Subjective Probability. Wiley, New York, pp. 93–158.). Ghosh, M., Meeden, G., 1997. Bayesian Methods for Finite Population Sampling. Chapman & Hall, Bury St. Edmunds. Goldstein, M., 1986. Exchangeable belief structures. Journal of the American Statistical Association 81, 971–976. Goldstein, M., 1988. The data trajectory (with discussion). In: Bernardo, J.M., DeGroot, M.H., Lindley, D.V., Smith, A.F.M. (Eds.), Bayesian Statistics, vol. 3. University Press, Oxford, pp. 189–209. Goldstein, M., 1999. Bayes linear analysis. In: Kotz, S., Read, C.B., Banks, D.L. (Eds.), Encyclopedia of Statistical Sciences, update vol. 3. Wiley, Chichester, pp. 29–34.
S.C. Shaw, M. Goldstein / Journal of Statistical Planning and Inference 142 (2012) 2844–2861
2861
Goldstein, M., OHagan, A., 1996. Bayes linear sufficiency and systems of expert posterior assessments. Journal of the Royal Statistical Society, Series B 58, 301–316. Goldstein, M., Wooff, D.A., 1997. Choosing sample sizes in balanced experimental designs: a Bayes linear approach. Statistician 46, 1–17. Goldstein, M., Wooff, D.A., 1998. Adjusting exchangeable beliefs. Biometrika 85, 39–54. Goldstein, M., Wooff, D.A., 2007. Bayes Linear Statistics: Theory and Methods. Wiley, Chichester. Hartigan, J.A., 1969. Linear Bayesian methods. Journal of the Royal Statistical Society, Series B 31, 446–454. Hewitt, E., Savage, L.J., 1955. Symmetric measures on cartesian products. Transactions of the American Mathematical Society 80, 470–501. Little, R.J.A., Zheng, H., 2007. The Bayesian approach to the analysis of finite population surveys with discussion. In: Bernardo, J.M., Bayarri, M.J., Berger, J.O., Dawid, A.P., Heckerman, D., Smith, A.F.M., West, M. (Eds.), Bayesian Statistics, vol. 8. University Press, Oxford, pp. 283–302. O’Hagan, A., Forster, J., 2004. Kendall’s Advanced Theory of Statistics: Bayesian Inference, vol. 2B, 2nd ed. Edward Arnold, London. Royall, R.M., Pfeffermann, D., 1982. Balanced samples and robust Bayesian inference in finite population sampling. Biometrika 69, 401–409. Rougier, J., Goldstein, M., House, L. Second-Order Exchangeability Analysis for Multi-Model Ensembles, submitted for publication. Available from /http://www.maths.bris.ac.uk/ MAZJCR/mme3.pdfS. San Martino, S., Singer, J., Stanek III, E.J., 2008. Performance of balanced two-stage empirical predictors of realized cluster latent values from finite populations: a simulation study. Computational Statistics & Data Analysis 52, 2199–2217. Scott, A., Smith, T.M.F., 1969. Estimation in multi-stage surveys. Journal of the American Statistical Association 64, 830–840. Smouse, E.P., 1984. A note on bayesian least squares inference for finite population models. Journal of the American Statistical Association 79, 390–392. Stanek III, E.J., Singer, J.M., 2004. Predicting random effects from finite population clustered samples with response error. Journal of the American Statistical Association 99, 1119–1130. Tebaldi, C., Knutti, R., 2007. The use of the multi-model ensemble in probabilistic climate projections. Philosophical Transactions of the Royal Society A 365, 2053–2075. Tebaldi, C., Sanso´, B., Smith, R.L., 2011. Characterizing uncertainty of future climate change projections using hierarchical Bayesian models with discussion. In: Bernardo, J.M., Bayarri, M.J., Berger, J.O., Dawid, A.P., Heckerman, D., Smith, A.F.M., West, M. (Eds.), Bayesian Statistics, vol. 9. University Press, Oxford, pp. 639–658. Vernon, I., Goldstein, M., Bower, R.G., 2010. Galaxy formation: a Bayesian uncertainty analysis. Bayesian Analysis 5, 619–670.