Discriminant analysis of survey data

Discriminant analysis of survey data

journal of statistical planning and inference Journal of Statistical Planning and Inference 60 (1997) 273-290 ELSEVIER Discriminant analysis of sur...

873KB Sizes 47 Downloads 152 Views

journal of statistical planning and inference

Journal of Statistical Planning and Inference 60 (1997) 273-290

ELSEVIER

Discriminant analysis of survey data Ching-Ho

L e u a, K a m - W a h

T s u i b,,

a National Cheng-Kung University, Tainan, Taiwan, ROC b Department o/" Statistics, University o[' Wisconsin-Madison, 1210 West Dayton Street, W I 53706-1685, USA

Received 5 September 1995; revised 15 April 1996

Abstract

We consider the problem of the effect of sample designs on discriminant analysis. The selection of the learning sample is assumed to depend on the population values of auxiliary variables. Under a superpopulation model with a multivariate normal distribution, unbiasedness and consistency are examined for the conventional estimators (derived under the assumptions of simple random sampling), maximum likelihood estimators, probability-weighted estimators and conditionally unbiased estimators of parameters. Four corresponding sampled linear discriminant functions are examined. The rates of misclassification of these four discriminant functions and the effect of sample design on these four rates of misclassification are discussed. The performances of these four discriminant functions are assessed in a simulation study. AMS

~lassification: Primary 62D05; 62tt30

K e y w o r d s . Discriminant analysis; Discriminant function; Misclassification rate; Stratified sampling; Survey data

1. Introduction A common assumption in multivariate analysis is that the data are generated as independent observations from a common probability distribution. From the perspective of a survey sampler using a model-based (super-population) approach, this is equivalent to assuming that the measurements associated with units in a finite population are realizations of i.i.d, random variables (i.i.d. model). However, sample surveys often involve mechanisms such as stratification, multistage selection and the use of auxiliary information in the sampling scheme, all of which may result in observations that are not statistically independent. The standard statistical methods for multivariate analysis therefore are often inappropriate for analyzing sample survey data, and modifications of the standard methods are necessary in order to make valid inferences from multivariate survey data. * Corresponding author. 0378-3758/97/$17.00 @ 1997 Elsevier Science B.V. All rights reserved PI1S0378-3758(96)00139-5

274

C.-H. Leu, K.-144 TsuilJournal of Statistical Plannin9 and In)terence 60 (1997) 273-290

A great deal of research in recent years has focused on the effects of sample design on various analytical statistical methods. However, little study has been devoted to the impact of sample design on standard discriminant analysis procedures because of technical complexity. The most related fields are regression analysis and principal component analysis of sample survey data. In their regression analysis of survey data, Nathan and Holt (1980) used a model-based approach and considered a sampling scheme where an auxiliary variable Z is known at the design stage for each member of a finite population. They showed how sample selection dependent on an auxiliary variable can lead to asymptotically biased inconsistent estimators and discussed the use of maximum likelihood and probability weighted procedures in regression analysis to adjust for the effect of selection dependent on an auxiliary variable. Bebbington and Smith (1977) demonstrated by computer simulations how complex survey design can lead to asymptotoically biased estimators in principal component analysis. Using a model-based approach, Skinner et al. (1986) and Skinner (1982) considered the effects of sample design in principal component analysis under the sampiing scheme proposed by Nathan and Holt (1980). However, the use of discriminant analysis with complex survey data is a largely unexplored area. Discriminant analysis is a statistical technique for classifying individuals or objects into mutually exclusive and exhaustive groups on the basis of a set of characteristic variables. In this paper, we consider a sampling scheme similar to that of Nathan and Holt (1980) and discuss the effect of sample design on discriminant analysis. The following questions are of interest: (a) What are the properties of standard estimators and their associated discriminant functions in multivariate analysis for two populations under our sampling scheme? (b) What alternative estimators and discriminant functions might we adopt? Section 2 outlines the basic sampling structure and examines the effect of selection on the consistency and unbiasedness of the pooled estimator of covariance matrices of two populations and on the estimator of the difference between the two means of the discriminator in two populations. A difference between the properties of the estimator under SRS design and other different models was interpreted as a misspecification effect in Skinner (1982) which is similar to the term, design, effect in the design-based approach. We found that the standard pooled variance estimator derived under the usual setting (i.i.d. model) is asymptotically biased with respect to the superpopulation model under a general stratified random sampling scheme; i.e., the model misspecification can introduce bias. Under simple random sampling, Cochran and Bliss (1948) derived a discriminant function that involves additional covariance variates whose means are the same in the two populations. They argued that although such covariance variates have no discriminating power by themselves, the covariance variates may still be utilized in a modified discriminant function to increase the power of discrimination. However, they did not consider sample selection that depends on auxiliary variables. In our sample design, the auxiliary variables Z are the covariance variates in Cochran and Bliss (1948).

C.-H. Leu, K.-W. Tsui/Journal of Statistical Plannin9 and Injerence 60 (1997) 2 7 3 ~ 9 0

275

We will show that the asymptotic distribution (hence the rate o f misclassification) o f Fisher's sampled discriminant function under SRS design is different from that under our selection scheme. The difference is interpreted as the misspecification effects on discriminant analysis. Fisher's sampled discriminant function requires estimates o f the population means and covariance matrices. Instead o f only using estimators based on the simple random sample and performing the usual discriminant analysis, in Section 2 we examine three alternative ways to estimate the population means and covariance matrices. Using these estimators, we obtain three additional versions o f Fisher's sampled discriminant functions in Section 3. W e then derive their asymptotic distributions and use the asymptotic distributions to derive the corresponding estimates o f rates o f misclassification. In Section 4, we report the results o f a simulation study designed to compare the performances o f the four discriminant functions considered in this paper.

2. Sampling effect on estimators of the mean and covariance matrix

2.1. S a m p l i n g s c h e m e Consider the problem o f classifying a unit into one o f two finite populations, Gi or G2. Let G1 and G2 consist o f N1 and N2 identifiable units, labeled i = 1. . . . . NL, and j = 1. . . . . N2, respectively. Let N = N1 + N2. Suppose that associated with each unit i in G~ are a discrimintator x~i, a p x 1 vector, to be measured in the survey and a q x 1 vector z~i o f known values that will be used in selecting the sample. In the terminology o f superpopulation model approach, we assume that x~i T = (X:~i,l, ... ,Xz~i,p) and z,iT = (ZTi '1 . . . . . Z:~i,q) are realizations o f the random vectors X~i and Z~i, respectively, i = 1. . . . . N~ and ~ = 1,2. The random vectors ( X ~ , Z ~ ) v ,i = 1. . . . . N~ and 7 = 1,2 are assumed to be independent and has a multivariate normal distribution

( ) XTi Z~ i

~ Np,q

E( ) ] [1~ ]lz

~

'

= N p,q

I(::)

,

,

(2.1)

~21 ~22

where ~22 is assumed to be positive definite. Here we assume that the two populations have a common covariance matrix (O) and the means o f random vectors Zli, Z2i are equal to/~z, hence the only discriminator is the random vector X~i for i :- 1. . . . . N~, ~ = 1,2. Let x~ = (x~l . . . . . X~N~),Z~ = (Z~I . . . . . Z~x~), X~ = (X~l . . . . . X~N~) and Z~ = (Z~L. . . . . Z~N~), 7 = 1, 2. The auxiliary information Z = ( z l , z 2 ) and the population sizes N1 and N2 are known at the beginning o f the survey. Two samples, sl = (il . . . . . i,,~ ) o f nl distinct units from population I and s2 = (jl . . . . . Jn2) o f n2 distinct units from population II, are selected independently by prechosen randomized sampling designs p(sl IZl ) and p(s2lz2), respectively. Let Xs~ and Xs~ be the subvectors o f x~ and X~ with indices in s~, respectively. W e assume that the sampling designs p(s~lz~ ), ~ = 1, 2, depend on the auxiliary information z = (z~,z2), but not on the x~. The auxiliary

276

C.-H. Leu, K.-I~ Tsui/Journal of Statistical Planning and InJerence 60 (1997) 273-290

information z can serve as a size variable for probability-proportional-to-size sampling, or as a grouping variable for stratified or cluster sampling. Denote the sample data by

(Xsl, x~2, zl, z2,s~,s2 ). To illustrate, suppose that the grade point averages (GPA) of a large population of students are measured. The population is then divided at random into two groups G1 and G2, each o f which subsequently receives a different type of training. At the end of the training period, each student is given several exams; the scores, xj, x2, will be used as discriminators o f the two groups. A sample s i o f students is selected from group i, i = 1, 2, with the sampling scheme dependent on the initial GPAs (z~, z2). The exam scores x~l,Xs2, and the initial GPAs (Zl,Z2) o f the selected students are used as the learning data, i.e., the data to determine the discriminant function. It is reasonable to assume that the initial GPAs Z~, Z2i o f the two groups have the same mean #~, and that the discriminators and the GPA variable in the two groups have the same covariance matrix f2. The initial GPAs are of no use in themselves in distinguishing between the two initial groups. Nevertheless, since they are correlated with the exam scores, and the sample selection depends on GPAs, the sampling scheme may have some effect on the discriminant function.

2.2. Estimators of superpopulation parameters In this section, we discuss the effect o f the sample design on the consistency and unbiasedness of the estimators o f f211,/~l and /~2. Let 0 be the vector o f parameters indexing the distribution in (2.1). Then the likelihood may be written as Z(O; Xsl , Xs2 , s1, $2, Z) o( p ( s 1IZl ) f ( X s l IZl'~ O)f(zl; O)p(s2 ]z2)f(Xs2 ]Z2; O)f(z2; 0),

(2.2a) where f(x~lz~; O) is the conditional density function o f X ~ given Z~ = z~,c~ = 1,2, and f(z~; O) is the marginal density o f Z~, ~ = 1,2. The sample selection mechanisms, p(s~lz~),~ = 1,2 are independent of 0 and hence can be ignored in likelihood inferences about 0. We assume that the sample sizes nl and n2 o f the samples sl and s2, respectively, are fixed. Let ~i(Zl ) be the probability that the ith unit (i = 1. . . . . Na) in population I is included in the sample, given zi, and let ~zj(z2) be the probability that the jth unit (j = 1. . . . . N2) in population II is included in the sample, given z2. Let L's denote the summation over the units in the sample and z~u denote the summation over the whole population, and fi = ~c212~21,

sl

(Di = N I l / t : / ( g l

)-1,

s2

(2.2b)

o)j = N217~j(Z,2) - 1 ,

sI

$2

(2.2c1 fizl,r~o3 = ~ (DiZli, si

fiz2,Ttm = ~ $2

~OjZ2j,

(2.2d)

C.-H. Leu, K.-V< Tsui/Journal of Statistical Planning and Inference 60 (1997) 273-290

X, = JVll' Z X l i , sI

Y, =NIl

ZXli, NI

277

X2 = N f l Z x2b

x2 = g / 2 1 Z x 2 . j ,

s2

N~

(2.2e)

"l = 1711 ~ Z l i , S1

Z2 = n2' Z z 2 j ,

mlsl = (nl - 1)-1 Z ( x u

_ xl ) ( X l i

(2.2f)

2 = N -1 ~-~zi, ~¥

S2

- - ~'1 )T,

SI

(2.2g)

m2sl = ( n l -- 1) - I Z ( Z l i s1

-- Z l ) ( Z l i -- z 1 ) T,

m ls2 = (n2 -- 1 ) - 1 ~ ( x 2 j

-- X'2)(x2j -- ~2 )T,

$2 m2s 2

=

(2.2h)

( n 2 -- 1 ) - 1 Z ( Z 2 j _ z 2 ) ( z 2 j

--z2)T

$2 Let

S = ( $11S21$22S12) and S = ( n l + n 2 ) - l ( n l + n 2 - 2 ) S =

( ~11S21$22S12)' (2.2i)

where S is the usual pooled estimator of f2. For example, S22 = ( n I Jr-n 2 2)-I[(nl 1)m2s 1 q - ( r t 2 - - 1)m2s2]. Under independent simple random sampling S is unbiased. Let rnlsl = rtll(nl -- 1)mlsl,

mls2 = n21(n2 -- l)mls2,

rn2sl = n i l ( h i

l)m2sl,

(2.2j) fn2s'2 = n21(n2 -- 1)m2s2, m22 = N -1 Z ( z i

- Z ) ( z , - Z ) T,

b s = S12S221.

5,;

(2.2k) We consider the following point estimators of ~'~11 : 1,

g)srs = [(nl - 1)ml,! + (n2 - 1)mls2]/(nl -t--n2 - 2) = Sll,

2.

~')ML -- nl +1 n~2(nl + n2 -- 2 ) S l l q- bs

3.

~cB = Sll - SnS[21S21,

4.

....

--

N, IF wixux L SI

N1 + N - 2 -~ Nl

+ N2

Ls/~2

__ (D(S 1 )

--

nl +1 n2

m22

~, 2)

I

^r

//1,n:,o/Al^

.....

//2,rcw/-/2,r~w

(nl + n2

2)$22 bf,

278

C.-H. Leu, K.-W. Tsui/Journal o f Statistical Planning and Inference 60 (1997) 273-290

The first estimate, l)srs, is the usual pooled sample covariance matrix that ignores the auxiliary information and assumes that Sl and s2 are obtained by simple random sampling. Thus, the sample design information is ignored. The second estimator, OML, makes use of the sample design information and is obtained by maximizing the likelihood in (2.2a). Using the notation in (2.2e), (2.2f) and (2.2k), the corresponding maximum likelihood estimators of #1 and/22 are fil,ML = ~'l "}-b~(Z - 71 ) and 122,ML = 22 + b~(2-z2), respectively. The derivation of these maximum likelihood estimators is an extension of Anderson's (1957) results to the case of two independent populations with a common covariance matrix and is given in Appendix A. The third estimator, ~CB, given by Cochran and Bliss (1948), is the estimator of the conditional covariance matrix given auxiliary values under the assumption that the data are obtained by simple random sampling. The corresponding estimators of the two conditional means given auxiliary values are fil,CB = xl - b s Z l and fi2,CB ---X2 - - b s z 2 , respectively. The estimator ~cB ignores the information of the sample design. The fourth estimator, ~o~, is a pooled weighted covariance matrix and it makes use of the sample design information so that it is approximately "design-unbiased" for the finite population covariance matrix. The corresponding estimators of #1 and #2 are, the usual Horvitz-Thompson estimators (rico-estimators), fil,r~o = ~sl(DiXli and fi2,~o, = ~s2O)jX2j, respectively, where the coi and the ~oj are given in (2.2b). An estimator similar to ~ML is considered by Nathan and Holt (1980). 2.3. Sampling effect on estimators

The conditional expectations of the estimators given the sample s = (s1,$2) and the auxiliary information z available prior to sample selection are as follows: E(~2srslS, Z) = ~21~ +//(822 - f222)flT,

(2.3)

E(~MLIS, Z) = 7~211 + fl(m22 -- 7•22)fl T,

(2.4)

E(~CB IS, Z) = ~CB(~II -- fl~Z2)fiT,

(2.5)

E(~2,o~ls, z ) = '~09~'~11q- fl(~22,nco - 3)o~c~22)flT,

(2.6)

7 : [n~ + n2 - 2 - q + tr(mz2SE2X)]/(nl + n2),

(2.7)

7CB : (nl -k- n2 -- 2 -- q)/(n~ + n2 -- 2),

(2.8)

where

~c~

_

N,

N1 + ~

~o(s, ) - ~

s,

(D~/(D(S 1 ) -~- N ,

÷ N2

- -

s:

' (2.9)

C.-H. Leu, K.-H4 Tsui/Journal of Statistical Planning and InJerence 60 (1997) 273~90

~'~22,~co -- Xl - - + N2

(DiZliZTi - o ) ( S l ) - I Z

-+ N1 + ~ .

¢.OiZli

279

O,)izTi

cOjzz/zfj--cO(sz)-' Z cOizzj Z ~ojz , $2 S2 J

(2.10)

and for i = 1,2,

E(Yils, z) = #i + fl(zi - #z),

(2.ll)

E(fii, MLtS, Z) = fli + fl(Z - t~z),

(2.12)

E(fii,cBIS, 7~) = Izi -- flliz,

(2.13)

E(fii .... Is, z) = (D(Si )]Ai ~- fl(fizi.rto, -- OO(Si)]lz )"

(2.14)

The proofs of (2.3)-(2.6) and (2.11)-(2.14) are given in Appendix B. Since sample selection depends on z, $22 may not be unbiased even asymptotically for (222. By arguments similar to those o f Skinner et al. (1986), we see that O~, generally has a conditional bias of order Op(! ) and is unconditionally asymptotically biased for (211. Suppose that the auxiliary variable z~i is one-dimensional. Suppose further that L equal-sized strata are formed according to the increasing values of z for populations l and II and that a stratified simple random sample with zjn~ units from the hth stratum o f population :~ is drawn, where :~ = 1,2,~h > 0 and ~ Z h = 1. Let m = nt + n2. Following Skinner et al. (1986), for each h = 1. . . . . L, we may treat the za;'s from the hth stratum as a simple random sample from a normal distribution truncated at the 1 0 0 ( h - 1)L and 100h/L percentage points. Suppose that ~h'S are fixed as nl ~ vc, and n2 -+ oc. Then, similar to Skinner et al. (1986) for the one population case, we can prove t h a t E ( ~ s r s ) = p lim(~srs) + O(m-1), and P lim(Os~) = Q,, + fl~"~22 [ E

T h [ ( / / h - 1~)2 + Gh2] __ 1] flY,

(2.15)

where

fi -- EhZh~h,

(2.16)

/~h is the mean, and o"h is the standard deviation of the standard normal distribution truncated at its 1 0 0 ( h - 1)/L and lOOh/L percentage points. Since Zh/a h -- 0, and Zh(I J2 + a2)/L = 1, (2.15) shows that under proportionate stratified random sampling, fi = 0 and hence the estimator (}sr~ is consistent. If the allocation is disproportionate, however, ~srs has a bias of O(1). By (2.11), the unconditional expectation E(~t) = p l i m ( ~ l ) + O(n~-1 ) and the sample mean o f the standardized data ( z ~ i - /~z)/(2~:22 ~ 1/2 converge in probability to fi if Zh are fixed for all h. We have p lim(21 ) = / ~ + lt~22~ and R ,~t-) 1/2 " p lim(~l ) = / q +/~'/*~22

(2.17)

280

C-H. Leu, K.-W. TsuilJournal of Statistical Planning and Inference 60 (1997) 273-290

Hence, 21 is not a consistent estimator of ~A1 under disproportionate allocation but it is consistent under proportionate allocation. Similar reasoning applies to the estimator 22 of/t2. However, 21 - 22 is consistent for #1 -- //22 under stratified sampling. In contrast, by (2.4), ~'~ME has a conditional bias of order Op(m-l), assuming that 7 ---- 1 + Op(m -1 ). Moreover, since m22 and Z do not depend on the sample selected, ~ME is in general unconditionally consistent for I2~1 (for pathological designs, ~'~ME may not be consistent for f211; see Nathan and Holt, 1980). Similarly, from (2.12), ili,ME is an unconditionally consistent estimator of /~i, i = l, 2. The pooled weighted estimator, f)~,,,, has a conditional bias o f Op(m -1/2) (see Skinner et al., 1986) but is unconditionally consistent (provided again that the design is not pathological), ili,~,o is unconditionally consistent for Pi, i = 1,2. The estimator ~cB has a conditional bias of Op(m-1) by (2.5), but is a consistent estimator of ~t-211--~r212~¢~1~2| and is an unconditionally inconsistent estimator for I211 except when f212 -- 0. ili,cB is asymptotically biased for #i = l, 2, except when #z -- 0 or fl -- 0.

3. Sampling effects on rates of misclassification

The usual approach to the two-group discriminant analysis problem was suggested by Fisher (1936). Under the assumptions that the true mean vectors for groups G~ and G2 are kq and/~2, respectively, and that the two groups have the same variance-covariance matrix f21~, Fisher suggested finding a linear combination of the discriminators so that the ratio of between-group variation to within-group variation is maximized, i.e., finding v to maximize ( v * ( # l - 112))2/(v-rf211v). Let x0 be a vector of the values of the discriminator of a new unit. Fisher's linear discriminant function is, therefore, L ( x o ) = (xo - (/21 + kt2)/2)T~lll(/A1

(3.1)

-- /.t2).

The classification rule is to allocate x0 (the new unit) to G1 if x0 to G2 otherwise.

L(xo) > 0 and to allocate

3.1. Sample-based discriminant functions In applications, the parameters are usually unknown. Hence, the samples of ng observations from Gi, the so-called "learning data", are used to define a sample-based linear discriminant function by replacing #i, i = 1, 2, of (3.1) with the estimated mean vectors and replacing f211 with the estimated covariance matrix. In Section 2, we suggested four sets of estimators for parameters #1, ~t2 and f211. By substituting these estimators for the parameters in the Fisher's discriminant function, we obtain four different sample-based discriminant functions. They are 1. 2.

Wsrs = (xo - (21 + 22)/2)TS~1(21 - 2 2 ) , T^--I WME ---~(X0 -- (ill,ME -}- il2,ML)/2) ~'~ML(Ill,ML -- il2,ML)'

C-H. Leu, K.-W. Tsui/Journal o f Statistical Planning and Inference 60 (1997) 273 290

281

T ^--1

.

WCB ~- ( x0 -- bszo - (fiI,CB -~ f i 2 , C B ) / 2 )

4.

~cB (fil,cB - fi2,CB),

Wrrw = (xo - (fil,rc{o -}- fi2 ..... )/2)T~c~rrol(fil .... -- fi2 ...... )"

In general, the above sample-based linear discriminant functions may be written as S ^--1

W(xo) = (x0 - (ill + fi2)/2) Qll (ilL -- fi2)"

(3.2)

The classification rule is to allocate x0 to GI if W(xo) > k and to allocate x0 to G2 otherwise, where k is a given constant which depends on the prior probabilities and the costs of misclassification. For simplicity, we assume equal prior probabilities and equal cost of misclassification, i.e., k is zero. The first function, Wsrs, is the usual Fisher's sample discriminant function. The second function, WML, is the ML discriminant rule under our sampling scheme. The third function, Wcu, given by Cochran and Bliss (1948) adjusts the discriminators by means of their "within-sample" regressions on the auxiliary variables. The last function, W~...... is a design-based discriminant function which adjusts the discriminators for the effect of sample design.

3.2. Probabilities of misclassification Jbr discrirninant Junctions The probabilities of misclassification (the rates of misclassification) are computed for the four sample-based linear discriminant functions under the actual sampling scheme. These probabilities of misclassification are then compared with the probability of misclassification for msr s under simple random sampling so as to evaluate the effect of sample design on the misclassification probabilities. Under the sample-based classification rules, there are two types of probabilities of misclassification, namely the (conditional) probabilities of misclassification given lhe particular values of fil,fi2 and ~11, and the expected probabilities of misclassification, where the expectation is taken under the joint distribution of these estimators. We conclude this section by deriving the naive estimators of the conditional probabilities of misclassification given the values of fil,fi2 and ~211. Since x0 is distributed as N(#x,~211), the distribution of L(xo) is N ( ( # ~ - (#1 + Ft2)/2)Y~2111(#l #2),A2), where A2 = ( / ' 1 - #2)sF2Hl(#1 - # 2 ) is the Mahalanobis distance. Hence, L(xo) is distributed as N(A2/2, A 2) when xo comes from G1 and L(xo) is distributed as N(-A2/2, A 2) when xo comes from G2. Thus, the probability of misclassifying a unit from G1 into G2 when the parameters are known is p(211) = P(L(xo) < 0]G1) = ~ ( - A / 2 ) , where 4(.) is the cumulative distribution function of the standard normal distribution. Similarly, the probability of misclassifying a unit from G2 into G1 when the parameters are known is p(l ]2) = P(L(xo) > 01G2) = q~(-A/2). If the parameters are estimated from the data, then a natural estimate of A2 is given by

D2 =

(ill -- f i 2 ) T ~ l l l ( f i l

- - fi2)"

282

C-H. Leu, K.-W. Tsui/Journal of Statistical Planning and Inference 60 (1997) 273-290

The (conditional) probabilities of misclassification for the discriminant function (3.2), given the values of 21,22 and f)11 obtained from the learning data, are as follows:

PW(xo)(211) = P(W(xo) < 0[xo~G1,21,22, ~ u ) = (1)

(

--(__.__~!_~_(21 ~- 22)/2)T/211___

(21 2 fi2)

((2, - 22)rf~7,' /2,~fiT,'(2,

- 22)) v~

)

(3.3)

'

Pw(xo)(1]2) =P(W(xo) > O]xoeG2,2~,22,Y211) = •

-(___Pl_-_ (fi~ +/22)/2)r/211 (/~1 -22___)) ((ill - 22)rO;1/211f)Hl(2~ - 22)) 1/2

(3.4) '

If the learning data, s~ and s2, are, respectively, independent simple random samples from G1 and G2 then (3.3) and (3.4) both converge in probability to ~ ( - A / 2 ) as the sample sizes of Sl and s2 tend to infinity. Consequently, a naive estimate of (3.3) and (3.4) is ~(-D~rs/2), where Ds2r~ = (H1 -H2)Ts~I(H1 - H 2 ) is the usual sample Mahalanobis distance. However, if the learning data come from a complex sample survey, Sll may be asymptotically biased for /211 and therefore Ds2s may not be a consistent estimator of A 2. As an illustration, define

in the stratified simple random sampling case considered in Section 2. If the allocation is disproportionate, then by (2.14) and (2.16), we have p lim(DZrs) = (]21 - ]22)v(/2~1 + flf2~2flT)-l(]2~ -- ]22). Thus, Ws~s has an asymptotic multivariate normal distribution with mean ]2wsr~! = (]21 2 ]22)/2 -- k,/~Z2R":'t11/2)Y(/211 -it- fl/2~2fiT)--l(]21 -- ]22) a n d v a r i a n c e awsrs = (]2! - kt2)Z(/21! + o T ) - - i (/z"1 -- ]22). Let Psrs(ilj) be the probability of misclassifying a unit from Gj fl /2*22/1 into Gi (i ¢ j) using the discriminant function W~rs. Similarly, let pML(i[j), p~o(ilj) and PCB(ilj) denote the probabilities of misclassification using the discriminant functions WML, W.o~ and Wcm respectively. We have Psrs(2[1) = P(Wsrs < 0 [ G I ) ---* dP(--]2wsrsl/tYwsrs) and Psrs(112) = P(msrs > 0IG2) ~ ¢I)(]2wrsr2/awsrs) - 1/2 ) r (/211 q-P ~ / 2 .22,~°S ) - - l - (]21 --]22). This in probability, where ]2wsrs2 = ((]22 -- ]21)/2 -- fl]2/222 indicates that the main effect of our sample design is the possiblity of misspecification using the (asymptotic) distribution of discriminant function W~rs. Note that the probabilities of misclassification depend on the sample proportion of each stratum. Since the parameters are unknown, we may estimate ]2wsrsl, ]2wsrs2 and awsrs2by the estimators , - 1/2 - 1/2 m - 1 2wsrsl = ((H1 -- H 2 ) / 2 -- Ds]2m22 )TSI~I(H 1 -- H 2 ), 2wsrs2 = ((H2 -- Xl )/2 - bs]2m22 ) Sll ( x I ~2), and awsrs ^2 = (xl --Hz)TS~ll(H1 - H 2 ) , respectively, fi .... 1 and 2w~r~2 are consistent

C.-H. Leu, K.-I44 Tsui/Journal of Statistical Plannin9 and Inlerence 60 (1997) 273 290

283

estimators. Hence, cb(--fiwsrsl/dwsrs) and (I)(/~wsrs2/~wsrs) a r e the estimates of p~rs(211) and Psrd 112), respectively. Note that under proportionate allocation, /~ given in (2.16) equals to zero and ~2~2 = 0. Hence psrs(2]l) = P s r s ( l l 2 ) ~ eb(-Dsrd2) in probability. The misspecification effect disappears in this case. In contrast, using (2.4) and (2.11 ), the asymptotic distribution of WM~ is N(A2/2, A2) when x0 comes from G1 and is N(-A2/2, A 2) when x0 comes from GR. Identical result is obtained for W~o) using (2.6) and (2.13). Hence, the probabilities of misclassification are PMC(211) = P(WML < 0IG1),PML(ll2) = P(WML > 0]G2),p~,,,(211) P(W,,, < 01G1) and p~o)(ll2 ) = P(W~o)> 0]G2). All these probabilities of misclassifcation converge to 4)(-A/2) in probability when the sizes of learning samples are large. Hence, the asymptotic distributions of these two alternative discriminant functions under our sampling design are close to the asymptotic distribution of the usual discriminant function msrs under the i.i.d, model. So is the probability of mispecification of WML or W~(o. This suggests that the alternative discriminant functions, WML and W~...... may be better than msr s under our sampling design. A consistent estimate of PMk(211 ) T^-I

and pML(I[2)is q~(--DML/2), where DzL = @I,ML- /~2.ML) f2ML(fil,ML- fi2.Ml )" A consistent estimate of p=,,,(l]2) and p=oo(2ll ) is ~(-D=,,,/2), where D2,,, = (fii ..... T^--I fiR .... ) G,,(fi~

..... --

fi:,~(,~).

By (2.5) and (2.12), the asymptotic distribution of WeB is N(A2cB/2, A~.B) if xo comes from Gl and is N(--A2B/2, A2B) ifxo comes form G2, where A2B = (ttl /.t2)T(~ll -~'~12~r~l ~r~21 )--1 (//1 --]22 ). The probabilities of misclassification are pcB(2[ 1) = P( WeB < 0]G1) and pcr~(ll2 ) = P(WcB >01G2). Both pcu(2ll) and PcB(ll2 ) converge in probability to 4~(-AcB/2) when the sizes of the learning samples are large. A consistent estimate of PCB(211) and pcn(ll2 ) is 4~(--DcB/2) where D2.B = (t?I.CB-T ^

1 ^

//2,CB) ~CI3(//I,CB- ])2,CB)" Hence, the samples design has no effect on the discriminant function of Cochran and Bliss (1948).

4. Some simulation results

In this section, we use a computer simulation study to compare the perfo~Tnance of the four discriminant functions considered in Section 3. We created two finite populations, each having 5000 units. For each unit, the associated vector (x, z) was generated from (2.1) with #1 = (1,2),kt2 = (5,4),/xz = 3 and a common covariance matrix ~'2 equal to 82 5) 2105 , 5510 using the generator GGNSM (IMSL 1984), The Mahalamobis distance, A 2 is 2.105. For various sample designs, 500 independent samples of size 100 were selected from each population. The performance of each of the four discriminant functions was

284

C.-H. Leu, K.-W. Tsui/Journal of Statistical Plannin9 and InJerence 60 (1997) 273-290

estimated from these 500 replications. The sampling designs were based on those used by Holt et al. (1980) and Skinner et al. (1986). Each finite population was stratified into five equal strata of size 1000 according to the increasing values of z~i, ~ = 1,2 (see Section 2). The notation (ml . . . . . ms) denotes a stratified random sampling design with mh units selected from the hth stratum, h = 1,...,5. The designs used are as follows: design D1 (100), simple random sampling (SRS) of size 100; design D2 (20, 20, 20, 20, 20), proportional allocation; design D3 (5, 15, 20, 25, 35), increasing allocation; design D4 (5, 5, 10, 30, 50), increasing allocation; design D5 (1, 2, 5, 16, 75), increasing allocation; design D6 (30, 15, 10, 15, 30), U-shaped allocation, and design D7 (44, 5, 2, 5, 44), U-shaped allocation. The same sampling design was applied to the two populations. For the situation where the two sample sizes, ni and n2, are equal and the costs of misclassification for the two populations are the same, the optimal choice of k in the sampled discriminant rule, W(xo)<~(>)k, is k = 0. In Section 3, we see that under stratified sampling design, the distribution of the discriminant function msrs converges to that of [Xo--(tll+#z+ZflfiQ~2)/z]T(Ollq-flO~2flW)--l(#1--#2) as n = n,+n2 --+ oo. Since ~

1/'2

T

P{[xo - (#~ + m + 2/~#022 )/2] ( ~ 1 + / ~ o J * ) - ' ( # !

- #2) < o}

= P { [ x o -- (#1 -~ ]L2)/2]T(011 -~ flO~2flT)-l(I.21 -- # 2 ) ~

1/2

T

*

T

< [fl#O22 )] ( O l l -}- f1022fl )

--1

(#1 -- /~/2)},

instead of using the discriminant rule Wsrs < (~>)0, we can adjust k = 0 to k* = ~ 1/2 T - 1 ~ 1/2 T [ha#m22 )] S l l (-,~1 - x 2 ) which is an estimate of [fl#O22 )] (~ll + flQ~2fi"r)-1(#1 - #2) and use the adjusted discriminant rule, Wsrs < (/>)k*, in the stratified sampling situation. We shall call it the adjusted k method in this section. Since A2 = 2.105, theoretically the rates of misclassification, p(112) and p(2]l), of Fisher's discriminant function, L(xo), are therefore equal to p(211) = P(112) = ~ ( - A / 2 ) = qb(-0.7254) = 0.234.

(4.1)

Recall that the rates of misclassification, Pw(xo)(211) and pw(xo)(l]2), of the sampled Fisher's W(xo) discriminant function, are given in (3.3) and (3.4), respectively. There are two methods to assess the rates of misclassification of W(xo) in the simulation. One is the naive estimation of the rates of misclassification given in Section 3. For each training sample, the other method is to apply the classification rule, W(xo), to each observation x0 in the two finite populations. The rates of misclassification, pw(x0)(2[1) or pw(xo)(112), is the proportion of the units in the original population that the classification rule classifies incorrectly. The rate of misclassification of discriminant functions, Wsrs, WML,W.~o, WcB and the adjusted k method for Wsrs shown in Tables 1 and 2 are the average of 500 replications (according to 500 independent training samples). By the substitution principle (see Arnold 1981, p. 402), a good sampled Fisher's discriminant function should have nearly the sample properties as those of L(xo) given in (3.1). Hence, from (4.1), we use the following facts as the criteria to judge the performance of sampled discriminant function W(x0): (1) if both pw(x0)(2J1) and Pw(xo)(112)

C_-H. Leu, K.-W. Tsui/Journal of Statistical Planning and b{lerence 60 (1997) 273 290

285

Table 1 Naive estimation of the rate of misclassification Design

Wsrs(%)

WML(%) We,o(%) WCB(%)

DI(100) D2(20,20,20,20,20) D3(5,15,20,25,35) D4(5,5,10,30,50) D5(1,3,5,16,75) D6(30,15,10,15,30) D7(44,5,2,5,44)

23.2 23.4 21.8 21.9 19.9 24.5 25.8

23.1 23.2 22.6 22.7 22.2 22.8 22.5

23.1 23.3 22.6 22.9 20.2 23.0 22.4

16.8 17.0 16.7 16.6 16.4 16.7 16.4

Table 2 Achieved misclassification based on 500 replications Design

msrs(%)

WML(%)

mrr¢~,(%)

WCB(%)

Adj. k(%)

DI(100) (SRS of size 100)

in GI in G2 Average

23.4 23.8 23.6

23.4 23.7 23.5

23.4 23.8 23.6

4.1 44.9 24.5

23.4 23.8 23.6

D2(20,20,20,20,20) (Proportional allocation)

in Gt in G2 Average

23.4 23.8 23.6

23.4 23.7 23.5

23.4 23.8 23.6

4.0 44.8 24.4

23.4 23.8 23.6

D3(5,15,20,25,35) (increasing allocation)

in G1 in G2 Average

15.3 33.2 24.3

23.1 24.0 23.6

23.1 24.2 23.6

4.1 44.2 24.2

23.1 24.1 23.6

D4(5,5,10,30,50) (increasing allocation)

in GI in G2 Average

11.6 40.3 25.9

23.1 24.0 23.5

23.2 24.3 23.7

3.9 44.7 24.3

6.4 54.1 30.3

D5(1,3,5,16,75) (increasing allocation)

in Gl in G2 Average

8.21 49.3 28.7

22.8 24.4 23.6

22.7 26.0 24.4

4.0 44.8 24.4

4.2 62.3 33.2

D6(30,15,10,15,30) (U-shaped allocation)

in Gi in G2 Average

23.4 23.8 23.6

23.4 23.8 23.6

23.2 23.9 23.6

4.0 44.6 24.3

23.4 23.8 23.6

D7(44,5,2,5,44) (U-shaped allocation)

in GI in G2 Average

23.8 23.7 23.8

23.6 23.5 23.5

23.7 24.3 24.0

4.1 43.9 24.0

23.8 23.7 23.8

are close to e a c h o t h e r a n d are a p p r o x i m a t e l y e q u a l to 0.234, t h e n W(xo) is a g o o d s a m p l e d F i s h e r ' s d i s c r i m i n a n t f u n c t i o n ; ( 2 ) the c l o s e r pwc,~,}(211) is to p~,~x0)(ll2), the s m a l l e r is the effect o f m i s s p e c i f i c a t i o n . B a s e d on the a b o v e facts a n d T a b l e 2, w e c a n see that ( 1 ) u n d e r S R S ( D 1 ) , all o f the s a m p l e d d i s c r i m i n a n t f u n c t i o n s e x c e p t Wcr~ p e r f o r m v e r y similarly; ( 2 ) u n d e r p r o p o r t i o n a l a l l o c a t i o n ( D 2 ) , the rate o f m i s c l a s s i f i c a t i o n o f e a c h d i s c r i m i n a n t f u n c t i o n is t h e s a m e as that o f t h e c o r r e s p o n d i n g d i s c r i m i n a n t f u n c t i o n u n d e r D I . T h i s m e a n s that t h e d e s i g n effect does not a p p e a r in this situation.

286

C.-H. Leu, K.-W. Tsui/Journal of Statistical Planning and Inference 60 (1997) 273~90

(3) under increasing allocation (D3, D4, and D5), it appears that the higher the increasing rate, the higher the effect of misspecification. WML has the smallest average rate of misclassification in this case. Compared with the other discriminant functions, WML has the smallest effect of misspecification and W~o~is a close second. (4) under U-shaped allocation (D6 and D7), it appears that the greater the difference between max{mh} and min{mh} (mh is the sample size of the hth stratum), the higher the effect of misspecification. WML,W~,o~ and Wsrs perform similarly. Comparing the rates of misclassification in Tables l and 2, we observe that estimated rates of misclassification using the naive method are in general smaller than the achieved rates of misclassification. This suggests that the naive estimation method underestimated the rates of misclassification. The differences between the rates of misclassification of WCB in Tables 1 and 2 are much larger than those of the other discriminant functions. From Table 2, the best sampled Fisher's discriminant function is WML and W ~ is a close second. Hence, we recommend the use of WML or W ~ and the corresponding naive estimator of the rate of misclassification for stratified sampling data in discriminant analysis. If the sizes of all strata are the same in a stratified sampling, then we recommend to use proportional allocation design so that we may still use the usual sampled discriminant functions msrs and the corresponding naive estimator for the rate of misclassification.

Appendix A. The Derivation of the MLE form (2.2a) Since p(sl [Zl) and p(S2Iz2) do not depend on the parameters, these two factors can be eliminated from (2.2a). The likelihood can be expressed as L(O; Xsl, Xs2 , s1, s2, z) cx f ( X s l

[Zl;

O)f(Zl, O)f(Xs2 t7,2; O)f(z2;

0),

where 0 the vector of parameters indexing the distribution in (2.1). Let N = N1 + N2. Note that f(x~[z~; O)f(z~; 0), ~ = 1,2, is the likelihood considered by Anderson (1957) in the cases where some observations are missing. Our proof here is an extension of Anderson's (1957) results to the two independent populations with the same covariance matrix. For simplicity, consider the case where p -- 1 and q = 1. Note that p = corr(Xl, Z 1) = corr(X2, Z2), Var(Xi ) = Var(X2) = al2 and a~lz = a21~ since we assume that the two superpopulations have the same covariance matrix. We shall denote the normal density with mean/~x and variance 62 by n(xlm, ax) and 0 = (#1, #2,ktz, al2, aZ, p). The likelihood can be written as nl 2 L(O) = Hn(xli,Zlilt~l, I~z;al,2 ~z, P) i=1 n i +n2

N

× H n(x2i'z2ilJ~2'JAz;t72'a2;P) H n(zi[[Az'~z) i=nl +1 i-hi +nz+1 N nl hi+he

= 1-I.(z,l z, i=1

)l-I.(xlilv i--1

+ Zli,4z

II .(x2itv2 + z2i,4z , i=nt+l

(A.1)

C-H. Leu, K.-W. Tsui/Journal of Statistical Planning and Inference 60 (1997) 273 290

287

where /21 = ]~1 -- J~,t/z,

/4 = pcq/az,

/22 = ~2 -- /4#z,

a~lz = az/z = a2(1 _ p2) _~ t/.

(A.2)

The maximum likelihood estimates of #z, a 2, vl, U2, j~ and r/ are those values that maximize (A. 1). The maximum likelihood estimates of/q,/*2, and p are then obtained by solving (A.2). By an argument similar to that of Anderson's (1957), the procedure can be generalized to a multivariate case. The resulting MLEs are fiz = 2 in (2.2e), fil =~-'1 ÷ b s ( Z - z 1 ) ,

t~ = ( n I + n2 --

~ML --

1

~')22 = m22 in (2.2k),

fi = b~ in (2.2k),

~2 = x2 + b s ( Z - z 2 ) ,

~

T

2)[&l - bsS22bY]/(nl + n 2 ) = S l l - bsS22bs,

E

1

(nl + n2 -- 2)$1j + b~ m22

nl ÷ n2

( h i + n2 - 2)$22

nt + n2

l b~I"

= S l l + bs[m22 - 822] bT.

Appendix B. The Proof of equations (2.3)-(2.6) and (2.11)-(2.14) We need a lemma that was given in Skinner (1982). L e m m a B.I. Let Q be a random n × p matrix whose rows are independently distributed with common covariance matrix, U, but with possibly different means. Let M be an n x n systematic matrix o f constants and let H = QTMQ. Then E ( H ) = E ( Q T ) M E ( Q ) + tr(M)U.

(B. 1)

(i) P r o o f o f Eqs. (2.3) and (2.11 ). Skinner (1982) proved E(Yl Isl,al ) =/~1 +/4(2,1 /~z) and E ( m l , l ) = ~11 +/4(m2,1 --~222)/4 T for single superpopulation. Note that Q,~r,~ is a linear function of mlsl and mls2. Eqs. (2.3) and (2.11) are simply generalizations of Skinner's (1982) results to the case of two superpopulations. (ii) P r o o f o f Eqs. (2.4) and (2.12). By ( 2 . l l ) and E(b~ls, g ) = /4, it is easy to derive (2.12). For the proof of (2.4), let A1r = [xll . . . . . Xlnl],

A~ = [x21. . . . . x2,2], T

Pwl = In1 -- l n l l , l / n l ,

B T = [Z21 . . . . . Z2n:], T

Pw2 = I~2 - 1~21~2/n2,

288

C-H. Leu, K.-W. Tsui/Journal of Statistical Planning and Inference 60 (1997) 273-290

where In is the n x n identity matrix, 1, is the n x 1 vector o f ones. Note that P~I and Pw2 are i d e m p o t e n t and o f ranks n~ - 1, n2 - 1, respectively. W e have tr(Pwl) = nl - 1,

(B.2)

tr(Pw2) = n2-1,

)T

Z ( Z l i - 2, )(zli - 7.1 )3 = BTpw~B1,

ATpwlAI,

Z(Xli sl

-- X1 )(Xli -- Xl

Z(X2j s2

-- X2 )(X2j -- -~2 )V

T = A2PwzA2 ,

-- -~'1 )(Zli -- ZI )T =

ATpwlBl,

sl Z ( Z Z j - - -~2 )(Z2j -- Z1 )T = B2Pw2B2, T s2

and Z(Xli

~(X2j s2

sl

-- X2 )(Z2j -- Z2 )T

= A 2T Pw2B2.

Hence

bs = (A~PwlB1 + ATpw2B2)(BTpwIBI + BTpw2B2) -1, ( n l q- n 2 ) S l l

(B.3)

T

(B.4)

= ATpwlAI +A2Pw2A2

= ( h i q- n2 - 2 ) S l l

and (n~ + n2)$22 = (nl + n2 - 2)$2e = BTpwlB1 Jr- BTpw2B2 ,

(8.5)

U s i n g the fact Pwllnl = 0, we have

PwlE(A1 Is, z) = P w l [ B l t T + 1,1(p~ - t i m ) ] = PwlB1t v.

(B.6)

Similarly, Pw2E(A2 Is, z) = Pw2B2fl T. Now, (n, + n2)[S,1 - bsSz2b [] = AT pw, A, + AT pw2Az - (AT pwaB1 + AT pw2B2)

[(B~Pw1B1 + B2Pw2B2) T X] --1(A1PwlBI T + ATpwzB2) T Let mzz = [(BTpwIB~ + BTpw2Bz)T] -1, then (nl + n2)[S,l - bsSzzb f] =AT[pwl -- PwlB,mzzBTpwl]Al

+sT[pw2 -- Pw2B2mzzB2TPw2]A2 T

T

T

T

T

T

+A1 PwlB1 mzz(A2PwzB2 ) + A2PwzB2m=(AI PwlBl ) • B y L e m m a B.1, the independence o f two superpopulations and (B.6), we have E [ ( n l -~-/'/2)(311

_

bsS22bT)[s,z) = tr[Pwl -- PwlBlmzzBlPwl]~ll.2 T +tr[Pw2 -- Pw2B2mzzB2TPw2 ] Q11.2

+flBTPwlB ' BIT + flBT pw2B2flT

-BTF'Blm=B[PwlB,f T

-

-

T

fiB 2 Pw2B2 mz~B 2 Pw2B2 fl

T

C-H. Leu, K.-Iq~ TsuilJournal o/Statistical Planning and InJerence 60 (1997) 273~90

289

-- [~BTPwl a l mzz aT2Pw2S2 fit -flBTpw2B2mzzBTpwl B 1fly -= tr(Pwl + Pw2 - Iq)(211.2 + flB~P~IBlfi v +[~BTpw2B2/~T __ [~(sT1Pwl Bi + B2Tpwl2B2 )/fl

=(nl +112 -- 2 - q)~QII 2-

Var(Alls, z)

Note that the conditional covariance

= Var (A21s, z)

(B.7) g211.2

£2t~ +

/3~722fiT. Moreover, T T T bsm22bT = (ATpwlB1 + A 2Pwagg)mzzm22mzz(AiPw1B1 + ATpw2B2) T

,; VPwl )A1 + A V V VPw2 )A2. : AT(pwlB~ mzzm22mzzBi 2 (Pw2B2mzzm22mzzB2 +~il . T n/"wl B 1tl 1zzm22mzz T B I2Fw2~'12 - a + A T1 P w l B l m z T z m 2 2 m z z B 2Tp w2A 2. By similar arguments as above, we have

E[bsm22b~ Is, Z] = tr[Pwl BI mzzm22mzzl~ T TI Pwl ]~QI1.2 1"

T

+tr[Pw2B2mzzm22mzzB2 Pw2]Qll 2 Wp

T

T

T

T

T

T

T

+fib I wlBimzzm22mzzB 1PwlBlfl +fiB 2 Pw2B2 m=m22m=B 2 Pw2B2fi

+I~BTp,,,~BImzzm22mzzB V Tp 2 w2B2[,jf T T T T +fiB2 Pw2B2 mzzm22mzzB I Pwl B1 fi = tr[m22mzz]~l 1.2 + flm22fl r : tr[m22S22]f211.2/(nl + n2) + flm2213w.

(B.8)

Note that m~_~= [(nl + n2)$22] I. C o m b i n i n g (B.7) and (B.8), we have (2.4). (iii) P r o o f o f Eqs. (2.5) and (2.13). Note that f}cB = [(nl +nz)(2ML--bsm22b~!]/(nl + n 2 - 2). By (B.7) and adjusting for scale, we set (2.5). (2.13) can easily be obtained from (2.12) since fii,CB = fii,ML- b~Z. (iv) P r o o f o f Eqs. (2.6) and (2.14). The proof o f (2.14) is given in Skinner (1982). Note that ~)~,.~ is a weighted m e a n o f two variance estimators. The conditional m e a n o f each variance estimator given s and z was obtained in Skinner (1982). The result in (2.6) follows immediately.

References

Anderson, T.W. (1957). Maximum likelihood estimates for a multivariate normal distribution when somc observations are missing, J. Amer. Statist. Assoc. 52, 200-203. Anderson, T.W. (1984). An Introduction to Multivariate Statistical Ana(vsis, 2nd ed. Wiley, New York. Arnold, S.F, (1981). The Theory of Linear Models and Multivariate Analysis. Wiley, New York. Bebbington, A.C. and T.M.F. Smith (1977). The effects of survey design on multivariate analysis, In: C.A. O'Muircheartaigh and C.D. Payne, Eds,, The Analysis Of Survey Data, Vol. 2. Wiley, New York.

290

C-H. Leu, K.-W. Tsui/Journal o f Statistical Planning and Inference 60 (1997) 273-290

Cochran, W.G. and C.I. Bliss (1948). Discriminant functions with covariance. Ann. Math. Statist. 19, 151-176. Fisher, R.A. (1936). The use of multiple measurements in taxonomic problems. Ann. Eugenics 7, 179-188. Holt, D., T.M.F. Smith and P.D. Winter (1980). Regression analysis of data from complex surveys. J. Roy. Statist. Soc. Ser. A 43, 474-487. Nathan, G. and D. Holt (1980). The effect of survey design on regression analysis. J. Roy. Statist. Soc. Ser. B 42, 377-386. Skinner, C.J. (1982). Multivariate analysis of sample survey data. Ph.D. thesis, Dept. of Social Statistics, University of Southampton. Skinner, C.J., D.J. Holmes and T.M.F. Smith (1986). The effect of sample design on principal component analysis. J. Amer. Statist. Assoc. 81, 789-798.