Sample design for educational survey research

Sample design for educational survey research

Ev&~afion in Educarmn, Vol. 2, pp. 105.195 0 Pergaman Press Ltd. 1978. Prrnted in Great Brrtain. SAMPLE 0345.9228/78/04Ol-0105 DESIGN SURVEY FOR E...

5MB Sizes 0 Downloads 50 Views

Ev&~afion in Educarmn, Vol. 2, pp. 105.195 0 Pergaman Press Ltd. 1978. Prrnted in Great Brrtain.

SAMPLE

0345.9228/78/04Ol-0105

DESIGN SURVEY

FOR EDUCAIWNAL RESEARCH

CONTENTS

Page 1.

INTRODUCTION

106

2.

THE FIVE SAMPLE DESIGNS

120

3.

THE ANALYTIC b%tlDDEL

137

4.

THE COMPARISON OF SAMPLE DESIGNS

136

5.

THE ESTI~TION DATA

147

6.

A WORKER EXAI"IPLE

157

7.

CONCLUSION

165

8.

SUM~RY

167

ACKNO~LE5GE~NTS

168

REFERENCES

169

APPENDIX: SOME THEORETICAL CONSI~E~T~ONS

173

OF SA~iFLIN~ ERRORS FROM SAMPLE

SC&W/O

1. Introduction

"Apples that grow on the same tree resemble one another more than they resemble appZes on other trees, particularly if the trees are of different kinds. In the same way the students, parents, and teachers of one school tend to resemble one another more closely than they resemble those of diJ"ferent schools, particuZarly when the schools are of different kinds. These facts are the key to our problem. They mean that if we are swnpling an orchard it is essential to have enough trees in our sample; if we are sampling students cnzd teachers it is essential to have enough schooZs. We cannot make up for a Lack of trees, or schools, by increasing the number of apples from each tree, or increasing the number of students taken from each school". (G.F. Peaker, 7367ai

Social science research is aimed at developing useful generalisations about However, due to society and the ways in which individuals behave in society. practical constraints on research resources, the social scientist is rarely able to study a complete coverage of the individuals or groups for whom these generalisations are appropriate. Provided that scientific sampling procedures are employed, the use of samples rather than a census or complete coverage often provides many advantages for the social scientist without limiting opportunities for the development of wide generalisations. Cochran (1963) lists the major advantages of sampling compared with a complete coverage as: reduced costs, reduced requirements for specialised equipment and personnel, greater accuracy due to closer supervision and greater speed in data collection and analysis. of data gathering procedures, Kish (1959) divides the social science research situations in which samples are used into three broad categories: experiments - in which all extraneous sources of variation are controlled through randomisation; surveys - in which all members of a defined population have a known positive probability of selection into the sample; and investigations - in which data are collected without either the randomisation of experiments or the probability sampling of surveys. Experiments are strong with respect to external validity involving the question of the extent to which the findings may be generalised to some wider population. and their use is due freInvestigations are weak on both types of validity quently to convenience or ow cost and sometimes to the need for measurements in natural settings. This report is concerned w th the design of samples for educational survey research. In particular, t seeks to examine the problems associated with the evaluation of the degree of confidence which may be attributed to sample estimates of population characteristics obtained from a variety of sample designs which are commonly used in educational survey research.

106

Sample Design for Educational

POPULATIONS

AND

Survey Research

107

SAMPLES

The populations which are of interest to educational and social science reThe searchers may be defined jointly with the elements which they contain. population is the aggregate of the elements and the elements are the basic units that comprise and define the population (Kish, 1965). The elements of the population are usually the units of analysis - the elementary units comprising the population about which inferences are to be drawn. Kish

(1965) 1. 2.

states

that

a population

must 3. 4.

content units

be defined

in terms

extent time.

For example, in a study of the characteristics of Australian students we may wish to specify the desired population as: 1. 2.

all 14 year-old students in secondary schools

3. 4.

of:

secondary

school

in Australia in 1977.

In order

to prepare a description of the population to be considered in a study it is useful to distinguish between the population for which the results are required, the desired target population, and the population actually covered, the survey population. The survey population may differ from the desired target population. This difference may be due to non-coverage (for example, in the study referred to above we may compile a list of schools during early 1977 which accidentally omits some new schools which will begin operating later in the year) or the difference may be due to non-response (for example, several schools which have some handicapped students falling within the desired target population definition may be unwilling to allow these students to participate in the study). Strictly speaking, only the survey population is represented by the sample, but this population may be difficult to describe exactly, and it is often easier to write about the defined target population (Kish, 1965).

DEFINING

A TARGET

POPULATION

- AN AUSTRALIAN

EXAMPLE

During a cross-national study of science achievement carried out in 1970 by the International Association for the Evaluation of Educational Achievement (IEA), one of the desired IEA target populations for the study was described as: "All students aged 14.0-14.11 years at the time of testing. This was the last point in most of the school systems in IEA where 100 per cent of an age group were still in compulsory schooling" (Comber and Keeves, 1973:lO). In Australia it was decided that, for certain administrative reasons (Rosier and Williams, 1973), the study would be conducted only within the six states of Australia and not within the smaller Australian territories. It was also decided that only students in those school grade levels which contained the majority of 14 year-old students would be tested. The the

above desired IEA target population description was therefore reworded following fashion to obtain the defined Australian target population

in

108

Evaluation

in Education

description: "All students aged 14.0-14.11 years on 1 August Australian states and secondary school grades: New South Wales Victoria Queensland South Australia West Australia Tasmania

Location

1970

Forms I, II and III Forms I, II, III and IV Grades 8, 9 and 10 1st year, 2nd year and 3rd year Years 1, 2 and 3 Years I, II, III and' IV” (Rosier

in the

fallowing

and Williams

Defined Australian target population

Number of excluded students

78163 62573 33046 22381 19128 7868

76317 62030 31839 21632 18708 7789

1846 543 1207 749 420 79

3427

0

3427

226586

218315

8271

Desired IEA target population

1973:3).

States New South Wales Victoria Queensland South Australia West Australia Tasmania Other Territories Total

The date used in the defined Australian target population description, 1 August 1970, was chosen to coincide with the date of the annual school census so that official census statistics could be used in weighting and other sampling calculations.

The number of Australian

students in the the defined Australian target population Rosier and Williams, 1973).

desired IEA target population and are presented in Table 1.1 (source:

For Australia overall, the excluded population desired target population to yield the defined four per cent of the desired target population.

(the students excluded target population) was

from less

the than

Most of the excluded students in the six states were 14 year-olds who were in grade levels which were either lower or higher than those included in the target definitions or who were in special schools. The students in the 'other territories' of Australia (Australian Capital Territory and Northern Territory) were excluded from the target population because of certain administrative difficulties associated with testing the students in these territories. The Australian Capital Territory was excluded because a major study, which will be referred to later in this report, was in progress in the schools of this terThe Northern Territory was also excluded because the administrative ritory. costs would be high for the small number of students which would be tested in this territory.

Sample Design for Educational

109

Survey Research

A sampling frame was constructed from a list of all schools containing students This sampling who were members of the defined Australian target population. frame was stratified by grouping the list of schools according to the state in and also by grouping schools within each state accorwhich they were situated, ding to the system under which the school operated (Government/Catholic/Independent) and according to the geographical location of the school (Metropolitan/Non-Metropolitan). The use of stratification procedures in the preparation of a sampling frame is usually undertaken because of the desire to increase the accuracy of sample estimates or because of the need to provide separate estimates for certain These issues are discussed in more designed strata of the target population. detail in the following chapter of this report. The selection of the sample was constrained by the researchers' desire to obtain information which would allow comparisons to be made using both students These comparisons were required to be and schools as the units of analysis. As a carried out with the same degree of sampling accuracy for each state. result of these constraints the sample was selected as a stratified sample in sample requiring the selection within each two stages - with the designed state of approximately 38 schools followed by the selection of 25 students within each selected school. The calculations and the mechanical selection procedures which are needed to arrive at the required number of first stage selections (schools) and the required number of second stage selections (students within schools) are described in the presentation of a hypothetical national sample design in Chapter 6. Clearly this sample design differs in its degree of complexity compared with a simple random sample in which all 218315 students within the defined Australian target population would be listed and the required number of students This report aims to examine the would be selected at random from the list. nature of these departures from the model of simple random sampling, and also to assess the consequences of these departures for commonly used procedures of analysis.

THE ACCURACY

There veys:

are

usually

two main

objectives

OF SAMPLES

involved

in the

conduct

of sample

sur-

1. The estimation of certain population vaZues (parwnetersi. In many educational research surveys we are interested in obtaining estimates of the mean level of achievement for the population and various percentile points of the distribution of the achievement for the population.

The testing of a statistica hypothesis about a population. AS 2. well as estimates of population parameters we may be interested for example in testing the hypothesis that there is no difference between the average achievement of certain subgroups.in our sample. Our capacity to examine sample data with respect to these two objectives depends directly upon our knowledge of the accuracy of sample estimates with Knowledge of the accuracy of estimates respect to the population parameters. is derived in turn from statistical theory which requires that each member of

110

Evaluation

in Education

the population has a known, and non-zero, probability of being selected into the sample. The accuracy of samoles selected without using probability sampling methods cannot be discovered from the internal evidence of the sample data. Therefore, since non-probability samples are not suitable for dealing with the objectives of estimation and hypothesis testing, they will not be considered in this report.

ACCURACY,

BIAS

AND

PRECISION

The sample estimate derived from any one sample is inaccurate to the extent that it differs from the population parameter. If we seek to estimate the mean of the population performance in a test norming program then the difference between the population mean and the sample estimate of the population mean measures the accuracy of the sample estimate of the population mean performance. Generally the value of the population parameter is not known and therefore the actual accuracy of an individual sample estimate cannot be assessed. Instead, through a knowledge of the behaviour of estimates derived from all possible samples which can be drawn from the population by using the same sample design we are sometimes able to assess the probable accuracy of the obtained sample estimate. For example, consider the population of two schools described in Fig.l.1 each containing two students. The two students in the first school are aged x1 and x2 years, while the two students in the second school are aged x3 years and xq years. School 1

School 2

Consider for example, that our survey objective is to estimate the mean age of the population of students from a sample of size n=2 drawn from the population. In order to select our sample we could write the name of each student on a ball, place the four balls in an urn, and then thoroughly mix the balls before randomly drawing out the names of the two students in our sample. This sampling procedure is called simple random sampling and it forms the basis of the more complicated sampling procedures which will be discussed later in this report.

If the above sampling procedure was continued indefinitely then each of the six samples listed in Table 1.2 would be drawn over and over again. With-this sampling procedure each sample would, by definition, have an equal chance of being selected. The average of the estimates derived from ail possible samples would be: ZKi 6

Possible samples n = 2 (ages of students selected)

Xi

(estimate of P)

x

XI and x2

x,

and x

x , and x x 2 and x

1

3 4 3

x 1 and x4 x 3 and x

x

4

6

The average of the estimates of the population parameter derived from an infinite number of samples, E(Z), is called the expected value of the estimator In our example the expecwhich is used to describe the population parameter. ted value is equal to the population value: Xl + . . . + XL+ 3x1 + . . . + 3x4 E(X) = 7 z-6 ( 4 =P 1 = 2 The mean value E(R) may or may not be equal to the population value P. The difference between the two we call the sampling bias = E(R)-II. A sample Note that this is not a property of a single design is unbiased if E(K)=p. sample, but of the entire sampling distribution, and that it belongs neither to the selection nor the estimation procedure alone, but to both jointly (Kish, 1965). In our example we have seen that the sample mean is an unbiased estimate of the population mean - despite the variations in the estimates obtained from individual samples. Since the accuracy of the sample estimates depends (on the average) upon the variations of the individual sample estimates, we require The usual a measure of the spread of the distribution of sample estimates. measure of the spread of a distribution is the variance of the distribution. For our example the variance of the distribution of sample means, V(E), is given by: ZR.

-a?-,’

+ ... +

(?26 -

$1

a measure The variance of the sampling distribution of sample means provides The calcuof the probable accuracy or precision of any one sample estimate. lation of precision levels will be discussed in a later section of this report.

In order to incorporate the two aspects of variance and bias into one statement associated with the accuracy of an individual estimate, statisticians have developed the concept of mean square error (MSE). The MSE is defined as the average of the squares of the deviations of the possible sample estimates from the value being estimated (Hansen et al, 1953). It can be shown

(Yamane,

1967)

that

the mean

square

error

can be written

as:

112

Evaluation

MSE(X)

in Education

= E(E

- p)*

= E[R

- E(QJ*

= Variance

+ [E(Z)

of y +

- p ]"

(Bias of Z]2

For the most well-designed samples in survey research the sampling bias is either zero (as in our sample) or small-tending towards zero with increasing sample size. The estimates used in this report exhibit this property and therefore our examination of the accuracy of these estimates will concentrate on sampling precision as measured by the variance term.

SAMPLING

DISTRIBUTIONS

AND

STANDARD

ERRORS

Since the accuracy of the estimates used in this report depend principally on precision we now.turn to consider how we may use the measure of variance to obtain measures of the probable accuracy of our estimates. In many practical survey research situations the sampling distribution of the estimated mean is approximately normally distributed. The approximation improves with increasing sample size even though the distribution of elements in the parent population may be far from normal. This characteristic of the sampling distribution of the same mean is associated with the Central Limit Theorems and it occurs not only for the mean but for most estimators commonly used to describe survey research results (Kish, 1965). From a knowledge of the properties of the normal distribution we know that we can be '68 per cent confident' that the range Z f J(v(~)] includes the population mean, where F is the sample mean obtained from one sample from the population. The quantity J[V(E)] is called the standard error, SE(Z), of the sampling distribution of Z. Similarly we know that the range Z + 1.96 SE(y) will include the population mean with 95 per cent confidence. The calculation of confidence limits for estimates as described above allows us to satisfy the estimation objective of survey research. Also, through the construction of difference scores d = Xl - Z2: and using a knowledge of the standard errors SE(X,) and SE(X,), we may satisfy the statistical hypothesis objective (Yamane, 1973). It should be remembered that, although our discussion has focussed on sample means, we could also set up confidence limits for many other population values, which for example are estimated by 7, in the form V ?r tJ(V(v)]. The quantity t represents an appropriate constant which usually is obtained from the normal distribution or under certain conditions from the t distribution. For most sample estimates encountered in practical survey research, assumptions of normality lead to errors that are small compared to other sources of inaccuracy (Kish, 1965). The approach to normality faster for scme variables mean described above then

of the sampling distribution of a statistic may be than for others. If the statistic is the sample a sample of at least 50 elements is generally

Sample Design for Educational

sufficient, however for correlation coefficients would be required (Moser and Kalton, 1973).

THE

ACCURACY

OF INDIVIDUAL

a much

SAMPLE

Survey Research

greater

sample

113

size

ESTIMATES

In the previous section we have discussed how the variance V(g), of an estimator may be used to make statements about the precision of individual sample estimates. In survey research we are usually dealing with a single sample of data and not with all possible samples from a population. Therefore we are unable to calculate the value of V(v) exactly. Fortunately statisticians have derived some formulae, for certain sample designs, which allow us to make an estimate of V(B) from the internal evidence of an individual sample of data. For the simple random sample design, in which each sample element is randomly and independently selected from the popthe variance of the sample mean ulation with equal probability of selection, may be estimated from a single sample of data by using the formula:

= -.N;

;(x)

where

n

N is the

n is the

s2 n

population

sample

and s2 = 1. n -1 the element

For sufficiently confidence that R&1.96

'('i

values

size.

size - jT)2

in the

is an unbiased

estimate

of the variance

of

population.

large values the population

of n we may therefore estimate mean p will lie in the range

with

95 per

cent

/'[(NT);]

where i7 is the sample mean of a simple from a population of N elements. Note N, the variance of the sample mean may called

the

finite

population

random sample of n elements selected that, for sufficiently large values of be estimated by & since the term N - n n N correction, tends toward unity.

Although there is general agreement among statistical authors about the formula for estimating the variance of the sample mean for a single simple random sample of elements, there are minor differences of opinion about the appropriate formulae for calculating the variance for more complex statistics. These minor differences generally become insignificant for the typically large population and sample sizes which are associated with survey research. Table 1.3 presents the formulae for calculating the standard error of a statistic, where SE(v) = ~'[V(jfl , from a simple random sample of elements for a range of complex statistics which are commonly employed in educational survey research. For this report the formulae were selected from one source (Guilford and Fruchter, 1973), however the main results of the report would

114

Evaluation

in Education

not be seriously altered recognised authors.

by the

use of formulae

presented

by any of the other

The formulae in Table 1.3 are based on a simple random sample of n elements which are measured on m variables, where variable X has a standard deviation of s. The multiple correlation coefficient Ri jkl refers to the regression equation which predictors.

uses

variable

i as the

criterion

and

variables

j, k and

1 as

The formulae were derived on the assumption that the sample design used to However collect the data consisted on a simple random sample of elements. most social science research, especially survey research, is conducted with data obtained from complex sample designs which, as in the Australian sample described earlier, employ techniques such as stratification, clustering and Computational formulae are available for varying probabilities of selection. and differences of means estimating the standard errors of means, aggregates for a wide range of these sample designs (see Kish, 1965). Unfortunately the computational formulaerequired for estimating the standard error of multivarregression soefficients, etc. iate statistics such as correlation coefficients, are not readily available for sample designs which depart from the model of These formulae either become enormously complicated simple random sampling. or, ultimately, they prove resistant to mathematical analysis (Frankel, 1971).

TABLE 1.3

Formulae for estimating standard errors with a simple random smplinq procedure

Sample statistic

l~i~en data

LZPEgathered

Estimate of SE(v)

Mean

Ji

(Guilford and Fruchter, 1973:127)

Correlation coefficient

J A

(Guilford and Fruchter, 1973:145)

Standardised regression coefficient

(Guilford and Fruchter, 1973:368) Multiple correlation coefficient

1 J(n

(Guilford and Fruchter, 1973: 367)

educational researchers have estimated standard errors for In the past many multivariate statistics by applying formulae which are appropriate only for data obtained from a simple random sample design despite the fact that they had not used simple random sampling in their research. The problems and some appropriate solutions associated with this misuse of computational formulae will be discussed at length in later sections of this report.

Sample Design for Educational

MULTI-STAGE

COMPLEX

SAMPLE

115

Survey Research

DESIGNS

A population of elements can usually be described in terms of a hierarchy of sampling units of different sizes and types. For example a population of school students may be seen as being composed of a number of classes each of which is composed of a number of students. Further, the classes may be grouped into a number of schools. In the previous discussion we have considered the use of simple random samples in which students yere selected individually from the population. In practice we usually select the individual units of the population as clusters, or in several stages. These modifications in sample design are often used because they reduce the costs of a research study by minimising the geographical spread of the sample elements. Consider the hypothetical population The population consists of eighteen (with three students per class) and Schools (psu's)

of school students students distributed three schools (with

described in Fig.1.2. among six classrooms two classes per school).

AyyzA School 1

Class 2

Classrooms (ssu's)

Class 1

Students (tsu's)

mmm/n/n/n 4 12 3

5

6

Class 3

7

a

9

Class 4

10

11

Class 6

Class 5

12

13

14

15

16

17

la

From this population we could select a simple random sample of four students (by the method described in a previous section) or we could employ a multistage cluster sample design to select a sample of the same size. In order to select a multi-stage cluster sample we consider the population to be divided into primary sampling units (schools), secondary sampling units (classrooms) and tertiary sampling units (students). At the first stage of sampling we could randomly select two schools; at the second stage of sampling we could randomly select one classroom from each of the selected schools; and at the third stage of sampling we could randomly select two students from each selected classroom. The actual mechanical procedures at different stages are discussed

required for the selection of sampling units more fully in Chapter 2 and Chapter 6.

If we employed either the simple random design or the three stage cluster sample design described above to select a sample of four elements, then for both sample designs this would ensure that each population element had an equal chance of appearing in either of the samples (see Chapter 2). That is, simple such as the population mean, would sample estimates of population parameters, provide unbiased estimates for both sample designs.

116

Evaluation

in Education

THE

COMPARISON

OF SAMPLE

DESIGNS

In the above example we have seen that for a given sample size both the simple random sample design and a three stage cluster sample design may provide unbiased sample estimates of the population mean. However, as will be shown in later chapters, the variance of these estimates may vary greatly. Therefore to compare these two sample designs we need to examine the stability of the estimates which they provide for samples of the same size. Fisher (1922) described and compared sample designs in terms of their 'efficiency'. Two sample designs A and B were compared by considering the inverse of Using E to denote the efficiency their variances for the same size of sample. of a sample design for the sample mean and n to denote the sample size we can compare the efficiency of two samples by the ratio: V(K*) EA -=v(io EB

(n A = nB)

It is important to remember that for a given sample teristics may have different levels of efficiency.

design,

different

charac-

More recently Kish (1965) has suggested the use of the simple random sample design as a baseline for quantifying the efficiency of complex sample designs. Kish introduced the word 'Deff' (design effect) to describe the ratio of the variance of the sample mean for a complex sample to the variance of a simple That is: random sample of the same size (Kish, 1965). Deff

=-

WC) V(F,,,)

Where ance

the

sample

of the

variance

sizes

sample

of the

mean

sample

nc and nSrS for mean

are both

a given for

complex

a simple

equal

and where

sample

random

design

sample

V(Z,)

is the

and V(F,,,)

of equal

vari-

is the

size.

For many commonly used sample designs and for many commonly used statistics in survey research we find that Deff is greater than unity. Consequently the use of formulae based on the simple random sample model to estimate standard errors may result in gross underestimation of sampling errors. Some values of Deff for a range of sample designs and statistics have been calculated and presented in Chapter 4.

THE

DATA

USED

IN THIS

REPORT

This report has been made possible through the availability of suitable educational survey research data associated with a population of Australian students. These data were collected as part of a study which examined the contributions of the home, the school and the peer group to change in the educational achievements of students during the first year at secondary school in the Australian Capital Territory (Keeves, 1972). Although Keeves focused his study on a sample of 215 students, he also gathered data on a group of variables for the

Sample Design for Educational

whole population of first of peer group influences.

year

students

in order

to assist

Survey Research

with

117

the examination

The population data obtained in the study carried out by Keeves areused to provide empirical examples of the sampling concepts discussed throughout this report. The types of data employed and the analytic procedures examined are constrained both by the availability of suitable variables in this population data file, and also by the author's aim to test the influence of commonly used sample designs on sample estimates derived from commonly used data analysis procedures. The desired population in this study consisted the schools of the Australian Capital Territory distributed amongst 15 secondary schools: nine Catholic high schools and two Independent high

of all first year students in in 1969. The population was Government high schools, four schools.

Desired target population (source: CBCS records)

Type of school

Survey population (source: Keeves' 1972 data)

Government schools Non-government schools (Catholic and Independent)

1714 764

1611 743

All Schools

2478

2354

By comparing

the Commonwealth Bureau of Census and Statistics records (CBCS, 1970a, 1970b) with the population data achieved by Keeves we see that, as in most survey research studies, there have been some losses between the desired population and the survey population. The discrepancies in the two sets of figures in Table 1.4 may be attributed to absenteeism on the day of testing, the movement of students' families out of the Australian Capital Territory be-

tween the census date and the date of data gathering programme, and the exclu-

sion of one small classroom nature of this class.

THE

of children

VARIABLES

USED

from

the

IN THIS

study

because

of the

atypical

REPORT

The term variable refers to a property whereby the members of a group or set differ one from another (Ferguson, 1971). Variables, as measured in educa-

tional survey research, are often crude indicators of the constructs which the researcher intended to measure. For example when a student takes an achievement test in mathematics, the resulting score is only an indicator of what he knows about mathematics. There is often a great deal of argument about the meaning of such an indicator with respect to the student's mathematical knowSimilarly, in an attempt to measure abstract concepts like the socioledge. economic status of a student's family background, researchers often create composite indicators from a range of measures of family income, education, and occupation.

118

Evaluation

in Education

The questions raised in the above discussion are concerned with problems of reliability and validity of measurement in educational research and are fully In this report we will examined in standard educational research textbooks. be concerned only with the inter-relationship between sample design and indicators which are commonly employed in educational survey research.

TABLE

I.5

“he

~onstr-u~~~

mi:

their

indicators

in

this

report

Indicator

Construct Name

Definition

Sex of student

SEX

Coded on a two point scale with male = 1, female = 2

Socioeconomic status of student's home

FATHERS OCC

The occupation of the student's father coded on a six point occupational prestige scale (Broom, Jones and Zubrzycki, 1965, 1968)

Student's attitude toward school

LIKE SCHL

A 17 item scale designed to measure the student's attitude towards school

Student's own expected level of education

EXP EDN

A seven point rating designed to measure the student's expected final level of education

Student's mathematical ability

MATHS

A test.of 55 mathematics items each of which has been scored 1 for correct and 0 for incorrect

In Table 1.5 the constructs, and their indicators, which were used in this report are described in detail. Further information concerning the operational definitions of the indicators can be obtained from the study by Keeves (1972). The constructs were selected so as to provide a range of commonly used a dichotomously coded indicator (SEX), an operational prestige indimeasures: cator (FATHERS OCC), an attitudinal indicator (LIKE SCHL), a self-rated expectation indicator (EXP EDN), and an indicator of a school achievement (MATHS).

THE

SAMPLE

DESIGNS IN THIS

REPORT

The sample designs examined in this report were selected because they represent five sampling procedures which are commonly used in educational survey research. These sampling procedures lead to probability sample designs because they ensure that each element in the survey population has. a known non-zero probabilThe characteristics of each design are discussed in detail ity of selection. in the following chapter.

Sample Design for Educational

THE

The estimation main ways:

ERROR

ESTIMATION

of sampling

errors

TECHNIQUES

in this

Survey Research

IN THIS REPORT

report

has been

carried

out

in two

(1908) technique of developing sampling 1. The use of Student's distributions from sample estimates obtained by repeated independent applications of each sample described above, and The use of the empirical error estimation techniques of jack2. knifino (Tukev. 1958) and balanced reoeated reolication fMcCarthv, 1966, i969a, i969b) to estimate the sampling errors of statisticsobtained from single samples of data. The results Chapter 5.

of the

use of these

techniques

are described

in Chapter

4 and

119

2. The Five Sample Designs

PROBABILITY

SAMPLES,

RANDOMISATION

AND

SAMPLE

FRAMES

The sample designs examined in this report are all probability samples with These sample each element having a known non-zero probability of selection. designs also represent designs which are commonly employed in educational survey research. Probability sampling requires that the actual selection of elements into the sample be made by a mechanical randomisation procedure that assigns the desired probabilities. If we wished to select a simple random sample without replacement by using the method described in Chapter 1, this mechanical procedure would become a difficult and cumbersome operation.

In most survey sampling operations we achieve our random selections by employing a table of random numbers in order to substitute for the shuffling process. frame as Also, instead of a listing of names on cards, we often use a sampling a means of deriving our sample selections with the assistance of the table of For example, we could construct a sampling frame from enrolrandom numbers. ment lists obtained from the schools which constitute our population, assign an individual serial number to each student on these enrolment lists and, by reading off a set of numbers (ignoring duplicates) from a table of random numbers, obtain a simple random sample from our population. In this report the survey population data consists of information obtained from the 2354 students (see Table 1.4) who participated in the data collection phase of the study carried out by Keeves (1972) in the Australian Capital This survey population has been arranged into a sampling frame in Territory. Table 2.1. The schools in the survey population have been broken into three strata; SYSTEM 1: Government schools, SYSTEM 2: Catholic schools, SYSTEM 3: Independent schools. Each school system has been listed by school and by class within school. Beside each class is given the number of students in the class and, at the end of each school list, the total number of students in the school is given in brackets. The sets of square brackets describe pairs of classes which have been combined into 'pseudoclasses' for the purpose of the application of the classroom cluster sample designs described later. A 'pseudoclass' is a combination of two or more classes which are combined because some classes contain fewer students than required for the sampling procedures being used. Five sample designs are used in this report to draw samples of 150 from the The selected sample size was chosen as a balance between sampling frame. the types of analyses which will be carried out three competing requirements: on the data (Kerlinger and Pedhazur (1973) recommends between 100 and 200 120

Sample Design for Educational

Survey Research

121

subjects for regression analyses which do not involve large numbers of variables); the aim of considering a research model which would be within the economic and administrative resources of the typical educational research worker; and the desire to keep tne sampling fraction (of 6.4 per cent) at a level which is considered to be small enough to minimise the finite population correction (Cochran, 1963).

ZWLE

2.1

The

frame used

sampling

CLASS CLASS CLASS CLASS

01 02 03 04

[CLASS 05 06 SCHOOL 02

SCHOOL 03

report

SYSTEM 1 (continued)

SYSTEM 1 SChOOL 01

in this

CLASS CLASS CLASS CLASS CLASS CLASS CLASS

07 08 09 10 11 12 13

CLASS 14 CLASS 15 [CLASS 16 17 CLASS 1%

37 36 39 33

SCHOOL 08

10 (183) 33 34 33 28 25 30 28 17 (195) 32 31 23 15 29 (130)

CLASS CLASS CLASS CLASS CLASS CLPSS

40 41 42 43 44 45

34 36 35 37 27 32

[CLASS 46 48 (274) 47 25 27

SCHOOL 09

CLASS CLASS CLASS CLFSS CLASS

49 50 51 52 53

32 33 32 31 26 (154)

SCHOOL 10

CLASS 54 CLASS 55 CLASS 56

38 40 35 (113)

SCHOOL 11

CLASS CLASS CLASS CLASS

35 33 34 31 (133)

SYSTEM 2

SCHOOL 04

CLASS 19 CLASS 20 CLASS 21

38 36 36

SCHOOL 05

CLASS CLASS CLASS CLASS CLASS

24 25 26 27 28

40 35 35 30 36 (176)

SCHOOL 12

CLASS 61 CLASS 62 CLASS 63

38 37 38

SCHOOL 06

CLASS CLASS CLASS CLASS CLASS

29 30 31 32 33

36 37 30 21 9 (133)

SCHOOL 13

CLASS 66 CLASS 67 CLASS 68

40 44 48 (132)

SCHOOL 07

CLASS CLASS CLASS CLASS CLASS CLASS

34 35 36 37 38 39

35 39 37 36 33 21 (201)

SCHOOL 14

CLASS 69 CLASS 70 CLASS 71

26 26 29 (81)

SCHOOL 15

CLASS CLASS CLASS CLASS

32 30 30 31 (123)

57 58 59 60

SYSTEM 3

72 73 74 75

122

Evaluation

in Education

SIMPLE

RANDOM

SAMPLING

Probability sampling requires that every element in the survey population has Simple random sampling represents a known non-zero probability of selection. the most basic type of probability sampling. However due to variations in terminology between statistical authors, the term 'simple random sampling' is often used to describe either of two different sampling techniques. In this report the term 'simple random sampling' will refer to simple Kish provides the following operational sampling without replacement. tion of simple random sampling:

random defini-

"From a table of random digits select with equal probability n corresponding to n of the N listing different selection numbers, The n listings selected from numbers of the population elements. the list, on which each of the N population elements is represented separately by exactly one listing, most identify uniquely n different elements" (Kish, 1965:36). The procedure of 'unrestricted simple random sampling' requires that the selected elements are placed in the selection pool again and may be reselected on In educational survey research it is not practicable or usesubsequent draws. ful to employ this technique because of the obtrusive measurement problems which result from measuring the same student, teacher, etc. on more than one occasion. Simple random sampling is also preferable to unrestricted simple random sampling because it produces more precise estimators. The standard error of the sample mean for simple random sampling is reduced by a factor of J(N - n ) N However for most educational compared to unrestricted simple random sampling. survey research applications the difference between the two standard errors is small because the sampling fraction, n, is usually very small. N Both simple random sampling and unrestricted simple random sampling give an This characequal probability of selection to each member of the population. teristic, called 'epsem' sampling (equal probability of selection method), is not restricted solely to these two sampling techniques. Equal probability of selection can result from either equal probability selection throughout the sampling process, or from variable probabilities that compensate for each Epsem sampling other through the several stages of multistage sampling. is widely applied in survey research because it usually leads to self-weighting in which the simple arithmetic mean obtained from the sample data is samples, an unbiased estimate of the population mean.

In this report the simple random sample thesurvey population described in Table sample of 150 elements.

design (the SRS design) was applied to 2.1 in order to obtain a simple random

Sample “.,:!,L: L . :;

Design for Educational

JOYtk CnZcuzatLmcfoor dzc sclcctim of cL.ements mrvmiom .samp1~ dcsig~:

Population size

iSi;“

Survey Research

123

::implf,

design)

Sampling fraction

Sample size 150

ProbabiliU, of selecting each element = for the SRS sample design

STRATIFIED

SAMPLING

The formulae for standard errors in Table 1.3 show that one way of increasing the precision of a simple random sample is to increase the sample size. Another way of increasing the precision of estimates in survey research is to use stratification. Stratification does not imply any departure from probability sampling - it merely requires that, before any selection takes place, the population be divided.into a number of mutually exclusive groups called strata and then following this division a random sample is selected within each stratum. Stratification may be used in survey research for reasons other than obtaining gains in precision. Strata may be formed in order to employ different sampling methods within strata, or because the subpopulations defined by the strata are designated as separate domains of study (Kish, 1965). Some typical variables which are used to stratify populations in educational survey research are school type (e.g. Government/Non-government) and school location (e.g. Metropolitan/Non-metropolitan). Stratification does not necessairly require that the same sampling fraction is used within each stratum. If a uniform sampling fraction is used then the sample design is known as a proportionate stratified sample because the sample size from any stratum is proportional to the population size of the stratum. If the sampling fractions vary between strata then the obtained sample is a disproportionate stratified sample. The gain in precision due to stratification may be examined simple random sample design with a proportionate stratified design when the sample size in each design is the same.

by comparing the simple random sample

If we apply these two sample designs to the same population then, from the discussion of some theoretical considerations presented in Appendix A, we may write:

V(X prop)

where

V(icprop) is the

ified

simple

random

variance

sample

of the

design.

sample

mean

for the

proportionate

strat-

Evaluation

124

V(Xsrs)

Nh 'h

in Education

is the design

va riance

is the

size

of the

sample

of the hth stratum

is the mean

of the hth

x

is the

population

mean

N

is the

population

size

n

is the sample

size

mean

for

(h = 1,

the simple

random

sample

..,, L)

stratum

for both

designs.

This expression shows that the gain in sampling precision due to stratification depends on the magnitude of the differences between Xh and X. That is, gains can be made in precision by choosing strata which exhibit a large amount of variation between stratum means and therefore also exhibit a large amount of homogeneity with strata. The precision of proportionate stratified simple random sampling will always be greater than simple random sampling, for a given sample size, except when In this special case all stratum means are equal to the population mean. ) will be equal to V(zsrs). V(X prop The gains in precision discussed above have been concerned with the sample mean as an estimator of the population mean. Later in this report we will examine the influence of stratification on the precision of more complex estimators. In this report the proportionate stratified sample design (the STR design) was applied to the survey population described in Table 2.1 by dividing elements in the sampling frame into three strata representing the three different school systems: Government, Catholic, Independent. A sample of 150 elements was then selected with simple random sampling from each stratum so that the number selected from each stratum was proportional to the stratum size.

TABLE 2.3

Stratum

CaLculationa for the setection of elements within strata ior the disproportionate stratified scmpLe design (WTD design)

nh (rounded)

Nh

"h

Government Catholic Independent

1611 539 204

102.7 34.3 13.0

103 34 13

Total

2354

150

150

Probability of selecting each element = for the STR sample design

t

= z

Sample Design for Educational

Survey Research

125

Table 2.3 describes the calculations required for determining the number of elements required to be selected from each stratum. In this study we have N=2354 and n=150, therefore the number of elements, nh, required to be selected from the hth stratum, containing Nh elements, is were rounded for use in this study.

150.Nh. The values of nh "h =2354

DISPROPORTIONATE STRATIFIED SAMPLING AND WEIGHTING The simple random sample design is called a self-weighting design because each element has the same probability of selection equal to For this design each 8. 1. element has a weight ofF in the mean, 1 in the sample total, and F.=;in the population total, where f =x

is the uniform sampling rate for all population

elements (Kish, 1965). In a disproportionate stratified sample design we employ different sampling fractions in the defined strata of the population. The chance of an element appearing in the sample is specified by the sampling fraction associated with the stratum in which that element is located. The reciprocals of the sampling fractions, which are sometimes called the raising factors, tell us how many elements in the population are represented by an element in the sanple. At the data analysis stage we may use either the raising factors, or any set of numbers proportional to them, to assign weights to the elements. The constant of proportionality makes no difference to our estimates. However, in order to avoid confusion for the readers of survey research reports, we usually choose the constant so that the sum of the weights is equal to the sample size (Peaker, 1968). For example, consider a stratified sample design of n elements which is applied to a population of N elements by selecting a simple random sample of nh elements from the hth stratum containing Nh elements. of selecting an element is nh, N stratum is Jil . n

In the hth stratum the probability

and therefore the raising factor for this

Nh Nh That is, each selected element represents-elements "h

in the

population. The sum of the raising factors over all n sample elements is equal to the population size. If we have two strata for our sample design then: cn 2 elements +

+n 1 elements + Nl

‘“;

+ ... +- Nl "1)

+

(!z+ n2

*'

N2 *+$-N

_

126

Evaluation

in Education

In order to make the sum of the weights equal the sample size, n, both sides of the above equation will have to be multiplied by a constant factor of n . N Then we have:

+- “I elements +

+ n2 elements +

nNIL (Nn + .. . +~)+(~+... 1

nN2 + N”2)

= n

2

Nnh Therefore the weight for an element in the hth stratum is r. h For the special case of proportionate stratified sampling which was discussed in the previous section we haven = nh for each stratum. The sample element N Nh weight is equal to 1 and we therefore describe this design as a self-weighting design. In this report the disproportionate stratified sample design (the WTD design) was applied to the survey population described in Table 2.1 in order to obtain a disproportionate stratified sample of 150 elements. The stratification procedures used in the proportionate stratified design (the STR design) were also employed in this design. After dividing the sampling frame into three strata: Government, Catholic, Independent, a sample of 50 elements was drawn from each stratum. The selection of elements within each stratum was not carried out with simple radnom sampling for this design, instead a two stage selection procedure of two classrooms followed by the selection of 25 students within each selected classroom was applied to each stratum. This multistage sampling procedure is discussed more fully in the following section. Within each stratum, this sampling procedure ensured that there was an equal probability of selection for the elements.

Stratum

Nh

nh

Government Catholic Independent

1611 539 204

50 50 50

Total

2354

150

The probability of selecting an element for the WTD sample design

Raising factor

32.22 10.78 4.08

=

Weight 2.0531 0.6839 0.2600

50 r h

Table 2.4 describes the calculation of the weights for the WTD design. Following Peaker's recommendation, the sum of the sample weights over the selected elements was made equal to the sample size. The weights were rounded to two decimal places for use in the analyses described in Chapters 4 and 5.

Sample

CLUSTER

Design for Educational

Survey Research

127

SAMPLING

When data are gathered in educational survey research with a simple random sample design, the individual selection and measurement of population elements In order to reduce costs by minimising the geooften becomes too expensive. graphical spread of the selected sample, survey researchers often employ cluster sampling designs. Cluster sampling involves the division of the population of elements into groups or clusters which serve as the initial units of selection. Sometimes the selection of clusters as the primary sampling units is followed by selecting a simple random sample of elements within the selected clusters. When there is more than one stage of selection we refer to the sample design as a multistage sample design. The most simple form of multistage sampling is the simple two stage cluster sample design. The influence of the selection of elements in clusters on precision may be examined by comparing the simple random sample design with a two stage cluster sample design when the sample sizeineach design is the same. Consider a sample of n elements drawn from a population of N elements divided Select m clusters with simple random sampling and into equal sized clusters. then from each cluster select ii elements from each selected cluster by using simple random sampling. We we apply these A we obtain: V(Q)

where

two

= V(Rsrs)

sample

designs

to the

same

population

then

of the

sample

mean

for

the

above

mean

for

the

simple

from

Appendix

[I + (fi-l)R]

V(Fcl)

is the variance cluster design

V(Xsrs)

is the design

va riance

of the

sample

R

is the

ultimate

cluster

size

R

is the

coefficient

of intraclass

simple random

stage sample

correlation.

The above expression shows that the sampling accuracy of the simple two stage for a given ultimate cluster size, on the value cluster sample design depends, of the coefficient of intraclass correlation. A discussion of the statistical properties and the historical background of this coefficient is presented in At this stage we may briefly describe the coefficiAppendix A of this report. ent of intraclass correlation as a measure of the degree of homogeneity within clusters. When the elementary units within clusters tend to be similar with the intraclass correlation between elementary respect to some characteristic, units within clusters for that characteristic will be high. Conversely, if the elementary units within clusters are relatively heterogeneous with respect to the characteristic, the intraclass correlation will be low positive or, in very unusual situations, even negative (Hansen et al, 1953). In educational survey research R is generally positive for achievement measures That is, the homogeneity of students within schools with rewithin schools. spect to achievement is greater than if students were assigned to them at

128

Evaluation

in Education

random. The source of this homogeneity may be due to selective factors in grouping, to joint exposure to similar influences, to the effects of mutualinteraction, or to some combination of these three sources. It is important to remember that the coefficient ofintraclass correlationmaytake different values for different variables, different populations and different clustering units. Since R is generally positive for a wide range of characteristics concerning students within schools, we find that the precision of the simple two stage When cluster sample is less than for a simple random sample of the same size. contemplating the selection of clusters rather than elements in an educational survey research study, the researcher must balance the losses in precision due to clustering against the advantages of reduced costs arising from the selection and measurement of fewer primary sampling units. In this reoort two self-weighting two stage cluster sample designs were each applied to the survey population described in Table 2.1 to obtain samples of In the first of these designs the school was 150 elements for each design. used as the primary sampling unit (the SCL design), in the second design the classroom was used as the primary sampling unit (the CLS design). For both of these sample designs the primary sampling units differ greatly in size. If we choose the primary sampling units with simple random sampling then a self-weighting design would require the use of the same sampling fracUnfortunately by using .this procedure the tion within each selected cluster. final sample size would depend on which primary sampling units were chosen first. Since it was considered important to constrain each sample design used in the study so that each sample design selected exactly 150 elements, it was necessary to modify the simple two stage sample design described above. One method of obtaining greater control over the sample self-weighting design is to select the primary sampling proportional to size (PPS), and then select equal sized the selected primary sampling units. The following formula a PPS sample design:

Element probability This

formula

indicates

+Zrf)

simolifies

a given

element's

size and yet ensuring a units with probability ultimate clusters from

probability

of selection

in

x~~~~::Ii~z~iz:,x(,:%::t:l;i:"'~

to:

Element probability That will size

is, if we have equal sized ultimate clusters then the 'Element probability' be constant for all elements. Further, we have control over our sample according to the following formula:

Sample

size

=(!~Z&$f)

x (:':"::::,:;lected)

Sample Desrgn for Educati~~aJ

Survey Research

129

2.5 describes the calculations for the two self-weighting two stage cluster sample designs examined in this study. The first sample design (the SCL design) used schools as the primary sampling units while the second sample design (the CLS design) used pseudoclasses as the primary sampling units. In both sample designs six primary sampling units were selected with probability proportional to size; within each selected primary sampling unit 25 elements were selected by using simple random sampling.

Table

Note that the first stage of selection for the CLS and WTD sample designs required that the sampling frame be adjusted by the creation of some 'pseudoclasses' in order to ensure that each primary selection would contain at least 25 sample elements. The eight pseudoclasses which were created prior to the execution of the CLS and WTD designs are indicated by square brackets in Table 2.1. TASX

2.5

Sample design

SCL design CLS design

Calou7.uttansfor fhr scltxztiovz of cilmcnts for the too self-ucighting two stage cluntnr sample dcsigno (SCL design and the CLS design)

Primary sampling units (selected with probability proportional to size) 6 schools 6 classes

Secondary sampling units (selected with simple random sampling) 25 elements 25 elements

Sample size

150 150

Probability of selecting each element for both the CLS and the SCL sample designs

The WTD sample design, which was described in the previous section, employed the same sampling procedures within strata as the CLS design. Within each of the three strata two classes were selected with probability pro~ortjonal to size, and then a simple random sample of 25 elements was selected from each of the selected pseudoclasses.

SUMMARY The desired population in this report consisted of all first year students in the schools of the Australian Capital Territory in 1969. This population was distributed amongst 15 secondary schools: nine Government high schools, four Catholic high schools and two Independent high schools. This desired population was reduced, for the reasons presented in Chapter 1, to the survey population described by the sampling frame in Table 1.4 and Table 2.1. Five probability sample designs which are commonly employed in educational research were selected for examination in this report. The five designs, and a summary of their characteristics are listed in Table 2.6.

Evaluation in Education TABLE 2.6

Dasign

of the characteristics in this report

Swnnnry

used

N&or

of selection; stager

Stratification variable

of the five

Probability of element selection

smpla

designs

Element weight

SRS

Ofl@

None

Self-weighting

STR

One

School system

Self-weighting

SCL

Two

None

Self-weighting Self-weighting

EITD

Two

School systes

50 %

150Nh n54nh

_ __ The survey Population for all designs is described in Table 2.1. The sallplesize for all designs is 150 elements.

3. The Analytic

THE

CAUSAL

Model

MODEL

In order to examine the influence of sample design on sampling errors in educational survey research, it was decided to compare the influence of the five sample designs discussed in the previous chapter on a commonly used multivariate analysis technique. The choice of an analysis technique was constrained by the data which were available for the survey population (see Table 1.5), and also by the author's wish to apply these data to a reasonably realistic educational research situation. Given these limitations, a simple causal model was designed which required the calculation of a range of often used statistics: means, correlation coefficients, regression coefficients and multiple correlation coefficients. The selected causal model is presented in Fig.3.1. The causal model in Fig.3.1 contains four stages which assume the following causal sequence: antecedent student characteristics (SEX, FATHERS OCC) -+ attitudes toward school (LIKE SCHL) + expectations for further education (EXP EDN) + achievement in mathematics (MATHS). This causal sequence assumes that the causal flow in the model is unidirectional and therefore a variable cannot be both the cause and effect of another variable. The straight arrows in Fig.3.1 denote causal links between variables in the direction shown by the arrowheads. The curved bi-directional arrow indicates the possibility of noncausal correlation between variables. The analysis technique required for the evaluation of the magnitudes of the path coefficients between variables, according to certain special constraints which will be described below, has become known as 'path analysis' (Moser and Kalton, 1971). The causal order described above would, in a well designed research study, require to be supported by a cogent argument based on available theory and previous research. However, for the purposes of this report, the validation of the causal sequence will have no effect on our study of the implications of choice of sample design - provided that we accept that the model leads to a reasonably realistic representation of analysis techniques which are commonly used in educational research. The model assumes that the sex of a student (indicated by the variable SEX) and the socio-economic status of a student's home (indicated by the variable FATHERS OCC) are exogenous variables, that is their variability is assumed to be determined by causes outside the causal model. No attempt will be made to explain the variability of the exogenous variables. The mediating causal influences (indicated by the variables LIKE SCHL and EXP EDN) and the achievement criterion for the model (indicated by MATHS) are endogenous variables in which variability is explained by exogenous and/or other endogenous variables 131

Evaluation in Education

132

in the system. Since it is never possible to account for the total variability of a variable, residual variables (a,b,c) are introduced to indicate the effects of variables not included in the model. It is assumed that a residual variable is neither correlated with other residuals nor with variables in the model to which it is not attached.

Fig.

3.1

The

causal

mode 2

If we let Zi denote a standardised score on variable i then we may represent each endogenous variable by an equation consisting of the variables upon which it is assumed to depend. For each independent variable there is a path coefficient, pi, indicating the amount of expected change in the dependent variable as a result of unit change in the independent variable. ZL

=

'SL 'S + 'FL

ZE

=

PSE 'S + PFE 'F + pLE 'L + PbE ‘b

zM

=

PSM 'S + PFM 'F + pLM 'L + PEM 'E + pcM 'c

'F + paL 'a

The equations are:

Sample

By using

the

conditions

that

E(ZiZj)

= rij,

Design for Educational

E(Zf)

= 1, and

Survey Research

the

independence

133

of

residuals as described above, the above system of equations may be solved for the path coefficients (Kerlinger and Pedhazur, 1973). The solution of these equations demonstratesthat the path coefficients are equal to standardised partial regression coefficients.

THE

PARAMETERS

OF THE

CAUSAL

MODEL

Prior to examining the influence of the five sample designs on the statistics of the model were calcurequired to describe the causal model, the parameters The following discussion briefly lated from the complete population data. considers some of the more interesting facets of these analyses. The variables and their names (SEX, FATHERS OCC, LIKE SCHL, EXP EDN and MATHS) The results of applying the causal model to the are described in Chapter 1. population data are presented in Table 3.1 and Fig.3.2.

Variable LIKE SCHL

SEX

FATHERS OCC

FATHERS OCC LIKE SCHL EXP EDN MATHS

-0.0110 0.1491 -0.0969 -0.0761

-0.1424 -0.4148 -0.3749

0.3980 0.2200

0.5148

Mean Multiple correlation coefficient

1.4766

3.1252

21.3229

4.2616

29.3416

0.2051

0.5596

0.5465

EXP EDN

MATHS

When inspecting the path diagram in Fig.3.2 care must be taken to remember the coding conventions used to score the variables in the model. The variables LIKE SCHL, EXP EDN and MATHS were all scored in the 'usual' direction which assigns a high positive numerical value to a relatively high rating on the variable. For example a score of 50 (out of a possible 55 items) on the MATHS variable demonstrates that the individual has performed at a relatively high level. By convention FATHERS OCC has been coded in the 'opposite' direction to the above. That is a high positive score on this variable assigns a low relative rating on the scale of occupational prestige. For the variable SEX a coding convention has been imposed by coding 1 for males and 2 for females. All of the above coding conventions must be kept in mind when we are considering the sign and magnitude of the 'causal' paths between variables. The path coefficient pSL = 0.1476 is positive which suggests that scoring high on the SEX variable individual's

(that score

is being female) on the LIKE SCHL

will cause variable.

a corresponding increase in an However the path coefficient

134

Evaluation

in Education

pSE = -0.1560

is negative

variable

cause

will

and this

a decrease

suggests

in score

that

a similar

on the EXP EDN

change

in the SEX

variable.

0.8375

u

Path coefficier.ts calcuZated from population data

In Fig.3.3 the path diagram has been redrawn by omitting path coefficients which are smaller in lllagnitude than 0.1000 (this level has been arbitrarily chosen in order to clarify the diagram). The presentation of the causal model in Fig.3.3 suggests that the variables SEX and LIKE SCHL have no direct influence on MATHS achievement. These two variables influence MATHS performance by working through the mediating variable EXP EDN. The variable FATHERS OCC has both a direct effect on MATHS performance and an indirect effect by working through the variable EXP EDN. The variable EXP EDN has the strongest effect on MATHS performance. However this variable is influenced by three earlier variables in the causal sequence. By considering the size of the path coefficients for the residual variables (a,b and c) it may be seen that a considerable amount of variation is left unexplained by the model. It should therefore be remembered that this path model, and most other path models in sociological and educational research, greatly oversimplify the nature of causation in the real world.

Sample

Fig.

3.3

Design for Educational

Survey Research

Path diagram after omitting smal2 path coefficients

To summarise the causa 1 model presented in Fig.3.3 we may say that the higher the occupational prest i ge of the father of a student (which probably means the higher the educational and cultural climate of the home) the higher the mathematical achievement of the student. The jnfluence of the father's occupational prestige occurs direct y and also indirectly by encouraging a positive attitude towards school and an ncreased level of educational aspiration. The sex of a student has no direct link with mathematics achievement, although it has links through other mediating variables. These indirect influences work through a student's attitude towards school (with female students expressing a more positive attitude towards school which thereby leads to higher educational aspirations) and through a student's level .of educational aspiration (with male students having a higher level of educational aspiration).

4. The Comparison

STUDENT'S

of Sample Designs

EMPIRICAL

SAMPLING

METHOD

In the previous chapter the parameters of the causal model were calculated from the complete population data. In order to find the parameters of the model it was necessary to calculate means, correlations, standardised regression coefficients and multiple correlation coefficients. We now turn our attention to the sampling errors associated with the estimation of these parameters from samples of data obtained by using the five sample designs described in Chapter 2. For the simple random sample design (the SRS design) a set of computational formulae are readily available for the calculationof the standard errors of all the statistics which are required for the causal model (see Table 1.3). However, similar complete sets of formulae are not readily available for the other four sample designs (the STR, SCL, CLS and WTD designs).

In order to circumvent the problem of the unavailability of deductive methods for these complex sample designs we will turn to the empirical sampling methods which were employed by Student in order to establish the sampling distribution of the mean for simple random sampling. "Before I had succeeded in solving my problem analytically, I had endeavoured to do so empirically. The material used was a correlation table containing the height and left middle finger measurements of 3000 criminals, from a paper by W.R. Macdonell (Biometrika, Vol.1, The measurements were written out on 3000 pieces of cardboard p.219). which were then very thoroughly shuffled and drawn at random. As each card was drawn its numbers were written down in a book which thus contains the measurements of 3000 criminals in a random order. Finally each consecutive set of four was taken as a sample - 750 in all - and the mean, standard deviation, and correlation of each sample determined" (Student, 1908:13). Student used the 750 sample estimates to generate the sampling distribution of This distribution assisted Student to correctly guess the mathematithe mean. cal form of the t distribution - it was not until 17 years later that the guess was analytically verified by R.A. Fisher (Mosteller and Tukey, 1968).

136

Sample

THE

EMPIRICAL

GENERATION

OF SAMPLING

Design for Educational

Survey Research

137

IN THIS REPORT

DISTRIBUTIONS

Following Student's technique of developing sampling distributions, the five sample designs were each applied to the survey population, described in Table 2.1, on 25 independent occasions in order to obtain 25 independent samples of The data sets derived from these inde150 elements for each sample design. pendent replications were then used to estimate the parameters of the causal model. For each sample design there were 25 independent estimates obtained for each of the parameters which describe the causal model. The standard deviations of the sets of 25 estimates obtained from each sample design provided an estimate of the accuracy of the sample estimates obtained for each design. The accuracy of a particular sample design in estimating a particular population parameter was measured by using the simple random design as a standard. Kish (1965a), concentrating mainly on the discrepancies which arise because research workers in the social sciences tend to assume that all samples are has introduced the word 'Deff' (design equivalent to simple random samples, effect) in order to compare the efficiency of a complex sample design with the efficiency of a simple random sample design.

DESIGN

Kish

defined

the

EFFECTS

design

AND

effect

(for

THE

EFFECTIVE

a statistic

SAMPLE

such

SIZE

as the sample

mean

"the ratio of the actual variance of a sample to the variance of a simple random sample of the same number of elements"(Kish, 1965a:258). That

is, Deff

= v(n

)

Tg-) where

V(Yc)

is the

sample and

variance

of the

is the

variance

sample

of equal

size.

random

V(Rsrs)

sample

= N - n u-n

population

n is the

sample

and

S2 is the into =

a given

S2 n

drawn

mean

without

for

complex

of the

population

which

defines

elements. Deff:

a simple

replacement

size,

the expression V(T,)

sample

size,

variance

N-n -sN

of the

of elements

N is the

Deff

for

5'

where

Substituting

mean

design,

V(Fsrs)

For a simple

sample

random

we have

X) as:

138

Evaluation

or

in Education

V(Zc) = N - n N

lg,Deff n

Kish (1965a) established that s2 computed from any large probability sample yields a good approximation of S?. The approximation is quite accurate when Deff is near one; in other cases with smaller samples it neglects a term of order 1. By using an estimate of Deff, obtained mostly from past experience, ii and s* as an estimate of S2 the above equation may be used to obtain an estimate of the sampling error of the sample mean when complex sample designs are used. Sample designs have also been compared by using the concept of the 'effective sample size' (Kish, 1965a) or the 'simple equivalent sample' (Peaker, 1953, 1967b). From the above equation we have: V(E,) = N - n N.ii‘

S2 Deff

Now consider a simple random sample of n* elements drawn from the same population. Let the variance of the sample mean for this sample, V*(iisrs), be equal to the variance of the sample mean for the complex sample design, V(Zc). Now, for a simple random sample of n* elements drawn without replacement: V*(Xsrs) = N - n N'i;-

S2

But since V*(Xs,,) = V(X,) we may write N - n N'ii'

S? Deff = -. N - n* N

S2 F

If N is large compared to n or n* is the size of the simple equivalent sample (or the effecthen n* = _g_ Deff tive sample size). The value of the design effect and the size of the simple equivalent sample for the sample mean was calculated for each statistic obtained from the five sample designs. The value of the numerator in the equation which defines 'Deff' was estimated from the sample variance of the empirical distribution of means obtained from the 25 replications of each sample design. The value of the denominator was estimated from the sample variance.of the empirical distribution of means from the 25 replications of the simple random sample design. For statistics other than the sample mean thesame formula was used with the appropriate empirical variance estimates.

Sample Design for Educational

THE

The

restilts of these

This

tables

EMPIRICAL

SAMPLING

calculations

describes

values

DISTRIBUTION

for

sample

of mf

means

for each

139

Survey Research

OF MEANS

are displayed

of the sample

in Table

designs.

4.1.

Values

of JDeff in preference to Deff are presented because they are more meaningful when used in a discussion of sampling standard errors (as distinct from a discussion of sampling variance). We may rewrite the above expression which defines Deff in terms of standard error notation. SE(xc) The

= JDeff.SE(xSrS)

use of JDeff

has

also

design effect because m Kish and Frankel, 1970). Kish

and

Frankel

it is useful

(1970)

to average

been

preferred

is less

recommend a series

in the

subject

that

when

of values

presentation

to extreme

reporting of JDeff

of measures

values

many

(Kish,

design

instead

of the

1969;

effect

values

of consi dering

The values of mf are averaged for particular individual values. rather than variance measures because variance measures are subject ations due to differences in units of measurement and sample size.

statistics to fluctu-

different Averaging may only be undertaken over particular statistics because statistics exhibit systematically different values of the design eff ect. Similarly we only average over particular sample designs for particular samples because the design effect depends not only on the statistic considered and the sample design but also on the nature of the target population from which the data were obtained (Kish and Frankel, 1970).

Design SCL

Variable

SRS

STR

SEX FATHERS OCC LIKE SCHL EXP EON MATHS

1.00 1.00 1.00 1.00 1.00

0.87 0.80 0.83 0.97 0,91

2.53 1.43 1.15 1.40 1.28

3.27 1.76 2.29 2.73 3.95

1.72 2.58 1.77 3.70 4.69

Average JDeff for means

1.00

0.88

1.56

2.80

2.89

CLS

WTD

The values of average JDeff are presented in Table 4.1 for each of the five sample designs (SRS - the simple random sample design, STR - the proportionate stratified simple random sample design, SCL the self-weighting two stage cluster sample design which selected schools as the first stage of sampling, CLS - the self-weighting two stage cluster sample design which selected classrooms as the first stage of sampling, WTD - the stratified two stage cluster sample design which selected classrooms as the first stage of sampling).

140 The

Evaluation

in Education

of average JDeff in Table 4.1 may now be used to compare the accurthe four complex sample designs with the accuracy of a simple random The stratified sample design STR shows a gain in accuracy design. was theoretically demonstrated in Chapter 2) whereas the other complex designs show substantial losses in accuracy.

values

acy of sample (which sample

The potential danger of disregarding considering confidence limits which for

JDeff.

For

example,

if we

the design effect may be demonstrated by are calculated with and without adjustment

consider

the

probability

proportional

to size

selection of classrooms, the CLS design, the average value of JDeff is 2.80. Assuming a normal distribution the use of lt1.96 standard errors in two tailed tests allows for errors 0.05 of the time. But if we ignore the value of m this interval is really equivalent to kO.70 standard errors which allows for errors 0.48 of the time. Table 4.2 gives values of the probability of incorrect statements when two sided confidence intervals are aimed at for each of the average

values TABLE

of JDeff. 4.2

Design

Probability the size

of

oJw incorrect the design

Average JDeff

SRS STR SCL CLS WTD

is

statements ignored

about

smple

warn

lzhen

Probability of incorrect statement when a two sided confidence interval of p = 0.05 is aimed at 0.05 0.03 0.21 0.48 0.50

1.00

0.88 1.56

2.80

2.89

The magnitude of the design effect may also be considered by calculating n* the size of the simple equivalent sample. The simple equivalent sample is that simple random sample of elements drawn from the population which would have the From the derisame variance of the sample mean as the given complex sample. where n is the size of the vation given in this chapter we have n* =n complex sample. Deff

Design SCL

CLS

WTD

23 1:;

14 ii

181

77 92

20 10

51 23 48 11 7

194

62

19

18

Variable

SRS

STR

SEX FATHERS OCC LIKE SCHL EXP ED8 MATHS

150 150 150 150 150

198 234 218 159

Size for average /Deff

150

Sample Design for Educational

Survey Research

141

From the contents of Table 4.2 and Table 4.3 it can be seen how the use of clustered sample designs may destroy the validity of confidence limits if the value of the design effect is not taken into account. The influence of clustering is especially dangerous when data are gathered from intact classrooms, as in the CLS and WTD designs. accuThe stratified simple random sample design, STR, showed better sampling However, this design is presented racy than the simple random sample design. interest because the administrative difficulties assomainly for theoretical ciated with this design are such that very few researchers are able to apply it to the dispersed populations which are typically found in educational and sociological research. The magnitude of the design effect for the CLS design is consistently greater than for the SCL design. Since the two designs are the same except for the nature of the primary sampling unit, this suggests that the clustering effect of students within classrooms is greater than that of students within schools. This in turn suggests that the value of the coefficient of intraclass correlation, R, is greater for classrooms than for schools. Table

4.4

presents

rooms

and schools.

values The

of R calculated

table

also

from

includes

the population

values

data

of Jl+(ii-l)R

for

which,

classin

Appendix A, has been shown to be equivalent to the value of JDeff for means obtained from the simple cluster design under the assumptions of a one way random analysis of variance model. Since both designs have used a value of 25 for also with

the ultimate set the

cluster

at a value assistance

size,

of 25. of the

the

value

of JDeff

The calculation computer program

R

SEX FATHERS OCC LIKE SCHL EXP EON MATHS

Average of SbefTffor means in simple cluster design

0.33 0.11 0.06 0.04 0.05

has been

calculated

with

of R and mwere carried out INTRA (Ross and Slee, 1975).

Schools JDeff -estimate 2.99 1.91 1.56 1.40 1.48

1.87

R

Classrooms JDeff estimate

0.31 0.24 0.10 0.27 0.57

2.91 2.60 1.84 2.73 3.83

2.78

ii

142

Evaluation

in Education

The values of R demonstrate that for all variables, except SEX, there is a greater homogeneity of students within classrooms than students within schools. This clustering effect is especially noticeable for the MATHS variable which suggests that there has been some form of classroom ability streaming carried out within the schools. The average values of JDeff for the simple cluster design provide very accurate approximations to the similar SCL and CLS designs. This accuracy shows that provided we can obtain accurate estimates of R then the formula Deff = l+(n-l)R will provide a reasonably accurate method for estimating design effects for a variety of clustered sample designs. In the case of a stratified cluster design, as used in the IEA Science Project (Comber and Keeves, 1973), we would expect the formula to provide conservative estimates of error because of the gain in accuracy due to stratification. However, as can be seen from the mf values for the WTD design, if weighting is used to adjust for a highly disproportionate allocation to strata then the formula may provide an approximate but not a conservative estimate of the magnitude of the design effect.

THE EMPIRICAL SAMPLING DISTRIBUTIONS OF CORRELATION COEFFICIENTS AND STANDARDISED REGRESSION COEFFICIENTS The results of these calculations are displayed in Table 4.5 and Table 4.6. The STR design once again shows better accuracy than the SRS design for both correlation coefficients and standardised regression coefficients. The SCL design shows a minor loss in accuracy, however, the CLS and WTD designs provide average m values which show that errors would be greatly underestimated by the use of the simple random sampling distribution.

TABLE 4.5

Empirical

Coefficient

estimates of JDeff for corwlation

SRS

STR

Design SCL

coefficienE

CLS

WTD

'SF

1.00

1.11

1.33

1.95

1.12

'SL

1.00

0.73

1.03

1.57

1.04

rSE

1.00

1.14

1.76

2.18

1.81

'SM

1.00

0.97

1.13

2.42

2.27

'FL

1.00

1.06

0.87

0.81

1.81

'FE

1.00

0.69

1.12

1.01

1.70

'FM

1.00

0.97

1.12

1.36

3.03

'LE

1.00

1.04

1.02

1.44

1.68

rLM

1.00

0.84

0.89

0.82

1.30

'EM

1.00

1.11

1.23

1.75

2.74

Average of JDeff for correlation coefficients

1.00

0.97

1.15

1.53

1.85

Sample Design for E~ucatj~nal Survey Research

Design XL

CLS

WTD

SRS

STR

1.00

1.01

1.31

2.14

1.48

1.00

1.01

0.86

0.93

1.86

1.00

0.92

1.45

1.30

1.77

1.00

0.65

1.04

1.04

1.56

1.00

0.91

1.03

1.39

1.51

1.00

0.92

1.05

1.66

2.27

1.00

1.10

1.14

1.52

1.87

1.00

0.83

1.05

1.06

1.28

1.00

1.37

1.51

2.19

1.96

Average of JDeff for standardised regression 1.00 coefficients

0.97

1.16

1.47

1.73

Coefficient

143

The magnitude of the underestimation of error may be examined by considering Table 4.7. This table is developed under the distribution assumptions which were stated prior to the presentation of a similar table for means in Table 4.2. TABLE 4.7

Design

SRS STR SCL US WTO

Probability of incorrect statements about comeZation coefficients and standardized regression coefficients when the size of the design effect is ignored

Probability of incorrect statement when a two sided confidence interval of p = .05 is aimed at r b

Avera e m *

1.00 0.97 1.15 1.53 1.85

1.00 0.97 1.16 1.47 1.73

0.05 0.04 0.09 0.20 0.28

0.05 0.04 0.09 0.18 0.26

One of the most striking features of Table 4.5 and Table 4.6 is the similarity for particular sample designs of the average JDeff values for correlation coefficients and standardised regression coefficients. The closeness of these values suggests that for a variety of complex sample designs the researcher might use the average JDeff for correlations as a reasonably accurate and conservative estimate of the average mf efficients.

for standardised regression co-

144

Evaluation

in Education

Although the probability of making incorrect statements about correlation coefficients and standardised regression coefficients is less than that for means, Table 4.7 shows that there is still an unacceptably high danger of making errors especially when samples based on clusters of classrooms, as in the CLS and WTD designs, are used.

THE

The

EMPIRICAL

empirical

SAMPLING

estimates

DISTRIBUTION

of the

design

OF MULTIPLE

effect

CORRELATION

for multiple

COEFFICIENTS

correlation

co-

The small average values of m are efficients are presented in Table 4.8. similar to the values for correlation coefficients and standardised regression However, the values of the average coefficients for the STR and SCL designs. JDeff for the CLS design and especially the account when using the previously described presented in Table 4.9.

FABLE

4.8

Empirical estimates of /Deff

Coefficient

SRS

STR

WTD design must be taken into two sided probability statements

fornuitiple

correlation coefficients

Design SCL

CLS

WTD

RL

1.00

1.22

1.38

1.61

RE

1.00

0.78

0.87

1.02

1.96

RM

1.00

0.92

1.03

1.31

2.56

Average of JDeff for multiple correlation coefficient5

1.00

0.97

1.09

1.31

2.14

TABLE

Design

SRS STR SCL CLS WTD

4.9

1.91

Probability of incorrect statements about multiple correlation coefficients when the size of the design effectis ignored

Average JzTt;f7

1.00 0.97 1.09 1.31 2.14

Probability of incorrect statements when a two sided confidence interval of p = .05 is aimed at 0.05 0.04 0.07 0.13 0.36

Sample Design for Educational

Survey Research

145

In Table 4.8 there seems to be no consistent relationship between the size of t/Deff and the number of variables required to calculate the value of the multiple correlation coefficient. This result suggests that the complexity of the model may bear no relationship to the sampling stability of the multiple correlation coefficients. However, further research into this question is essential because of the growing use of recursive path models which not only contain 'composite variables' which are themselves constructed with regression equations (for example the measure of the socio-educational level of the home used by Comber and Keeves, 1973).

SUMMARY AND IMPLICATIONS FOR THE CAUSAL MODEL By using the empirically established values of the standard errors for the standardised regression coefficients (the 'path coefficients') we may summarise the implications of choice of sample design on the evaluation of our causal model. The use of the type of 'path analysis' techniques described in Chapter 3 requires that a test of significance be carried out on the magnitude of the 'path coefficients' in the model. The test of significance usually examines the null hypothesis that a path coefficient is equal to zero for the population data by setting up 95 per cent confidence limits for the magnitude of the path calculated from sample data. The following table presents for each sample design the absolute magnitude of each of the path coefficients of the causal model which are required to reject the null hypothesis at the 95 per cent confidence level. TABLE 4.10

Path Coefficient

Mean

The absolute magnitude of the path coefficients required to reject the hypothesis'that a path coefficient is equal to zero for the population data

Absolute magnitude of path required to reject null hypothesis SRS

STR

SCL

CLS

WTD

0.14

0.14

0.18

0.29

0.15

0.15

0.13

0.14

0.28

0.12

0.11

0.17

0.15

0.21

0.20

0.16

0.10

0.17

0.17

0.26

0.14

0.13

0.14

0.20

0.21

0.13

0.12

0.13

0.21

0.28

0.14

0.15

0.16

0.22

0.26

0.16

0.14

0.17

0.17

0.21

0.11

0.15

0.16

0.23

0.21

0.14

0.13

0.16

0.20

0.24

146

Evaluation

in Education

From Table 4.10 we see that the use of sample designs which depart from the model of simple random sampling may greatly influence tests of significance The mean values of the absolute magnitudes of the paths for path coefficients. required to reject the null hypothesis demonstrate that the dangers associated with the underestimation of sampling errors through the misuse of computational formulae for simple random sampling are most evident for those sample designs which employ classrooms as the primary sampling unit.

5. The Estimation of Sampling

Errors

from Sample Data

RANDOM

SUBSAMPLE

ERROR

ESTIMATION

TECHNIQUES

Because of the unavailability of deductive methods to calculate the sampling errors of multivariate statistics from complex samples, researchers have turned These methods employ subsampling, splitting or towards empirical methods. 'independent replications' of the replication in order to generate multiple The estimates which are produced by these replicachosen analysis procedure. tions are then used to generate estimates of sampling error. The historical development of the use of random subsample techniques for the purpose of estimating sampling errors has been traced back to P.C. Mahalanobis' introduction in 1936 of interpenetrating samples for agricultural surveys in Acting on some suggestions from J.W. Tukey, Deming Bengal (Finifter, 1972). further developed the technique as the Tukey plan (Deming, 1960). This plan is executed prior to data collection by constructing a systematic sample of each one-tenth the size of n, based on ten size n by drawing ten subsamples, The data are collected as ten indepenindependent starts within the listing. dent samples which are drawn with ten repetiiions of the chosen sample design. The variance of a statistic, say the sample mean, may be estimated as: k 1(X+)* ;(:) where

and

= ;(k-1) K is the estimate xi is the

estimate

k

number

is the

for for

the the

subsamples

combined,

i'th subsample,

of subsamples

(ten

for

the Tukey

plan).

The variance estimated by this procedure refers to the variance of the mean of the subsample estimates and not to the estimate that might be prepared from the whole sample. However, in the special case when the estimates are linear in the individual observations the mean of the subsample estimates and the total sample estimate are equivalent. The use of Deming's method also permits the estimation of sampling bias (Finifter, 1972). The estimation of bias is carried out through a graphical approximation technique suggested by Tukey to Jones (1956) which is based on the assumption that the bias is inversely proportional to sample size. Deming's method presentJsevera1 difficulties for sample designs in educational research which are typically stratified, clustered, and restricted in size because of administrative and cost considerations. 147

148

Evaluation

in Education

First, for statistics more complicated than weighted averages it may not be meaningful to calculate results from such small amounts of data that may arise from one-tenth of the total number of observations (for example, in order to fit a regression line Kerlinger and Pedhazur, 1973, suggest at least 100 to 200 observations, and if large numbers of variables are being used then more than 200 would be required): Therefore if a substantial amount of data is needed in each subsample, the number of possible independent groups may be severely restricted. Second, many statistics based on small samples give biased estimates, typically the leading term in the bias is proportional tolwhere n is the sample size. Consequently the mean of results based on severa? small subsamples to be more biased than is a single result based on all the sample (Mosteller and Tukey, 1968).

is likely data

Third, to achieve sufficient stability a large number k of independent subsamples is needed. However, a large k sacrifices the intended computational simplicity and the full amount of stratification desired in many designs (Finifter, 1972). Within a stratum of a stratified cluster design there may be too few primary selections to allow the total sample to be divided into a large number of subsamples. Fourth, rare characteristics in the appearing in some of the subsamples (Deming, 1956).

total sample may have little chance of if a large number of subsamples are used

In order to surmount the problems associated with the use of Deming's approach for the type of complex sample designs which are commonly used in educational research, researchers have begun to show an increasing interest in several other subsample replication techniques: balanced repeated replication (McCarthy, 1966) and jackknifing (Mosteller and Tukey, 1968).

BALANCED

REPEATED

REPLICATION

This technique was developed by McCarthy (1966, 1969a, 1969b) to permit variance estimates to be made from sample designs which featured the maximum amount of stratification possible (two primary selections per stratum) and yet still permitted variance estimates to be computed from the sample data alone (Kish, 1965). The population is divided into h strata, the primary sampling units in each stratum are divided into two random halves of equal size. Then a primary sampling unit is selected from each half stratum. A half-sample replicate is formed by randomly choosing one of these primary sampling units for each stratum. The number of possible half samples which can be drawn from the sample data is 2h. Variance estimates are then computed from the squared difference between the total sample estimate and the half-sample replicate estimate. McCarthy's (1966) contribution to this technique was to develop a method choosing a subset of half-samples which contained all of the information was available in the total set of half-samples.

for which

Sample

Design for Educational

149

Survey Research

Kish and Frankel (1970) point out that this technique is suitable for generating sampling errors for a wide variety of multivariate statistics provided the variance of repeated paired replicates is a good estimate of the variance of the statistic based on the entire body of sample data. That is, if we are considering a statistic such as the weighted mean then provided:

E [U;

- Y,‘]

where

y is the estimate

and "3 then

= V(Y)

IS the estimate

v^(g) = 1 kj

is a good

k 1

estimate

based based

on the on the

total jth

data,

subsample,

(y;* - y)' J of V(p),

where

k is the

number

of half

sample

estimates.

The above conclusion was examined by Simmons and Baird (1968) who carried out an empirical investigation of the application of balanced repeated replication techniques for non-linear statistics such as regression and correlation coefficients. They concluded that provided the mean of the replicate statistics was closely equal to the corresponding statistic in the parent sample then nonlinearity would not greatly disturb the accuracy of the technique. A more detailed discussion of the technique is presented in Appendix A.

JACKKNIFING

The jackknife technique may be traced back to a method developed by Quenouille (1956) to reduce the bias of estimates. Estimates of parameters are made on the total sample data, and then, after dividing the data into groups, the calculations are made for each of the slightly reduced bodies of data which are obtained by omitting each subgroup in turn. Let yi be the estimate subgroup

and

let yall

based

on the

data

be the estimate

which

based

Tukey (Tukey, 1958; Mosteller and Tukey;1968) (i = 1, . . . k) based on the k complements.

y'! = ky 1

He also

defined

Y *=The

variance

;

all

remains

on the

after

total

defined

omitting

sample

the

ith

data.

k 'pseudovalues'

y?

- (k - 1) Yi

the

'jackknife

value'.

!~,t i ’ Sag of the jackknife

k(k-1)

value

may

be obtained

from

the

pseudovalues,

150

Evaluation

in Education

Tukey (1958) set forward the proposal that these pseudovalues could be treated as if they were approximately independent observations and that Student's t distribution could be applied (Mosteller and Tukey, 1968) to these estimates to construct approximate confidence intervals for y or y* (Brillinger, 1966). Later emoirical work bv Frankel (1971) has substantiallv validated these oroposals for both the jackknife technique (and also the balanced repeated replication technique) for a variety of regression-related statistics. The jackknife procedure has been applied to several large cross-national educational research studies (Peaker, 1967b; 1975) conducted.by the International Association for Educational Achievement. A more detailed discussion of the technique is presented in Appendix A.

APPLICATION OF THE TION TECHNIQUE AND

BALANCED REPEATED REPLICATHE JACKKNIFE TECHNIQUE

In the previous chapter the design effect was calculated for a variety of ._ statistics obtained from complex sample designs by the empirical generation of sampling distributions. The researcher is generally unable to apply this approach when working with social science data and therefore requires suitable In this techniques for estimating the design effect from a single sample. chapter the sampling error estimation techniques of jackknifing and balanced repeated replication were each applied to one sample design in order to examine the accuracv with which thev could be used to obtain measures of the desian The jackknife tech';lique was applied to one sample randomly selected effect. from the 25 independent replications of the CLS design. The balanced repeated technique was applied to one sample randomly selected from the 25 independent replications of the WTD design. Each technique was applied to a sample design which was the most appropriate for the particular statistical features of the technique. The balanced repeated replication technique is derived on the assumption of two primary selections from each stratum of a stratified population and therefore this technique was The jackknife technique is derived on most suited to the WTD sample design. the assumption that the sample may be split into a number of subgroups which identically follow the design of the original sample and therefore was most suited to the CLS design.

EXAMPLE

OF CALCULATIONS

FOR THE

BALANCED

REPEATED

REPLICATION

TECHNIQUE

The balanced repeated replication technique was applied to one (randomly selected) sample from the 25 available independent replications of the WTD As described in Chapter 2, this sample design consisted of three design. strata which were obtained by the preliminary stratification of the population according to the 'school system' variable. Within each stratum two 'pseudowere selected with probability proportional to size; within each classrooms' selected pseudoclass a simple random sample of 25 elements was selected. The

sample

chosen

for

this

example

may

be represented

diagrammatically:

Sample

Design for Educational

Classroom

Stratum

The

notation

Cij

refers

Cl1

Cl2

Catholic

c21

c22

Independent

c3*

c32

ultimate

cluster

151

number

Government

to the jth

Survey Research

of 25 elements

selected

From the discussion presented in Appendix A we must now from the ith stratum. form four sets of half-samples based on the method presented by Plackett and Burman (1946). The allocation of the selected ultimate clusters to the half-samples is presented in Table 5.1. This allocation follows the example provided in discussion of balanced repeated replication in Appendix A. TABLE 5.1 Allocation of ultimate clusters to half-samples for the WTD design

Half-sample

Stratum Government

Catholic

Independent

Cl1

c21

c31

2

Cl1

c22

c32

3

Cl2

CZ?

c3 1

4

Cl2

cz 1

c32

1

In this example calculation we will consider the sampling error of the sample mean of the MATHS variable - the criterion variable for the causal model. This variable is an indicator of the mathematical ability of the students in our survey population. Table 5.2 presents the results of the calculations. The notation follows the discussion presented in Appendix A. Each half-sample consists of three ultimate clusters of 25 elements as described in Table 5.1. From Table 5.2 we see that the balanced repeated replication estimate of the standard error of the sample mean for this particular WTD sample of 150 elements is 5.30. Since Kish (1965) has established that s2, the sample variance, computed from any large probability sample yields a good approximation to S2, the population variance, we may estimate the standard error of the sample mean for a simple random sample of elements (of the same size) as For this sample Ji%' our estimate of this term is 0.82 (since for this particular sample s = 10.10).

152

Evaluation

in Education

TABLE 5.2 Balanced repeated repZication calculations required for estimating the standard error of the sample macroobtained from one sanpte obtained by using the WTD design

Half-sample values 25.5190

rl* 3;

24.2820

jf

34.6421

7;

36.1714

Total sample value 30.1537

7 Difference squares (Y: - Y?

21.4804

G;

34.4769

- Y?

(7: - YJ2

20.1457

(Y,*- Y?

36.2127

Mean of the difference squares 28.0789

V(Y) Estimate of standard error

5.2990

V(Y)

We combine these two calculations into an estimate of JDeff for the mean of the indicator MATHS associated with the WTD sample design.

Similar calculations be carried out.

for the

In each case we employ

the

other

formulae

statistics

presented

used

in the

in Table

causal

sample

model

1.3 to estimate

may

also

the

denominator of the equation which defines JDeff - the standard error of the It may seem strange statistic .Inder the conditions of simple random sampling. to some researchers that we can obtain reasonable estimates of the value of the standard errors of statistics under conditions of simple random sampling from sample data gathered with complex probability sample designs provided the This becomes a little more intuitively sample size is sufficiently large. obvious when we remember that a simple random sample consists of one of all possible different combinations of n different elements out of N N (N-ni!n! such that Therefore

each combination a self-weighted

has the same probability of selection (Kish, 1965). (or properly weighted) complex probability sample

Sample Desgn

for Educational

Survey Research

153

from a large population represents one of the possible simple random (or a good model of one of the possible simple random samples) which selected from the same population (Kish, 1965).

samples may be

The

statistics

calculations

described

in Table

5.2 were

carried

out

for

all

the

required to describe the causal model. The average values of m for the means, correlations, standardised regression coefficients and multiple correlation coefficients are presented in Table 5.3 Beside each of the values of average Jm the values of average JDeff obtained from the empirical sampling techniques described in Chapter 4 are presented. In order to summarise the extent to which the balanced repeated replication technique was able to estimate the empirical value (taken as the 'correct'value), a list of percentage error was also prepared.

TABLE 5.3

Ealmced replication estimates of average JDeff

Statistic

Average m Balanced repeated replication estimate

Empiricaf estimate

5 10

4.12 1.66

2.89 1.85

9

1.63

1.73

5.8

3

1.20

2.14

43.9

Means Correlation coefficients Standardized regression coefficients Multiple correlation coefficients

The magnitude when The

the

of the

number

convergence

percentage

of estimates

error

used

of the estimate

The jackknife 25 available described in tion of six sample of 25

in average

to calculate

of average

lished value shows that care should obtained as individual estimates.

EXAMPLE

Percentage error

Nunber of estimates

OF CALCULATIONS

average

/l%%=

be taken

FOR

JDeffis

THE

Jnis

towards

in using

JACKKNIFE

shown

42.6 10.3

to be smaller increased.

the empirically

~'Deff values

which

estabare

TECHNIQUE

technique was applied to one (randomly selected) sample from the independent replications of the CLS design. The CLS design, as Chapter 2, consisted of a probability proportional to size selec'pseudoclassrooms' followed by the selection of a simple random elements from each selected pseudoclass.

From the discussion presented in Appendix A we form six reduced samples of five ultimate clusters such that each reduced sample follows the design of the original sample. Each reduced sample is obtained by leaving out one of the ultimate clusters. The allocation of the selected ultimate clusters to the reduced samples is presented in Table 5.4. The notation Cj refers to the jth ultimate cluster of 25 elements selected from the population. There are six ultimate clusters (C,, C,, C,, C,, C,, C,) for each replication of the CLS design.

154

Evaluation

in Education

TABLE 5.4

Allocation of ultimate clusters to redil~cd scnnples ;br the CLS design

Reduced samples

Ultimate clusters

1

Cl

L

c3

c,

2

C!

C,

c3

C,

3

Cl

CZ

C3

4

Cl

C?

5

C!

6

Total sample

Cl

C5 c, c5

c,

G

CL>

c,

CA

C4

C5

c,

C_,

C3

G

C5

c,

CZ

c3

C4

C5

Cf>

For this example of the application of the jackknife technique we will again Table consider the sampling error of the sample mean of the MATHS variable. 5.5 presents the results of these calculations. The notation follows the discussion presented in the Appendix. From Table 5.5 the jackknife estimate of the standard error of the mean for this particular CLS sample of 150 elements is 4.41. Also, as described oreviouslv, the standard error of sample mean for a simple random sample of (since for this particular sample elements (of the same size) is k. = 0.99 s = 12.14). By combining these two calculations we obtain the jackknife for the sample mean of the indicator MATHS associated with design:

estimate of 6% the CLS sample

The calculations described in Table 5.5 were carried out for all the statistics These calculations are summarised in required to describe the causal model. This table contains a list of percentage error for the jackknife Table 5.6. estimates

of average

JDeff.

The jackknife estimates of average JOeff followed the pattern for the balanced repeated replication technique by converging empirically was larger. By comparing

established

the

values

percentage

of average

errors

of the

mwhen

two

of the results towards the

the number

techniques

we

of estimates

see that

the jack-

knife technique consistently provided more accurate estimates of average mf While it is difficult to make meaningful comparisons of percentage error because each lechnique was applied to a different sample design, this increased accuracy of the jackknife estimates being based on six reduced samples from the CLS design while the balanced repeated replication estimates were based on only four half-samples.

Reduced sample values 28.592

Yl

27.440

Y2 Y, Y4 Y, Y6

24.992 31.096 28.144 30.256

Total sample value 28.420

Ya11 Pseudovalues *

27.56

y2* * Y;

33.32 45.56

YQ*

15.04

ys*

29.80

Y6*

19.24

Y1

Jackknife value 28.42

Y*

Variance of the jackknife value 5*

2

19.4576

Estimate of the standard error s*

4.4111

TABLE 5.6 Jackknife estimates

Statistic

Means Correlation coefficients Standardised regression coefficients Multiple correlation coefficients

of average mf

Number of estimates

Average JDeff Jackknife tmpiricaT estimate estimate

Percentage error

5 10

3.09 1.63

2.80 1.53

10.4 6.5

9

1.53

1.47

4.1

3

1.44

1.31

9.9

156

Evaluation in Education

SUMMARY

In this chapter the jackknife and balanced repeated replication techniques were each applied to different samples obtained from the WTD and CLS sample designs. The techniques provided useful estimates that a sufficiently large number of estimates were

of average m provided used to establish average

JDeff. The convergence of the estimates established values as the number

of average JDeff towards the empirically of estimates are increased emphasises the

possibility of the instability of individual obtained from the two techniques.

estimates

of JDeff

which

may

be

There appear to be no published results available concerning questions about the degree of stability of the jackknife and balanced repeated replication estimates of sampling errors obtained in educational survey research studies. A research study which is being carried out by the Australian 'Council for Educational Research aims to provide further information about these questions in the near future.

6. A Worked

Example

The choice of a suitable sample design for an educational survey research study is rarely a free choice. The researcher usually designs a sample which not only provides appropriate data for answering the research questions posed in the study, but is also within the financial and administrative resources available for the study. The 'best' sample design for a study is therefore the design which optimally satisfies the particular set of constraints which are placed on the study. In this chapter we will consider the pattern of reasoning arrive at a sample design which satisfies the constraints educational survey research study.

THE

HYPOTHETICAL

Let us assume that the Australian study which seeks to establish:

government

which is required of a hypothetical

STUDY

has

commissioned

an evaluation

(i) the proportion of Australian 14 year-old students who can master the items on a criterion referenced test of basic mathematical skills; (ii) a list of schools (to be used in later case studies) which contain either an unusually high proportion of students who have mastered 80 per cent of the items on the test, or an unusually low proportion of students who have mastered 80 per cent of the items on the test.

CONSTRAINTS

The following list presents choice of sample design for

ON THE

(ii) Desired population - the year-old students in Australian

the selected on the test.

schools

who

STUDY

the major constraints which the hypothetical study.

(i) Financial - the government possible costs for the study.

(iii Statistics required Popu 1 ation who can master

HYPOTHETICAL

has

provided

$100,000

desired population secondary schools

will

influence

to cover

consists in 1976.

our

all

of all

14

- estimates of the proportion of the desired the items on the test - estimates of the proportion of students in have mastered at least 80 per cent of the items

157

to

158

Evaluation

(iv)

in Education

Units of analysis - students and schools.

(v) Error requirements - the standard error of the estimates of the proportions of students in the desired population who can master the items on the test should not exceed 0.01. That is, in estimating the percentage of students who can master the items in the desired population we require 95 per cent confidence limits of +2% surrounding the sample estimates - the estimates of the proportion of students in the selected schools who have mastered at least 80 per cent of the items on the test should.be sufficiently stable to compile a meaningful ordered list of schools on the basis of these proportions. (vi) Domains of the study - there are no specific domains {that is, there are no parts of the desired population for which separate estimates are to be planned for in the sample design). (vii) Sampling frame availability - a current sampling frame is available which lists the number of students in the desired population associated with each school in Australia.

THE SAMPLE DESIGN The financial resources of the study place the first restriction on our sample design. In previous studies of this type it has cost around f10 for each student tested (Bourke and Keeves, 1977). Therefore our government grant of $100,000 places an upper limit of 10,000 students who can be tested for this study. We must now consider whether we can meet the error requirements for the study given this constraint on the maximum sample size. If we were to select a simple random sample of n students from the population in order to estimate the proportion p who can master an item on the test, then the standard error of this estimate could be estimated by the following formula (Kish, 1965:46):

se(p) =

rlE&

Note that because of the sufficiently large value of the population size we have ignored the finite population correction. of our stated error requi~ments is that the standard error of the estimated proportions of the population who can master the items on the test should not exceed 0.01. Therefore for a simple random sample design we would require that: One

O_Ol>,

P(l-P) 4 -ii--srs

The maximum value of p(l-p) occurs for p=O.S, and thus to ensure that we could satisfy the error requirements for all items we would require that:

Sample

Design for Educational

159

Survey Research

c0.25

O.Ol>,

nsrs

or

n srs

> 2500

This sample size is well within our financial resources, however, the a simple random sample would not allow us to satisfy the second error ment of selecting sufficient students per school to obtain reasonably estimates for the compilation of an ordered list of schools.

use of requirestable

In order to satisfy the second error requirement we will need to construct a two stage sample design by selecting schools first and then sampling students within schools. Therefore we must decide how many schools and how many students within schools must be selected to provide estimates of proportions which are at least as accurate as a simple random sample of 2500 students. From

Appendix

A we

know

that

for

a complex

sample

design

of nc elements

the

to "c , where n* is the size of the simple equivalent F Also from Table 4.4 we know that for a two stage cluster sample which sample. used schools as the primary sampling unit the design effect for a sample mean is approximately equal to l+(ii-l)R, where n is the ultimate cluster size and R is the coefficient of intraclass correlation. For our sample design we will therefore have:

design

effect

“c = 1

+ (fi

is equal

- 1)R

n*

or nc = Ti m = n* [I t (ii - l)R]

where

m = the

number

of schools

The above formulae, which have applied to proportions because sample mean in which the cases 1 for a correct response to an 1965). Table

6.1

summarises

the

values

in the

complex

sample.

been derived for sample means, may also be sample proportions are a special case of the in the same can be assigned two possible va?ues: item and 0 for an incorrect response (Kish,

of nc and m, for

which would be required to obtain sample of size 2500 when p = 0.5.

several

values

of R and

n,

the same standard error as a simple random These values of nc estimate the minimum

sample size for two stage sampling which is required to satisfy the first error constraint. Note that the values of m have been rounded to integer values. From Table 6.1 we can see that the size of the complex sample that is equivalent to a simple

the value of R has a very strong influence on design which is required to have a precision random sample of size 2500. However, our main

160

Evaluation in Education

dilemma, which is common to all survey research which employs cluster sample designs, is to know the magnitude of R before we collect our data. Generally we resolve our problem by examining values of R from previous surveys which have examined similar variables with respect to similar populations by using similar sample designs.

ii (Number of students per school)

m (No. of schools) 475 363 325 306 295 207

R =O.l "c

(Complex sample size} 4750 7250 9750 12250 14750 17250

illD% (No. of schools) 700 600 567 550 540 533

R 10.2

n

(Comp:;::fample

7000 12000 17000 22000 27000 32000

Table 4.4 provides an estimate of 0.05 for R for the sample mean of a mathematics achievement test using schools as the clustering unit. Unfortunately the population from which this estimate was obtained is more limited in coverage and younger than the desired population required for this study. Also the R value in this table is associated with total scores on a mathematics achievement test rather than scores on individual items for a test of basic mathematical skills. Let us assume that the estimate of R in Table 4.4 is the only one which is available because of the absence of previous similar studies. In this case we could estimate a value of R equal to 0.1 for proportions calculated from our design (remembering that a proportion calculated for students is the sample mean of a variable which takes only two values: zero or one) which is a similar but slightly more conservative value so that we can be on the 'safe' side with the precision of our final estimates. If we estimate that R = 0.1 then, remembering that the upper limit for our sample size is 10,000, we see from Table 6.1 that a suitable sample design would consist of 325 schools with 30 students selected from each school giving a total sample size of 9750 students. A more precise sample design which is within our financial resources and that would provide a complex sample size of 9990exactly would be to select 333 schools with 30 students per school. From our estimate of R = 0.1 we have therefore deduced that in order to satisfy the first error requirement we could use a two stage cluster sample design which selects 333 schools at the first stage and then selects 30 students per school at the second stage. The second error requirement requires that we have sufficient students per school in order to establish reasonably stable estimates of the mean school performance. This second requirement raises the question of whether 30 students per school will provide sufficiently stable estimates for schools. The answer to this question is that we have no choice but to accept this figure if we wish to remain within the limits set by the

Sample

Design for EducatIonal

Survey

Research

161

financial constraint of a total sample size yhich does not exceed 10,000 students. The above discussion highlights the essential nature of sample design for educational survey research: the chosen sample design is often not the best possible design required to answer all the key questions posed in the study, rather it is usually the sample design which allows the researcher to answer as many of the key questions as possible given the finite set of resources which are available for the study.

THE EXECUTION OF THE SAMPLE DESIGN The arguments presented in the previous section were based on a functional relationship between the design effect and the value of the coefficient of intraclass correlation for the simple cluster design. In most survey research studies we are able to improve the accuracy of our sample design by using stratification. The use of wise stratification will therefore ensure that our prior estimates of precision, based on the simple cluster design which has no stratification, can be achieved. The information available for our sampling frame consists of a list of the number of students in the desired population associated with each school in Australia (see constraint (vii)). Previous studies have shown that there is considerable variation in achievement in mathematics achievement between Australian states and territories, and also between types of school within these states and territories. We thus have two possible candidates for our stratification variables: 'State/Territory' (which consists of the categories of New South Wales, Victoria, Queensland, South Australia, Western Australia, Tasmania, Australian Capital Territory, Northern Territory) and 'Type of school' (which consists of the categories of Government, Catholic, Independent). Let us assume that our list of schools provides only the name and address of the schools containing the desired population - then we would be unable to reliably discern which schools are Government, Catholic or Independent schools. Therefore despite the potential usefulness of this variable we are constrained to use only 'State/Territory' as a stratification variable because of a lack of information about 'Type of school' in our list of schools. We therefore now proceed to divide our list of schools according to State/ Territory. The resulting sample frame contains the eight strata described in Table 6.2. This table also describes the designed sample which is selected according to the procedures outlined in the following discussion. Since there are no separate domains of study we may use the self-weighting two stage cluster sample design (described in Chapter 2) in order to select our sample. This design requires that the schools be selected with probability proportional to their size with respect to the number of students in the survey population, followed by a simple random sample of students within the selected schools. The usual technique for selecting a probability proportional to size selection of schools from a sample frame is to use a 'lottery' method described by Rosier and Williams (1973). Each school is allocated a number of 'tickets'

162

Evaluation

in Education

equal to the number of students in the survey population in the school. 333 schools are required for our sample, it will be necessary to choose 'winning tickets'. TABLE 6.2 Swnmary of the survey population and the designed sampie

Survey population Schools Students

State/Territory

Since 333

Designed sample Schools Students

New South Wales Victoria Queensland South Australia Western Australia Tasmania Australian Capital Territory Northern Territory

594 580 286 182 184 91

84894 66550 38106 24152 20842 8290

115 89 51 :; 11

3450 2670 1530 990 840 330

::

3309 1275

5 1

150 30

Total

1950

247418

333

9990

The ratio of the number of tickets to the number of winning tickets is 247418/ Therefore approximately every 743rd ticket is a winning ticket 333 or 743. which represents a school selected into the sample. by using a random start-constant interval These winning tickets are selected from 1 to 743 is selected from a procedure. A random number in the interval _.. table of random numbers and a list of 333 numbers is created by adding successive increments of 743. This list of numbers is used to select the sample schools by comparing these winning ticket numbers with a cumulated tally over schools of the numbers of students in the survey populat ion. Consider the following example based on the first few en tries in the cumulative Assume that a random start of 100 is selected from the table of tally table. random numbers. The winning tickets would be 100, 843, 1586, 2329 . . . etc. From Table 6.3 we see that the first two selected schools, winning tickets 100 and 843, are School B and School G.

corresponding

TABLE 6.3 Hypothetical cumulative tally table for the survey popia~ion

School A El C 0 E G etc.

Population size

Cumulated tally

50 200 50 300 150 50 250 etc.

2:oo 300 600 750 800 1050 etc.

Ticket numbers l- 50 51-250* 251-300 301-600 601-750 751-800 801-1050* etc.

to

Sample

Design for Educational

Survey Research

163

The schools in the cumulative tally table are grouped according to their State/ Territory which provides an implicit stratification for the selection process. That is, while there is not a strictly independent selection of schools from each stratum, the random start-constant interval method of selection ensures a reasonably accurate proportional distribution of schools and students across the implicit strata.

WEIGHTING The probability of selecting a student in a given school from the survey population is: Probability of selecting a student

=

30 333 ' 247418

=

-30 743

That is, each student in the survey population has an equal chance of selection and therefore for between-student analyses our sample is a self-weighting sample. Although our sample design is self-weighting for between-student analyses, this is certainly not the case for between-school analyses. The probability of selecting a given school from the survey population is: Probability of selecting a school =

School size 743

That is, the probability of selecting a school is directly proportional to the number of students in the school who are in the survey population. The above calculations show that we may conduct unweighted analyses in order to examine the first set of key questions associated with the proportion of students who are able to master the test items. However, since the schools in the sample did not have an equal probability of selection, we must emphasise the need for care in interpretation when reporting the ordered list of schools required for the second set of key questions. That is, in some circumstances, in order to generalise from the characteristics of the sample schools to the

164

Evaluation

in Education

schools in the survey compensatory weights.

population

it may

be necessary

to calculate

suitable

The problem of weighting data for between-schools analyses requires a firm If ‘we weight each school in our sample design definition of the word 'school'. with a weight which is inversely proportional to the 'school' size, then we are able to represent the survey population of 1950 schools with our sample. However this weighting scheme often leads to problems of interpretation of results when there are, for example, a greater number of small schools compared In this situation we may create to large schools in our survey population. confusion in the minds of the readers of our research report because statements about 'half of the schools' in the study may only concern a very small percentage of students in the survey population. An alternative weighting strategy has been put forward by Peaker (1973) as being suitable for sample designs which select schools with probability proporPeaker suggests that unit weights should be applied to the tional to size. selected sample schools because probability proportional to size sampling followed by the selection of equal sized ultimate clusters amounts to sampling not the pieces being of equal size, so that the 'schools' but 'pieces of schools', be taken as appropriate weight for each piece is the same, and may therefore unity.

THE CALCULATION

OF SAMPLING

ERRORS

Having designed our sample around the particular set of constraints given for the study, we should be prepared to provide sampling error estimates for For the estimates of proportions based on statistics provided for the study. the total sample of data there are suitable formulae available (see Yamane, However, for other estimates of proportions which may be required for 1967). certain subclasses of the data, for example the subclasses males or females, the empirical techniques described in the previous chapter would be appropriate.

SUMMARY

In this chapter we have examined the pattern of reasoning and techniques required for the preparation of a sample design which is suitable for a hypoA sample design was prepared within thetical national evaluation study. limits set down by a set of hypothetical financial and technical constraints. It is important to remember that this hypothetical study was a relatively Typically, national evalsimple study with respect to the given constraints. uation studies involve disproportionate sampling of strata and more sophisticated multivariate analysis techniques. These extended constraints on sample design require consideration of many ideas which were discussed in earlier chapters of this report but which were omitted from this chapter in order to maintain simplicity in the presentation of the essential patterns of reasoning required for sample design in educational survey research.

7. Conclusion

In this report Student's (1908) empirical sampling approach has been used to assess the magnitude of the sampling errors of statistics used to describe a recursive causal model based on data gathered with four complex sample designs The influence of the complex which are commonly used in educational research. sample designs on sampling errors was shown to offer strong support for the argument presented by Kish (1957) that in the social sciences the use of simple random sample formulae on data from complex samples is the most frequent source of gross mistakes in the construction of confidence intervals and tests of hypotheses. When applied to single samples of data gathered for educational survey research the jackknife and the balanced half-sample error estimation techniques have been shown to be useful methods for calculating the average design effect standardised regression cocoefficients, (Kish, 1965) for means, correlation efficients and multiple correlation coefficients when the deductive theory required for calculating sampling errors is not available. The accuracy of these estimating techniques for individual statistics was not high, although there was a noticeable convergence of the average design effect towards an empirically established value when the number of estimates was increased. These results suggest that a rounded average design effect should be used to approximately adjust sampling errors rather than using an individual estimate of the design effect which may be wildly inaccurate. The influence of the complex sample designs used in this report on the sampling errors of standardised regression coefficients shows that a great deal of care must be taken in designing samples for recursive causal models. This influence was shown to be equally strong for correlation coefficients. Therefore the warnings expressed for the particular model in this report may be generalised to a great deal of current educational research because correlation matrices are the cornerstone'of many modern multivariate analysis techniques. Although this report was primarily concerned with data generated for survey research, the implications of the findings are equally important for experimental research. Observations which are gathered in clusters may lead to nonindependence of observations which in turn leads to confusion with respect to decisions about both the available desgrees of freedom in variance estimates and the choice of the unit of analysis (Guilford and Fruchter, 1973). These problems may often be circumvented in experimental designs by adjusting the Alternaanalysis for the nesting and clustering effects which are present. tively it is sometimes possible to ensure that the subjects in the study are not only randomly assigned to treatments but also that treatments are administered independently to each subject. While this study has developed estimates of the design commonly used statistics, it must be kept in mind that 165

effect for a variety of the value of the design

lffi

Evaluation

in Education

effect depends upon the variables being used in the study and also upon the clustering effects which occur among the elements of the sampling frame. Further research into the value of the design effect in educational settings for different variables, different populations and different complex sample designs is urgently needed to assist the planning of samples for future educational research. This report has demonstrated that the evaluation of the sampling stability of survey research findings in educational research is both necessary and possible. Hopefully, future educational research workers who use complex sample designs will make use of the procedures which have been discussed. in this study in order to present their findings in association with the appropriate estimates of sampling stability.

8. Summary

Educational survey research is often conducted with data gathered by employing sampling procedures which depart from the model of simple random sampling in which sample elements are selected individually, independently and with equal probability from the population under study. These sampling procedures usually incorporate such complexities as stratification, the selection of sample elements in clusters, and the use of multiple stages of selection. Unfortunately, either the computational formulae required to estimate the sampling errors of many statistics derived from these complex sampling procedures are enormously complicated or they prove resistant to mathematical analysis. This monograph examines the influence of these complexities on the sampling errors of statistics which are required to describe causal models based on systems of structural equations. The empirical sampling error estimation techniques of Balanced Repeated Replication and Jackknifing are applied to some educational survey research data in order to demonstrate their capacity to estimate the sampling errors of these statistics when suitable computational formulae are not readily available.

Acknowledgements

The author wishes to express his gratitude to Dr. John P. Keeves, the Director of the Australian Council for Educational Research, for his guidance and encouragement throughout the preparation of this report. It was through his earlier research work that I first became aware of the problems associated with the use of complex sample designs in educational research. I would also like to thank Professor S.S. Dunn who, as chairman of the Educational Research and Development Conniittee (ERDC), provided an opportunity for me to undertake an ERDC Visiting Fellowship programme with Professor Leslie Kish and Dr. Gerald Bachman at the Institute for Social Research (ISR), University of Michigan. This opportunity to meet and work with the survey research specialists of the ISR enabled me to clarify many of the issues which are discussed in the following pages. Several people have assisted with the preparation of this monograph by providing helpful suggestions and comments. In particular I would like to thank Dr. A.W. Davis, CSIRO Division of Mathematical Statistics, and Dr. 14-J. Rosier, ACER Survey Section. KR

168

Bourke, S.F. & Keeves, J.P. (Eds.) Australian Studies in School Performance, Vol. III: The Mastery of Literacy and Numeracy. Hawthorn, Australia: Australian Council for Educational Research, Canberra, Australia: Australian Government Printing Office, 1977. Brillinger, D.R., "The application of the jackknife to the analysis of sample surveys", Commentary, 8, pp. 74-80, 1966. "An occuaptional classification of the Broom, L., Jones, F.L. & Zubrzycki, J., Australian workforce", Supplement to Australian and New Zealand Journal of Sociology, 1, (2), pp. 1-16, 1965. F.L. & Zubrzycki, J., "Social stratification in Australia", (Ed.) Social Stratification: Sociological Studies I, Cambridge: Cambridge University Press, 1968.

Broom, L., Jones,

In Jackson,

J.A.

Commonwealth Bureau of Census and Statistics (CBCS), Australian Capital Territory Statistical Summary 1970, Canberra: The Bureau, 1970a. Commonwealth Bureau of Census and Statistics (CBCS), Schools 1969. Canberra: The Bureau, 1970b. Cochran, W.G., Sampling Techniques, 2nd Edition, New York: Wiley, 1963. Comber, L.C. & Keeves, J-P., Science Education in Nineteen Countries., Stockholm: Almqvist and Wiksell/New York: Wiley, 1973. Deming, W.E., "On simplifications of sampling design through replication with equal probabilities and without stages", Journal of the American Statistical Association, 51, pp. 24-43, 1956. Deming, W.E., Sample Design in Business Research, New York: Wiley, 1960. Ferguson, C.A., Statistical Analysis in Psychology and Education, 3rd Edition, New York: McGraw-Hill, 1971. Finifter, B.M., "The generation of confidence: Evaluati ng research findings by random subsample replication", In Costner H.L. (Ed.) Sociological Methodology, 1972, San Francisco: Jossey-Bass, 1972. Fisher, R.A., "On the mathematical foundations of theoretical Statistics", Philosophical Transactions of the Roy-, Series A, 222, 309-368, lY22.

169

170

Evaluation

Frankel, Arbor,

in Education

From Survey Samples: An Emprical Investigation, M.R., Inference Michigan: Institute for Social Research, University of Michigan,

Gray, H.L. & Schucany, Marcel Dekkar, 1972.

W.R.,

The

Generalised

Jackknife

Statistic,

Ann 1971.

New York:

Guilford, J.P. & Fruchter, B., Fundamental Statistics in Psychology Education, 5th Edition, New York: McGraw-Hill, 1973.

and

Gupta, H-C., "Intraclass correlation in educational research: an exploratory study into sane of the possible uses of the technique of intraclass correlation in educational research". Unpublished PhD dissertation, University of Chicago, 1955. orthogonal replications for estimating variances, with Gurney, M., "McCarthy's grouped strata", United States Bureau of the Census, Technical Notes - No.3, Washington: United States Bureau of the Census, 1970. Haggard, Dryden

E.A., Intraclass Press, 1958.

Correlation

and

the Analysis

of Variance,

Hansen, M.H., Hurwitz, W.N. & Madow, W.G., Sample Survey Methods Vol.1: Methods and Applications, New York: Wiley, 1953.

New York:

and Theory,

Harris, J.A., "On the calculation of intraclass and interclass coefficients of correlation from class moments when the number of possible combinations is large", Biometrika, 9, pp. 446-472, 1913. Hays, W.L., 1963.

Statistics

for

Psychologists,

New York:

Holt,

Rinehart

& Winston,

Jones, H-G., "Investigating the properties of a sample mean by employing random means", Journal of the American Statistical Association, 51, pp. 54-83, 1956. Keeves, J.P., Educational Environment Australian Council for Educational Wiksell, 1972.

and Student Achievement, Research, also Stockholm:

Kerlinger, Rinehart

F.N., Multiple Regression & Winston, 1973.

in Behavioral

Kish, L., Review,

"Confidence intervals for 22, pp. 154-165, 1957.

clustered

Kish, L., "Some statistical Review, 24, pp. 328-338,

problems 1959.

Kish,

New York:Wiley,

L., Survey

Sampling,

Research,

samples",

in research

design",

Melbourne: Almqvist &

New York:

American

American

Holt,

Sociological

Sociological

1965.

for subclasses comparisons, and analytical Kish, L., "Design and estimation In Johnson, N.L. & Smith, H. (Eds.) New Developments in Survey statistics", Sampling, New York: Wiley, 1969.

Sample

Kish, L. & Frankel, M.R., Journal of the American

Design ior Educatronal

Survey Research

"Balanced repeated replications for standard errors", Statistical Association, 65, pp. 1071-1094, 1970.

Kish, L. & Frankel, M.R., Inference from complex samples. Mimeographed issued by the Survey Research Center, University of Michigan, 1973. Marks, E.S., "Sampling in the revision of the Stanford-Binet gical Bulletin, 44, pp. 413-434, 1947. McCarthy, P.J., "Replication: An approach to the analysis National Center for Health Statistics, Series surveys", McCarthy, P.J., "Pseudo-replication: balanced half-sample technique", 2, No. 31, 1969a.

Further evaluation National Center for

McCarthy, P.J., _I’Pseudo-replication: Half .Statistical Institute, 37, pp. 239-264, Q., The

McNemar, 1942.

Revision

of the

Moser, C.A. & Kalton, G., Survey London: Heinemann, 1971. Mosteller, Lindzey, Edition,

samples", 1969b.

Stanford-Binet

Methods

F. & Tukey, J.W., "Data G. & Aronson, E. (Eds.) Reading, Massachusetts:

Peaker, G.F., "A sampling of the Royal Statistical

171

and application of the Health Statistics, Series

of the

Boston:

International

Houghton

Investigation,

Mifflin,

2nd Edition,

analysis i..cluding statistics", In The Handbook of Social Psychology, Addison-Wesley, 1968.

design used by the Ministry Society, 116, pp. 140-165,

Psycholo-

of data from complex 2, No. 14, 1966.

Review

Scale,

in Social

Scale",

paper

of Education", 1953.

2nd

Journal

and analysis of survey evidence, IEA/B/9. MimeoPeaker, G.F., The collection graphed paper issued by the International Association for the Evaluation of Educational Achievement, Stockholm, 1967a. Peaker, G.F., "Sampling", In Husen, in Mathematics, Vol. 1, Stockholm:

T. (Ed.) Almqvist

International & Wiksell/New

Study of Achievement York: Wiley, 1969b.

Peaker, G.F., The presentation and analysis of the IEA evidence, Mimeographed paper issued by the International Association for uf Educational Achievement, Stockholm, 1968. Peaker,

G.F.,

Personal

communication

to J.P.

Keeves,

IEA/B/57. the Evaluation

IEATTR/123.

Study of Education in Twenty-One Peaker, G.F., An Empirical Technical Report, Stockholm: Almqvist & Wiksell/New York:

Countries: A Wiley, 1975.

contributions to the theory-of evolution", Pearson, K. et al, "Mathematical Philosophical Transactions of the Royal Society, Series A, 197, pp. 285379, 1901. Plackett, R.L. & Burman, P.J., "The design 33, pp. 305-325. ments, Biometrika,

of optimal

multifactorial

experi-

172

Evaluation

in Education

Quenouille, M.J., "Notes on bias in estimation", Biometrika, 43, pp. 353-360, 1956. Rosier, M.J. & Williams, W.H., The Samplinq and Administration of the IEA Science Project in Australia 1970: A Technical Report, IEA (Australia) Reoort 1973:8. Hawthorn, Australia: Australian Council for Educational Research, 1973. Ross, K.N. & Skee, C.W., A computer program for estimating the coefficient of intraclass correlation. Mimeographed paper, Hawthorn: Australian.Council for Educational Research, 1975. Simnons, W.R. & Baird, J.T., "Pseudo-replication in the NCHS health examination survey', Proceedings of the American Statistical Association, Social Statistics Section, 1968. Student (W.S. Gosset). "The probable error of a mean", Biometrika, 6. pp. l-25, 1908. Terman, L.M. & Merrill, M.A., Measuring Intelligence, Boston: Houghton Mifflin, 1937. Tukey, J.W., "Bias and confidence in not-quite large samples: Abstract, Annals of Mathematical Statistics, 29, p. 614, 1958. Walsh, J.E., "Concerning the effect of intraclass correlation on certain significance tests, -Annals of Mathematical Statistics, 18, pp. 88-96, 1947. Weatherburn, C.E., A First Course in Mathematical Statistics, Cambridge: Cambridge University Press, 1946. Yamane, T., Elementary Sampling Theory, Englewood Cliffs, New Jersey: PrenticeHall, 1967: Yamane, T., Statistics: An Introductory Analysis, 3rd Edition, New York: Harper & Row, 1973.

Appendix

: Some Theoretical

Considerations

This appendix contains a theoretical discussion of the sampling concepts which The discussion is only concerned with the mathematiare used in this report. cal theory underlying the calculation of the sampling errors of sample means because current theory has not been adequately extended to permit the estimation of the errors of sampling for multivariate statistics which are calculated from data obtained with complex sample designs (Kish and Frankel, 1973). Each topic presented is a sumnary of more complex theoretical statements which have been developed with a variety of notations and with a variety of intended applications by the authors mentioned in the main text of this report. The topics examined have been divided into sections are concerned with the properties random sample design and the simple cluster presented to show that the introduction of sample designs may be associated with both racy.

three main sections. The first two of the proportional stratified Theoretical arguments are design. these complexities into random gains and losses in sampling accu-

The third section provides a theoretical background to the random subsample estimation techniques which are used in this report. This section also includes a proof of the efficiency of McCarthy's (1966) balanced orthogonal matrix when it is used to estimate the sampling errors of weighted means which are obtained from appropriately constructed sample designs.

PROPORTIONAL

STRATIFIED

SAMPLING

Comparison of Simple Random Sampling with Proportional Stratified Random

Simple The

random

variance

drawn

of the sample

without

V(R,,,)

sampling

replacement

=

mean from

Rsrs

for

a simple

a population

N+ f$

random

of size

N may

sample

of n elements

be written

as:

(A-1) (A-2)

173

174

Evaluation

in Education

If the population is divided into L strata, each stratum containing Nh (h=i,.. L) elements, we may write (A-2) as:

(N-t)S2

$(xhi-X)z 1

= i

L

Nh = i i [(xhi-xh)+(xh-x)]2 l

. *.

(N-l)S2 = i p(Xhi-ih)z+

2; tNh(Xhi-fh)(xh-B) hi

Lyp. _ + i ‘; uh-x)2

(A-3)

Consider the second expression on the right hand side of (A.3)

But

Nh Nh 1yh{ fXhisXh) = \ 'hi-\ 'h

Nh = \ 'himNhXh

% ‘h = 1’ x .-F Xhi i hli

=o

Thus the second expression is also equal to zero.

(N-l)+

= i

Equation (A-3) now becomes:

$h(Xhi-?h)2+ i @?h-i?)z

L = i(Nh-l)S; + k N,(;,-;;i)'

Sample

where

= the

Now,

when

variance

for

and

Nh-’ + Nh

N>>l

and

Nh elements

within

the

hth

stratum.

we obtain:

we have:

Nh -+ Nh Fl K

r

V(Fsrs)

Stratified Consider

(A-4)

random

sampling

a stratified

pendently

from

population

h

The mean

h

of the

jTn=

the

total

x&

random

the hth Let

replacement. The

of the

S2 in (A.l)

Nh>>l

N-l

Then,

sample

stratum population

is the

sum

in which

of Nh elements of N elements of the

stratum

=#x hi

hth

hi

stratum

samples

of size

by simple be divided

nh are

random into

drawn

sampling L strata.

totals.

(A.5)

is:

'h

(A.6)

N

h The

population

mean L

"=$=hF

175

Survey Research

Nh i (Xhi-Xh)2

Si = e

Substituting

Design for Educational

lx,

is obtained

from

(A.5):

(A.7)

inde-

without

176

Evaluation in Education

Now from (A.6) substitute for Xh in (A.7):

An estimate of the population mean based on the sample design described above becomes:

X

kh'h

(A.81

st = - N

This estimate is unbiased because we know from simple random sampling theory that each of the iih are unbiased estimates of the stratum means. Also the variance of Zst may be written as (Yamane, 1967:175):

_

(A.91

I

where

Nh (Xhi_si;l)2 St = V(Xh) = I.1 Nh'l 1

If we consider the special case of proportional stratified random sampling then we impose the restriction that the number of elements drawn from a stratum be proportional to the size of that stratum. That is,

"h _ n - -x Nh

(h=l,

. ..>

L)

fA.10) (h=1, .... L) Substitute from (A.lO) into (A.8) to obtain an expression for Z

"h

prop'

Sample Design for Educational

Survey Research

177

n That

.

-

IS, xprop

is equivalent

to the

sample

mean.

Therefore the sample mean, as in the simple random sampling, is an unbiased estimate of the population mean. For this reason we call proportionate stratified random sampling a self-weighting design. To obtain

the

we substitute

variance from

(A.lO)

I_ V(X

. ..

prop)

V(R prop)

= c h

=

of Est

under

into

restriction

of proportionate

Nh

N 2 N---n

S2

Nh

Nh n

$

(?_p i p

sampling

(A.9).

hhNN N

the

(A.ll)

We may now compare the efficiency (of the sample mean as an estimate of the population mean) for the two sample designs by substituting from equation (a.11) into equation (A.4):

Vsrs)

V(Z prop)

or

These

=

wprop

= V(Xsrs)

relationships

depends

on the magnitudes

can be made strata.

Design

show

by ensuring

Effect

Consider are drawn

for

(A.12)

=

~fi>(%$'

that

independently

gain

in accuracy

of the differences that

Proportional

a population

the

of N elements from

between

the stratification

Stratified

the hth

stratum

2,

into

prop

1 over VPSrs)

and F.

provides

Random

divided

of V(X

That

is, gains

homogeneity

within

Sampling

L strata.

of Nh elements

Samples such

of size

that

nh

a sample

178 of

Evaluation

in Education

"h n elements is drawn according to the restriction n = N._.__ Nh

(h=l, . ... L).

Since this design is a self-weighting design, the sample mean K is an unprop biased estimate of the population mean. Also consider a simple random sample of n elements drawn without replacement from the sample population of N elements. It has been shown that the variance of the sample mean for the proportionate stratified random sample design v(nprop) and the variance of the sample mean for the simple random sample without replacement design V(Zsrs) are related by the following equation:

V(Isrs) where and

= Vfxprop)

+

!$ t

h

” (%I$?

(A-14)

y = the mean of the hth stratum _h X = the population mean.

The second term on the right hand side-of (A.14) is equal to zero when all the stratum means are equal, that is when Xh = f for all values of h. Otherwise the term will always be greater than zero. By using this info~ation we may establish the values of Deff for proportionate stratified random sampling. When

ah = x" for all values of h

and when 3, # x for any values of h

Deff = 1 Deff < 1

The design effect will always be close to one if the variable which is used for stratification is unrelated to the criterion being considered because the strata will consist of the pieces of a randomly divided population.

SIMPLE CLUSTER SAMPLING

The Coefficient of--- IntracZass

Correlation

Standard statistical theory has mostly been developed with the assumption that the sample observations were obtained through independent random selection. However, most research in the social sciences has been carried out by using complex sample designs. The main feature of complex sample designs are clustering, stratification, unequal probabilities of selection and systematic sampling. Kish (1957) examined the consequences of applyjng the usual textbook formulae for calculating confidence limits to data obtained by employing complex sampling designs. He concluded that:

the social sciences the use of s.r.s. (simple random sample) formulas on data from complex samples is now the most frequent source of gross mistakes inthe construction of confidence statements and tests of hypotheses" (Kish, 1957:156). “In

Sample Design for Educational

Survey Research

179

The feature of complex sample designs which is responsible for these mistakes has usually been clustering - the selection of observational units in clusters or groups rather than individually. Marks (1947) provided an early warning of this influence in psychological research when he considered the effects of clustering on the sample design used in the revision of the Stanford-Binet Scale (Terman and Merrill, 1937; McNemar, 1942): "Ignoring the effects of cluster sampling on measures of sampling error has undoubtedly resulted in attaching importance to results which are statistically insignificant. In the testing field, failure to allow for cluster sampling has probably caused us to attach a measure of precision to our results considerably in excess of that warranted by sound statistical techniques" (Marks, 1947:413). Marks estimated that the standard error of the reported mean score on the Stanford-Binet Scale was at least three times the error which would be calculated from the data by the use of the formula for unrestricted random sampling. The source of this discrepancy in error estimates could be traced to the fact that the researchers found it economical and convenient to use existing geographical clusters as the primary smapling unit. Since individuals within a particular sampling unit tended to resemble each other more than they resembled individuals from other units the basic assumption of independent random selection of observations had broken down and the usual formulae failed to apply. Kish (1957) points out that this homogeneity of individuals within sampling units may be due to common selective factors, or to joint exposure to the same effects, or to mutual influence (interaction), or to some combination of these. The magnitude of this homogeneity is usually measured by rho, the coefficient of intraclass correlation. The coefficient of intraclass correlation was developed in connection with the estimation of fraternal resemblance, as in the calculation of correlation between the heights of brothers. To establish the correlation between brothers in general we have no reason for ordering the pairs of measurements. That is, the measurements are logically interchangeable in computing the correlation coefficient. Pearson (1901) suggested that this problem could be approached by the calculation of a product-moment correlation coefficient from a symmetrical table of measures consisting of two interchanged entries for each pair or measures. This method is suitable for a small number of pairs, however the number of entries in the tables rises rapidly as the number of pairs increases. To overcome the difficulties posed by working with very large symmetrical tables Harris (1913) developed a short cut based on sums of squares. This method was further refined (by using Fisher's approach which employs degrees of freedom rather than sample size to obtain population variance estimates) to allow the computation of the intraclass correlation to be made from analysis of variance tables. Haggard's (1958) comprehensive investigation of the relationship between intraclass correlation and analysis of variance considerably extended the range of applications in psychological research for the coefficient of intraclass correlation; further work by Gupta (1955) explored the suitability of this statistic for use in educational research.

180

Evaluation in Education

It should be remembered that the value of the coefficient of intraclass correlation has no meaning for the individual except insofar as he is considered to A high value implies that there is a high degree of be a member of a group. homogeneity within the groups of observations.; The concept of consistency within groups may also be thought of as non-indepenThe presence of non-independence dence of observations within the groups. among observations was also shown to affect test statistics such as 't, F, or X2 (Walsh, 1947) because tests using these statistics are based on the assumpIn the tion of the independence of observations within two or more samples. case of t, when the observations are not independent, and the t test based on t will be the assumption of independence is used, the value of the obtained overestimated if rho is positive and underestimated if rho is negative (Haggard, 1958). The following discussion traces the definition of the coefficient of intraclass correlation from its description based on a symmetric correlation table (Weatherburn, 1946) to its functional relationship with the F statistic (Haggard, 1958). Consider a population of elements divided into M clusters each containing k In order to consider the elements which are measured on a characteristic X. correlation between the elements in a cluster, without distinguishing between the order of the pairs of elements in each cluster, Pearson (1901) suggested the construction of a symmetric table consisting of all possible pairs for each cluster. There will the table.

be k(k-1)

Let Xij

(i=l,

the jth

element

The mean

for

(i=l, The two

pairs

M; j=l,

..,

of the

for each

.., k) denote

ith

and Mk(k-1)

the measure

pairs

on the

of values

characteristic

in

of

cluster.

the population

. . . . M; j=l,

cluster,

is:

. ..I

k)

product moment correlation coefficient calculated between the values in the table is the coefficient of intraclass correlation rows of the symmetric

RI'

’ . .

RI=+2

ud ob 2

is the

variance

of the values

in the first

ub

is the

variance

of the

in the

0a b

is the

covariance

oa

row

2

values

between

the

rows.

second

row

Sample Design for Educational f

Now

a.g

(j, where

f

f:

Survey Research

181

!xij-x)(xiJ-x)

=

N

1=1, ..,

k: j # 1; i=l, .., M)

N is the number of pairs =

Mk(k-1)

"1.1 ('ij-X)'

,I j Oa=Ob = \;i Mk .

. .

(A-15)

R

(j,

l=l, .., k; j # 1, i=l, .., M)

Consider the numerator of the right hand side of (A.15). Numerator

= " 5

(xij-p)o~(xii-R)

The sum of the Xi, for all values of 1 includes all values for the ith cluster except Xij. That is,

1 Xi, = k'jli-X.. 1J 1

where

K, is the mean of the ith cluster.

Also

;" = (k-1)x since 1 takes (k-i) values.

. .Numerator = 7 1 (Xij-!t) kli-Xij-(k-i) iT I i J = k ~ 5

= k

(xij-x)(xi-x) - 2;5

5: (kyi-kX)(?+)

= k21(iii-R)2 - T 5 i

= kSSB - SS,

(Xij-112

-1 r, (xij4)2 'J

(xij-ji)2

182

Evaluation

where

Now

in Education

SST

= total

SSB

= between

SSW

= within

consider

sum

of squares

clusters clusters

sum of squares sum of squares.

thedenominatorof

Denominator

= (k-l)

the

right

hand

side

of (A.15).

7 l (Xij-x)2 iJ

= (k-l)SST RI = k.SSB

Therefore

-

(SS*+SSw)

(k-l)(SSB+SSW)

= (k-1)SS8-SSW (k-l)SSB Divide

both

sides

t (k-1)SSW

by M(k-1)

SSB_ q.J .

. .

RI=

M(k-1)

z

Now if we were to estimate (A.16) as follows: BCMS* BCMS"

RH= where

and

(A.16)

B + (k-1 Ww M M(k-1 ) a sample

of data

then

we

could

rewrite

- WCMS + (k-1)WCMS

BCMS

is the between

WCMS

is the within

*

denotes

RH is the

RI from

clusters clusters

a biased

estimate

mean mean

square square

estimate.

used

by Harris

(1913).

This estimate of the coefficient of intraclass correlation is biased in the negative direction (Gupta, 1956). Haggard (1958) recommends replacing BCMS* with an unbiased estimate of the population value of the between clusters mean square in order to remove this bias. Then

R = BCMS BCMS

is an unbiased

- WCMS t (k-1)WCMS

estimate

of the

(A.17) coefficient

Now if we divide the numerator and establish a functional relationship

of intraclass

correlation.

denominator of (A.17) by the WCMS, between R and F statistic.

we may

Sample

Then

183

Survey Research

R=h

In the justed

class membership case of unequal average value k (Haggard, 1958).

r

where

M is the

and

ki is the

Cluster

Consider

where

Sampling

number

and

the

of elements

Coefficient

= p + ai + eij

Xij

is the jth

U

is the

general

that:

one-way

in the

random

element

ith

cluster

of the

due to the

ith

associated

with

ai has

a normal eij

each

have pair

distribution

Between clusters Within clusters Total

1963):

cluster the jth with

a normal

distribut

of random

variables

Further, assume that the same number equal sized clusters. The following to variance (Hays, 1963). Source

Correlation

mean

effect

of u 2; e

by an ad-

class.

(Hays,

effect

the errors

ith

model

is the

variance

of k is replaced

of Intraclass

is the

of ac2;

value

of clusters

Xij

iJ

Assume

number

following

ai

the

-I

the

e..

From

Design for Educational

Sum of squares SSB ssW

sT

element

a mean ion with

in the and

of zero a mean

ai and e..

1.l

ith

a variance

of zero

are

cluster.

and

a

independent.

of observations ii are selected from m one-way table sumnarises the contributions

Degrees of freedan m-i m(fi-1)

Mean squares BCMS WCMS

Expected mean square no2+02 c 'e

e

2

mfi-1

(A17)

R =

BCMS - WCMS BCMS.+ (Ii-n)WCMS

Now it we substitute the expected values from the analysis of variance table, then we obtain the value of R under the assumptions of the one-way random

184

Evaluation in Education

model

presented

above.

iiu2

+U2-U2 e



R=

e

?ioc2 + ue2 + (‘ii-l)oe2 2 OC R

b

= oc2+u2

=

e

OC2 2

Now consider a two-stage sample and-suppose that m clusters are selected from Let the population size be N. Select a total of M clusters each of size N. ii observations from each of the m clusters. The variance of the mean may be estimated from the following formula (Yamane, 1967).

(A. 18)

where

and

Zi

is the mean

x

is the overall sample population mean),

5:

=-n;ll

of the

ith cluster mean

(which

is an unbiased

estimate

of the

T (xi-x)2

1

Sii =_'..l.(xij-xi)2 n-l J Now The

consider the sums of squares from the above total sum of squares may be expanded in the

‘t!F(Xij4)2

= ; i (xij-“i)2

ij

= ssw The

expression

analysis of variance table. usual fashion (Edwards, 1967).

+ fi ; (q-q2

ij

i

+ SSB SSW may

be rewritten

ssw= z”z" (XijGi

)2

i j =

(5-l

=

(E-1

)

![A!

)

?

i

txij-xi)2]

Si!i

in the

following

fash ion

Sample

Design for Educational

Survey Research

la5

Dividing both sides by m(R-1) WCMS =$T

ssi 1

or

(A.19)

ms. s, +tx=y

The expression SSB may also be rewritten: SSB = Fi i (Pi& i

= R(m-1) &-1

y (5-3 i

[

= 5(m-1) s* Dividing both sides by m-l: BCMS = 71 '"1 or

52 1

BCMS mfl

-= m

(A.20)

For large values of M, N the finite population corrections M-m and N-ii tend to unity. Therefore equation (A.18) may be rewritten as: M N

?

V(xc)

:$ +++

‘5i

Substituting from (A.19) and (A.20) for the two expressions on the right hand side of this equation gives: ;(nc) = BCMS + WCMS -K -KY But M is considered to be very large; therefore the second expression on the right hand side will tend to zero. That is, t(Yc) = BCMS ii?and

$ic,) = E(;p)

=

ilo 2 + (&-a c2) ' mti

186

Evaluation in Education

Now substitute for crc2 by using the relationship that

R/2_ 2

Then

V(F&

= iiRa2 + (a2 - Roe) nii =

But

V(Sisrs)=c= mn

g$ nn

for a simple random sample of m?i elements drawn from the sane population. (A.21)

Therefore V(yc) = V(Xsrs ) [1 + (fi-l)R]

It can be seen from the above expression that under the assumptions of the oneway random model the value of V(Rc) in relation to V(Fsrs) depends directly on the size of the coefficient of intraclass correlation. This relationship, and its resulting influence on the value of the design effect, is examined in a later discussion. Sampling Error of the Mean in Relation to Cluster Size and Number of Clusters Consider a sample of n elements drawn from a population of N elements divided into M clusters each containing Ni elements. Select n clusters at random and from each cluster select ni elements (where i=l, ..,

n).

If we assume that

the cluster sizes and subsample sizes are equal, then we have: Ni

=N_= M

i;i, ni

=

E=

n

n

Under these restrictions the sample mean Fcl becomes an unbiased estimate of the population mean (Yanane, 1967), where E

cl

Also consider a simple random sample of n elements drawn without replacement from the same population of N elements. It nay be shown (see equation A.21) that the variance of the sample mean for the simple cluster sample design V(Kc,), under the restrictions on Ni and Ni, and the variance of the sample mean for the simple random sample without replacement design V(Zsrs) are related by the following equation:

Vc,f

=

vsrs 4 1

1

+ (k-1)R

(A.22)

where R is the coefficient of intraclass correlation under the conditions of the one-way random model,

Sample Design for Educational

Survey Research

187

k = 71 is the ultimate cluster size (>I), and

74l>ii, M>> m.

Since

V(Ksrs) = $9

= !k

we have

V(ZJ

= $$

[Rk+ (I-R)]

v(gcl)

=

. . .

(A-23)

Vb;.R + W~l-R)

Therefore in order to reduce the variance of the sample mean based on cluster sampling it is important to increase the size of m (the number of clusters) rather than k (the number of elements per cluster).

Relationship Between Deff and Rho for Simple Cluster Sampling From previous discussion we find that there is a functional relationship between the F statistic and R:

R= F

1 F, (k-l)

The F statistic may take values in the range zero to infinity. By partially differentiating R with respect to F we obtain:

aR= aF

[F + (k-11

.l

-

rF - 11.1

[F + (k-l)]' =

k [F + (k-112

which is always positive. Therefore the relationship between R and F is monotonic and the maximum and minimum values of R must coincide with the maximum and minimum values of F The maximum value of R will thereforeoccur when F+mand the minimum value of R will occur when F + 0. That is Rmax = 1 - E and Rmin = _1+ 6, where E k-l and6 are small positive numbers such that E+O as F+-, and S+O as F + 0. NOW substitute these values into (A.22) to obtain two new relationships. When

R = Rmax, V(Q)

= Wsrs).

1+ (k-1) ( l-

E) I

188

Evaluation

in Education

but

k >> E and

then

V(Rc,)

When

R=

=v

but

6'

k >> Ek

Rmin,

V(Rc1)

From

s(k-lj

V&l)

the

for

and

is also

Also

two

Also

a small

6'-0

when

positive

1

number

because

F+O. (A.25)

6’Wsrs) results

R = Rmax,

for

= V(Rsrs),l+(k-')(kGi+') [

(~srs). bW)l =

(k-l) >O. then

(A.24)

= kV(xsrs)

(A.24)

V&l)

R = Rmin,

by inspection

and

(A.25)

it can be seen

that

‘V(~srs)

V(Xcl)~V(Rsrs)

from

(A.22)

it can be shown

that

for

R=O,

V(lcl)=V(Xsrs).

By substituting these results back into the expression which defines Oeff we These may now inspect the values of Deff for the simple cluster sample design. values may be presented with ranges of values for R because V(Rcl) also increases monotonically with R. When

and

O
;

Deff

>l,

R=O

;

Deff

= 1,

Deff

< 1,

-l
For most cluster designs in survey research R tends to be positive. That is, the individuals associated with human groups are more like their own group than Therefore we usually find that the value of Deff for a simple any other group. cluster design is greater than one.

RANDOM The

jackknife

SUBSAMPLE

ERROR

ESTIMATION

TECHNIQUES

method

Consider an estimator partition this sample subsample identically

y based on a random sample of n observations. into k 'independent' subsamples of size m such follows the design of the original sample.

Randomly that each

Sample

Design for Educational

Survey Research

189

Let yi be the estimate y based on that portion of the data which omits the ith subsample, and let y,,, be the corresponding result for the entire sample. Mosteller and Tukey (1968) define k pseudo values yP (i = 1, . . . . k) based on the k complements 'C =

y1

ky

all -

(i = 1, .. . . k)

(k-l)Yi

They also defined the jackknife value as Y * = + i yt

= kyall -

(k-l&

(A.26)

1

Quenouille (1556) presented theoretical arguments to support his earlier deduction (Quenouille, 1949) that the jackknife value displayed less bias than the usual estimate. A complete theoretical discussion of Quenouille's method and a detailed discussion of the jackknife statistic has been presented by Gray and Schacany (1972); however a brief summary of Quenouille's main argument is set out below. Assume that the bias in y as an estimator or Y is such that m ai E(y)=y+;ai=y+li n'

i (mk)'

where ai is not functionally related to n. m ai Also E(Jc) = E(yi) = Y + 1 F i (n-m)' since yi is based on (n-m) elements. Also (n-m) = km-m = m(k-1) then

E&)

m ai = Y + 1 i i m (k-l)i

From equation (A.26) the expected value of y* is E(Y*) = K*E(y,ll) - (k-l)*E(Ji)

= ky + ke’ + k%+ mk

= y_

(k-1)Y t (k-l)

m2k2

a2

a3

n?k(k-1)

- n?k'(k-@-

**'

al t (k-l) a2 t .. . m(k-1) m'(k- 1)'

190

Evaluation

. ‘

l

E(y*)

in Education

= Y

-k

n(n-m)

-

.-

a3

-

.

.

.

n2(n-m)2

Therefore the bias in y is of the order i while the bias of the jackknife value y* is of the order 3. Also, if the bias in y is exactly of the orderkthen is unbiased.

a2 = a3 = ... = 0 and y*

The variance of the jackknife value Sag may now be obtained from the pseudovalues,

where

52

=’

1. Yi*2

_1(1 *)2 k iyi k-l

Following Tukey's (1958) proposal, the pseudovalues are used as approximately independent observations in order to estimate the stability of the jackknife value.

THE HALF-SAMPLE REPLICATION METHOD The technique of half-sample replication is a technique for the estimation of variance from stratified sample designs which have two selections per stratum. These designs are valuable in that they permit the utmost stratification consistent with more than one independent replicate per stratum needed for computing variances. The following discussion is a summary of a detailed theoretical development by McCarthy (1966). Consider a stratified sample design which is composed of two independent selections from each of L strata. The following table describes the details of this design.

Sample Design for Educational

Population variance

Population mean

Weight

Stratum

191

Survey Research

Sample mean

Sample points

1

w1

Ul

4

Yll.

Y12

Yl

2

w2

p2

4

Y213

Y22

Y2

'h

'h

%

yhls

yh2

yh

h

An unbiased

where

Also,

estimate

iwh=

the

estimate

bj,k-j$,)2 (2-l)

= i

population

mean

u is given

1

usual

i. where sf, = k

of the

(Y,,-~)2

of the

variance

of 9.

t(j),

is

by

(Kish,

1965a:78).

192

Evaluation

in Education

yhl - 2-2

=

Then

11

yh22

+

'hl

yhz

-2

-

'h2 2

1 2

t(Y,,f-.$,d2

"v(y) = 4 fi w;di

That is, the sample obtained Now consider a 'half-sample replicate'. ting one of the sample points in each stratum. There are 2L distinct For the jth half-sample, the estimate of u is samples.

“3

=

The deviation written as

("s-")

where

i 'hYhi

of this

half-sample

= ~ 'hyhi

i = 1 or 2 for each

estimate

from

the overall

sample

mean

may

d LL

where the deviation for each stratum is determined choice of a plus or minus sign for each stratum.

(_Y;-J)' =

h.

- it 'h(yh 1+yh2)

= $(?w,dl?w2d2+...+w

Therefore

by selechalf-

by making

an appropriate

be

Sample

Design for Educational

Survey Research

193

and since selections within strata are independent, E(dhdk) = 0

Then

E~Y~-Y)~]

=

EE

i wid:] = V(Y)

Select a simple random sample of k half-samples. Since the expected value of each squared term is equal to V(p) then the expected value of their average is also equal to V(g). That is, ^v(y) =+b

(y3-P)2

(A.27)

J is an unbiased estimate of V(j)

BALANCED HALF-SAMPLE REPLICATION From above the variance of the weighted mean, V(Y), for a stratified sample design with two independent selections per stratum may be estimated as follows. (A.28)

Also, for a random half-sample, the variance of J may be estimated by considering the deviation of the half-sample estimate from the overall sample estimate

(3*-J? = 4

1w;d; + t i<;

(") whwkdhdk

(A.29)

The between stratum contributions to variance come from the cross product tens which involve dhdk: These terms cancel out in (A.29) when we consider the entire set of 2L half samples. The question now arises whether one can choose a relatively small subset of half-samples for which these terms will cancel out. If this can be done, then the corresponding half-sample estimates of variance will contain all the information available in the total sample. McCarthy (1966, 1969) has shown that by selecting orthogonally balanced patterms of half samples a smaller set of half samples may be selected which will produce estimates of the variance equal to the estimate that would be produced by considering all possible half samples. This technique relies on the property of the statistic under consideration being linear in the replicate values. The weighted mean satisfied the condition of linearity because (E;-P)2 = (j$'-y)i, where q' is the value of the weighted mean based on the data which form the complement of the data used to establish J;'.

Despite the

194

Evaluation

in Education

lack of analytical proofs for the non-linear statistics such as regression coefficients, Kish and Frankel (1970) provide results which suggest that this technique is also suitable for non-linear statistics. Consider the three strata design which is to be used in this study. Let the strata have two independent observations per stratum: (yI1, y12), (yzl, y22) and

(~31,

following

~32).

There

are

subset

of four

23 = 8 possible

half

half

samples.

Now

consider

the

samples.

Stratum

1

2

3

1

Yll

Y21

Y31

t(wld,

+ wzdz

+ w3d3)

2

Yll

Y22

Y32

l(wldl

- wzdz

-

3

Y12

Y22

Y31

g(-w,d,

- wzdz

+ w,d,)

4

Y12

Y21

Y32

t(-w,d,

+ w,d,

- w3d3)

Half

sample

The signs of the terms tion dh = (yh -yh2)* By multiplying

But

each

entry

right

hand

in the

column

r4ght

hand

(2w,w,d,d,+

are determined

column

w3d3)

by the defini-

we obtain

2w,w3d,d3+2w2w3d2ds)

= $(w:dj+w$d$+w?jds)

+h

(9*-Y)2

= $(w:dj+w$d$+w3ds)

+a (-2wlw2dld2-2w,w,d,d3+2W,W3d2d3)

(s*-y)2

= ~(w:d~+w$d~+w$d~)

t f(2wlw2d,d2-2w,w3d,d3-2w,w,d,d,)

(y*-j-)2

= a(wld$+w!!d$+w$d5)

t a(-2wlw2dld2+2w,w3d,d3-2w2w3d2d3)

6*-Y)

Then

out

in the

(9:-Y)

2

$ 5 (Y*-Y)'=

from

previous

E4

a [ w;di

discussion

we know

that

B1 w;d;

= V(y)

Therefore by selecting the above pattern of four half all the information which would be available by using half samples. McCarthy (1966) summarises this pattern of half whose columns are orthogonal to one another.

samples

samples we have obtained all of the eight possible

as a matrix

of signs

Sample Design for Educational

A plus

sign

denotes

yhl

and

+ +

+ -

a minus

sign

195

Survey Research

t -

denotes

yhs.

In order

to obtain

a

set of half samples which have the property of cross-product balance it is necessary to construct a matrix whose columns are orthogonal and whose rows The method of constructing these ormust be a multiple of four in number. thogonal matrices is described by Plackett and Burman (1946). Gurney (1970) developed a formula which compared the variance estimates obtained from McCarthy's orthogonal method with the unbalanced random selection Considerable gains were shown to be associated with the use of replications. of estimates based on the McCarthy method.