Ev&~afion in Educarmn, Vol. 2, pp. 105.195 0 Pergaman Press Ltd. 1978. Prrnted in Great Brrtain.
SAMPLE
0345.9228/78/04Ol-0105
DESIGN SURVEY
FOR EDUCAIWNAL RESEARCH
CONTENTS
Page 1.
INTRODUCTION
106
2.
THE FIVE SAMPLE DESIGNS
120
3.
THE ANALYTIC b%tlDDEL
137
4.
THE COMPARISON OF SAMPLE DESIGNS
136
5.
THE ESTI~TION DATA
147
6.
A WORKER EXAI"IPLE
157
7.
CONCLUSION
165
8.
SUM~RY
167
ACKNO~LE5GE~NTS
168
REFERENCES
169
APPENDIX: SOME THEORETICAL CONSI~E~T~ONS
173
OF SA~iFLIN~ ERRORS FROM SAMPLE
SC&W/O
1. Introduction
"Apples that grow on the same tree resemble one another more than they resemble appZes on other trees, particularly if the trees are of different kinds. In the same way the students, parents, and teachers of one school tend to resemble one another more closely than they resemble those of diJ"ferent schools, particuZarly when the schools are of different kinds. These facts are the key to our problem. They mean that if we are swnpling an orchard it is essential to have enough trees in our sample; if we are sampling students cnzd teachers it is essential to have enough schooZs. We cannot make up for a Lack of trees, or schools, by increasing the number of apples from each tree, or increasing the number of students taken from each school". (G.F. Peaker, 7367ai
Social science research is aimed at developing useful generalisations about However, due to society and the ways in which individuals behave in society. practical constraints on research resources, the social scientist is rarely able to study a complete coverage of the individuals or groups for whom these generalisations are appropriate. Provided that scientific sampling procedures are employed, the use of samples rather than a census or complete coverage often provides many advantages for the social scientist without limiting opportunities for the development of wide generalisations. Cochran (1963) lists the major advantages of sampling compared with a complete coverage as: reduced costs, reduced requirements for specialised equipment and personnel, greater accuracy due to closer supervision and greater speed in data collection and analysis. of data gathering procedures, Kish (1959) divides the social science research situations in which samples are used into three broad categories: experiments - in which all extraneous sources of variation are controlled through randomisation; surveys - in which all members of a defined population have a known positive probability of selection into the sample; and investigations - in which data are collected without either the randomisation of experiments or the probability sampling of surveys. Experiments are strong with respect to external validity involving the question of the extent to which the findings may be generalised to some wider population. and their use is due freInvestigations are weak on both types of validity quently to convenience or ow cost and sometimes to the need for measurements in natural settings. This report is concerned w th the design of samples for educational survey research. In particular, t seeks to examine the problems associated with the evaluation of the degree of confidence which may be attributed to sample estimates of population characteristics obtained from a variety of sample designs which are commonly used in educational survey research.
106
Sample Design for Educational
POPULATIONS
AND
Survey Research
107
SAMPLES
The populations which are of interest to educational and social science reThe searchers may be defined jointly with the elements which they contain. population is the aggregate of the elements and the elements are the basic units that comprise and define the population (Kish, 1965). The elements of the population are usually the units of analysis - the elementary units comprising the population about which inferences are to be drawn. Kish
(1965) 1. 2.
states
that
a population
must 3. 4.
content units
be defined
in terms
extent time.
For example, in a study of the characteristics of Australian students we may wish to specify the desired population as: 1. 2.
all 14 year-old students in secondary schools
3. 4.
of:
secondary
school
in Australia in 1977.
In order
to prepare a description of the population to be considered in a study it is useful to distinguish between the population for which the results are required, the desired target population, and the population actually covered, the survey population. The survey population may differ from the desired target population. This difference may be due to non-coverage (for example, in the study referred to above we may compile a list of schools during early 1977 which accidentally omits some new schools which will begin operating later in the year) or the difference may be due to non-response (for example, several schools which have some handicapped students falling within the desired target population definition may be unwilling to allow these students to participate in the study). Strictly speaking, only the survey population is represented by the sample, but this population may be difficult to describe exactly, and it is often easier to write about the defined target population (Kish, 1965).
DEFINING
A TARGET
POPULATION
- AN AUSTRALIAN
EXAMPLE
During a cross-national study of science achievement carried out in 1970 by the International Association for the Evaluation of Educational Achievement (IEA), one of the desired IEA target populations for the study was described as: "All students aged 14.0-14.11 years at the time of testing. This was the last point in most of the school systems in IEA where 100 per cent of an age group were still in compulsory schooling" (Comber and Keeves, 1973:lO). In Australia it was decided that, for certain administrative reasons (Rosier and Williams, 1973), the study would be conducted only within the six states of Australia and not within the smaller Australian territories. It was also decided that only students in those school grade levels which contained the majority of 14 year-old students would be tested. The the
above desired IEA target population description was therefore reworded following fashion to obtain the defined Australian target population
in
108
Evaluation
in Education
description: "All students aged 14.0-14.11 years on 1 August Australian states and secondary school grades: New South Wales Victoria Queensland South Australia West Australia Tasmania
Location
1970
Forms I, II and III Forms I, II, III and IV Grades 8, 9 and 10 1st year, 2nd year and 3rd year Years 1, 2 and 3 Years I, II, III and' IV” (Rosier
in the
fallowing
and Williams
Defined Australian target population
Number of excluded students
78163 62573 33046 22381 19128 7868
76317 62030 31839 21632 18708 7789
1846 543 1207 749 420 79
3427
0
3427
226586
218315
8271
Desired IEA target population
1973:3).
States New South Wales Victoria Queensland South Australia West Australia Tasmania Other Territories Total
The date used in the defined Australian target population description, 1 August 1970, was chosen to coincide with the date of the annual school census so that official census statistics could be used in weighting and other sampling calculations.
The number of Australian
students in the the defined Australian target population Rosier and Williams, 1973).
desired IEA target population and are presented in Table 1.1 (source:
For Australia overall, the excluded population desired target population to yield the defined four per cent of the desired target population.
(the students excluded target population) was
from less
the than
Most of the excluded students in the six states were 14 year-olds who were in grade levels which were either lower or higher than those included in the target definitions or who were in special schools. The students in the 'other territories' of Australia (Australian Capital Territory and Northern Territory) were excluded from the target population because of certain administrative difficulties associated with testing the students in these territories. The Australian Capital Territory was excluded because a major study, which will be referred to later in this report, was in progress in the schools of this terThe Northern Territory was also excluded because the administrative ritory. costs would be high for the small number of students which would be tested in this territory.
Sample Design for Educational
109
Survey Research
A sampling frame was constructed from a list of all schools containing students This sampling who were members of the defined Australian target population. frame was stratified by grouping the list of schools according to the state in and also by grouping schools within each state accorwhich they were situated, ding to the system under which the school operated (Government/Catholic/Independent) and according to the geographical location of the school (Metropolitan/Non-Metropolitan). The use of stratification procedures in the preparation of a sampling frame is usually undertaken because of the desire to increase the accuracy of sample estimates or because of the need to provide separate estimates for certain These issues are discussed in more designed strata of the target population. detail in the following chapter of this report. The selection of the sample was constrained by the researchers' desire to obtain information which would allow comparisons to be made using both students These comparisons were required to be and schools as the units of analysis. As a carried out with the same degree of sampling accuracy for each state. result of these constraints the sample was selected as a stratified sample in sample requiring the selection within each two stages - with the designed state of approximately 38 schools followed by the selection of 25 students within each selected school. The calculations and the mechanical selection procedures which are needed to arrive at the required number of first stage selections (schools) and the required number of second stage selections (students within schools) are described in the presentation of a hypothetical national sample design in Chapter 6. Clearly this sample design differs in its degree of complexity compared with a simple random sample in which all 218315 students within the defined Australian target population would be listed and the required number of students This report aims to examine the would be selected at random from the list. nature of these departures from the model of simple random sampling, and also to assess the consequences of these departures for commonly used procedures of analysis.
THE ACCURACY
There veys:
are
usually
two main
objectives
OF SAMPLES
involved
in the
conduct
of sample
sur-
1. The estimation of certain population vaZues (parwnetersi. In many educational research surveys we are interested in obtaining estimates of the mean level of achievement for the population and various percentile points of the distribution of the achievement for the population.
The testing of a statistica hypothesis about a population. AS 2. well as estimates of population parameters we may be interested for example in testing the hypothesis that there is no difference between the average achievement of certain subgroups.in our sample. Our capacity to examine sample data with respect to these two objectives depends directly upon our knowledge of the accuracy of sample estimates with Knowledge of the accuracy of estimates respect to the population parameters. is derived in turn from statistical theory which requires that each member of
110
Evaluation
in Education
the population has a known, and non-zero, probability of being selected into the sample. The accuracy of samoles selected without using probability sampling methods cannot be discovered from the internal evidence of the sample data. Therefore, since non-probability samples are not suitable for dealing with the objectives of estimation and hypothesis testing, they will not be considered in this report.
ACCURACY,
BIAS
AND
PRECISION
The sample estimate derived from any one sample is inaccurate to the extent that it differs from the population parameter. If we seek to estimate the mean of the population performance in a test norming program then the difference between the population mean and the sample estimate of the population mean measures the accuracy of the sample estimate of the population mean performance. Generally the value of the population parameter is not known and therefore the actual accuracy of an individual sample estimate cannot be assessed. Instead, through a knowledge of the behaviour of estimates derived from all possible samples which can be drawn from the population by using the same sample design we are sometimes able to assess the probable accuracy of the obtained sample estimate. For example, consider the population of two schools described in Fig.l.1 each containing two students. The two students in the first school are aged x1 and x2 years, while the two students in the second school are aged x3 years and xq years. School 1
School 2
Consider for example, that our survey objective is to estimate the mean age of the population of students from a sample of size n=2 drawn from the population. In order to select our sample we could write the name of each student on a ball, place the four balls in an urn, and then thoroughly mix the balls before randomly drawing out the names of the two students in our sample. This sampling procedure is called simple random sampling and it forms the basis of the more complicated sampling procedures which will be discussed later in this report.
If the above sampling procedure was continued indefinitely then each of the six samples listed in Table 1.2 would be drawn over and over again. With-this sampling procedure each sample would, by definition, have an equal chance of being selected. The average of the estimates derived from ail possible samples would be: ZKi 6
Possible samples n = 2 (ages of students selected)
Xi
(estimate of P)
x
XI and x2
x,
and x
x , and x x 2 and x
1
3 4 3
x 1 and x4 x 3 and x
x
4
6
The average of the estimates of the population parameter derived from an infinite number of samples, E(Z), is called the expected value of the estimator In our example the expecwhich is used to describe the population parameter. ted value is equal to the population value: Xl + . . . + XL+ 3x1 + . . . + 3x4 E(X) = 7 z-6 ( 4 =P 1 = 2 The mean value E(R) may or may not be equal to the population value P. The difference between the two we call the sampling bias = E(R)-II. A sample Note that this is not a property of a single design is unbiased if E(K)=p. sample, but of the entire sampling distribution, and that it belongs neither to the selection nor the estimation procedure alone, but to both jointly (Kish, 1965). In our example we have seen that the sample mean is an unbiased estimate of the population mean - despite the variations in the estimates obtained from individual samples. Since the accuracy of the sample estimates depends (on the average) upon the variations of the individual sample estimates, we require The usual a measure of the spread of the distribution of sample estimates. measure of the spread of a distribution is the variance of the distribution. For our example the variance of the distribution of sample means, V(E), is given by: ZR.
-a?-,’
+ ... +
(?26 -
$1
a measure The variance of the sampling distribution of sample means provides The calcuof the probable accuracy or precision of any one sample estimate. lation of precision levels will be discussed in a later section of this report.
In order to incorporate the two aspects of variance and bias into one statement associated with the accuracy of an individual estimate, statisticians have developed the concept of mean square error (MSE). The MSE is defined as the average of the squares of the deviations of the possible sample estimates from the value being estimated (Hansen et al, 1953). It can be shown
(Yamane,
1967)
that
the mean
square
error
can be written
as:
112
Evaluation
MSE(X)
in Education
= E(E
- p)*
= E[R
- E(QJ*
= Variance
+ [E(Z)
of y +
- p ]"
(Bias of Z]2
For the most well-designed samples in survey research the sampling bias is either zero (as in our sample) or small-tending towards zero with increasing sample size. The estimates used in this report exhibit this property and therefore our examination of the accuracy of these estimates will concentrate on sampling precision as measured by the variance term.
SAMPLING
DISTRIBUTIONS
AND
STANDARD
ERRORS
Since the accuracy of the estimates used in this report depend principally on precision we now.turn to consider how we may use the measure of variance to obtain measures of the probable accuracy of our estimates. In many practical survey research situations the sampling distribution of the estimated mean is approximately normally distributed. The approximation improves with increasing sample size even though the distribution of elements in the parent population may be far from normal. This characteristic of the sampling distribution of the same mean is associated with the Central Limit Theorems and it occurs not only for the mean but for most estimators commonly used to describe survey research results (Kish, 1965). From a knowledge of the properties of the normal distribution we know that we can be '68 per cent confident' that the range Z f J(v(~)] includes the population mean, where F is the sample mean obtained from one sample from the population. The quantity J[V(E)] is called the standard error, SE(Z), of the sampling distribution of Z. Similarly we know that the range Z + 1.96 SE(y) will include the population mean with 95 per cent confidence. The calculation of confidence limits for estimates as described above allows us to satisfy the estimation objective of survey research. Also, through the construction of difference scores d = Xl - Z2: and using a knowledge of the standard errors SE(X,) and SE(X,), we may satisfy the statistical hypothesis objective (Yamane, 1973). It should be remembered that, although our discussion has focussed on sample means, we could also set up confidence limits for many other population values, which for example are estimated by 7, in the form V ?r tJ(V(v)]. The quantity t represents an appropriate constant which usually is obtained from the normal distribution or under certain conditions from the t distribution. For most sample estimates encountered in practical survey research, assumptions of normality lead to errors that are small compared to other sources of inaccuracy (Kish, 1965). The approach to normality faster for scme variables mean described above then
of the sampling distribution of a statistic may be than for others. If the statistic is the sample a sample of at least 50 elements is generally
Sample Design for Educational
sufficient, however for correlation coefficients would be required (Moser and Kalton, 1973).
THE
ACCURACY
OF INDIVIDUAL
a much
SAMPLE
Survey Research
greater
sample
113
size
ESTIMATES
In the previous section we have discussed how the variance V(g), of an estimator may be used to make statements about the precision of individual sample estimates. In survey research we are usually dealing with a single sample of data and not with all possible samples from a population. Therefore we are unable to calculate the value of V(v) exactly. Fortunately statisticians have derived some formulae, for certain sample designs, which allow us to make an estimate of V(B) from the internal evidence of an individual sample of data. For the simple random sample design, in which each sample element is randomly and independently selected from the popthe variance of the sample mean ulation with equal probability of selection, may be estimated from a single sample of data by using the formula:
= -.N;
;(x)
where
n
N is the
n is the
s2 n
population
sample
and s2 = 1. n -1 the element
For sufficiently confidence that R&1.96
'('i
values
size.
size - jT)2
in the
is an unbiased
estimate
of the variance
of
population.
large values the population
of n we may therefore estimate mean p will lie in the range
with
95 per
cent
/'[(NT);]
where i7 is the sample mean of a simple from a population of N elements. Note N, the variance of the sample mean may called
the
finite
population
random sample of n elements selected that, for sufficiently large values of be estimated by & since the term N - n n N correction, tends toward unity.
Although there is general agreement among statistical authors about the formula for estimating the variance of the sample mean for a single simple random sample of elements, there are minor differences of opinion about the appropriate formulae for calculating the variance for more complex statistics. These minor differences generally become insignificant for the typically large population and sample sizes which are associated with survey research. Table 1.3 presents the formulae for calculating the standard error of a statistic, where SE(v) = ~'[V(jfl , from a simple random sample of elements for a range of complex statistics which are commonly employed in educational survey research. For this report the formulae were selected from one source (Guilford and Fruchter, 1973), however the main results of the report would
114
Evaluation
in Education
not be seriously altered recognised authors.
by the
use of formulae
presented
by any of the other
The formulae in Table 1.3 are based on a simple random sample of n elements which are measured on m variables, where variable X has a standard deviation of s. The multiple correlation coefficient Ri jkl refers to the regression equation which predictors.
uses
variable
i as the
criterion
and
variables
j, k and
1 as
The formulae were derived on the assumption that the sample design used to However collect the data consisted on a simple random sample of elements. most social science research, especially survey research, is conducted with data obtained from complex sample designs which, as in the Australian sample described earlier, employ techniques such as stratification, clustering and Computational formulae are available for varying probabilities of selection. and differences of means estimating the standard errors of means, aggregates for a wide range of these sample designs (see Kish, 1965). Unfortunately the computational formulaerequired for estimating the standard error of multivarregression soefficients, etc. iate statistics such as correlation coefficients, are not readily available for sample designs which depart from the model of These formulae either become enormously complicated simple random sampling. or, ultimately, they prove resistant to mathematical analysis (Frankel, 1971).
TABLE 1.3
Formulae for estimating standard errors with a simple random smplinq procedure
Sample statistic
l~i~en data
LZPEgathered
Estimate of SE(v)
Mean
Ji
(Guilford and Fruchter, 1973:127)
Correlation coefficient
J A
(Guilford and Fruchter, 1973:145)
Standardised regression coefficient
(Guilford and Fruchter, 1973:368) Multiple correlation coefficient
1 J(n
(Guilford and Fruchter, 1973: 367)
educational researchers have estimated standard errors for In the past many multivariate statistics by applying formulae which are appropriate only for data obtained from a simple random sample design despite the fact that they had not used simple random sampling in their research. The problems and some appropriate solutions associated with this misuse of computational formulae will be discussed at length in later sections of this report.
Sample Design for Educational
MULTI-STAGE
COMPLEX
SAMPLE
115
Survey Research
DESIGNS
A population of elements can usually be described in terms of a hierarchy of sampling units of different sizes and types. For example a population of school students may be seen as being composed of a number of classes each of which is composed of a number of students. Further, the classes may be grouped into a number of schools. In the previous discussion we have considered the use of simple random samples in which students yere selected individually from the population. In practice we usually select the individual units of the population as clusters, or in several stages. These modifications in sample design are often used because they reduce the costs of a research study by minimising the geographical spread of the sample elements. Consider the hypothetical population The population consists of eighteen (with three students per class) and Schools (psu's)
of school students students distributed three schools (with
described in Fig.1.2. among six classrooms two classes per school).
AyyzA School 1
Class 2
Classrooms (ssu's)
Class 1
Students (tsu's)
mmm/n/n/n 4 12 3
5
6
Class 3
7
a
9
Class 4
10
11
Class 6
Class 5
12
13
14
15
16
17
la
From this population we could select a simple random sample of four students (by the method described in a previous section) or we could employ a multistage cluster sample design to select a sample of the same size. In order to select a multi-stage cluster sample we consider the population to be divided into primary sampling units (schools), secondary sampling units (classrooms) and tertiary sampling units (students). At the first stage of sampling we could randomly select two schools; at the second stage of sampling we could randomly select one classroom from each of the selected schools; and at the third stage of sampling we could randomly select two students from each selected classroom. The actual mechanical procedures at different stages are discussed
required for the selection of sampling units more fully in Chapter 2 and Chapter 6.
If we employed either the simple random design or the three stage cluster sample design described above to select a sample of four elements, then for both sample designs this would ensure that each population element had an equal chance of appearing in either of the samples (see Chapter 2). That is, simple such as the population mean, would sample estimates of population parameters, provide unbiased estimates for both sample designs.
116
Evaluation
in Education
THE
COMPARISON
OF SAMPLE
DESIGNS
In the above example we have seen that for a given sample size both the simple random sample design and a three stage cluster sample design may provide unbiased sample estimates of the population mean. However, as will be shown in later chapters, the variance of these estimates may vary greatly. Therefore to compare these two sample designs we need to examine the stability of the estimates which they provide for samples of the same size. Fisher (1922) described and compared sample designs in terms of their 'efficiency'. Two sample designs A and B were compared by considering the inverse of Using E to denote the efficiency their variances for the same size of sample. of a sample design for the sample mean and n to denote the sample size we can compare the efficiency of two samples by the ratio: V(K*) EA -=v(io EB
(n A = nB)
It is important to remember that for a given sample teristics may have different levels of efficiency.
design,
different
charac-
More recently Kish (1965) has suggested the use of the simple random sample design as a baseline for quantifying the efficiency of complex sample designs. Kish introduced the word 'Deff' (design effect) to describe the ratio of the variance of the sample mean for a complex sample to the variance of a simple That is: random sample of the same size (Kish, 1965). Deff
=-
WC) V(F,,,)
Where ance
the
sample
of the
variance
sizes
sample
of the
mean
sample
nc and nSrS for mean
are both
a given for
complex
a simple
equal
and where
sample
random
design
sample
V(Z,)
is the
and V(F,,,)
of equal
vari-
is the
size.
For many commonly used sample designs and for many commonly used statistics in survey research we find that Deff is greater than unity. Consequently the use of formulae based on the simple random sample model to estimate standard errors may result in gross underestimation of sampling errors. Some values of Deff for a range of sample designs and statistics have been calculated and presented in Chapter 4.
THE
DATA
USED
IN THIS
REPORT
This report has been made possible through the availability of suitable educational survey research data associated with a population of Australian students. These data were collected as part of a study which examined the contributions of the home, the school and the peer group to change in the educational achievements of students during the first year at secondary school in the Australian Capital Territory (Keeves, 1972). Although Keeves focused his study on a sample of 215 students, he also gathered data on a group of variables for the
Sample Design for Educational
whole population of first of peer group influences.
year
students
in order
to assist
Survey Research
with
117
the examination
The population data obtained in the study carried out by Keeves areused to provide empirical examples of the sampling concepts discussed throughout this report. The types of data employed and the analytic procedures examined are constrained both by the availability of suitable variables in this population data file, and also by the author's aim to test the influence of commonly used sample designs on sample estimates derived from commonly used data analysis procedures. The desired population in this study consisted the schools of the Australian Capital Territory distributed amongst 15 secondary schools: nine Catholic high schools and two Independent high
of all first year students in in 1969. The population was Government high schools, four schools.
Desired target population (source: CBCS records)
Type of school
Survey population (source: Keeves' 1972 data)
Government schools Non-government schools (Catholic and Independent)
1714 764
1611 743
All Schools
2478
2354
By comparing
the Commonwealth Bureau of Census and Statistics records (CBCS, 1970a, 1970b) with the population data achieved by Keeves we see that, as in most survey research studies, there have been some losses between the desired population and the survey population. The discrepancies in the two sets of figures in Table 1.4 may be attributed to absenteeism on the day of testing, the movement of students' families out of the Australian Capital Territory be-
tween the census date and the date of data gathering programme, and the exclu-
sion of one small classroom nature of this class.
THE
of children
VARIABLES
USED
from
the
IN THIS
study
because
of the
atypical
REPORT
The term variable refers to a property whereby the members of a group or set differ one from another (Ferguson, 1971). Variables, as measured in educa-
tional survey research, are often crude indicators of the constructs which the researcher intended to measure. For example when a student takes an achievement test in mathematics, the resulting score is only an indicator of what he knows about mathematics. There is often a great deal of argument about the meaning of such an indicator with respect to the student's mathematical knowSimilarly, in an attempt to measure abstract concepts like the socioledge. economic status of a student's family background, researchers often create composite indicators from a range of measures of family income, education, and occupation.
118
Evaluation
in Education
The questions raised in the above discussion are concerned with problems of reliability and validity of measurement in educational research and are fully In this report we will examined in standard educational research textbooks. be concerned only with the inter-relationship between sample design and indicators which are commonly employed in educational survey research.
TABLE
I.5
“he
~onstr-u~~~
mi:
their
indicators
in
this
report
Indicator
Construct Name
Definition
Sex of student
SEX
Coded on a two point scale with male = 1, female = 2
Socioeconomic status of student's home
FATHERS OCC
The occupation of the student's father coded on a six point occupational prestige scale (Broom, Jones and Zubrzycki, 1965, 1968)
Student's attitude toward school
LIKE SCHL
A 17 item scale designed to measure the student's attitude towards school
Student's own expected level of education
EXP EDN
A seven point rating designed to measure the student's expected final level of education
Student's mathematical ability
MATHS
A test.of 55 mathematics items each of which has been scored 1 for correct and 0 for incorrect
In Table 1.5 the constructs, and their indicators, which were used in this report are described in detail. Further information concerning the operational definitions of the indicators can be obtained from the study by Keeves (1972). The constructs were selected so as to provide a range of commonly used a dichotomously coded indicator (SEX), an operational prestige indimeasures: cator (FATHERS OCC), an attitudinal indicator (LIKE SCHL), a self-rated expectation indicator (EXP EDN), and an indicator of a school achievement (MATHS).
THE
SAMPLE
DESIGNS IN THIS
REPORT
The sample designs examined in this report were selected because they represent five sampling procedures which are commonly used in educational survey research. These sampling procedures lead to probability sample designs because they ensure that each element in the survey population has. a known non-zero probabilThe characteristics of each design are discussed in detail ity of selection. in the following chapter.
Sample Design for Educational
THE
The estimation main ways:
ERROR
ESTIMATION
of sampling
errors
TECHNIQUES
in this
Survey Research
IN THIS REPORT
report
has been
carried
out
in two
(1908) technique of developing sampling 1. The use of Student's distributions from sample estimates obtained by repeated independent applications of each sample described above, and The use of the empirical error estimation techniques of jack2. knifino (Tukev. 1958) and balanced reoeated reolication fMcCarthv, 1966, i969a, i969b) to estimate the sampling errors of statisticsobtained from single samples of data. The results Chapter 5.
of the
use of these
techniques
are described
in Chapter
4 and
119
2. The Five Sample Designs
PROBABILITY
SAMPLES,
RANDOMISATION
AND
SAMPLE
FRAMES
The sample designs examined in this report are all probability samples with These sample each element having a known non-zero probability of selection. designs also represent designs which are commonly employed in educational survey research. Probability sampling requires that the actual selection of elements into the sample be made by a mechanical randomisation procedure that assigns the desired probabilities. If we wished to select a simple random sample without replacement by using the method described in Chapter 1, this mechanical procedure would become a difficult and cumbersome operation.
In most survey sampling operations we achieve our random selections by employing a table of random numbers in order to substitute for the shuffling process. frame as Also, instead of a listing of names on cards, we often use a sampling a means of deriving our sample selections with the assistance of the table of For example, we could construct a sampling frame from enrolrandom numbers. ment lists obtained from the schools which constitute our population, assign an individual serial number to each student on these enrolment lists and, by reading off a set of numbers (ignoring duplicates) from a table of random numbers, obtain a simple random sample from our population. In this report the survey population data consists of information obtained from the 2354 students (see Table 1.4) who participated in the data collection phase of the study carried out by Keeves (1972) in the Australian Capital This survey population has been arranged into a sampling frame in Territory. Table 2.1. The schools in the survey population have been broken into three strata; SYSTEM 1: Government schools, SYSTEM 2: Catholic schools, SYSTEM 3: Independent schools. Each school system has been listed by school and by class within school. Beside each class is given the number of students in the class and, at the end of each school list, the total number of students in the school is given in brackets. The sets of square brackets describe pairs of classes which have been combined into 'pseudoclasses' for the purpose of the application of the classroom cluster sample designs described later. A 'pseudoclass' is a combination of two or more classes which are combined because some classes contain fewer students than required for the sampling procedures being used. Five sample designs are used in this report to draw samples of 150 from the The selected sample size was chosen as a balance between sampling frame. the types of analyses which will be carried out three competing requirements: on the data (Kerlinger and Pedhazur (1973) recommends between 100 and 200 120
Sample Design for Educational
Survey Research
121
subjects for regression analyses which do not involve large numbers of variables); the aim of considering a research model which would be within the economic and administrative resources of the typical educational research worker; and the desire to keep tne sampling fraction (of 6.4 per cent) at a level which is considered to be small enough to minimise the finite population correction (Cochran, 1963).
ZWLE
2.1
The
frame used
sampling
CLASS CLASS CLASS CLASS
01 02 03 04
[CLASS 05 06 SCHOOL 02
SCHOOL 03
report
SYSTEM 1 (continued)
SYSTEM 1 SChOOL 01
in this
CLASS CLASS CLASS CLASS CLASS CLASS CLASS
07 08 09 10 11 12 13
CLASS 14 CLASS 15 [CLASS 16 17 CLASS 1%
37 36 39 33
SCHOOL 08
10 (183) 33 34 33 28 25 30 28 17 (195) 32 31 23 15 29 (130)
CLASS CLASS CLASS CLASS CLASS CLPSS
40 41 42 43 44 45
34 36 35 37 27 32
[CLASS 46 48 (274) 47 25 27
SCHOOL 09
CLASS CLASS CLASS CLFSS CLASS
49 50 51 52 53
32 33 32 31 26 (154)
SCHOOL 10
CLASS 54 CLASS 55 CLASS 56
38 40 35 (113)
SCHOOL 11
CLASS CLASS CLASS CLASS
35 33 34 31 (133)
SYSTEM 2
SCHOOL 04
CLASS 19 CLASS 20 CLASS 21
38 36 36
SCHOOL 05
CLASS CLASS CLASS CLASS CLASS
24 25 26 27 28
40 35 35 30 36 (176)
SCHOOL 12
CLASS 61 CLASS 62 CLASS 63
38 37 38
SCHOOL 06
CLASS CLASS CLASS CLASS CLASS
29 30 31 32 33
36 37 30 21 9 (133)
SCHOOL 13
CLASS 66 CLASS 67 CLASS 68
40 44 48 (132)
SCHOOL 07
CLASS CLASS CLASS CLASS CLASS CLASS
34 35 36 37 38 39
35 39 37 36 33 21 (201)
SCHOOL 14
CLASS 69 CLASS 70 CLASS 71
26 26 29 (81)
SCHOOL 15
CLASS CLASS CLASS CLASS
32 30 30 31 (123)
57 58 59 60
SYSTEM 3
72 73 74 75
122
Evaluation
in Education
SIMPLE
RANDOM
SAMPLING
Probability sampling requires that every element in the survey population has Simple random sampling represents a known non-zero probability of selection. the most basic type of probability sampling. However due to variations in terminology between statistical authors, the term 'simple random sampling' is often used to describe either of two different sampling techniques. In this report the term 'simple random sampling' will refer to simple Kish provides the following operational sampling without replacement. tion of simple random sampling:
random defini-
"From a table of random digits select with equal probability n corresponding to n of the N listing different selection numbers, The n listings selected from numbers of the population elements. the list, on which each of the N population elements is represented separately by exactly one listing, most identify uniquely n different elements" (Kish, 1965:36). The procedure of 'unrestricted simple random sampling' requires that the selected elements are placed in the selection pool again and may be reselected on In educational survey research it is not practicable or usesubsequent draws. ful to employ this technique because of the obtrusive measurement problems which result from measuring the same student, teacher, etc. on more than one occasion. Simple random sampling is also preferable to unrestricted simple random sampling because it produces more precise estimators. The standard error of the sample mean for simple random sampling is reduced by a factor of J(N - n ) N However for most educational compared to unrestricted simple random sampling. survey research applications the difference between the two standard errors is small because the sampling fraction, n, is usually very small. N Both simple random sampling and unrestricted simple random sampling give an This characequal probability of selection to each member of the population. teristic, called 'epsem' sampling (equal probability of selection method), is not restricted solely to these two sampling techniques. Equal probability of selection can result from either equal probability selection throughout the sampling process, or from variable probabilities that compensate for each Epsem sampling other through the several stages of multistage sampling. is widely applied in survey research because it usually leads to self-weighting in which the simple arithmetic mean obtained from the sample data is samples, an unbiased estimate of the population mean.
In this report the simple random sample thesurvey population described in Table sample of 150 elements.
design (the SRS design) was applied to 2.1 in order to obtain a simple random
Sample “.,:!,L: L . :;
Design for Educational
JOYtk CnZcuzatLmcfoor dzc sclcctim of cL.ements mrvmiom .samp1~ dcsig~:
Population size
iSi;“
Survey Research
123
::implf,
design)
Sampling fraction
Sample size 150
ProbabiliU, of selecting each element = for the SRS sample design
STRATIFIED
SAMPLING
The formulae for standard errors in Table 1.3 show that one way of increasing the precision of a simple random sample is to increase the sample size. Another way of increasing the precision of estimates in survey research is to use stratification. Stratification does not imply any departure from probability sampling - it merely requires that, before any selection takes place, the population be divided.into a number of mutually exclusive groups called strata and then following this division a random sample is selected within each stratum. Stratification may be used in survey research for reasons other than obtaining gains in precision. Strata may be formed in order to employ different sampling methods within strata, or because the subpopulations defined by the strata are designated as separate domains of study (Kish, 1965). Some typical variables which are used to stratify populations in educational survey research are school type (e.g. Government/Non-government) and school location (e.g. Metropolitan/Non-metropolitan). Stratification does not necessairly require that the same sampling fraction is used within each stratum. If a uniform sampling fraction is used then the sample design is known as a proportionate stratified sample because the sample size from any stratum is proportional to the population size of the stratum. If the sampling fractions vary between strata then the obtained sample is a disproportionate stratified sample. The gain in precision due to stratification may be examined simple random sample design with a proportionate stratified design when the sample size in each design is the same.
by comparing the simple random sample
If we apply these two sample designs to the same population then, from the discussion of some theoretical considerations presented in Appendix A, we may write:
V(X prop)
where
V(icprop) is the
ified
simple
random
variance
sample
of the
design.
sample
mean
for the
proportionate
strat-
Evaluation
124
V(Xsrs)
Nh 'h
in Education
is the design
va riance
is the
size
of the
sample
of the hth stratum
is the mean
of the hth
x
is the
population
mean
N
is the
population
size
n
is the sample
size
mean
for
(h = 1,
the simple
random
sample
..,, L)
stratum
for both
designs.
This expression shows that the gain in sampling precision due to stratification depends on the magnitude of the differences between Xh and X. That is, gains can be made in precision by choosing strata which exhibit a large amount of variation between stratum means and therefore also exhibit a large amount of homogeneity with strata. The precision of proportionate stratified simple random sampling will always be greater than simple random sampling, for a given sample size, except when In this special case all stratum means are equal to the population mean. ) will be equal to V(zsrs). V(X prop The gains in precision discussed above have been concerned with the sample mean as an estimator of the population mean. Later in this report we will examine the influence of stratification on the precision of more complex estimators. In this report the proportionate stratified sample design (the STR design) was applied to the survey population described in Table 2.1 by dividing elements in the sampling frame into three strata representing the three different school systems: Government, Catholic, Independent. A sample of 150 elements was then selected with simple random sampling from each stratum so that the number selected from each stratum was proportional to the stratum size.
TABLE 2.3
Stratum
CaLculationa for the setection of elements within strata ior the disproportionate stratified scmpLe design (WTD design)
nh (rounded)
Nh
"h
Government Catholic Independent
1611 539 204
102.7 34.3 13.0
103 34 13
Total
2354
150
150
Probability of selecting each element = for the STR sample design
t
= z
Sample Design for Educational
Survey Research
125
Table 2.3 describes the calculations required for determining the number of elements required to be selected from each stratum. In this study we have N=2354 and n=150, therefore the number of elements, nh, required to be selected from the hth stratum, containing Nh elements, is were rounded for use in this study.
150.Nh. The values of nh "h =2354
DISPROPORTIONATE STRATIFIED SAMPLING AND WEIGHTING The simple random sample design is called a self-weighting design because each element has the same probability of selection equal to For this design each 8. 1. element has a weight ofF in the mean, 1 in the sample total, and F.=;in the population total, where f =x
is the uniform sampling rate for all population
elements (Kish, 1965). In a disproportionate stratified sample design we employ different sampling fractions in the defined strata of the population. The chance of an element appearing in the sample is specified by the sampling fraction associated with the stratum in which that element is located. The reciprocals of the sampling fractions, which are sometimes called the raising factors, tell us how many elements in the population are represented by an element in the sanple. At the data analysis stage we may use either the raising factors, or any set of numbers proportional to them, to assign weights to the elements. The constant of proportionality makes no difference to our estimates. However, in order to avoid confusion for the readers of survey research reports, we usually choose the constant so that the sum of the weights is equal to the sample size (Peaker, 1968). For example, consider a stratified sample design of n elements which is applied to a population of N elements by selecting a simple random sample of nh elements from the hth stratum containing Nh elements. of selecting an element is nh, N stratum is Jil . n
In the hth stratum the probability
and therefore the raising factor for this
Nh Nh That is, each selected element represents-elements "h
in the
population. The sum of the raising factors over all n sample elements is equal to the population size. If we have two strata for our sample design then: cn 2 elements +
+n 1 elements + Nl
‘“;
+ ... +- Nl "1)
+
(!z+ n2
*'
N2 *+$-N
_
126
Evaluation
in Education
In order to make the sum of the weights equal the sample size, n, both sides of the above equation will have to be multiplied by a constant factor of n . N Then we have:
+- “I elements +
+ n2 elements +
nNIL (Nn + .. . +~)+(~+... 1
nN2 + N”2)
= n
2
Nnh Therefore the weight for an element in the hth stratum is r. h For the special case of proportionate stratified sampling which was discussed in the previous section we haven = nh for each stratum. The sample element N Nh weight is equal to 1 and we therefore describe this design as a self-weighting design. In this report the disproportionate stratified sample design (the WTD design) was applied to the survey population described in Table 2.1 in order to obtain a disproportionate stratified sample of 150 elements. The stratification procedures used in the proportionate stratified design (the STR design) were also employed in this design. After dividing the sampling frame into three strata: Government, Catholic, Independent, a sample of 50 elements was drawn from each stratum. The selection of elements within each stratum was not carried out with simple radnom sampling for this design, instead a two stage selection procedure of two classrooms followed by the selection of 25 students within each selected classroom was applied to each stratum. This multistage sampling procedure is discussed more fully in the following section. Within each stratum, this sampling procedure ensured that there was an equal probability of selection for the elements.
Stratum
Nh
nh
Government Catholic Independent
1611 539 204
50 50 50
Total
2354
150
The probability of selecting an element for the WTD sample design
Raising factor
32.22 10.78 4.08
=
Weight 2.0531 0.6839 0.2600
50 r h
Table 2.4 describes the calculation of the weights for the WTD design. Following Peaker's recommendation, the sum of the sample weights over the selected elements was made equal to the sample size. The weights were rounded to two decimal places for use in the analyses described in Chapters 4 and 5.
Sample
CLUSTER
Design for Educational
Survey Research
127
SAMPLING
When data are gathered in educational survey research with a simple random sample design, the individual selection and measurement of population elements In order to reduce costs by minimising the geooften becomes too expensive. graphical spread of the selected sample, survey researchers often employ cluster sampling designs. Cluster sampling involves the division of the population of elements into groups or clusters which serve as the initial units of selection. Sometimes the selection of clusters as the primary sampling units is followed by selecting a simple random sample of elements within the selected clusters. When there is more than one stage of selection we refer to the sample design as a multistage sample design. The most simple form of multistage sampling is the simple two stage cluster sample design. The influence of the selection of elements in clusters on precision may be examined by comparing the simple random sample design with a two stage cluster sample design when the sample sizeineach design is the same. Consider a sample of n elements drawn from a population of N elements divided Select m clusters with simple random sampling and into equal sized clusters. then from each cluster select ii elements from each selected cluster by using simple random sampling. We we apply these A we obtain: V(Q)
where
two
= V(Rsrs)
sample
designs
to the
same
population
then
of the
sample
mean
for
the
above
mean
for
the
simple
from
Appendix
[I + (fi-l)R]
V(Fcl)
is the variance cluster design
V(Xsrs)
is the design
va riance
of the
sample
R
is the
ultimate
cluster
size
R
is the
coefficient
of intraclass
simple random
stage sample
correlation.
The above expression shows that the sampling accuracy of the simple two stage for a given ultimate cluster size, on the value cluster sample design depends, of the coefficient of intraclass correlation. A discussion of the statistical properties and the historical background of this coefficient is presented in At this stage we may briefly describe the coefficiAppendix A of this report. ent of intraclass correlation as a measure of the degree of homogeneity within clusters. When the elementary units within clusters tend to be similar with the intraclass correlation between elementary respect to some characteristic, units within clusters for that characteristic will be high. Conversely, if the elementary units within clusters are relatively heterogeneous with respect to the characteristic, the intraclass correlation will be low positive or, in very unusual situations, even negative (Hansen et al, 1953). In educational survey research R is generally positive for achievement measures That is, the homogeneity of students within schools with rewithin schools. spect to achievement is greater than if students were assigned to them at
128
Evaluation
in Education
random. The source of this homogeneity may be due to selective factors in grouping, to joint exposure to similar influences, to the effects of mutualinteraction, or to some combination of these three sources. It is important to remember that the coefficient ofintraclass correlationmaytake different values for different variables, different populations and different clustering units. Since R is generally positive for a wide range of characteristics concerning students within schools, we find that the precision of the simple two stage When cluster sample is less than for a simple random sample of the same size. contemplating the selection of clusters rather than elements in an educational survey research study, the researcher must balance the losses in precision due to clustering against the advantages of reduced costs arising from the selection and measurement of fewer primary sampling units. In this reoort two self-weighting two stage cluster sample designs were each applied to the survey population described in Table 2.1 to obtain samples of In the first of these designs the school was 150 elements for each design. used as the primary sampling unit (the SCL design), in the second design the classroom was used as the primary sampling unit (the CLS design). For both of these sample designs the primary sampling units differ greatly in size. If we choose the primary sampling units with simple random sampling then a self-weighting design would require the use of the same sampling fracUnfortunately by using .this procedure the tion within each selected cluster. final sample size would depend on which primary sampling units were chosen first. Since it was considered important to constrain each sample design used in the study so that each sample design selected exactly 150 elements, it was necessary to modify the simple two stage sample design described above. One method of obtaining greater control over the sample self-weighting design is to select the primary sampling proportional to size (PPS), and then select equal sized the selected primary sampling units. The following formula a PPS sample design:
Element probability This
formula
indicates
+Zrf)
simolifies
a given
element's
size and yet ensuring a units with probability ultimate clusters from
probability
of selection
in
x~~~~::Ii~z~iz:,x(,:%::t:l;i:"'~
to:
Element probability That will size
is, if we have equal sized ultimate clusters then the 'Element probability' be constant for all elements. Further, we have control over our sample according to the following formula:
Sample
size
=(!~Z&$f)
x (:':"::::,:;lected)
Sample Desrgn for Educati~~aJ
Survey Research
129
2.5 describes the calculations for the two self-weighting two stage cluster sample designs examined in this study. The first sample design (the SCL design) used schools as the primary sampling units while the second sample design (the CLS design) used pseudoclasses as the primary sampling units. In both sample designs six primary sampling units were selected with probability proportional to size; within each selected primary sampling unit 25 elements were selected by using simple random sampling.
Table
Note that the first stage of selection for the CLS and WTD sample designs required that the sampling frame be adjusted by the creation of some 'pseudoclasses' in order to ensure that each primary selection would contain at least 25 sample elements. The eight pseudoclasses which were created prior to the execution of the CLS and WTD designs are indicated by square brackets in Table 2.1. TASX
2.5
Sample design
SCL design CLS design
Calou7.uttansfor fhr scltxztiovz of cilmcnts for the too self-ucighting two stage cluntnr sample dcsigno (SCL design and the CLS design)
Primary sampling units (selected with probability proportional to size) 6 schools 6 classes
Secondary sampling units (selected with simple random sampling) 25 elements 25 elements
Sample size
150 150
Probability of selecting each element for both the CLS and the SCL sample designs
The WTD sample design, which was described in the previous section, employed the same sampling procedures within strata as the CLS design. Within each of the three strata two classes were selected with probability pro~ortjonal to size, and then a simple random sample of 25 elements was selected from each of the selected pseudoclasses.
SUMMARY The desired population in this report consisted of all first year students in the schools of the Australian Capital Territory in 1969. This population was distributed amongst 15 secondary schools: nine Government high schools, four Catholic high schools and two Independent high schools. This desired population was reduced, for the reasons presented in Chapter 1, to the survey population described by the sampling frame in Table 1.4 and Table 2.1. Five probability sample designs which are commonly employed in educational research were selected for examination in this report. The five designs, and a summary of their characteristics are listed in Table 2.6.
Evaluation in Education TABLE 2.6
Dasign
of the characteristics in this report
Swnnnry
used
N&or
of selection; stager
Stratification variable
of the five
Probability of element selection
smpla
designs
Element weight
SRS
Ofl@
None
Self-weighting
STR
One
School system
Self-weighting
SCL
Two
None
Self-weighting Self-weighting
EITD
Two
School systes
50 %
150Nh n54nh
_ __ The survey Population for all designs is described in Table 2.1. The sallplesize for all designs is 150 elements.
3. The Analytic
THE
CAUSAL
Model
MODEL
In order to examine the influence of sample design on sampling errors in educational survey research, it was decided to compare the influence of the five sample designs discussed in the previous chapter on a commonly used multivariate analysis technique. The choice of an analysis technique was constrained by the data which were available for the survey population (see Table 1.5), and also by the author's wish to apply these data to a reasonably realistic educational research situation. Given these limitations, a simple causal model was designed which required the calculation of a range of often used statistics: means, correlation coefficients, regression coefficients and multiple correlation coefficients. The selected causal model is presented in Fig.3.1. The causal model in Fig.3.1 contains four stages which assume the following causal sequence: antecedent student characteristics (SEX, FATHERS OCC) -+ attitudes toward school (LIKE SCHL) + expectations for further education (EXP EDN) + achievement in mathematics (MATHS). This causal sequence assumes that the causal flow in the model is unidirectional and therefore a variable cannot be both the cause and effect of another variable. The straight arrows in Fig.3.1 denote causal links between variables in the direction shown by the arrowheads. The curved bi-directional arrow indicates the possibility of noncausal correlation between variables. The analysis technique required for the evaluation of the magnitudes of the path coefficients between variables, according to certain special constraints which will be described below, has become known as 'path analysis' (Moser and Kalton, 1971). The causal order described above would, in a well designed research study, require to be supported by a cogent argument based on available theory and previous research. However, for the purposes of this report, the validation of the causal sequence will have no effect on our study of the implications of choice of sample design - provided that we accept that the model leads to a reasonably realistic representation of analysis techniques which are commonly used in educational research. The model assumes that the sex of a student (indicated by the variable SEX) and the socio-economic status of a student's home (indicated by the variable FATHERS OCC) are exogenous variables, that is their variability is assumed to be determined by causes outside the causal model. No attempt will be made to explain the variability of the exogenous variables. The mediating causal influences (indicated by the variables LIKE SCHL and EXP EDN) and the achievement criterion for the model (indicated by MATHS) are endogenous variables in which variability is explained by exogenous and/or other endogenous variables 131
Evaluation in Education
132
in the system. Since it is never possible to account for the total variability of a variable, residual variables (a,b,c) are introduced to indicate the effects of variables not included in the model. It is assumed that a residual variable is neither correlated with other residuals nor with variables in the model to which it is not attached.
Fig.
3.1
The
causal
mode 2
If we let Zi denote a standardised score on variable i then we may represent each endogenous variable by an equation consisting of the variables upon which it is assumed to depend. For each independent variable there is a path coefficient, pi, indicating the amount of expected change in the dependent variable as a result of unit change in the independent variable. ZL
=
'SL 'S + 'FL
ZE
=
PSE 'S + PFE 'F + pLE 'L + PbE ‘b
zM
=
PSM 'S + PFM 'F + pLM 'L + PEM 'E + pcM 'c
'F + paL 'a
The equations are:
Sample
By using
the
conditions
that
E(ZiZj)
= rij,
Design for Educational
E(Zf)
= 1, and
Survey Research
the
independence
133
of
residuals as described above, the above system of equations may be solved for the path coefficients (Kerlinger and Pedhazur, 1973). The solution of these equations demonstratesthat the path coefficients are equal to standardised partial regression coefficients.
THE
PARAMETERS
OF THE
CAUSAL
MODEL
Prior to examining the influence of the five sample designs on the statistics of the model were calcurequired to describe the causal model, the parameters The following discussion briefly lated from the complete population data. considers some of the more interesting facets of these analyses. The variables and their names (SEX, FATHERS OCC, LIKE SCHL, EXP EDN and MATHS) The results of applying the causal model to the are described in Chapter 1. population data are presented in Table 3.1 and Fig.3.2.
Variable LIKE SCHL
SEX
FATHERS OCC
FATHERS OCC LIKE SCHL EXP EDN MATHS
-0.0110 0.1491 -0.0969 -0.0761
-0.1424 -0.4148 -0.3749
0.3980 0.2200
0.5148
Mean Multiple correlation coefficient
1.4766
3.1252
21.3229
4.2616
29.3416
0.2051
0.5596
0.5465
EXP EDN
MATHS
When inspecting the path diagram in Fig.3.2 care must be taken to remember the coding conventions used to score the variables in the model. The variables LIKE SCHL, EXP EDN and MATHS were all scored in the 'usual' direction which assigns a high positive numerical value to a relatively high rating on the variable. For example a score of 50 (out of a possible 55 items) on the MATHS variable demonstrates that the individual has performed at a relatively high level. By convention FATHERS OCC has been coded in the 'opposite' direction to the above. That is a high positive score on this variable assigns a low relative rating on the scale of occupational prestige. For the variable SEX a coding convention has been imposed by coding 1 for males and 2 for females. All of the above coding conventions must be kept in mind when we are considering the sign and magnitude of the 'causal' paths between variables. The path coefficient pSL = 0.1476 is positive which suggests that scoring high on the SEX variable individual's
(that score
is being female) on the LIKE SCHL
will cause variable.
a corresponding increase in an However the path coefficient
134
Evaluation
in Education
pSE = -0.1560
is negative
variable
cause
will
and this
a decrease
suggests
in score
that
a similar
on the EXP EDN
change
in the SEX
variable.
0.8375
u
Path coefficier.ts calcuZated from population data
In Fig.3.3 the path diagram has been redrawn by omitting path coefficients which are smaller in lllagnitude than 0.1000 (this level has been arbitrarily chosen in order to clarify the diagram). The presentation of the causal model in Fig.3.3 suggests that the variables SEX and LIKE SCHL have no direct influence on MATHS achievement. These two variables influence MATHS performance by working through the mediating variable EXP EDN. The variable FATHERS OCC has both a direct effect on MATHS performance and an indirect effect by working through the variable EXP EDN. The variable EXP EDN has the strongest effect on MATHS performance. However this variable is influenced by three earlier variables in the causal sequence. By considering the size of the path coefficients for the residual variables (a,b and c) it may be seen that a considerable amount of variation is left unexplained by the model. It should therefore be remembered that this path model, and most other path models in sociological and educational research, greatly oversimplify the nature of causation in the real world.
Sample
Fig.
3.3
Design for Educational
Survey Research
Path diagram after omitting smal2 path coefficients
To summarise the causa 1 model presented in Fig.3.3 we may say that the higher the occupational prest i ge of the father of a student (which probably means the higher the educational and cultural climate of the home) the higher the mathematical achievement of the student. The jnfluence of the father's occupational prestige occurs direct y and also indirectly by encouraging a positive attitude towards school and an ncreased level of educational aspiration. The sex of a student has no direct link with mathematics achievement, although it has links through other mediating variables. These indirect influences work through a student's attitude towards school (with female students expressing a more positive attitude towards school which thereby leads to higher educational aspirations) and through a student's level .of educational aspiration (with male students having a higher level of educational aspiration).
4. The Comparison
STUDENT'S
of Sample Designs
EMPIRICAL
SAMPLING
METHOD
In the previous chapter the parameters of the causal model were calculated from the complete population data. In order to find the parameters of the model it was necessary to calculate means, correlations, standardised regression coefficients and multiple correlation coefficients. We now turn our attention to the sampling errors associated with the estimation of these parameters from samples of data obtained by using the five sample designs described in Chapter 2. For the simple random sample design (the SRS design) a set of computational formulae are readily available for the calculationof the standard errors of all the statistics which are required for the causal model (see Table 1.3). However, similar complete sets of formulae are not readily available for the other four sample designs (the STR, SCL, CLS and WTD designs).
In order to circumvent the problem of the unavailability of deductive methods for these complex sample designs we will turn to the empirical sampling methods which were employed by Student in order to establish the sampling distribution of the mean for simple random sampling. "Before I had succeeded in solving my problem analytically, I had endeavoured to do so empirically. The material used was a correlation table containing the height and left middle finger measurements of 3000 criminals, from a paper by W.R. Macdonell (Biometrika, Vol.1, The measurements were written out on 3000 pieces of cardboard p.219). which were then very thoroughly shuffled and drawn at random. As each card was drawn its numbers were written down in a book which thus contains the measurements of 3000 criminals in a random order. Finally each consecutive set of four was taken as a sample - 750 in all - and the mean, standard deviation, and correlation of each sample determined" (Student, 1908:13). Student used the 750 sample estimates to generate the sampling distribution of This distribution assisted Student to correctly guess the mathematithe mean. cal form of the t distribution - it was not until 17 years later that the guess was analytically verified by R.A. Fisher (Mosteller and Tukey, 1968).
136
Sample
THE
EMPIRICAL
GENERATION
OF SAMPLING
Design for Educational
Survey Research
137
IN THIS REPORT
DISTRIBUTIONS
Following Student's technique of developing sampling distributions, the five sample designs were each applied to the survey population, described in Table 2.1, on 25 independent occasions in order to obtain 25 independent samples of The data sets derived from these inde150 elements for each sample design. pendent replications were then used to estimate the parameters of the causal model. For each sample design there were 25 independent estimates obtained for each of the parameters which describe the causal model. The standard deviations of the sets of 25 estimates obtained from each sample design provided an estimate of the accuracy of the sample estimates obtained for each design. The accuracy of a particular sample design in estimating a particular population parameter was measured by using the simple random design as a standard. Kish (1965a), concentrating mainly on the discrepancies which arise because research workers in the social sciences tend to assume that all samples are has introduced the word 'Deff' (design equivalent to simple random samples, effect) in order to compare the efficiency of a complex sample design with the efficiency of a simple random sample design.
DESIGN
Kish
defined
the
EFFECTS
design
AND
effect
(for
THE
EFFECTIVE
a statistic
SAMPLE
such
SIZE
as the sample
mean
"the ratio of the actual variance of a sample to the variance of a simple random sample of the same number of elements"(Kish, 1965a:258). That
is, Deff
= v(n
)
Tg-) where
V(Yc)
is the
sample and
variance
of the
is the
variance
sample
of equal
size.
random
V(Rsrs)
sample
= N - n u-n
population
n is the
sample
and
S2 is the into =
a given
S2 n
drawn
mean
without
for
complex
of the
population
which
defines
elements. Deff:
a simple
replacement
size,
the expression V(T,)
sample
size,
variance
N-n -sN
of the
of elements
N is the
Deff
for
5'
where
Substituting
mean
design,
V(Fsrs)
For a simple
sample
random
we have
X) as:
138
Evaluation
or
in Education
V(Zc) = N - n N
lg,Deff n
Kish (1965a) established that s2 computed from any large probability sample yields a good approximation of S?. The approximation is quite accurate when Deff is near one; in other cases with smaller samples it neglects a term of order 1. By using an estimate of Deff, obtained mostly from past experience, ii and s* as an estimate of S2 the above equation may be used to obtain an estimate of the sampling error of the sample mean when complex sample designs are used. Sample designs have also been compared by using the concept of the 'effective sample size' (Kish, 1965a) or the 'simple equivalent sample' (Peaker, 1953, 1967b). From the above equation we have: V(E,) = N - n N.ii‘
S2 Deff
Now consider a simple random sample of n* elements drawn from the same population. Let the variance of the sample mean for this sample, V*(iisrs), be equal to the variance of the sample mean for the complex sample design, V(Zc). Now, for a simple random sample of n* elements drawn without replacement: V*(Xsrs) = N - n N'i;-
S2
But since V*(Xs,,) = V(X,) we may write N - n N'ii'
S? Deff = -. N - n* N
S2 F
If N is large compared to n or n* is the size of the simple equivalent sample (or the effecthen n* = _g_ Deff tive sample size). The value of the design effect and the size of the simple equivalent sample for the sample mean was calculated for each statistic obtained from the five sample designs. The value of the numerator in the equation which defines 'Deff' was estimated from the sample variance of the empirical distribution of means obtained from the 25 replications of each sample design. The value of the denominator was estimated from the sample variance.of the empirical distribution of means from the 25 replications of the simple random sample design. For statistics other than the sample mean thesame formula was used with the appropriate empirical variance estimates.
Sample Design for Educational
THE
The
restilts of these
This
tables
EMPIRICAL
SAMPLING
calculations
describes
values
DISTRIBUTION
for
sample
of mf
means
for each
139
Survey Research
OF MEANS
are displayed
of the sample
in Table
designs.
4.1.
Values
of JDeff in preference to Deff are presented because they are more meaningful when used in a discussion of sampling standard errors (as distinct from a discussion of sampling variance). We may rewrite the above expression which defines Deff in terms of standard error notation. SE(xc) The
= JDeff.SE(xSrS)
use of JDeff
has
also
design effect because m Kish and Frankel, 1970). Kish
and
Frankel
it is useful
(1970)
to average
been
preferred
is less
recommend a series
in the
subject
that
when
of values
presentation
to extreme
reporting of JDeff
of measures
values
many
(Kish,
design
instead
of the
1969;
effect
values
of consi dering
The values of mf are averaged for particular individual values. rather than variance measures because variance measures are subject ations due to differences in units of measurement and sample size.
statistics to fluctu-
different Averaging may only be undertaken over particular statistics because statistics exhibit systematically different values of the design eff ect. Similarly we only average over particular sample designs for particular samples because the design effect depends not only on the statistic considered and the sample design but also on the nature of the target population from which the data were obtained (Kish and Frankel, 1970).
Design SCL
Variable
SRS
STR
SEX FATHERS OCC LIKE SCHL EXP EON MATHS
1.00 1.00 1.00 1.00 1.00
0.87 0.80 0.83 0.97 0,91
2.53 1.43 1.15 1.40 1.28
3.27 1.76 2.29 2.73 3.95
1.72 2.58 1.77 3.70 4.69
Average JDeff for means
1.00
0.88
1.56
2.80
2.89
CLS
WTD
The values of average JDeff are presented in Table 4.1 for each of the five sample designs (SRS - the simple random sample design, STR - the proportionate stratified simple random sample design, SCL the self-weighting two stage cluster sample design which selected schools as the first stage of sampling, CLS - the self-weighting two stage cluster sample design which selected classrooms as the first stage of sampling, WTD - the stratified two stage cluster sample design which selected classrooms as the first stage of sampling).
140 The
Evaluation
in Education
of average JDeff in Table 4.1 may now be used to compare the accurthe four complex sample designs with the accuracy of a simple random The stratified sample design STR shows a gain in accuracy design. was theoretically demonstrated in Chapter 2) whereas the other complex designs show substantial losses in accuracy.
values
acy of sample (which sample
The potential danger of disregarding considering confidence limits which for
JDeff.
For
example,
if we
the design effect may be demonstrated by are calculated with and without adjustment
consider
the
probability
proportional
to size
selection of classrooms, the CLS design, the average value of JDeff is 2.80. Assuming a normal distribution the use of lt1.96 standard errors in two tailed tests allows for errors 0.05 of the time. But if we ignore the value of m this interval is really equivalent to kO.70 standard errors which allows for errors 0.48 of the time. Table 4.2 gives values of the probability of incorrect statements when two sided confidence intervals are aimed at for each of the average
values TABLE
of JDeff. 4.2
Design
Probability the size
of
oJw incorrect the design
Average JDeff
SRS STR SCL CLS WTD
is
statements ignored
about
smple
warn
lzhen
Probability of incorrect statement when a two sided confidence interval of p = 0.05 is aimed at 0.05 0.03 0.21 0.48 0.50
1.00
0.88 1.56
2.80
2.89
The magnitude of the design effect may also be considered by calculating n* the size of the simple equivalent sample. The simple equivalent sample is that simple random sample of elements drawn from the population which would have the From the derisame variance of the sample mean as the given complex sample. where n is the size of the vation given in this chapter we have n* =n complex sample. Deff
Design SCL
CLS
WTD
23 1:;
14 ii
181
77 92
20 10
51 23 48 11 7
194
62
19
18
Variable
SRS
STR
SEX FATHERS OCC LIKE SCHL EXP ED8 MATHS
150 150 150 150 150
198 234 218 159
Size for average /Deff
150
Sample Design for Educational
Survey Research
141
From the contents of Table 4.2 and Table 4.3 it can be seen how the use of clustered sample designs may destroy the validity of confidence limits if the value of the design effect is not taken into account. The influence of clustering is especially dangerous when data are gathered from intact classrooms, as in the CLS and WTD designs. accuThe stratified simple random sample design, STR, showed better sampling However, this design is presented racy than the simple random sample design. interest because the administrative difficulties assomainly for theoretical ciated with this design are such that very few researchers are able to apply it to the dispersed populations which are typically found in educational and sociological research. The magnitude of the design effect for the CLS design is consistently greater than for the SCL design. Since the two designs are the same except for the nature of the primary sampling unit, this suggests that the clustering effect of students within classrooms is greater than that of students within schools. This in turn suggests that the value of the coefficient of intraclass correlation, R, is greater for classrooms than for schools. Table
4.4
presents
rooms
and schools.
values The
of R calculated
table
also
from
includes
the population
values
data
of Jl+(ii-l)R
for
which,
classin
Appendix A, has been shown to be equivalent to the value of JDeff for means obtained from the simple cluster design under the assumptions of a one way random analysis of variance model. Since both designs have used a value of 25 for also with
the ultimate set the
cluster
at a value assistance
size,
of 25. of the
the
value
of JDeff
The calculation computer program
R
SEX FATHERS OCC LIKE SCHL EXP EON MATHS
Average of SbefTffor means in simple cluster design
0.33 0.11 0.06 0.04 0.05
has been
calculated
with
of R and mwere carried out INTRA (Ross and Slee, 1975).
Schools JDeff -estimate 2.99 1.91 1.56 1.40 1.48
1.87
R
Classrooms JDeff estimate
0.31 0.24 0.10 0.27 0.57
2.91 2.60 1.84 2.73 3.83
2.78
ii
142
Evaluation
in Education
The values of R demonstrate that for all variables, except SEX, there is a greater homogeneity of students within classrooms than students within schools. This clustering effect is especially noticeable for the MATHS variable which suggests that there has been some form of classroom ability streaming carried out within the schools. The average values of JDeff for the simple cluster design provide very accurate approximations to the similar SCL and CLS designs. This accuracy shows that provided we can obtain accurate estimates of R then the formula Deff = l+(n-l)R will provide a reasonably accurate method for estimating design effects for a variety of clustered sample designs. In the case of a stratified cluster design, as used in the IEA Science Project (Comber and Keeves, 1973), we would expect the formula to provide conservative estimates of error because of the gain in accuracy due to stratification. However, as can be seen from the mf values for the WTD design, if weighting is used to adjust for a highly disproportionate allocation to strata then the formula may provide an approximate but not a conservative estimate of the magnitude of the design effect.
THE EMPIRICAL SAMPLING DISTRIBUTIONS OF CORRELATION COEFFICIENTS AND STANDARDISED REGRESSION COEFFICIENTS The results of these calculations are displayed in Table 4.5 and Table 4.6. The STR design once again shows better accuracy than the SRS design for both correlation coefficients and standardised regression coefficients. The SCL design shows a minor loss in accuracy, however, the CLS and WTD designs provide average m values which show that errors would be greatly underestimated by the use of the simple random sampling distribution.
TABLE 4.5
Empirical
Coefficient
estimates of JDeff for corwlation
SRS
STR
Design SCL
coefficienE
CLS
WTD
'SF
1.00
1.11
1.33
1.95
1.12
'SL
1.00
0.73
1.03
1.57
1.04
rSE
1.00
1.14
1.76
2.18
1.81
'SM
1.00
0.97
1.13
2.42
2.27
'FL
1.00
1.06
0.87
0.81
1.81
'FE
1.00
0.69
1.12
1.01
1.70
'FM
1.00
0.97
1.12
1.36
3.03
'LE
1.00
1.04
1.02
1.44
1.68
rLM
1.00
0.84
0.89
0.82
1.30
'EM
1.00
1.11
1.23
1.75
2.74
Average of JDeff for correlation coefficients
1.00
0.97
1.15
1.53
1.85
Sample Design for E~ucatj~nal Survey Research
Design XL
CLS
WTD
SRS
STR
1.00
1.01
1.31
2.14
1.48
1.00
1.01
0.86
0.93
1.86
1.00
0.92
1.45
1.30
1.77
1.00
0.65
1.04
1.04
1.56
1.00
0.91
1.03
1.39
1.51
1.00
0.92
1.05
1.66
2.27
1.00
1.10
1.14
1.52
1.87
1.00
0.83
1.05
1.06
1.28
1.00
1.37
1.51
2.19
1.96
Average of JDeff for standardised regression 1.00 coefficients
0.97
1.16
1.47
1.73
Coefficient
143
The magnitude of the underestimation of error may be examined by considering Table 4.7. This table is developed under the distribution assumptions which were stated prior to the presentation of a similar table for means in Table 4.2. TABLE 4.7
Design
SRS STR SCL US WTO
Probability of incorrect statements about comeZation coefficients and standardized regression coefficients when the size of the design effect is ignored
Probability of incorrect statement when a two sided confidence interval of p = .05 is aimed at r b
Avera e m *
1.00 0.97 1.15 1.53 1.85
1.00 0.97 1.16 1.47 1.73
0.05 0.04 0.09 0.20 0.28
0.05 0.04 0.09 0.18 0.26
One of the most striking features of Table 4.5 and Table 4.6 is the similarity for particular sample designs of the average JDeff values for correlation coefficients and standardised regression coefficients. The closeness of these values suggests that for a variety of complex sample designs the researcher might use the average JDeff for correlations as a reasonably accurate and conservative estimate of the average mf efficients.
for standardised regression co-
144
Evaluation
in Education
Although the probability of making incorrect statements about correlation coefficients and standardised regression coefficients is less than that for means, Table 4.7 shows that there is still an unacceptably high danger of making errors especially when samples based on clusters of classrooms, as in the CLS and WTD designs, are used.
THE
The
EMPIRICAL
empirical
SAMPLING
estimates
DISTRIBUTION
of the
design
OF MULTIPLE
effect
CORRELATION
for multiple
COEFFICIENTS
correlation
co-
The small average values of m are efficients are presented in Table 4.8. similar to the values for correlation coefficients and standardised regression However, the values of the average coefficients for the STR and SCL designs. JDeff for the CLS design and especially the account when using the previously described presented in Table 4.9.
FABLE
4.8
Empirical estimates of /Deff
Coefficient
SRS
STR
WTD design must be taken into two sided probability statements
fornuitiple
correlation coefficients
Design SCL
CLS
WTD
RL
1.00
1.22
1.38
1.61
RE
1.00
0.78
0.87
1.02
1.96
RM
1.00
0.92
1.03
1.31
2.56
Average of JDeff for multiple correlation coefficient5
1.00
0.97
1.09
1.31
2.14
TABLE
Design
SRS STR SCL CLS WTD
4.9
1.91
Probability of incorrect statements about multiple correlation coefficients when the size of the design effectis ignored
Average JzTt;f7
1.00 0.97 1.09 1.31 2.14
Probability of incorrect statements when a two sided confidence interval of p = .05 is aimed at 0.05 0.04 0.07 0.13 0.36
Sample Design for Educational
Survey Research
145
In Table 4.8 there seems to be no consistent relationship between the size of t/Deff and the number of variables required to calculate the value of the multiple correlation coefficient. This result suggests that the complexity of the model may bear no relationship to the sampling stability of the multiple correlation coefficients. However, further research into this question is essential because of the growing use of recursive path models which not only contain 'composite variables' which are themselves constructed with regression equations (for example the measure of the socio-educational level of the home used by Comber and Keeves, 1973).
SUMMARY AND IMPLICATIONS FOR THE CAUSAL MODEL By using the empirically established values of the standard errors for the standardised regression coefficients (the 'path coefficients') we may summarise the implications of choice of sample design on the evaluation of our causal model. The use of the type of 'path analysis' techniques described in Chapter 3 requires that a test of significance be carried out on the magnitude of the 'path coefficients' in the model. The test of significance usually examines the null hypothesis that a path coefficient is equal to zero for the population data by setting up 95 per cent confidence limits for the magnitude of the path calculated from sample data. The following table presents for each sample design the absolute magnitude of each of the path coefficients of the causal model which are required to reject the null hypothesis at the 95 per cent confidence level. TABLE 4.10
Path Coefficient
Mean
The absolute magnitude of the path coefficients required to reject the hypothesis'that a path coefficient is equal to zero for the population data
Absolute magnitude of path required to reject null hypothesis SRS
STR
SCL
CLS
WTD
0.14
0.14
0.18
0.29
0.15
0.15
0.13
0.14
0.28
0.12
0.11
0.17
0.15
0.21
0.20
0.16
0.10
0.17
0.17
0.26
0.14
0.13
0.14
0.20
0.21
0.13
0.12
0.13
0.21
0.28
0.14
0.15
0.16
0.22
0.26
0.16
0.14
0.17
0.17
0.21
0.11
0.15
0.16
0.23
0.21
0.14
0.13
0.16
0.20
0.24
146
Evaluation
in Education
From Table 4.10 we see that the use of sample designs which depart from the model of simple random sampling may greatly influence tests of significance The mean values of the absolute magnitudes of the paths for path coefficients. required to reject the null hypothesis demonstrate that the dangers associated with the underestimation of sampling errors through the misuse of computational formulae for simple random sampling are most evident for those sample designs which employ classrooms as the primary sampling unit.
5. The Estimation of Sampling
Errors
from Sample Data
RANDOM
SUBSAMPLE
ERROR
ESTIMATION
TECHNIQUES
Because of the unavailability of deductive methods to calculate the sampling errors of multivariate statistics from complex samples, researchers have turned These methods employ subsampling, splitting or towards empirical methods. 'independent replications' of the replication in order to generate multiple The estimates which are produced by these replicachosen analysis procedure. tions are then used to generate estimates of sampling error. The historical development of the use of random subsample techniques for the purpose of estimating sampling errors has been traced back to P.C. Mahalanobis' introduction in 1936 of interpenetrating samples for agricultural surveys in Acting on some suggestions from J.W. Tukey, Deming Bengal (Finifter, 1972). further developed the technique as the Tukey plan (Deming, 1960). This plan is executed prior to data collection by constructing a systematic sample of each one-tenth the size of n, based on ten size n by drawing ten subsamples, The data are collected as ten indepenindependent starts within the listing. dent samples which are drawn with ten repetiiions of the chosen sample design. The variance of a statistic, say the sample mean, may be estimated as: k 1(X+)* ;(:) where
and
= ;(k-1) K is the estimate xi is the
estimate
k
number
is the
for for
the the
subsamples
combined,
i'th subsample,
of subsamples
(ten
for
the Tukey
plan).
The variance estimated by this procedure refers to the variance of the mean of the subsample estimates and not to the estimate that might be prepared from the whole sample. However, in the special case when the estimates are linear in the individual observations the mean of the subsample estimates and the total sample estimate are equivalent. The use of Deming's method also permits the estimation of sampling bias (Finifter, 1972). The estimation of bias is carried out through a graphical approximation technique suggested by Tukey to Jones (1956) which is based on the assumption that the bias is inversely proportional to sample size. Deming's method presentJsevera1 difficulties for sample designs in educational research which are typically stratified, clustered, and restricted in size because of administrative and cost considerations. 147
148
Evaluation
in Education
First, for statistics more complicated than weighted averages it may not be meaningful to calculate results from such small amounts of data that may arise from one-tenth of the total number of observations (for example, in order to fit a regression line Kerlinger and Pedhazur, 1973, suggest at least 100 to 200 observations, and if large numbers of variables are being used then more than 200 would be required): Therefore if a substantial amount of data is needed in each subsample, the number of possible independent groups may be severely restricted. Second, many statistics based on small samples give biased estimates, typically the leading term in the bias is proportional tolwhere n is the sample size. Consequently the mean of results based on severa? small subsamples to be more biased than is a single result based on all the sample (Mosteller and Tukey, 1968).
is likely data
Third, to achieve sufficient stability a large number k of independent subsamples is needed. However, a large k sacrifices the intended computational simplicity and the full amount of stratification desired in many designs (Finifter, 1972). Within a stratum of a stratified cluster design there may be too few primary selections to allow the total sample to be divided into a large number of subsamples. Fourth, rare characteristics in the appearing in some of the subsamples (Deming, 1956).
total sample may have little chance of if a large number of subsamples are used
In order to surmount the problems associated with the use of Deming's approach for the type of complex sample designs which are commonly used in educational research, researchers have begun to show an increasing interest in several other subsample replication techniques: balanced repeated replication (McCarthy, 1966) and jackknifing (Mosteller and Tukey, 1968).
BALANCED
REPEATED
REPLICATION
This technique was developed by McCarthy (1966, 1969a, 1969b) to permit variance estimates to be made from sample designs which featured the maximum amount of stratification possible (two primary selections per stratum) and yet still permitted variance estimates to be computed from the sample data alone (Kish, 1965). The population is divided into h strata, the primary sampling units in each stratum are divided into two random halves of equal size. Then a primary sampling unit is selected from each half stratum. A half-sample replicate is formed by randomly choosing one of these primary sampling units for each stratum. The number of possible half samples which can be drawn from the sample data is 2h. Variance estimates are then computed from the squared difference between the total sample estimate and the half-sample replicate estimate. McCarthy's (1966) contribution to this technique was to develop a method choosing a subset of half-samples which contained all of the information was available in the total set of half-samples.
for which
Sample
Design for Educational
149
Survey Research
Kish and Frankel (1970) point out that this technique is suitable for generating sampling errors for a wide variety of multivariate statistics provided the variance of repeated paired replicates is a good estimate of the variance of the statistic based on the entire body of sample data. That is, if we are considering a statistic such as the weighted mean then provided:
E [U;
- Y,‘]
where
y is the estimate
and "3 then
= V(Y)
IS the estimate
v^(g) = 1 kj
is a good
k 1
estimate
based based
on the on the
total jth
data,
subsample,
(y;* - y)' J of V(p),
where
k is the
number
of half
sample
estimates.
The above conclusion was examined by Simmons and Baird (1968) who carried out an empirical investigation of the application of balanced repeated replication techniques for non-linear statistics such as regression and correlation coefficients. They concluded that provided the mean of the replicate statistics was closely equal to the corresponding statistic in the parent sample then nonlinearity would not greatly disturb the accuracy of the technique. A more detailed discussion of the technique is presented in Appendix A.
JACKKNIFING
The jackknife technique may be traced back to a method developed by Quenouille (1956) to reduce the bias of estimates. Estimates of parameters are made on the total sample data, and then, after dividing the data into groups, the calculations are made for each of the slightly reduced bodies of data which are obtained by omitting each subgroup in turn. Let yi be the estimate subgroup
and
let yall
based
on the
data
be the estimate
which
based
Tukey (Tukey, 1958; Mosteller and Tukey;1968) (i = 1, . . . k) based on the k complements.
y'! = ky 1
He also
defined
Y *=The
variance
;
all
remains
on the
after
total
defined
omitting
sample
the
ith
data.
k 'pseudovalues'
y?
- (k - 1) Yi
the
'jackknife
value'.
!~,t i ’ Sag of the jackknife
k(k-1)
value
may
be obtained
from
the
pseudovalues,
150
Evaluation
in Education
Tukey (1958) set forward the proposal that these pseudovalues could be treated as if they were approximately independent observations and that Student's t distribution could be applied (Mosteller and Tukey, 1968) to these estimates to construct approximate confidence intervals for y or y* (Brillinger, 1966). Later emoirical work bv Frankel (1971) has substantiallv validated these oroposals for both the jackknife technique (and also the balanced repeated replication technique) for a variety of regression-related statistics. The jackknife procedure has been applied to several large cross-national educational research studies (Peaker, 1967b; 1975) conducted.by the International Association for Educational Achievement. A more detailed discussion of the technique is presented in Appendix A.
APPLICATION OF THE TION TECHNIQUE AND
BALANCED REPEATED REPLICATHE JACKKNIFE TECHNIQUE
In the previous chapter the design effect was calculated for a variety of ._ statistics obtained from complex sample designs by the empirical generation of sampling distributions. The researcher is generally unable to apply this approach when working with social science data and therefore requires suitable In this techniques for estimating the design effect from a single sample. chapter the sampling error estimation techniques of jackknifing and balanced repeated replication were each applied to one sample design in order to examine the accuracv with which thev could be used to obtain measures of the desian The jackknife tech';lique was applied to one sample randomly selected effect. from the 25 independent replications of the CLS design. The balanced repeated technique was applied to one sample randomly selected from the 25 independent replications of the WTD design. Each technique was applied to a sample design which was the most appropriate for the particular statistical features of the technique. The balanced repeated replication technique is derived on the assumption of two primary selections from each stratum of a stratified population and therefore this technique was The jackknife technique is derived on most suited to the WTD sample design. the assumption that the sample may be split into a number of subgroups which identically follow the design of the original sample and therefore was most suited to the CLS design.
EXAMPLE
OF CALCULATIONS
FOR THE
BALANCED
REPEATED
REPLICATION
TECHNIQUE
The balanced repeated replication technique was applied to one (randomly selected) sample from the 25 available independent replications of the WTD As described in Chapter 2, this sample design consisted of three design. strata which were obtained by the preliminary stratification of the population according to the 'school system' variable. Within each stratum two 'pseudowere selected with probability proportional to size; within each classrooms' selected pseudoclass a simple random sample of 25 elements was selected. The
sample
chosen
for
this
example
may
be represented
diagrammatically:
Sample
Design for Educational
Classroom
Stratum
The
notation
Cij
refers
Cl1
Cl2
Catholic
c21
c22
Independent
c3*
c32
ultimate
cluster
151
number
Government
to the jth
Survey Research
of 25 elements
selected
From the discussion presented in Appendix A we must now from the ith stratum. form four sets of half-samples based on the method presented by Plackett and Burman (1946). The allocation of the selected ultimate clusters to the half-samples is presented in Table 5.1. This allocation follows the example provided in discussion of balanced repeated replication in Appendix A. TABLE 5.1 Allocation of ultimate clusters to half-samples for the WTD design
Half-sample
Stratum Government
Catholic
Independent
Cl1
c21
c31
2
Cl1
c22
c32
3
Cl2
CZ?
c3 1
4
Cl2
cz 1
c32
1
In this example calculation we will consider the sampling error of the sample mean of the MATHS variable - the criterion variable for the causal model. This variable is an indicator of the mathematical ability of the students in our survey population. Table 5.2 presents the results of the calculations. The notation follows the discussion presented in Appendix A. Each half-sample consists of three ultimate clusters of 25 elements as described in Table 5.1. From Table 5.2 we see that the balanced repeated replication estimate of the standard error of the sample mean for this particular WTD sample of 150 elements is 5.30. Since Kish (1965) has established that s2, the sample variance, computed from any large probability sample yields a good approximation to S2, the population variance, we may estimate the standard error of the sample mean for a simple random sample of elements (of the same size) as For this sample Ji%' our estimate of this term is 0.82 (since for this particular sample s = 10.10).
152
Evaluation
in Education
TABLE 5.2 Balanced repeated repZication calculations required for estimating the standard error of the sample macroobtained from one sanpte obtained by using the WTD design
Half-sample values 25.5190
rl* 3;
24.2820
jf
34.6421
7;
36.1714
Total sample value 30.1537
7 Difference squares (Y: - Y?
21.4804
G;
34.4769
- Y?
(7: - YJ2
20.1457
(Y,*- Y?
36.2127
Mean of the difference squares 28.0789
V(Y) Estimate of standard error
5.2990
V(Y)
We combine these two calculations into an estimate of JDeff for the mean of the indicator MATHS associated with the WTD sample design.
Similar calculations be carried out.
for the
In each case we employ
the
other
formulae
statistics
presented
used
in the
in Table
causal
sample
model
1.3 to estimate
may
also
the
denominator of the equation which defines JDeff - the standard error of the It may seem strange statistic .Inder the conditions of simple random sampling. to some researchers that we can obtain reasonable estimates of the value of the standard errors of statistics under conditions of simple random sampling from sample data gathered with complex probability sample designs provided the This becomes a little more intuitively sample size is sufficiently large. obvious when we remember that a simple random sample consists of one of all possible different combinations of n different elements out of N N (N-ni!n! such that Therefore
each combination a self-weighted
has the same probability of selection (Kish, 1965). (or properly weighted) complex probability sample
Sample Desgn
for Educational
Survey Research
153
from a large population represents one of the possible simple random (or a good model of one of the possible simple random samples) which selected from the same population (Kish, 1965).
samples may be
The
statistics
calculations
described
in Table
5.2 were
carried
out
for
all
the
required to describe the causal model. The average values of m for the means, correlations, standardised regression coefficients and multiple correlation coefficients are presented in Table 5.3 Beside each of the values of average Jm the values of average JDeff obtained from the empirical sampling techniques described in Chapter 4 are presented. In order to summarise the extent to which the balanced repeated replication technique was able to estimate the empirical value (taken as the 'correct'value), a list of percentage error was also prepared.
TABLE 5.3
Ealmced replication estimates of average JDeff
Statistic
Average m Balanced repeated replication estimate
Empiricaf estimate
5 10
4.12 1.66
2.89 1.85
9
1.63
1.73
5.8
3
1.20
2.14
43.9
Means Correlation coefficients Standardized regression coefficients Multiple correlation coefficients
The magnitude when The
the
of the
number
convergence
percentage
of estimates
error
used
of the estimate
The jackknife 25 available described in tion of six sample of 25
in average
to calculate
of average
lished value shows that care should obtained as individual estimates.
EXAMPLE
Percentage error
Nunber of estimates
OF CALCULATIONS
average
/l%%=
be taken
FOR
JDeffis
THE
Jnis
towards
in using
JACKKNIFE
shown
42.6 10.3
to be smaller increased.
the empirically
~'Deff values
which
estabare
TECHNIQUE
technique was applied to one (randomly selected) sample from the independent replications of the CLS design. The CLS design, as Chapter 2, consisted of a probability proportional to size selec'pseudoclassrooms' followed by the selection of a simple random elements from each selected pseudoclass.
From the discussion presented in Appendix A we form six reduced samples of five ultimate clusters such that each reduced sample follows the design of the original sample. Each reduced sample is obtained by leaving out one of the ultimate clusters. The allocation of the selected ultimate clusters to the reduced samples is presented in Table 5.4. The notation Cj refers to the jth ultimate cluster of 25 elements selected from the population. There are six ultimate clusters (C,, C,, C,, C,, C,, C,) for each replication of the CLS design.
154
Evaluation
in Education
TABLE 5.4
Allocation of ultimate clusters to redil~cd scnnples ;br the CLS design
Reduced samples
Ultimate clusters
1
Cl
L
c3
c,
2
C!
C,
c3
C,
3
Cl
CZ
C3
4
Cl
C?
5
C!
6
Total sample
Cl
C5 c, c5
c,
G
CL>
c,
CA
C4
C5
c,
C_,
C3
G
C5
c,
CZ
c3
C4
C5
Cf>
For this example of the application of the jackknife technique we will again Table consider the sampling error of the sample mean of the MATHS variable. 5.5 presents the results of these calculations. The notation follows the discussion presented in the Appendix. From Table 5.5 the jackknife estimate of the standard error of the mean for this particular CLS sample of 150 elements is 4.41. Also, as described oreviouslv, the standard error of sample mean for a simple random sample of (since for this particular sample elements (of the same size) is k. = 0.99 s = 12.14). By combining these two calculations we obtain the jackknife for the sample mean of the indicator MATHS associated with design:
estimate of 6% the CLS sample
The calculations described in Table 5.5 were carried out for all the statistics These calculations are summarised in required to describe the causal model. This table contains a list of percentage error for the jackknife Table 5.6. estimates
of average
JDeff.
The jackknife estimates of average JOeff followed the pattern for the balanced repeated replication technique by converging empirically was larger. By comparing
established
the
values
percentage
of average
errors
of the
mwhen
two
of the results towards the
the number
techniques
we
of estimates
see that
the jack-
knife technique consistently provided more accurate estimates of average mf While it is difficult to make meaningful comparisons of percentage error because each lechnique was applied to a different sample design, this increased accuracy of the jackknife estimates being based on six reduced samples from the CLS design while the balanced repeated replication estimates were based on only four half-samples.
Reduced sample values 28.592
Yl
27.440
Y2 Y, Y4 Y, Y6
24.992 31.096 28.144 30.256
Total sample value 28.420
Ya11 Pseudovalues *
27.56
y2* * Y;
33.32 45.56
YQ*
15.04
ys*
29.80
Y6*
19.24
Y1
Jackknife value 28.42
Y*
Variance of the jackknife value 5*
2
19.4576
Estimate of the standard error s*
4.4111
TABLE 5.6 Jackknife estimates
Statistic
Means Correlation coefficients Standardised regression coefficients Multiple correlation coefficients
of average mf
Number of estimates
Average JDeff Jackknife tmpiricaT estimate estimate
Percentage error
5 10
3.09 1.63
2.80 1.53
10.4 6.5
9
1.53
1.47
4.1
3
1.44
1.31
9.9
156
Evaluation in Education
SUMMARY
In this chapter the jackknife and balanced repeated replication techniques were each applied to different samples obtained from the WTD and CLS sample designs. The techniques provided useful estimates that a sufficiently large number of estimates were
of average m provided used to establish average
JDeff. The convergence of the estimates established values as the number
of average JDeff towards the empirically of estimates are increased emphasises the
possibility of the instability of individual obtained from the two techniques.
estimates
of JDeff
which
may
be
There appear to be no published results available concerning questions about the degree of stability of the jackknife and balanced repeated replication estimates of sampling errors obtained in educational survey research studies. A research study which is being carried out by the Australian 'Council for Educational Research aims to provide further information about these questions in the near future.
6. A Worked
Example
The choice of a suitable sample design for an educational survey research study is rarely a free choice. The researcher usually designs a sample which not only provides appropriate data for answering the research questions posed in the study, but is also within the financial and administrative resources available for the study. The 'best' sample design for a study is therefore the design which optimally satisfies the particular set of constraints which are placed on the study. In this chapter we will consider the pattern of reasoning arrive at a sample design which satisfies the constraints educational survey research study.
THE
HYPOTHETICAL
Let us assume that the Australian study which seeks to establish:
government
which is required of a hypothetical
STUDY
has
commissioned
an evaluation
(i) the proportion of Australian 14 year-old students who can master the items on a criterion referenced test of basic mathematical skills; (ii) a list of schools (to be used in later case studies) which contain either an unusually high proportion of students who have mastered 80 per cent of the items on the test, or an unusually low proportion of students who have mastered 80 per cent of the items on the test.
CONSTRAINTS
The following list presents choice of sample design for
ON THE
(ii) Desired population - the year-old students in Australian
the selected on the test.
schools
who
STUDY
the major constraints which the hypothetical study.
(i) Financial - the government possible costs for the study.
(iii Statistics required Popu 1 ation who can master
HYPOTHETICAL
has
provided
$100,000
desired population secondary schools
will
influence
to cover
consists in 1976.
our
all
of all
14
- estimates of the proportion of the desired the items on the test - estimates of the proportion of students in have mastered at least 80 per cent of the items
157
to
158
Evaluation
(iv)
in Education
Units of analysis - students and schools.
(v) Error requirements - the standard error of the estimates of the proportions of students in the desired population who can master the items on the test should not exceed 0.01. That is, in estimating the percentage of students who can master the items in the desired population we require 95 per cent confidence limits of +2% surrounding the sample estimates - the estimates of the proportion of students in the selected schools who have mastered at least 80 per cent of the items on the test should.be sufficiently stable to compile a meaningful ordered list of schools on the basis of these proportions. (vi) Domains of the study - there are no specific domains {that is, there are no parts of the desired population for which separate estimates are to be planned for in the sample design). (vii) Sampling frame availability - a current sampling frame is available which lists the number of students in the desired population associated with each school in Australia.
THE SAMPLE DESIGN The financial resources of the study place the first restriction on our sample design. In previous studies of this type it has cost around f10 for each student tested (Bourke and Keeves, 1977). Therefore our government grant of $100,000 places an upper limit of 10,000 students who can be tested for this study. We must now consider whether we can meet the error requirements for the study given this constraint on the maximum sample size. If we were to select a simple random sample of n students from the population in order to estimate the proportion p who can master an item on the test, then the standard error of this estimate could be estimated by the following formula (Kish, 1965:46):
se(p) =
rlE&
Note that because of the sufficiently large value of the population size we have ignored the finite population correction. of our stated error requi~ments is that the standard error of the estimated proportions of the population who can master the items on the test should not exceed 0.01. Therefore for a simple random sample design we would require that: One
O_Ol>,
P(l-P) 4 -ii--srs
The maximum value of p(l-p) occurs for p=O.S, and thus to ensure that we could satisfy the error requirements for all items we would require that:
Sample
Design for Educational
159
Survey Research
c0.25
O.Ol>,
nsrs
or
n srs
> 2500
This sample size is well within our financial resources, however, the a simple random sample would not allow us to satisfy the second error ment of selecting sufficient students per school to obtain reasonably estimates for the compilation of an ordered list of schools.
use of requirestable
In order to satisfy the second error requirement we will need to construct a two stage sample design by selecting schools first and then sampling students within schools. Therefore we must decide how many schools and how many students within schools must be selected to provide estimates of proportions which are at least as accurate as a simple random sample of 2500 students. From
Appendix
A we
know
that
for
a complex
sample
design
of nc elements
the
to "c , where n* is the size of the simple equivalent F Also from Table 4.4 we know that for a two stage cluster sample which sample. used schools as the primary sampling unit the design effect for a sample mean is approximately equal to l+(ii-l)R, where n is the ultimate cluster size and R is the coefficient of intraclass correlation. For our sample design we will therefore have:
design
effect
“c = 1
+ (fi
is equal
- 1)R
n*
or nc = Ti m = n* [I t (ii - l)R]
where
m = the
number
of schools
The above formulae, which have applied to proportions because sample mean in which the cases 1 for a correct response to an 1965). Table
6.1
summarises
the
values
in the
complex
sample.
been derived for sample means, may also be sample proportions are a special case of the in the same can be assigned two possible va?ues: item and 0 for an incorrect response (Kish,
of nc and m, for
which would be required to obtain sample of size 2500 when p = 0.5.
several
values
of R and
n,
the same standard error as a simple random These values of nc estimate the minimum
sample size for two stage sampling which is required to satisfy the first error constraint. Note that the values of m have been rounded to integer values. From Table 6.1 we can see that the size of the complex sample that is equivalent to a simple
the value of R has a very strong influence on design which is required to have a precision random sample of size 2500. However, our main
160
Evaluation in Education
dilemma, which is common to all survey research which employs cluster sample designs, is to know the magnitude of R before we collect our data. Generally we resolve our problem by examining values of R from previous surveys which have examined similar variables with respect to similar populations by using similar sample designs.
ii (Number of students per school)
m (No. of schools) 475 363 325 306 295 207
R =O.l "c
(Complex sample size} 4750 7250 9750 12250 14750 17250
illD% (No. of schools) 700 600 567 550 540 533
R 10.2
n
(Comp:;::fample
7000 12000 17000 22000 27000 32000
Table 4.4 provides an estimate of 0.05 for R for the sample mean of a mathematics achievement test using schools as the clustering unit. Unfortunately the population from which this estimate was obtained is more limited in coverage and younger than the desired population required for this study. Also the R value in this table is associated with total scores on a mathematics achievement test rather than scores on individual items for a test of basic mathematical skills. Let us assume that the estimate of R in Table 4.4 is the only one which is available because of the absence of previous similar studies. In this case we could estimate a value of R equal to 0.1 for proportions calculated from our design (remembering that a proportion calculated for students is the sample mean of a variable which takes only two values: zero or one) which is a similar but slightly more conservative value so that we can be on the 'safe' side with the precision of our final estimates. If we estimate that R = 0.1 then, remembering that the upper limit for our sample size is 10,000, we see from Table 6.1 that a suitable sample design would consist of 325 schools with 30 students selected from each school giving a total sample size of 9750 students. A more precise sample design which is within our financial resources and that would provide a complex sample size of 9990exactly would be to select 333 schools with 30 students per school. From our estimate of R = 0.1 we have therefore deduced that in order to satisfy the first error requirement we could use a two stage cluster sample design which selects 333 schools at the first stage and then selects 30 students per school at the second stage. The second error requirement requires that we have sufficient students per school in order to establish reasonably stable estimates of the mean school performance. This second requirement raises the question of whether 30 students per school will provide sufficiently stable estimates for schools. The answer to this question is that we have no choice but to accept this figure if we wish to remain within the limits set by the
Sample
Design for EducatIonal
Survey
Research
161
financial constraint of a total sample size yhich does not exceed 10,000 students. The above discussion highlights the essential nature of sample design for educational survey research: the chosen sample design is often not the best possible design required to answer all the key questions posed in the study, rather it is usually the sample design which allows the researcher to answer as many of the key questions as possible given the finite set of resources which are available for the study.
THE EXECUTION OF THE SAMPLE DESIGN The arguments presented in the previous section were based on a functional relationship between the design effect and the value of the coefficient of intraclass correlation for the simple cluster design. In most survey research studies we are able to improve the accuracy of our sample design by using stratification. The use of wise stratification will therefore ensure that our prior estimates of precision, based on the simple cluster design which has no stratification, can be achieved. The information available for our sampling frame consists of a list of the number of students in the desired population associated with each school in Australia (see constraint (vii)). Previous studies have shown that there is considerable variation in achievement in mathematics achievement between Australian states and territories, and also between types of school within these states and territories. We thus have two possible candidates for our stratification variables: 'State/Territory' (which consists of the categories of New South Wales, Victoria, Queensland, South Australia, Western Australia, Tasmania, Australian Capital Territory, Northern Territory) and 'Type of school' (which consists of the categories of Government, Catholic, Independent). Let us assume that our list of schools provides only the name and address of the schools containing the desired population - then we would be unable to reliably discern which schools are Government, Catholic or Independent schools. Therefore despite the potential usefulness of this variable we are constrained to use only 'State/Territory' as a stratification variable because of a lack of information about 'Type of school' in our list of schools. We therefore now proceed to divide our list of schools according to State/ Territory. The resulting sample frame contains the eight strata described in Table 6.2. This table also describes the designed sample which is selected according to the procedures outlined in the following discussion. Since there are no separate domains of study we may use the self-weighting two stage cluster sample design (described in Chapter 2) in order to select our sample. This design requires that the schools be selected with probability proportional to their size with respect to the number of students in the survey population, followed by a simple random sample of students within the selected schools. The usual technique for selecting a probability proportional to size selection of schools from a sample frame is to use a 'lottery' method described by Rosier and Williams (1973). Each school is allocated a number of 'tickets'
162
Evaluation
in Education
equal to the number of students in the survey population in the school. 333 schools are required for our sample, it will be necessary to choose 'winning tickets'. TABLE 6.2 Swnmary of the survey population and the designed sampie
Survey population Schools Students
State/Territory
Since 333
Designed sample Schools Students
New South Wales Victoria Queensland South Australia Western Australia Tasmania Australian Capital Territory Northern Territory
594 580 286 182 184 91
84894 66550 38106 24152 20842 8290
115 89 51 :; 11
3450 2670 1530 990 840 330
::
3309 1275
5 1
150 30
Total
1950
247418
333
9990
The ratio of the number of tickets to the number of winning tickets is 247418/ Therefore approximately every 743rd ticket is a winning ticket 333 or 743. which represents a school selected into the sample. by using a random start-constant interval These winning tickets are selected from 1 to 743 is selected from a procedure. A random number in the interval _.. table of random numbers and a list of 333 numbers is created by adding successive increments of 743. This list of numbers is used to select the sample schools by comparing these winning ticket numbers with a cumulated tally over schools of the numbers of students in the survey populat ion. Consider the following example based on the first few en tries in the cumulative Assume that a random start of 100 is selected from the table of tally table. random numbers. The winning tickets would be 100, 843, 1586, 2329 . . . etc. From Table 6.3 we see that the first two selected schools, winning tickets 100 and 843, are School B and School G.
corresponding
TABLE 6.3 Hypothetical cumulative tally table for the survey popia~ion
School A El C 0 E G etc.
Population size
Cumulated tally
50 200 50 300 150 50 250 etc.
2:oo 300 600 750 800 1050 etc.
Ticket numbers l- 50 51-250* 251-300 301-600 601-750 751-800 801-1050* etc.
to
Sample
Design for Educational
Survey Research
163
The schools in the cumulative tally table are grouped according to their State/ Territory which provides an implicit stratification for the selection process. That is, while there is not a strictly independent selection of schools from each stratum, the random start-constant interval method of selection ensures a reasonably accurate proportional distribution of schools and students across the implicit strata.
WEIGHTING The probability of selecting a student in a given school from the survey population is: Probability of selecting a student
=
30 333 ' 247418
=
-30 743
That is, each student in the survey population has an equal chance of selection and therefore for between-student analyses our sample is a self-weighting sample. Although our sample design is self-weighting for between-student analyses, this is certainly not the case for between-school analyses. The probability of selecting a given school from the survey population is: Probability of selecting a school =
School size 743
That is, the probability of selecting a school is directly proportional to the number of students in the school who are in the survey population. The above calculations show that we may conduct unweighted analyses in order to examine the first set of key questions associated with the proportion of students who are able to master the test items. However, since the schools in the sample did not have an equal probability of selection, we must emphasise the need for care in interpretation when reporting the ordered list of schools required for the second set of key questions. That is, in some circumstances, in order to generalise from the characteristics of the sample schools to the
164
Evaluation
in Education
schools in the survey compensatory weights.
population
it may
be necessary
to calculate
suitable
The problem of weighting data for between-schools analyses requires a firm If ‘we weight each school in our sample design definition of the word 'school'. with a weight which is inversely proportional to the 'school' size, then we are able to represent the survey population of 1950 schools with our sample. However this weighting scheme often leads to problems of interpretation of results when there are, for example, a greater number of small schools compared In this situation we may create to large schools in our survey population. confusion in the minds of the readers of our research report because statements about 'half of the schools' in the study may only concern a very small percentage of students in the survey population. An alternative weighting strategy has been put forward by Peaker (1973) as being suitable for sample designs which select schools with probability proporPeaker suggests that unit weights should be applied to the tional to size. selected sample schools because probability proportional to size sampling followed by the selection of equal sized ultimate clusters amounts to sampling not the pieces being of equal size, so that the 'schools' but 'pieces of schools', be taken as appropriate weight for each piece is the same, and may therefore unity.
THE CALCULATION
OF SAMPLING
ERRORS
Having designed our sample around the particular set of constraints given for the study, we should be prepared to provide sampling error estimates for For the estimates of proportions based on statistics provided for the study. the total sample of data there are suitable formulae available (see Yamane, However, for other estimates of proportions which may be required for 1967). certain subclasses of the data, for example the subclasses males or females, the empirical techniques described in the previous chapter would be appropriate.
SUMMARY
In this chapter we have examined the pattern of reasoning and techniques required for the preparation of a sample design which is suitable for a hypoA sample design was prepared within thetical national evaluation study. limits set down by a set of hypothetical financial and technical constraints. It is important to remember that this hypothetical study was a relatively Typically, national evalsimple study with respect to the given constraints. uation studies involve disproportionate sampling of strata and more sophisticated multivariate analysis techniques. These extended constraints on sample design require consideration of many ideas which were discussed in earlier chapters of this report but which were omitted from this chapter in order to maintain simplicity in the presentation of the essential patterns of reasoning required for sample design in educational survey research.
7. Conclusion
In this report Student's (1908) empirical sampling approach has been used to assess the magnitude of the sampling errors of statistics used to describe a recursive causal model based on data gathered with four complex sample designs The influence of the complex which are commonly used in educational research. sample designs on sampling errors was shown to offer strong support for the argument presented by Kish (1957) that in the social sciences the use of simple random sample formulae on data from complex samples is the most frequent source of gross mistakes in the construction of confidence intervals and tests of hypotheses. When applied to single samples of data gathered for educational survey research the jackknife and the balanced half-sample error estimation techniques have been shown to be useful methods for calculating the average design effect standardised regression cocoefficients, (Kish, 1965) for means, correlation efficients and multiple correlation coefficients when the deductive theory required for calculating sampling errors is not available. The accuracy of these estimating techniques for individual statistics was not high, although there was a noticeable convergence of the average design effect towards an empirically established value when the number of estimates was increased. These results suggest that a rounded average design effect should be used to approximately adjust sampling errors rather than using an individual estimate of the design effect which may be wildly inaccurate. The influence of the complex sample designs used in this report on the sampling errors of standardised regression coefficients shows that a great deal of care must be taken in designing samples for recursive causal models. This influence was shown to be equally strong for correlation coefficients. Therefore the warnings expressed for the particular model in this report may be generalised to a great deal of current educational research because correlation matrices are the cornerstone'of many modern multivariate analysis techniques. Although this report was primarily concerned with data generated for survey research, the implications of the findings are equally important for experimental research. Observations which are gathered in clusters may lead to nonindependence of observations which in turn leads to confusion with respect to decisions about both the available desgrees of freedom in variance estimates and the choice of the unit of analysis (Guilford and Fruchter, 1973). These problems may often be circumvented in experimental designs by adjusting the Alternaanalysis for the nesting and clustering effects which are present. tively it is sometimes possible to ensure that the subjects in the study are not only randomly assigned to treatments but also that treatments are administered independently to each subject. While this study has developed estimates of the design commonly used statistics, it must be kept in mind that 165
effect for a variety of the value of the design
lffi
Evaluation
in Education
effect depends upon the variables being used in the study and also upon the clustering effects which occur among the elements of the sampling frame. Further research into the value of the design effect in educational settings for different variables, different populations and different complex sample designs is urgently needed to assist the planning of samples for future educational research. This report has demonstrated that the evaluation of the sampling stability of survey research findings in educational research is both necessary and possible. Hopefully, future educational research workers who use complex sample designs will make use of the procedures which have been discussed. in this study in order to present their findings in association with the appropriate estimates of sampling stability.
8. Summary
Educational survey research is often conducted with data gathered by employing sampling procedures which depart from the model of simple random sampling in which sample elements are selected individually, independently and with equal probability from the population under study. These sampling procedures usually incorporate such complexities as stratification, the selection of sample elements in clusters, and the use of multiple stages of selection. Unfortunately, either the computational formulae required to estimate the sampling errors of many statistics derived from these complex sampling procedures are enormously complicated or they prove resistant to mathematical analysis. This monograph examines the influence of these complexities on the sampling errors of statistics which are required to describe causal models based on systems of structural equations. The empirical sampling error estimation techniques of Balanced Repeated Replication and Jackknifing are applied to some educational survey research data in order to demonstrate their capacity to estimate the sampling errors of these statistics when suitable computational formulae are not readily available.
Acknowledgements
The author wishes to express his gratitude to Dr. John P. Keeves, the Director of the Australian Council for Educational Research, for his guidance and encouragement throughout the preparation of this report. It was through his earlier research work that I first became aware of the problems associated with the use of complex sample designs in educational research. I would also like to thank Professor S.S. Dunn who, as chairman of the Educational Research and Development Conniittee (ERDC), provided an opportunity for me to undertake an ERDC Visiting Fellowship programme with Professor Leslie Kish and Dr. Gerald Bachman at the Institute for Social Research (ISR), University of Michigan. This opportunity to meet and work with the survey research specialists of the ISR enabled me to clarify many of the issues which are discussed in the following pages. Several people have assisted with the preparation of this monograph by providing helpful suggestions and comments. In particular I would like to thank Dr. A.W. Davis, CSIRO Division of Mathematical Statistics, and Dr. 14-J. Rosier, ACER Survey Section. KR
168
Bourke, S.F. & Keeves, J.P. (Eds.) Australian Studies in School Performance, Vol. III: The Mastery of Literacy and Numeracy. Hawthorn, Australia: Australian Council for Educational Research, Canberra, Australia: Australian Government Printing Office, 1977. Brillinger, D.R., "The application of the jackknife to the analysis of sample surveys", Commentary, 8, pp. 74-80, 1966. "An occuaptional classification of the Broom, L., Jones, F.L. & Zubrzycki, J., Australian workforce", Supplement to Australian and New Zealand Journal of Sociology, 1, (2), pp. 1-16, 1965. F.L. & Zubrzycki, J., "Social stratification in Australia", (Ed.) Social Stratification: Sociological Studies I, Cambridge: Cambridge University Press, 1968.
Broom, L., Jones,
In Jackson,
J.A.
Commonwealth Bureau of Census and Statistics (CBCS), Australian Capital Territory Statistical Summary 1970, Canberra: The Bureau, 1970a. Commonwealth Bureau of Census and Statistics (CBCS), Schools 1969. Canberra: The Bureau, 1970b. Cochran, W.G., Sampling Techniques, 2nd Edition, New York: Wiley, 1963. Comber, L.C. & Keeves, J-P., Science Education in Nineteen Countries., Stockholm: Almqvist and Wiksell/New York: Wiley, 1973. Deming, W.E., "On simplifications of sampling design through replication with equal probabilities and without stages", Journal of the American Statistical Association, 51, pp. 24-43, 1956. Deming, W.E., Sample Design in Business Research, New York: Wiley, 1960. Ferguson, C.A., Statistical Analysis in Psychology and Education, 3rd Edition, New York: McGraw-Hill, 1971. Finifter, B.M., "The generation of confidence: Evaluati ng research findings by random subsample replication", In Costner H.L. (Ed.) Sociological Methodology, 1972, San Francisco: Jossey-Bass, 1972. Fisher, R.A., "On the mathematical foundations of theoretical Statistics", Philosophical Transactions of the Roy-, Series A, 222, 309-368, lY22.
169
170
Evaluation
Frankel, Arbor,
in Education
From Survey Samples: An Emprical Investigation, M.R., Inference Michigan: Institute for Social Research, University of Michigan,
Gray, H.L. & Schucany, Marcel Dekkar, 1972.
W.R.,
The
Generalised
Jackknife
Statistic,
Ann 1971.
New York:
Guilford, J.P. & Fruchter, B., Fundamental Statistics in Psychology Education, 5th Edition, New York: McGraw-Hill, 1973.
and
Gupta, H-C., "Intraclass correlation in educational research: an exploratory study into sane of the possible uses of the technique of intraclass correlation in educational research". Unpublished PhD dissertation, University of Chicago, 1955. orthogonal replications for estimating variances, with Gurney, M., "McCarthy's grouped strata", United States Bureau of the Census, Technical Notes - No.3, Washington: United States Bureau of the Census, 1970. Haggard, Dryden
E.A., Intraclass Press, 1958.
Correlation
and
the Analysis
of Variance,
Hansen, M.H., Hurwitz, W.N. & Madow, W.G., Sample Survey Methods Vol.1: Methods and Applications, New York: Wiley, 1953.
New York:
and Theory,
Harris, J.A., "On the calculation of intraclass and interclass coefficients of correlation from class moments when the number of possible combinations is large", Biometrika, 9, pp. 446-472, 1913. Hays, W.L., 1963.
Statistics
for
Psychologists,
New York:
Holt,
Rinehart
& Winston,
Jones, H-G., "Investigating the properties of a sample mean by employing random means", Journal of the American Statistical Association, 51, pp. 54-83, 1956. Keeves, J.P., Educational Environment Australian Council for Educational Wiksell, 1972.
and Student Achievement, Research, also Stockholm:
Kerlinger, Rinehart
F.N., Multiple Regression & Winston, 1973.
in Behavioral
Kish, L., Review,
"Confidence intervals for 22, pp. 154-165, 1957.
clustered
Kish, L., "Some statistical Review, 24, pp. 328-338,
problems 1959.
Kish,
New York:Wiley,
L., Survey
Sampling,
Research,
samples",
in research
design",
Melbourne: Almqvist &
New York:
American
American
Holt,
Sociological
Sociological
1965.
for subclasses comparisons, and analytical Kish, L., "Design and estimation In Johnson, N.L. & Smith, H. (Eds.) New Developments in Survey statistics", Sampling, New York: Wiley, 1969.
Sample
Kish, L. & Frankel, M.R., Journal of the American
Design ior Educatronal
Survey Research
"Balanced repeated replications for standard errors", Statistical Association, 65, pp. 1071-1094, 1970.
Kish, L. & Frankel, M.R., Inference from complex samples. Mimeographed issued by the Survey Research Center, University of Michigan, 1973. Marks, E.S., "Sampling in the revision of the Stanford-Binet gical Bulletin, 44, pp. 413-434, 1947. McCarthy, P.J., "Replication: An approach to the analysis National Center for Health Statistics, Series surveys", McCarthy, P.J., "Pseudo-replication: balanced half-sample technique", 2, No. 31, 1969a.
Further evaluation National Center for
McCarthy, P.J., _I’Pseudo-replication: Half .Statistical Institute, 37, pp. 239-264, Q., The
McNemar, 1942.
Revision
of the
Moser, C.A. & Kalton, G., Survey London: Heinemann, 1971. Mosteller, Lindzey, Edition,
samples", 1969b.
Stanford-Binet
Methods
F. & Tukey, J.W., "Data G. & Aronson, E. (Eds.) Reading, Massachusetts:
Peaker, G.F., "A sampling of the Royal Statistical
171
and application of the Health Statistics, Series
of the
Boston:
International
Houghton
Investigation,
Mifflin,
2nd Edition,
analysis i..cluding statistics", In The Handbook of Social Psychology, Addison-Wesley, 1968.
design used by the Ministry Society, 116, pp. 140-165,
Psycholo-
of data from complex 2, No. 14, 1966.
Review
Scale,
in Social
Scale",
paper
of Education", 1953.
2nd
Journal
and analysis of survey evidence, IEA/B/9. MimeoPeaker, G.F., The collection graphed paper issued by the International Association for the Evaluation of Educational Achievement, Stockholm, 1967a. Peaker, G.F., "Sampling", In Husen, in Mathematics, Vol. 1, Stockholm:
T. (Ed.) Almqvist
International & Wiksell/New
Study of Achievement York: Wiley, 1969b.
Peaker, G.F., The presentation and analysis of the IEA evidence, Mimeographed paper issued by the International Association for uf Educational Achievement, Stockholm, 1968. Peaker,
G.F.,
Personal
communication
to J.P.
Keeves,
IEA/B/57. the Evaluation
IEATTR/123.
Study of Education in Twenty-One Peaker, G.F., An Empirical Technical Report, Stockholm: Almqvist & Wiksell/New York:
Countries: A Wiley, 1975.
contributions to the theory-of evolution", Pearson, K. et al, "Mathematical Philosophical Transactions of the Royal Society, Series A, 197, pp. 285379, 1901. Plackett, R.L. & Burman, P.J., "The design 33, pp. 305-325. ments, Biometrika,
of optimal
multifactorial
experi-
172
Evaluation
in Education
Quenouille, M.J., "Notes on bias in estimation", Biometrika, 43, pp. 353-360, 1956. Rosier, M.J. & Williams, W.H., The Samplinq and Administration of the IEA Science Project in Australia 1970: A Technical Report, IEA (Australia) Reoort 1973:8. Hawthorn, Australia: Australian Council for Educational Research, 1973. Ross, K.N. & Skee, C.W., A computer program for estimating the coefficient of intraclass correlation. Mimeographed paper, Hawthorn: Australian.Council for Educational Research, 1975. Simnons, W.R. & Baird, J.T., "Pseudo-replication in the NCHS health examination survey', Proceedings of the American Statistical Association, Social Statistics Section, 1968. Student (W.S. Gosset). "The probable error of a mean", Biometrika, 6. pp. l-25, 1908. Terman, L.M. & Merrill, M.A., Measuring Intelligence, Boston: Houghton Mifflin, 1937. Tukey, J.W., "Bias and confidence in not-quite large samples: Abstract, Annals of Mathematical Statistics, 29, p. 614, 1958. Walsh, J.E., "Concerning the effect of intraclass correlation on certain significance tests, -Annals of Mathematical Statistics, 18, pp. 88-96, 1947. Weatherburn, C.E., A First Course in Mathematical Statistics, Cambridge: Cambridge University Press, 1946. Yamane, T., Elementary Sampling Theory, Englewood Cliffs, New Jersey: PrenticeHall, 1967: Yamane, T., Statistics: An Introductory Analysis, 3rd Edition, New York: Harper & Row, 1973.
Appendix
: Some Theoretical
Considerations
This appendix contains a theoretical discussion of the sampling concepts which The discussion is only concerned with the mathematiare used in this report. cal theory underlying the calculation of the sampling errors of sample means because current theory has not been adequately extended to permit the estimation of the errors of sampling for multivariate statistics which are calculated from data obtained with complex sample designs (Kish and Frankel, 1973). Each topic presented is a sumnary of more complex theoretical statements which have been developed with a variety of notations and with a variety of intended applications by the authors mentioned in the main text of this report. The topics examined have been divided into sections are concerned with the properties random sample design and the simple cluster presented to show that the introduction of sample designs may be associated with both racy.
three main sections. The first two of the proportional stratified Theoretical arguments are design. these complexities into random gains and losses in sampling accu-
The third section provides a theoretical background to the random subsample estimation techniques which are used in this report. This section also includes a proof of the efficiency of McCarthy's (1966) balanced orthogonal matrix when it is used to estimate the sampling errors of weighted means which are obtained from appropriately constructed sample designs.
PROPORTIONAL
STRATIFIED
SAMPLING
Comparison of Simple Random Sampling with Proportional Stratified Random
Simple The
random
variance
drawn
of the sample
without
V(R,,,)
sampling
replacement
=
mean from
Rsrs
for
a simple
a population
N+ f$
random
of size
N may
sample
of n elements
be written
as:
(A-1) (A-2)
173
174
Evaluation
in Education
If the population is divided into L strata, each stratum containing Nh (h=i,.. L) elements, we may write (A-2) as:
(N-t)S2
$(xhi-X)z 1
= i
L
Nh = i i [(xhi-xh)+(xh-x)]2 l
. *.
(N-l)S2 = i p(Xhi-ih)z+
2; tNh(Xhi-fh)(xh-B) hi
Lyp. _ + i ‘; uh-x)2
(A-3)
Consider the second expression on the right hand side of (A.3)
But
Nh Nh 1yh{ fXhisXh) = \ 'hi-\ 'h
Nh = \ 'himNhXh
% ‘h = 1’ x .-F Xhi i hli
=o
Thus the second expression is also equal to zero.
(N-l)+
= i
Equation (A-3) now becomes:
$h(Xhi-?h)2+ i @?h-i?)z
L = i(Nh-l)S; + k N,(;,-;;i)'
Sample
where
= the
Now,
when
variance
for
and
Nh-’ + Nh
N>>l
and
Nh elements
within
the
hth
stratum.
we obtain:
we have:
Nh -+ Nh Fl K
r
V(Fsrs)
Stratified Consider
(A-4)
random
sampling
a stratified
pendently
from
population
h
The mean
h
of the
jTn=
the
total
x&
random
the hth Let
replacement. The
of the
S2 in (A.l)
Nh>>l
N-l
Then,
sample
stratum population
is the
sum
in which
of Nh elements of N elements of the
stratum
=#x hi
hth
hi
stratum
samples
of size
by simple be divided
nh are
random into
drawn
sampling L strata.
totals.
(A.5)
is:
'h
(A.6)
N
h The
population
mean L
"=$=hF
175
Survey Research
Nh i (Xhi-Xh)2
Si = e
Substituting
Design for Educational
lx,
is obtained
from
(A.5):
(A.7)
inde-
without
176
Evaluation in Education
Now from (A.6) substitute for Xh in (A.7):
An estimate of the population mean based on the sample design described above becomes:
X
kh'h
(A.81
st = - N
This estimate is unbiased because we know from simple random sampling theory that each of the iih are unbiased estimates of the stratum means. Also the variance of Zst may be written as (Yamane, 1967:175):
_
(A.91
I
where
Nh (Xhi_si;l)2 St = V(Xh) = I.1 Nh'l 1
If we consider the special case of proportional stratified random sampling then we impose the restriction that the number of elements drawn from a stratum be proportional to the size of that stratum. That is,
"h _ n - -x Nh
(h=l,
. ..>
L)
fA.10) (h=1, .... L) Substitute from (A.lO) into (A.8) to obtain an expression for Z
"h
prop'
Sample Design for Educational
Survey Research
177
n That
.
-
IS, xprop
is equivalent
to the
sample
mean.
Therefore the sample mean, as in the simple random sampling, is an unbiased estimate of the population mean. For this reason we call proportionate stratified random sampling a self-weighting design. To obtain
the
we substitute
variance from
(A.lO)
I_ V(X
. ..
prop)
V(R prop)
= c h
=
of Est
under
into
restriction
of proportionate
Nh
N 2 N---n
S2
Nh
Nh n
$
(?_p i p
sampling
(A.9).
hhNN N
the
(A.ll)
We may now compare the efficiency (of the sample mean as an estimate of the population mean) for the two sample designs by substituting from equation (a.11) into equation (A.4):
Vsrs)
V(Z prop)
or
These
=
wprop
= V(Xsrs)
relationships
depends
on the magnitudes
can be made strata.
Design
show
by ensuring
Effect
Consider are drawn
for
(A.12)
=
~fi>(%$'
that
independently
gain
in accuracy
of the differences that
Proportional
a population
the
of N elements from
between
the stratification
Stratified
the hth
stratum
2,
into
prop
1 over VPSrs)
and F.
provides
Random
divided
of V(X
That
is, gains
homogeneity
within
Sampling
L strata.
of Nh elements
Samples such
of size
that
nh
a sample
178 of
Evaluation
in Education
"h n elements is drawn according to the restriction n = N._.__ Nh
(h=l, . ... L).
Since this design is a self-weighting design, the sample mean K is an unprop biased estimate of the population mean. Also consider a simple random sample of n elements drawn without replacement from the sample population of N elements. It has been shown that the variance of the sample mean for the proportionate stratified random sample design v(nprop) and the variance of the sample mean for the simple random sample without replacement design V(Zsrs) are related by the following equation:
V(Isrs) where and
= Vfxprop)
+
!$ t
h
” (%I$?
(A-14)
y = the mean of the hth stratum _h X = the population mean.
The second term on the right hand side-of (A.14) is equal to zero when all the stratum means are equal, that is when Xh = f for all values of h. Otherwise the term will always be greater than zero. By using this info~ation we may establish the values of Deff for proportionate stratified random sampling. When
ah = x" for all values of h
and when 3, # x for any values of h
Deff = 1 Deff < 1
The design effect will always be close to one if the variable which is used for stratification is unrelated to the criterion being considered because the strata will consist of the pieces of a randomly divided population.
SIMPLE CLUSTER SAMPLING
The Coefficient of--- IntracZass
Correlation
Standard statistical theory has mostly been developed with the assumption that the sample observations were obtained through independent random selection. However, most research in the social sciences has been carried out by using complex sample designs. The main feature of complex sample designs are clustering, stratification, unequal probabilities of selection and systematic sampling. Kish (1957) examined the consequences of applyjng the usual textbook formulae for calculating confidence limits to data obtained by employing complex sampling designs. He concluded that:
the social sciences the use of s.r.s. (simple random sample) formulas on data from complex samples is now the most frequent source of gross mistakes inthe construction of confidence statements and tests of hypotheses" (Kish, 1957:156). “In
Sample Design for Educational
Survey Research
179
The feature of complex sample designs which is responsible for these mistakes has usually been clustering - the selection of observational units in clusters or groups rather than individually. Marks (1947) provided an early warning of this influence in psychological research when he considered the effects of clustering on the sample design used in the revision of the Stanford-Binet Scale (Terman and Merrill, 1937; McNemar, 1942): "Ignoring the effects of cluster sampling on measures of sampling error has undoubtedly resulted in attaching importance to results which are statistically insignificant. In the testing field, failure to allow for cluster sampling has probably caused us to attach a measure of precision to our results considerably in excess of that warranted by sound statistical techniques" (Marks, 1947:413). Marks estimated that the standard error of the reported mean score on the Stanford-Binet Scale was at least three times the error which would be calculated from the data by the use of the formula for unrestricted random sampling. The source of this discrepancy in error estimates could be traced to the fact that the researchers found it economical and convenient to use existing geographical clusters as the primary smapling unit. Since individuals within a particular sampling unit tended to resemble each other more than they resembled individuals from other units the basic assumption of independent random selection of observations had broken down and the usual formulae failed to apply. Kish (1957) points out that this homogeneity of individuals within sampling units may be due to common selective factors, or to joint exposure to the same effects, or to mutual influence (interaction), or to some combination of these. The magnitude of this homogeneity is usually measured by rho, the coefficient of intraclass correlation. The coefficient of intraclass correlation was developed in connection with the estimation of fraternal resemblance, as in the calculation of correlation between the heights of brothers. To establish the correlation between brothers in general we have no reason for ordering the pairs of measurements. That is, the measurements are logically interchangeable in computing the correlation coefficient. Pearson (1901) suggested that this problem could be approached by the calculation of a product-moment correlation coefficient from a symmetrical table of measures consisting of two interchanged entries for each pair or measures. This method is suitable for a small number of pairs, however the number of entries in the tables rises rapidly as the number of pairs increases. To overcome the difficulties posed by working with very large symmetrical tables Harris (1913) developed a short cut based on sums of squares. This method was further refined (by using Fisher's approach which employs degrees of freedom rather than sample size to obtain population variance estimates) to allow the computation of the intraclass correlation to be made from analysis of variance tables. Haggard's (1958) comprehensive investigation of the relationship between intraclass correlation and analysis of variance considerably extended the range of applications in psychological research for the coefficient of intraclass correlation; further work by Gupta (1955) explored the suitability of this statistic for use in educational research.
180
Evaluation in Education
It should be remembered that the value of the coefficient of intraclass correlation has no meaning for the individual except insofar as he is considered to A high value implies that there is a high degree of be a member of a group. homogeneity within the groups of observations.; The concept of consistency within groups may also be thought of as non-indepenThe presence of non-independence dence of observations within the groups. among observations was also shown to affect test statistics such as 't, F, or X2 (Walsh, 1947) because tests using these statistics are based on the assumpIn the tion of the independence of observations within two or more samples. case of t, when the observations are not independent, and the t test based on t will be the assumption of independence is used, the value of the obtained overestimated if rho is positive and underestimated if rho is negative (Haggard, 1958). The following discussion traces the definition of the coefficient of intraclass correlation from its description based on a symmetric correlation table (Weatherburn, 1946) to its functional relationship with the F statistic (Haggard, 1958). Consider a population of elements divided into M clusters each containing k In order to consider the elements which are measured on a characteristic X. correlation between the elements in a cluster, without distinguishing between the order of the pairs of elements in each cluster, Pearson (1901) suggested the construction of a symmetric table consisting of all possible pairs for each cluster. There will the table.
be k(k-1)
Let Xij
(i=l,
the jth
element
The mean
for
(i=l, The two
pairs
M; j=l,
..,
of the
for each
.., k) denote
ith
and Mk(k-1)
the measure
pairs
on the
of values
characteristic
in
of
cluster.
the population
. . . . M; j=l,
cluster,
is:
. ..I
k)
product moment correlation coefficient calculated between the values in the table is the coefficient of intraclass correlation rows of the symmetric
RI'
’ . .
RI=+2
ud ob 2
is the
variance
of the values
in the first
ub
is the
variance
of the
in the
0a b
is the
covariance
oa
row
2
values
between
the
rows.
second
row
Sample Design for Educational f
Now
a.g
(j, where
f
f:
Survey Research
181
!xij-x)(xiJ-x)
=
N
1=1, ..,
k: j # 1; i=l, .., M)
N is the number of pairs =
Mk(k-1)
"1.1 ('ij-X)'
,I j Oa=Ob = \;i Mk .
. .
(A-15)
R
(j,
l=l, .., k; j # 1, i=l, .., M)
Consider the numerator of the right hand side of (A.15). Numerator
= " 5
(xij-p)o~(xii-R)
The sum of the Xi, for all values of 1 includes all values for the ith cluster except Xij. That is,
1 Xi, = k'jli-X.. 1J 1
where
K, is the mean of the ith cluster.
Also
;" = (k-1)x since 1 takes (k-i) values.
. .Numerator = 7 1 (Xij-!t) kli-Xij-(k-i) iT I i J = k ~ 5
= k
(xij-x)(xi-x) - 2;5
5: (kyi-kX)(?+)
= k21(iii-R)2 - T 5 i
= kSSB - SS,
(Xij-112
-1 r, (xij4)2 'J
(xij-ji)2
182
Evaluation
where
Now
in Education
SST
= total
SSB
= between
SSW
= within
consider
sum
of squares
clusters clusters
sum of squares sum of squares.
thedenominatorof
Denominator
= (k-l)
the
right
hand
side
of (A.15).
7 l (Xij-x)2 iJ
= (k-l)SST RI = k.SSB
Therefore
-
(SS*+SSw)
(k-l)(SSB+SSW)
= (k-1)SS8-SSW (k-l)SSB Divide
both
sides
t (k-1)SSW
by M(k-1)
SSB_ q.J .
. .
RI=
M(k-1)
z
Now if we were to estimate (A.16) as follows: BCMS* BCMS"
RH= where
and
(A.16)
B + (k-1 Ww M M(k-1 ) a sample
of data
then
we
could
rewrite
- WCMS + (k-1)WCMS
BCMS
is the between
WCMS
is the within
*
denotes
RH is the
RI from
clusters clusters
a biased
estimate
mean mean
square square
estimate.
used
by Harris
(1913).
This estimate of the coefficient of intraclass correlation is biased in the negative direction (Gupta, 1956). Haggard (1958) recommends replacing BCMS* with an unbiased estimate of the population value of the between clusters mean square in order to remove this bias. Then
R = BCMS BCMS
is an unbiased
- WCMS t (k-1)WCMS
estimate
of the
(A.17) coefficient
Now if we divide the numerator and establish a functional relationship
of intraclass
correlation.
denominator of (A.17) by the WCMS, between R and F statistic.
we may
Sample
Then
183
Survey Research
R=h
In the justed
class membership case of unequal average value k (Haggard, 1958).
r
where
M is the
and
ki is the
Cluster
Consider
where
Sampling
number
and
the
of elements
Coefficient
= p + ai + eij
Xij
is the jth
U
is the
general
that:
one-way
in the
random
element
ith
cluster
of the
due to the
ith
associated
with
ai has
a normal eij
each
have pair
distribution
Between clusters Within clusters Total
1963):
cluster the jth with
a normal
distribut
of random
variables
Further, assume that the same number equal sized clusters. The following to variance (Hays, 1963). Source
Correlation
mean
effect
of u 2; e
by an ad-
class.
(Hays,
effect
the errors
ith
model
is the
variance
of k is replaced
of Intraclass
is the
of ac2;
value
of clusters
Xij
iJ
Assume
number
following
ai
the
-I
the
e..
From
Design for Educational
Sum of squares SSB ssW
sT
element
a mean ion with
in the and
of zero a mean
ai and e..
1.l
ith
a variance
of zero
are
cluster.
and
a
independent.
of observations ii are selected from m one-way table sumnarises the contributions
Degrees of freedan m-i m(fi-1)
Mean squares BCMS WCMS
Expected mean square no2+02 c 'e
e
2
mfi-1
(A17)
R =
BCMS - WCMS BCMS.+ (Ii-n)WCMS
Now it we substitute the expected values from the analysis of variance table, then we obtain the value of R under the assumptions of the one-way random
184
Evaluation in Education
model
presented
above.
iiu2
+U2-U2 e
’
R=
e
?ioc2 + ue2 + (‘ii-l)oe2 2 OC R
b
= oc2+u2
=
e
OC2 2
Now consider a two-stage sample and-suppose that m clusters are selected from Let the population size be N. Select a total of M clusters each of size N. ii observations from each of the m clusters. The variance of the mean may be estimated from the following formula (Yamane, 1967).
(A. 18)
where
and
Zi
is the mean
x
is the overall sample population mean),
5:
=-n;ll
of the
ith cluster mean
(which
is an unbiased
estimate
of the
T (xi-x)2
1
Sii =_'..l.(xij-xi)2 n-l J Now The
consider the sums of squares from the above total sum of squares may be expanded in the
‘t!F(Xij4)2
= ; i (xij-“i)2
ij
= ssw The
expression
analysis of variance table. usual fashion (Edwards, 1967).
+ fi ; (q-q2
ij
i
+ SSB SSW may
be rewritten
ssw= z”z" (XijGi
)2
i j =
(5-l
=
(E-1
)
![A!
)
?
i
txij-xi)2]
Si!i
in the
following
fash ion
Sample
Design for Educational
Survey Research
la5
Dividing both sides by m(R-1) WCMS =$T
ssi 1
or
(A.19)
ms. s, +tx=y
The expression SSB may also be rewritten: SSB = Fi i (Pi& i
= R(m-1) &-1
y (5-3 i
[
= 5(m-1) s* Dividing both sides by m-l: BCMS = 71 '"1 or
52 1
BCMS mfl
-= m
(A.20)
For large values of M, N the finite population corrections M-m and N-ii tend to unity. Therefore equation (A.18) may be rewritten as: M N
?
V(xc)
:$ +++
‘5i
Substituting from (A.19) and (A.20) for the two expressions on the right hand side of this equation gives: ;(nc) = BCMS + WCMS -K -KY But M is considered to be very large; therefore the second expression on the right hand side will tend to zero. That is, t(Yc) = BCMS ii?and
$ic,) = E(;p)
=
ilo 2 + (&-a c2) ' mti
186
Evaluation in Education
Now substitute for crc2 by using the relationship that
R/2_ 2
Then
V(F&
= iiRa2 + (a2 - Roe) nii =
But
V(Sisrs)=c= mn
g$ nn
for a simple random sample of m?i elements drawn from the sane population. (A.21)
Therefore V(yc) = V(Xsrs ) [1 + (fi-l)R]
It can be seen from the above expression that under the assumptions of the oneway random model the value of V(Rc) in relation to V(Fsrs) depends directly on the size of the coefficient of intraclass correlation. This relationship, and its resulting influence on the value of the design effect, is examined in a later discussion. Sampling Error of the Mean in Relation to Cluster Size and Number of Clusters Consider a sample of n elements drawn from a population of N elements divided into M clusters each containing Ni elements. Select n clusters at random and from each cluster select ni elements (where i=l, ..,
n).
If we assume that
the cluster sizes and subsample sizes are equal, then we have: Ni
=N_= M
i;i, ni
=
E=
n
n
Under these restrictions the sample mean Fcl becomes an unbiased estimate of the population mean (Yanane, 1967), where E
cl
Also consider a simple random sample of n elements drawn without replacement from the same population of N elements. It nay be shown (see equation A.21) that the variance of the sample mean for the simple cluster sample design V(Kc,), under the restrictions on Ni and Ni, and the variance of the sample mean for the simple random sample without replacement design V(Zsrs) are related by the following equation:
Vc,f
=
vsrs 4 1
1
+ (k-1)R
(A.22)
where R is the coefficient of intraclass correlation under the conditions of the one-way random model,
Sample Design for Educational
Survey Research
187
k = 71 is the ultimate cluster size (>I), and
74l>ii, M>> m.
Since
V(Ksrs) = $9
= !k
we have
V(ZJ
= $$
[Rk+ (I-R)]
v(gcl)
=
. . .
(A-23)
Vb;.R + W~l-R)
Therefore in order to reduce the variance of the sample mean based on cluster sampling it is important to increase the size of m (the number of clusters) rather than k (the number of elements per cluster).
Relationship Between Deff and Rho for Simple Cluster Sampling From previous discussion we find that there is a functional relationship between the F statistic and R:
R= F
1 F, (k-l)
The F statistic may take values in the range zero to infinity. By partially differentiating R with respect to F we obtain:
aR= aF
[F + (k-11
.l
-
rF - 11.1
[F + (k-l)]' =
k [F + (k-112
which is always positive. Therefore the relationship between R and F is monotonic and the maximum and minimum values of R must coincide with the maximum and minimum values of F The maximum value of R will thereforeoccur when F+mand the minimum value of R will occur when F + 0. That is Rmax = 1 - E and Rmin = _1+ 6, where E k-l and6 are small positive numbers such that E+O as F+-, and S+O as F + 0. NOW substitute these values into (A.22) to obtain two new relationships. When
R = Rmax, V(Q)
= Wsrs).
1+ (k-1) ( l-
E) I
188
Evaluation
in Education
but
k >> E and
then
V(Rc,)
When
R=
=v
but
6'
k >> Ek
Rmin,
V(Rc1)
From
s(k-lj
V&l)
the
for
and
is also
Also
two
Also
a small
6'-0
when
positive
1
number
because
F+O. (A.25)
6’Wsrs) results
R = Rmax,
for
= V(Rsrs),l+(k-')(kGi+') [
(~srs). bW)l =
(k-l) >O. then
(A.24)
= kV(xsrs)
(A.24)
V&l)
R = Rmin,
by inspection
and
(A.25)
it can be seen
that
‘V(~srs)
V(Xcl)~V(Rsrs)
from
(A.22)
it can be shown
that
for
R=O,
V(lcl)=V(Xsrs).
By substituting these results back into the expression which defines Oeff we These may now inspect the values of Deff for the simple cluster sample design. values may be presented with ranges of values for R because V(Rcl) also increases monotonically with R. When
and
O
;
Deff
>l,
R=O
;
Deff
= 1,
Deff
< 1,
-l
For most cluster designs in survey research R tends to be positive. That is, the individuals associated with human groups are more like their own group than Therefore we usually find that the value of Deff for a simple any other group. cluster design is greater than one.
RANDOM The
jackknife
SUBSAMPLE
ERROR
ESTIMATION
TECHNIQUES
method
Consider an estimator partition this sample subsample identically
y based on a random sample of n observations. into k 'independent' subsamples of size m such follows the design of the original sample.
Randomly that each
Sample
Design for Educational
Survey Research
189
Let yi be the estimate y based on that portion of the data which omits the ith subsample, and let y,,, be the corresponding result for the entire sample. Mosteller and Tukey (1968) define k pseudo values yP (i = 1, . . . . k) based on the k complements 'C =
y1
ky
all -
(i = 1, .. . . k)
(k-l)Yi
They also defined the jackknife value as Y * = + i yt
= kyall -
(k-l&
(A.26)
1
Quenouille (1556) presented theoretical arguments to support his earlier deduction (Quenouille, 1949) that the jackknife value displayed less bias than the usual estimate. A complete theoretical discussion of Quenouille's method and a detailed discussion of the jackknife statistic has been presented by Gray and Schacany (1972); however a brief summary of Quenouille's main argument is set out below. Assume that the bias in y as an estimator or Y is such that m ai E(y)=y+;ai=y+li n'
i (mk)'
where ai is not functionally related to n. m ai Also E(Jc) = E(yi) = Y + 1 F i (n-m)' since yi is based on (n-m) elements. Also (n-m) = km-m = m(k-1) then
E&)
m ai = Y + 1 i i m (k-l)i
From equation (A.26) the expected value of y* is E(Y*) = K*E(y,ll) - (k-l)*E(Ji)
= ky + ke’ + k%+ mk
= y_
(k-1)Y t (k-l)
m2k2
a2
a3
n?k(k-1)
- n?k'(k-@-
**'
al t (k-l) a2 t .. . m(k-1) m'(k- 1)'
190
Evaluation
. ‘
l
E(y*)
in Education
= Y
-k
n(n-m)
-
.-
a3
-
.
.
.
n2(n-m)2
Therefore the bias in y is of the order i while the bias of the jackknife value y* is of the order 3. Also, if the bias in y is exactly of the orderkthen is unbiased.
a2 = a3 = ... = 0 and y*
The variance of the jackknife value Sag may now be obtained from the pseudovalues,
where
52
=’
1. Yi*2
_1(1 *)2 k iyi k-l
Following Tukey's (1958) proposal, the pseudovalues are used as approximately independent observations in order to estimate the stability of the jackknife value.
THE HALF-SAMPLE REPLICATION METHOD The technique of half-sample replication is a technique for the estimation of variance from stratified sample designs which have two selections per stratum. These designs are valuable in that they permit the utmost stratification consistent with more than one independent replicate per stratum needed for computing variances. The following discussion is a summary of a detailed theoretical development by McCarthy (1966). Consider a stratified sample design which is composed of two independent selections from each of L strata. The following table describes the details of this design.
Sample Design for Educational
Population variance
Population mean
Weight
Stratum
191
Survey Research
Sample mean
Sample points
1
w1
Ul
4
Yll.
Y12
Yl
2
w2
p2
4
Y213
Y22
Y2
'h
'h
%
yhls
yh2
yh
h
An unbiased
where
Also,
estimate
iwh=
the
estimate
bj,k-j$,)2 (2-l)
= i
population
mean
u is given
1
usual
i. where sf, = k
of the
(Y,,-~)2
of the
variance
of 9.
t(j),
is
by
(Kish,
1965a:78).
192
Evaluation
in Education
yhl - 2-2
=
Then
11
yh22
+
'hl
yhz
-2
-
'h2 2
1 2
t(Y,,f-.$,d2
"v(y) = 4 fi w;di
That is, the sample obtained Now consider a 'half-sample replicate'. ting one of the sample points in each stratum. There are 2L distinct For the jth half-sample, the estimate of u is samples.
“3
=
The deviation written as
("s-")
where
i 'hYhi
of this
half-sample
= ~ 'hyhi
i = 1 or 2 for each
estimate
from
the overall
sample
mean
may
d LL
where the deviation for each stratum is determined choice of a plus or minus sign for each stratum.
(_Y;-J)' =
h.
- it 'h(yh 1+yh2)
= $(?w,dl?w2d2+...+w
Therefore
by selechalf-
by making
an appropriate
be
Sample
Design for Educational
Survey Research
193
and since selections within strata are independent, E(dhdk) = 0
Then
E~Y~-Y)~]
=
EE
i wid:] = V(Y)
Select a simple random sample of k half-samples. Since the expected value of each squared term is equal to V(p) then the expected value of their average is also equal to V(g). That is, ^v(y) =+b
(y3-P)2
(A.27)
J is an unbiased estimate of V(j)
BALANCED HALF-SAMPLE REPLICATION From above the variance of the weighted mean, V(Y), for a stratified sample design with two independent selections per stratum may be estimated as follows. (A.28)
Also, for a random half-sample, the variance of J may be estimated by considering the deviation of the half-sample estimate from the overall sample estimate
(3*-J? = 4
1w;d; + t i<;
(") whwkdhdk
(A.29)
The between stratum contributions to variance come from the cross product tens which involve dhdk: These terms cancel out in (A.29) when we consider the entire set of 2L half samples. The question now arises whether one can choose a relatively small subset of half-samples for which these terms will cancel out. If this can be done, then the corresponding half-sample estimates of variance will contain all the information available in the total sample. McCarthy (1966, 1969) has shown that by selecting orthogonally balanced patterms of half samples a smaller set of half samples may be selected which will produce estimates of the variance equal to the estimate that would be produced by considering all possible half samples. This technique relies on the property of the statistic under consideration being linear in the replicate values. The weighted mean satisfied the condition of linearity because (E;-P)2 = (j$'-y)i, where q' is the value of the weighted mean based on the data which form the complement of the data used to establish J;'.
Despite the
194
Evaluation
in Education
lack of analytical proofs for the non-linear statistics such as regression coefficients, Kish and Frankel (1970) provide results which suggest that this technique is also suitable for non-linear statistics. Consider the three strata design which is to be used in this study. Let the strata have two independent observations per stratum: (yI1, y12), (yzl, y22) and
(~31,
following
~32).
There
are
subset
of four
23 = 8 possible
half
half
samples.
Now
consider
the
samples.
Stratum
1
2
3
1
Yll
Y21
Y31
t(wld,
+ wzdz
+ w3d3)
2
Yll
Y22
Y32
l(wldl
- wzdz
-
3
Y12
Y22
Y31
g(-w,d,
- wzdz
+ w,d,)
4
Y12
Y21
Y32
t(-w,d,
+ w,d,
- w3d3)
Half
sample
The signs of the terms tion dh = (yh -yh2)* By multiplying
But
each
entry
right
hand
in the
column
r4ght
hand
(2w,w,d,d,+
are determined
column
w3d3)
by the defini-
we obtain
2w,w3d,d3+2w2w3d2ds)
= $(w:dj+w$d$+w?jds)
+h
(9*-Y)2
= $(w:dj+w$d$+w3ds)
+a (-2wlw2dld2-2w,w,d,d3+2W,W3d2d3)
(s*-y)2
= ~(w:d~+w$d~+w$d~)
t f(2wlw2d,d2-2w,w3d,d3-2w,w,d,d,)
(y*-j-)2
= a(wld$+w!!d$+w$d5)
t a(-2wlw2dld2+2w,w3d,d3-2w2w3d2d3)
6*-Y)
Then
out
in the
(9:-Y)
2
$ 5 (Y*-Y)'=
from
previous
E4
a [ w;di
discussion
we know
that
B1 w;d;
= V(y)
Therefore by selecting the above pattern of four half all the information which would be available by using half samples. McCarthy (1966) summarises this pattern of half whose columns are orthogonal to one another.
samples
samples we have obtained all of the eight possible
as a matrix
of signs
Sample Design for Educational
A plus
sign
denotes
yhl
and
+ +
+ -
a minus
sign
195
Survey Research
t -
denotes
yhs.
In order
to obtain
a
set of half samples which have the property of cross-product balance it is necessary to construct a matrix whose columns are orthogonal and whose rows The method of constructing these ormust be a multiple of four in number. thogonal matrices is described by Plackett and Burman (1946). Gurney (1970) developed a formula which compared the variance estimates obtained from McCarthy's orthogonal method with the unbalanced random selection Considerable gains were shown to be associated with the use of replications. of estimates based on the McCarthy method.