Experimentatal Design: Large-scale Social Experimentation Newhouse J P 1993 Free for All? Lessons from the RAND Health Insurance Experiments. Harvard University Press, Cambridge, MA Newhouse J P, Marquis K H, Morris C N, Phelps C E, Rogers W H 1979 Measurement issues in the second generation of social experiments: The health insurance study. Journal of Econometrics 11: 117–29 Orr L L 1999 Social Experiments: Ealuating Public Programs with Experimental Methods. Sage, Thousand Oaks, CA Ross H 1966 A Proposal for a Demonstration of New Techniques in Income Maintenance (mimeo). Data Center Archives, Institute for Research on Poverty, University of Wisconsin, Madison, WI
C. N. Morris and C. W. F. Chiu
Experimental Design: Overview Many scientific discoveries are made by observing how a change in a stimulus that is presented to a subject or object affects the response (measurement) given by that subject or object. In an experiment, the investigator has direct control over which stimuli are presented to which subjects in which time periods. This control, when exercised correctly, enables the investigator to deduce a ‘cause and effect’ relationship, that is, the investigator can deduce that a given change in a stimulus causes a given change in the measured response. The plan of how an experiment is to proceed is called the ‘design of the experiment.’ The art of experimental design is the art of devising an experimental plan which maximizes the information that can be obtained on the effects of the stimuli.
1. Terminology Experimentation is used in almost every branch of science, with the result that the terminology used in experimental design is not quite standardized. For example, in some fields, the subject or object which is to be presented with a stimulus and then to be measured is called a ‘unit’ or ‘experimental unit.’ The stimulus, itself, may be called the ‘treatment’ or the ‘level of a factor’ or the ‘level of an independent variable.’ In factorial experiments, where a subject is presented with a combination of different types of stimuli (such as a particular light intensity together with a particular noise level), the combination may be called a ‘treatment combination’ but, for simplicity, the term ‘treatment’ may be taken to mean either a single stimulus or a combination of stimuli, depending on the context. In some experiments, measurements are made on each subject over several time periods. These are known as ‘repeated measurements.’ The terminology concerning the associated designs again differs between disciplines. The term ‘repeated measurements design’ may refer solely to a design involving repeated 5090
measurements on a subject to whom a single treatment has been administered or it may include designs in which the treatment is changed before each measurement. The latter type of design is also known as a ‘within subjects design’ or a ‘block design’ or a ‘crossover design.’ All of these designs are grouped under the heading of ‘split-plot designs’ by some authors, while others reserve this last term for a design with two types of treatment, one of which is held constant throughout the repeated measurements on a subject and the other of which changes before each measurement see Sect. 9.2. 2. The Purpose of Experimental Design The pioneer in statistical experimental design was Sir R. A. Fisher (Fisher 1951) who was concerned with maximizing the amount of information about agricultural crop production. In the Social and Behavioral Sciences, the questions of scientific interest are very different, but the art of good experimental design is similar. Every experiment has a budget (time and money) and a limit as to the number of subjects that can be recruited. Also, every experiment has inherent variability; subjects differ from one another in fundamental ways, technicians differ in how they read measuring instruments, subjects and instrument readings vary over time, and so on. Variability translates into uncertainty and uncertainty reduces the amount of useful information. Information gained from an experiment may be viewed either from a sampling perspective or from a Bayesian perspective. In the former, hypothesis tests and confidence intervals are generally used, and the most informative experimental designs yield the most powerful tests and the shortest intervals, whilst producing unbiased results. (E.g., Estimation: Point and Interal; Hypothesis Testing in Statistics). For a Bayesian analysis, the most informative designs are those which maximize the expected utility (e.g., Experimental Design: Bayesian Designs). The purpose of designing an experiment is (a) to maximize the amount of information that can be gained within a given budget or (b) to minimize the budget required to obtain a given amount of information. Maximizing information is done by controlling and reducing the effect of extraneous variables and by the removal of confounding variables and bias, as described in Sects. 3.2 and 3.3.
3. 3.1
Features of Good Design Comparison
Experiments, by their nature, tend to be comparative. Questions of interest tend to be of the type ‘does this treatment elicit a ‘‘better’’ response than that treatment?’ and, if so, ‘how much better?.’ Even if a single treatment appears to be the only one of interest,
Experimental Design: Oeriew information about its effect on a subject is usually of no value without a comparison with other treatments. For instance, Moore and McCabe (1999, chap. 3) cite an example of a medical study that showed a gastric freezing technique to be a good treatment for ulcer pain, but in later comparative experiments it was shown that the pain relief using freezing was no better than the relief achieved using the same technique but with no freezing solution. Apparently, the pain relief was due to nothing more than doctors showing concern for their patients (a placebo effect) and the freezing technique was abandoned. A good design for evaluating the effect of a single treatment will, therefore, always include a second treatment, called a control or control treatment, for comparison purposes. In some experiments, the control is the treatment in current use, and in other experiments it is the ‘absence of a stimulus.’ An experiment with more than one treatment needs no control since the experimental treatments can be compared among themselves. Nevertheless, a control can often add extra information. For example, in an experiment on how different types of background music (such as ‘classical,’ ‘rock,’ ‘rap,’ etc.) affect the time taken to learn a new task, a control treatment might be the absence of background music.
3.2 Control of Extraneous and Confounding Variables A good experimental design allows a particular set of stimuli to be compared with each other with high accuracy. Therefore, any other variable (or factor) that causes the experimental measurements to vary at best reduces the efficiency of the experiment and at worst completely masks the true effect being investigated. For example, in an experiment to investigate the effect of employing different types of memory aid in memorization, the IQ of the subjects may play an important role. The variability of IQ of subjects in the experiment would then contribute to the variability in the measured memorization scores. One way of controlling the effect of such an extraneous variable is to hold the variable fixed during the experiment. For instance, IQ could be held more-or-less fixed by using as subjects only people with a tested IQ within a certain range. Although it reduces variability and avoids masking the effects of the memory aids, this strategy limits the applicability of the conclusions of the experiment since the results would apply only to the people in the population with IQ within this range. A preferable method of controlling extraneous variability is to use a matched design (see below). An extraneous variable whose effect is completely muddled with that of the factor(s) of interest is called a confounding variable. In the above example, if all the subjects with high IQ were to be tested using the first memory aid and all those with low IQ tested using
the second, then, if high IQ is correlated with memorization ability, the first memory aid will inevitably appear to be the better, regardless of its true merits. The masking effect of a confounding variable can be reduced by randomization (e.g., Experimental Design: Randomization and Social Experiments). The simplest form of randomization leads to a ‘completely randomized design’ (see Sect. 7.1); subjects are recruited from the general public as randomly as is possible (see Sect. 5) and then assigned to the stimuli at random in such a way that every subject has the same chance of being assigned to any one of the stimuli. In such a design, it is likely, although not guaranteed, that each stimulus will receive roughly the same distribution of values of the extraneous variable. If a completely randomized design is used in the above example, the memory aids should receive roughly similar ranges of subject IQs. In all experiments there are confounding variables that have small effects which are ignored by the experimenter and confounding variables that are accidentally overlooked. The use of randomization helps to spread out the effects of these variables so that the response given to any one stimulus is less likely to be inflated upward or downward due to extraneous factors. For a discussion of random assignment using a random number table or a computer random number generator, see for example, Dean and Voss (1999, chaps. 1, 3), (e.g., Random Numbers). A more foolproof method of ensuring similar distributions of IQ levels for each memory aid in the above example is to divide the subjects into groups so that subjects within the same group have similar IQs, and then to make the random assignment of subjects to stimuli within each group separately. The division into groups provides the control necessary for removing the variability and masking due to the extraneous IQ variable, while maintaining the applicability to the general population. This type of design is called a ‘matched design’ or ‘block design’ (see Sect. 7.2). Matched designs and completely randomized design are examples of ‘between subjects designs’ (Sect. 7). A further possibility is to measure each subject under a sequence of different treatments—a ‘crossover design’ or ‘within subjects design’ (see Sect. 8). The use of such a design in the above example would completely control the extraneous variable since it would ensure that the distribution of IQs assigned to each memory aid is identical. However, new extraneous variables have now been introduced, such as fatigue on the part of the subject over the course of the experiment. If the effects of the new extraneous variables are likely to be small, they can be ignored, but if they are large, then the variables should be controlled by making sure that subjects are assigned the memory aids in different orders. The orders can be assigned at random for each subject separately—a 5091
Experimental Design: Oeriew ‘block design,’ or by deliberately making sure that each stimulus is viewed by the same number of subjects in each time period—a ‘latin square design.’ 3.3 Remoal of Bias Since each experiment is run with a particular purpose in mind, experimenters tend to have inbuilt, although perhaps subconscious, biases towards or against certain treatments. A random assignment of subjects to treatments and a random ordering of observations ensures that experimenter bias cannot consciously or unconsciously favor one treatment above another. Subjects’ own biases towards the treatments can affect their responses. It may or may not be possible or ethical to conceal the true nature of the experiment from the subjects. It may be possible, however, to mask from both the subjects and the person(s) running the experiment which of the stimuli is the experimental treatment(s) and which is the control (a ‘double blind experiment’). Leach (1991 Sect. 24, Appendix 1) discusses ethical issues of concealing information from subjects and lists the guidelines published by the British Psychological Society. Kirk (1982 Sect. 1.5) gives references to ethical guidelines put out by the American Psychological Association, the American Sociological Association, and other bodies.
4. Planning an Experiment Guides to planning experiments can be found in many texts; for example, Myers (1979, chap. 1), Cox (1958) Dean and Voss (1999, chap. 2), Leach (1991 Sects. 5–9). A protocol, which gives in great detail, step by step, how the experiment is to proceed, is usually prepared well in advance. The protocol includes details about subject selection, measurement, and data collection methods, preparation of materials, preparation of subjects, and a draft statistical analysis. A pilot experiment, in which a small number of observations is collected, is often run early in the planning stage. Although these observations will usually be thrown away when the main experiment commences, the pilot experiment gives an opportunity to check that the experimental procedure will work as planned and that the required analyses are possible. It also gives indications of unexpected important confounding variables and the likely accuracy of the results. It allows problems to be detected and corrected before they arise in the main experiment.
5. Selection of Subjects Ideally, the subjects taking part in the experiment are selected at random from the population to which the conclusions of the experiment will be applied (the ‘target population’). Since this is not always possible, 5092
the experimenter may be forced to use volunteers who will inevitably come from a subset of the population. The results of the experiment may or may not then apply to the entire target population. The number of subjects required to achieve the goals of powerful hypothesis tests and short confidence intervals, or of maximum expected utility, can be calculated using statistical formulae (e.g., see Estimation: Point and Interal; Hypothesis Testing in Statistics; Experimental Design: Bayesian Designs). The required number of subjects depends upon the design selected for the experiment, on the comparisons of interest, and on the variability of the responses of subjects when assigned the same treatment under identical experimental conditions. In general, ‘between subjects designs’ require many more subjects than ‘within subjects designs.’ Between subjects designs are used in fields where subjects can be assigned only a single treatment (as in the evaluation of different teaching methods) and\or where subjects are sufficiently plentiful to off-set the subject-to-subject variability. Within subjects designs are preferred in fields where subjects are scarce or highly variable. The length of the sequence of treatments that can be presented to any one subject depends upon the nature of the experiment. For example, in experimentation involving multiple visits to a laboratory, subject tolerance can be as low as two or three visits. In experiments in which treatments can be changed rapidly with no long term effect on the subject, a much longer sequence of treatments can be used, requiring fewer subjects in total. There are exceptional circumstances in which an experiment is conducted on a single subject (see Wilson 1995, Kratochwill and Levin 1992), but such experiments cannot give conclusive information about the population as a whole and are not used in standard experimentation.
6. Models and Analysis The model links the dependent (response) variable(s) to all of the factors (independent variables) that could influence the response, such as the various stimuli, the subjects, time periods, extraneous variables that were used in determining the design, and other important variables that can be taken into account only during the analysis (called ‘concomitant variables’ or ‘covariates’: see, e.g., Maxwell and Delaney 1990, chap. 9). All extraneous variables that were ignored in the design are grouped together in a single ‘error variable.’ A measurement on a subject during a time period before the experiment begins is called a ‘baseline measurement’ and can be used to increase the accuracy of the results (see, e.g., Jones and Kenward 1989, chap. 2, Sect. 4.4; Cotton 1998, chaps. 9, 10). If the distributions of the error variables are identical and approximately normal distributions, then standard analysis of variance techniques can be used (e.g., Analysis of Variance and Generalized Linear
Experimental Design: Oeriew Models). Analysis of designs in which treatments are assigned the same numbers of subjects and the same numbers of time periods are the easiest to analyse and interpret. They are also the least sensitive to assumptions about normality of the error distributions and equal variances (e.g., Errors in Statistical Assumptions, Effects of; Statistical Analysis, Special Problems of: Transformations of Data). When the errors do not follow a normal distribution, other types of analysis are needed such as analysis of generalized linear models (e.g., Analysis of Variance and Generalized Linear Models), nonparametric analysis (e.g., Nonparametric Statistics: Rank-based Methods), and categorical data analysis (e.g., see Ratkowski et al. 1993, chap. 7, Jones and Kenward 1989, chap. 3, Crowder and Hand 1990, chap. 8). Repeated measurements on a subject under a particular treatment may require a time series or regression analysis, (see Crowder and Hand 1990), (e.g., Linear Hypothesis: Regression (Basics)). Bayesian techniques in analyzing cross-over designs are mentioned by Jones and Kenward (1989, pp. 80, 235) (e.g., Experimental Design: Bayesian Designs). In the case of correlated responses per subject, multivariate analysis is used for normally distributed errors (see, e.g., Winer et al. 1991, chap. 4, Johnson and Wichern 1992, Maxwell and Delaney 1990, chaps. 13, 14, Crowder and Hand 1990, chap. 4, Jones and Kenward 1989, chap. 7, Myers 1979, chap. 18), (e.g., Multiariate Analysis: Oeriew).
7. Between Subjects Designs 7.1 Completely Randomized Designs In a completely randomized design, each subject is assigned to just one treatment. The assignment is done completely at random so that each subject has exactly the same chance of being assigned to each possible treatment (stimulus or combination of stimuli). Often a restriction is applied so that each stimulus receives the same number of subjects. Completely randomized designs are simple to use and simple to analyse. They are most suited to situations where subjects are plentiful, where subjects’ responses would not be too variable if they were all given the same stimulus under the same experimental conditions, and where experimental conditions can be held constant throughout the experiment. If there are extraneous variables which add variation to the responses and which can be measured during the experiment (such as age or IQ), their effects can be removed during the analysis (analysis of covariance) (e.g., Winer et al. 1991 chap. 10). However, where it is possible for the extraneous variables to become confounding variables in an unfortunate randomization, a matched pairs design, block design, or within subjects design would be preferred (see Sects. 7.2 and 8).
As in all designs, it is possible to take repeated measurements in completely randomized designs. Each subject is measured for some number of time intervals after administration of the single treatment. Cotton (1998) calls such a design a ‘multigroup splitplot design.’ 7.2
Matched Pairs and Block Designs
If there are just two treatments of interest, the variability in the responses (observations) due to differences in the subjects themselves can be reduced by pairing the subjects so that, within each pair, the subjects are alike as possible. For each pair separately, the two subjects are assigned at random to the two treatments (a matched pairs design). When there are t treatments with t 2, the subjects are matched into groups (or blocks) of size t and, within in each group, the subjects are assigned at random to the t treatments. This type of design is usually known as a ‘randomized block design.’ Block designs are also appropriate when the extraneous variation is due to variables unrelated to the subjects. Experimental conditions cannot always be held constant throughout the experiment, as for example, changes in the weather, use of different testing centers, use of different laboratory technicians, etc. To combat this, the subjects would be put into groups of size t (not necessarily matched) and, apart from the assignment to different treatments, all subjects within a group would be tested under the same experimental conditions. If the number of treatments, t, to be compared is large (as is often the case in a factorial experiment where the treatments are combinations of several stimuli), there may not be enough subjects to form groups of t similar subjects, nor may it be possible to hold conditions constant for t measurements. In this case, an incomplete block design can be used, where each group of s( t) subjects is assigned at random to a preselected subset of s treatments (see, e.g., Winer 1971, chap. 9, Dean and Voss 1999, chaps. 11, 13).
8. Within Subject Designs In a within subjects design, each subject is essentially matched with himself or herself and assigned a sequence of some or all of the treatments. The order or presentation is decided using randomization for each subject separately. The comparison of any two treatments is made for each subject (‘within’ each subject) and then averaged over all the subjects. This has the advantage that subject to subject variability does not play a part in the comparison of the treatments. In experiments where stimuli are presented to each subject in quick succession, ‘carry-over effects’ can be a problem. For example, a subject asked to work in bright light followed by normal light may perceive the normal light to be darker than if it had been preceded 5093
Experimental Design: Oeriew by dim light. These carry-over effects can be mitigated by separating the trials by a period of time, called a ‘washout period,’ in which the subject is asked to do something completely different in some control state. If a long enough washout period cannot be arranged, then the experiment usually is counterbalanced (see Sect. 8.3). Carry-over effects are also known as ‘residual effects’ and the effect of the treatment administered in the current time period is called the ‘direct treatment effect.’ 8.1 Cross-oer Designs In the simplest within subjects design, any randomization of the order of treatments for any subject is accepted. However, if time period effects or carry-over effects are thought to be important and are to be included in the model, then it is desirable to exercise control over which treatment sequences are used. In most cases, the carry-over effect from a given treatment will be assumed to be the same no matter which treatment follows it. If the treatments interact, then this assumption may not be valid and larger designs and more complicated analyses are needed. When there is a small number of treatments, say two or three, and the subject can be measured over several time periods, then each subject can be assigned some or all of the treatments more than once. For three time periods and two treatments (one assigned to one period and one assigned to two periods), there are six possible treatment sequences; with four time periods and two treatments (each assigned to two periods), there are fourteen possible treatment sequences; with four time periods and four treatments (each assigned to one period), there are twelve possible sequences, and so on. If there are sufficient subjects, all the possible sequences can be used an equal number of times. If the number of subjects is small, however, it may not even be possible to use each sequence once. Information must then be used about which set of sequences provides the best design. Among the possibilities are ‘variance balanced designs’ (Sect. 8.2) and ‘latin squares’ (Sect. 8.3). In general, if it is possible to avoid them, two-period designs are not recommended. Not only can the carryover effects not be estimated independently of the treatment by period interaction, but also three-period designs are considerably more efficient (Jones and Kenward 1989, Sect. 4.16). 8.2 Balanced Cross-oer Designs Cross-over designs that allow differences between all pairs of treatments to be estimated with the same precision are called ‘variance balanced.’ Variance balanced designs include cross-over designs that use all possible treatment sequences, and counterbalanced latin square designs (see Sect. 8.3). Variance balanced designs, efficient for comparing all pairs of treatments, 5094
are tabulated by Ratkowski et al. (1993, chap. 5) and also by Jones and Kenward (1989, pp. 212–4, 223–4). The latter authors also list designs efficient for comparing test treatments with a control. The treatment given in the last period of a crossover design can be repeated in an extra period. Such designs, called ‘extra period designs,’ allow the carryover from the last treatment to be measured, thus increasing the precision of the treatment comparisons. Balanced designs typically require a large number of subjects when the number of treatments is large and so cannot always be used. An alternative is to use an efficient ‘partially balanced’ design. These designs allow treatment differences to be estimated with two or three different precisions that are fairly close in value. When the treatments are factorial in nature, the effects of the individual factors (main effects) and the interactions between the factors are usually of primary interest. Variance balance is then desirable for the comparisons of the levels of each factor separately (see Jones and Kenward 1989, pp. 222–8, Ratkowski et al. 1993, chap. 6, for tabulated designs). 8.3 Latin Squares A latin square design is ideal for any experiment in which it is possible to measure each subject under every treatment and, in addition, it is necessary to control for changing conditions over the course of the experiment. A latin square is a design in which each treatment is assigned to each time period the same number of times and to each subject the same number of times (see Dean and Voss 1999, chap. 12). If there are t treatments, t time periods, and mt subjects then m latin squares (each with t treatment sequences) would be used. Carry-over effects are controlled by using latin squares that are ‘counterbalanced’ (Cotton 1993). This means that, looking at the sequences of treatments assigned to all the subjects taken together, every treatment is preceded by every other treatment for the same number of subjects. Counterbalanced latin squares exist for any even number of treatments and for some odd numbers of treatments (e.g., t l 9, 15, 21, 27; see Jones and Kenward 1989 Sect. 5.2.2, for references). For other odd numbers, a pair of latin squares can be used which between them give a set of 2t counterbalanced sequences. If a carry-over effect is expected to persist for more than one time period, then the counterbalancing must be extended to treatements occurring more than one time period prior to the current treatment.
9.
Other Designs
9.1 Nested or Hierarchical Designs It is not unusual for extraneous variables to be ‘nested.’ For example, if subjects are recruited and tested
Experimental Design: Oeriew separately at different testing centers, the subjects are ‘nested within testing center.’ If subjects are animals such as mice or piglets, then the subjects are naturally nested within litters, which are nested within parent, which may be nested within laboratory. The nesting information can be used in matched designs, since the nesting forms natural groupings of like subjects. For within subjects designs, the nesting information can be used during the analysis for examining the different sources of extraneous variation (e.g., Hierarchical Models: Random and Fixed Effects). Designs in which different levels of nesting are assigned different treatment factors are called ‘split-plot designs’ (see Sect. 9.2). A second type of nesting is a nesting structure within the treatment factors being examined. Examples given by Myers (1979) include memorization of words within grammatical class; time taken to complete problems within difficulty levels. Models and analyses used in such experiments must reflect the nested treatment structure.
9.2 Split-plot Designs An experiment with more than one type of stimulus (factor) can be run as a split plot design with a level of one or more factors being applied to a subject throughout the course of the experiment (as for a between-subjects design), and the levels of the other factor(s) being changed for each time period (as for a within subjects design). Such designs are sometimes called ‘mixed designs.’ The stimuli applied as the within subjects design will generally be measured more accurately than those applied as the between subjects design, since subject to subject variability enters into the comparison of the latter. Split-plot designs are useful when it is difficult to change the levels of one of the factors. For example, Dean and Voss (1999, chap. 19) cite an example of an optokinetic experiment on the drift of focus of a subject’s eyes from the center of a rotating drum measured under two different lighting conditions. The change of lighting conditions was a time-consuming process, whereas it was simple to change the speed of rotation of the drum. Consequently, each subject was assigned a single lighting condition throughout an entire viewing session, and during the viewing session the subject was assigned a sequence of different speeds.
10. Optimality and Efficiency of Designs As pointed out by Cotton (1998), designs that are best (i.e., optimal) for one purpose are not necessarily best for another purpose and compromises may need to be made. The optimal design for investigating the effects in one model may be totally unsuitable for a different model. For example, in a cross-over design with two
treatments and two time periods, a set of counterbalanced latin squares provides an optimal design for estimating direct treatment effects and carry-over effects. This design does not, however, allow estimation of both a carry-over effect and an interaction between treatments and time periods. Thus, if both of these effects are required in the model, then more than two time periods must be used in the experiment (e.g., Statistical Identification and Estimability). As a general rule, the most balanced designs are optimal when interest lies equally in all treatment comparisons. The following features are typical characteristics of balance; every treatment is assigned the same number of subjects, every treatment is observed in every time period the same number of times, every treatment is preceded by every other treatment (including itself, if possible) for the same number of subjects and in the same number of time periods. When balance is not achievable, computer programs for generating optimal designs are commercially available. For other settings, where comparison of all treatments is not the main goal of the experiment, more sophisticated algorithms are needed see, for example, Atkinson and Donev (1992). For a discussion of design optimality from a Bayesian perspective, see for example, Experimental Design: Bayesian Designs.
Bibliography Atkinson A C, Donev A N 1992 Optimum Experimental Design. Oxford Science Publications, Oxford, UK Breakwell G M, Hammond S, Fife-Shaw C (eds.) 1995 Research Methods in Psychology. Sage, London Cotton J W 1993 Latin square designs. In: Edwards L K (ed.) Applied Analysis of Variance. Dekker, New York Cotton J W 1998 Analyzing Within-subjects Experiments. Erlbaum, Marwah, NJ Cox D R 1958 Planning of Experiments. Wiley, New York Crowder M J, Hand D J 1990 Analysis of Repeated Measures. Chapman and Hall, London Dean A M, Voss D T 1999 Design and Analysis of Experiments. Springer Verlag, New York Edwards L K (ed.) 1993 Applied Analysis of Variance in Behaioral Sciences. Dekker, New York Fisher R A 1951 The Design of Experiments, 6th edn. Oliver and Boyd, Edinburgh, UK Harris P 1986 Designing and Reporting Experiments. Open University Press, Milton Keynes, UK Johnson R A, Wichern D W (eds.) 1992 Applied Multiariate Statistics, 3rd edn. Prentice Hall, Englewood Cliffs, NJ Jones B, Kenward M G 1989 Design and Analysis of Cross-oer Trials. Chapman and Hall, London Keppel G 1982 Design and Analysis: A Resercher’s Handbook, 2nd edn. Prentice Hall, Englewood Cliffs, NJ Kirk R E 1982 Experimental Design: Procedures for the Behaioral Sciences, 2nd edn. Brooks\Cole, Belmont, CA Kratochwill T R, Levin J R 1992 Single-case Research Design and Analysis: New Directions for Psychology and Education. Erlbaum, Hillsdale, NJ Kuehl R O 1994 Statistical Principles of Research Design and Analysis. Duxbury, Belmont, CA
5095
Experimental Design: Oeriew Leach J 1991 Running Applied Psychology Experiments. Open University Press, Milton Keynes, UK Maxwell S E, Delaney H D 1990 Designing Experiments and Analyzing Data. A Model Comparison Perspectie. Wadsworth, Belmont, CA Moore D S, McCabe G P 1999 Introduction to the Practice of Statistics, 3rd edn. Freeman, New York Myers J L 1979 Fundamentals of Experimental Design, 3rd edn. Allyn and Bacon, Boston Ratkowski D A, Evans M A, Alldredge J R 1993 Cross-oer Experiments. Design, Analysis and Application. Dekker, New York Senn S 1993 Cross-oer Trials in Clinical Research. Wiley, New York Wilson S L 1995 Single case experimental designs. In: Breakwell G M, Hamond S, Fife-Shaw C (eds.) Research Methods in Psychology. Sage, London Winer B J 1971 Statistical Principles in Experimental Design, 2nd edn. McGraw-Hill, New York Winer B J, Brown D R, Michels K M 1991 Statistical Principles in Experimental Design, 3rd edn. McGraw-Hill, New York
A. M. Dean Copyright # 2001 Elsevier Science Ltd. All rights reserved.
Experimental Design: Randomization and Social Experiments 1. Definition, Functions, and Rationale In a randomized field trial, a sample of individuals or of entities, such as schools or hospitals, are randomly assigned to one of two or more interventions. These interventions may be different programs or variations on the same program, different intensities of social service, different mixes of therapies, and so on. The interventions may include a ‘control’ condition in which no services, or services that are ordinarily available, are provided (see Experimental Design: Oeriew). For example, to understand whether a new employment and training program works well, one randomly allocates some eligible individuals to the new program. All or some of the remaining eligible individuals are randomly allocated to the alternative intervention. This alternative may be a normally available training service, i.e., a control condition, or a program with lower levels of service, or a different service. This random assignment of individuals to a new employment and training program and to an alternative program permits one to make a fair comparison of the wage rates achieved by the people who were assigned to each group. The value that is added by a new program can be estimated well because the people who were assigned to the conventional services and to the new program are equivalent on account of the random assignment. Furthermore, one can take into
account the chance statistical differences among individuals who were assigned to each group, a fact that is important in science and law (Kaye and Freeman 1994). The logic that underlies a randomized trial is as follows. To estimate the relative effect of a particular intervention, one has to estimate the status of individuals (or entities) had they not had the intervention. At times, a precise forecast can be made about how individuals or entities would fare in the absence of the intervention. Also at times, one may be able to construct what appears to be a fair comparison group that represents the state of people (or institutions) in the absence of the particular intervention. Under these circumstances, quasi-experiments or observational studies might then be employed to generate evidence about the relative effectiveness of programs. These approaches try to make a fair comparison of the outcome of alternative interventions based on sophisticated statistical models of how people and entities behave, and on conscientious attention to competing explanations about why a difference in the effects of interventions might appear. See Maynard and Chalmers (1997) and Quasi-Experimental Designs. Often, the researcher cannot make good forecasts, construct defensible ad hoc comparison groups, or account for competing explanations for a difference among interventions in a quasi-experiment. A randomized trial may than be warranted because it assures that the prediction is fair: the comparison is fair simply because people or entities are randomly allocated into two or more equivalent groups. It assures that competing explanations for people’s (or entities’) behaviors are taken into account because both groups are subject to the same influences, apart from the interventions. Social and behavioral scientists and statisticians use phrases other than ‘randomized field trial’ to describe this approach to generating evidence. These phrases include randomized experiment, randomized trial, randomized clinical trial (RCT) in medical research, and controlled experiment. Different phrases are used to identify trials in which entities such as schools or hospitals are randomly assigned to different interventions. These descriptors include cluster-randomized trials, group randomized experiments, macro-experiments, and place-based trials (Murray 1998, Boruch and Foley 2000).
2. Examples of Randomized Field Trials Good examples of randomized trials are not difficult to locate, although such trials are in the minority of studies on the effects of interventions. Boruch (1997) gives many such examples. In Switzerland, for example, randomized trials have been carried out to understand the effects of heroin therapy vs. conventional treatment of drug addicts and to estimate
5096
International Encyclopedia of the Social & Behavioral Sciences
ISBN: 0-08-043076-7