Experimental Design: Overview Angela M Dean, The Ohio State University, Columbus, OH, USA Ó 2001 Elsevier Ltd. All rights reserved. This article is reproduced from the previous edition, volume 8, pp. 5090–5096, Ó 2001, Elsevier Ltd.
Abstract Experimental design is the art of devising an experimental plan that maximizes the information that can be obtained on the effects of stimuli (treatments) presented to subjects. The article discusses basic features of good design, including comparison of treatments, randomization, avoidance of bias, and reduction of variability. Designs are classified as between-subjects designs, where subjects are assigned one treatment each, and within-subjects designs, where subjects are assigned sequences of treatments. Between-subjects designs include completely randomized designs, where subjects are assigned completely at random to treatments, and matched designs or block designs, where subjects are divided into groups (blocks) of like subjects and random assignment is done within each block separately. Within-subjects designs include crossover designs and splitplot designs. In these, there can be problems of carryover effects, where each measurement may depend not only upon the treatment given in that time period but also on the preceding treatment(s). Counterbalanced Latin squares, which have every treatment preceding every other treatment the same number of times, can be used to isolate treatment effects from carryover effects.
Many scientific discoveries are made by observing how a change in a stimulus that is presented to a subject or object affects the response (measurement) given by that subject or object. In an experiment, the investigator has direct control over which stimuli are presented to which subjects in which time periods. This control, when exercised correctly, enables the investigator to deduce a ‘cause and effect’ relationship, that is, the investigator can deduce that a given change in a stimulus causes a given change in the measured response. The plan of how an experiment is to proceed is called the ‘design of the experiment.’ The art of experimental design is the art of devising an experimental plan that maximizes the information that can be obtained on the effects of the stimuli.
Terminology Experimentation is used in almost every branch of science, with the result that the terminology used in experimental design is not quite standardized. For example, in some fields, the subject or object which is to be presented with a stimulus and then to be measured is called a ‘unit’ or an ‘experimental unit.’ The stimulus, itself, may be called the ‘treatment’ or the ‘level of a factor’ or the ‘level of an independent variable.’ In factorial experiments, where a subject is presented with a combination of different types of stimuli (such as a particular light intensity together with a particular noise level), the combination may be called a ‘treatment combination’ but, for simplicity, the term ‘treatment’ may be taken to mean either a single stimulus or a combination of stimuli, depending on the context. In some experiments, measurements are made on each subject over several time periods. These are known as ‘repeated measurements.’ The terminology concerning the associated designs again differs among disciplines. The term ‘repeated measurements design’ may refer solely to a design involving repeated measurements on a subject to whom a single treatment has been administered or it may include designs in which the
International Encyclopedia of the Social & Behavioral Sciences, 2nd edition, Volume 8
treatment is changed before each measurement. The latter type of design is also known as a ‘within-subjects design’ or a ‘block design’ or a ‘crossover design.’ All of these designs are grouped under the heading of ‘split-plot designs’ by some authors, while others reserve this last term for a design with two types of treatment, one of which is held constant throughout the repeated measurements on a subject and the other of which changes before each measurement (see Section Split-Plot Designs).
The Purpose of Experimental Design The pioneer in statistical experimental design was Sir R.A. Fisher (Fisher, 1951), who was concerned with maximizing the amount of information about agricultural crop production. In the social and behavioral sciences, the questions of scientific interest are very different, but the art of good experimental design is similar. Every experiment has a budget (time and money) and a limit as to the number of subjects that can be recruited. Also, every experiment has inherent variability: subjects differ from one another in fundamental ways, technicians differ in how they read measuring instruments, subjects and instrument readings vary over time, and so on. Variability translates into uncertainty and uncertainty reduces the amount of useful information. Information gained from an experiment may be viewed either from a sampling perspective or from a Bayesian perspective. In the former, hypothesis tests and confidence intervals are generally used, and the most informative experimental designs yield the most powerful tests and the shortest intervals, while producing unbiased results. For a Bayesian analysis, the most informative designs are those that maximize the expected utility. The purpose of designing an experiment is (1) to maximize the amount of information that can be gained within a given budget or (2) to minimize the budget required to obtain a given amount of information. Maximizing information is done by controlling and reducing the effect of extraneous
http://dx.doi.org/10.1016/B978-0-08-097086-8.42039-8
535
536
Experimental Design: Overview
variables and by removing confounding variables and bias, as described in Sections Control of Extraneous and Confounding Variables and Removal of Bias.
Features of Good Design Comparison Experiments, by their nature, tend to be comparative. Questions of interest tend to be of the type ‘does this treatment elicit a “better” response than that treatment?’ and, if so, ‘how much better?’ Even if a single treatment appears to be the only one of interest, information about its effect on a subject is usually of no value without a comparison with other treatments. For instance, Moore and McCabe (1999: Chapter 3) cite an example of a medical study that showed a gastric freezing technique to be a good treatment for ulcer pain, but in later comparative experiments it was shown that the pain relief using freezing was no better than the relief achieved using the same technique but with no freezing solution. Apparently, the pain relief was due to nothing more than doctors showing concern for their patients (a placebo effect) and the freezing technique was abandoned. A good design for evaluating the effect of a single treatment will, therefore, always include a second treatment, called a control or control treatment, for comparison purposes. In some experiments, the control is the treatment in current use, and in other experiments it is the ‘absence of a stimulus.’ An experiment with more than one treatment needs no control since the experimental treatments can be compared among themselves. Nevertheless, a control can often add extra information. For example, in an experiment on how different types of background music (such as ‘classical,’ ‘rock,’ ‘rap,’ etc.) affect the time taken to learn a new task, a control treatment might be the absence of background music.
Control of Extraneous and Confounding Variables A good experimental design allows a particular set of stimuli to be compared with each other with high accuracy. Therefore, any other variable (or factor) that causes the experimental measurements to vary at best reduces the efficiency of the experiment and at worst completely masks the true effect being investigated. For example, in an experiment to investigate the effect of employing different types of memory aid in memorization, the IQ of the subjects may play an important role. The variability of IQ of subjects in the experiment would then contribute to the variability in the measured memorization scores. One way of controlling the effect of such an extraneous variable is to hold the variable fixed during the experiment. For instance, IQ could be held more or less fixed by using as subjects only people with a tested IQ within a certain range. Although it reduces variability and avoids masking the effects of the memory aids, this strategy limits the applicability of the conclusions of the experiment since the results would apply only to the people in the population with IQ within this range. A preferable method of controlling extraneous variability is to use a matched design (see below). An extraneous variable whose effect is completely muddled with that of the factor(s) of interest is called a confounding variable. In the above example, if all the subjects with high IQ
were to be tested using the first memory aid and all those with low IQ were to be tested using the second, then, if high IQ is correlated with memorization ability, the first memory aid will inevitably appear to be the better one, regardless of its true merits. The masking effect of a confounding variable can be reduced by randomization. The simplest form of randomization leads to a ‘completely randomized design’ (see Section Completely Randomized Designs); subjects are recruited from the general public as randomly as is possible (see Section Selection of Subjects) and then assigned to the stimuli at random in such a way that every subject has the same chance of being assigned to any one of the stimuli. In such a design, it is likely, although not guaranteed, that each stimulus will receive roughly the same distribution of values of the extraneous variable. If a completely randomized design is used in the above example, the memory aids should receive roughly similar ranges of subject IQs. In all experiments there are confounding variables that have small effects, which are ignored by the experimenter, and confounding variables that are accidentally overlooked. The use of randomization helps to spread out the effects of these variables so that the response given to any one stimulus is less likely to be inflated upward or downward due to extraneous factors. For a discussion of random assignment using a random number table or a computer random number generator, see, for example, Dean and Voss (1999: Chapters 1 and 3). A more foolproof method of ensuring similar distributions of IQ levels for each memory aid in the above example is to divide the subjects into groups so that subjects within the same group have similar IQs, and then to make the random assignment of subjects to stimuli within each group separately. The division into groups provides the control necessary for removing the variability and masking due to the extraneous IQ variable, while maintaining the applicability to the general population. This type of design is called a ‘matched design’ or ‘block design’ (see Section Matched Pairs and Block Designs). Matched designs and completely randomized design are examples of ‘betweensubjects designs’ (see Section Between-Subjects Designs). A further possibility is to measure each subject under a sequence of different treatments – a ‘crossover design’ or ‘within-subjects design’ (see Section Within-Subject Designs). The use of such a design in the above example would completely control the extraneous variable since it would ensure that the distribution of IQs assigned to each memory aid is identical. However, new extraneous variables have now been introduced such as fatigue on the part of the subject over the course of the experiment. If the effects of the new extraneous variables are likely to be small, they can be ignored, but if they are large, then the variables should be controlled by making sure that subjects are assigned the memory aids in different orders. The orders can be assigned at random for each subject separately – a ‘block design,’ or by deliberately making sure that each stimulus is viewed by the same number of subjects in each time period – a ‘Latin square design.’
Removal of Bias Since each experiment is run with a particular purpose in mind, experimenters tend to have built-in, although perhaps subconscious, biases toward or against certain treatments. A random
Experimental Design: Overview
assignment of subjects to treatments and a random ordering of observations ensure that experimenter bias cannot consciously or unconsciously favor one treatment above another. Subjects’ own biases toward the treatments can affect their responses. It may or may not be possible or ethical to conceal the true nature of the experiment from the subjects. It may be possible, however, to mask from both the subjects and the person(s) running the experiment which of the stimuli is the experimental treatment(s) and which is the control (a ‘doubleblind experiment’). Leach (1991: Section 24, Appendix 1) discusses ethical issues of concealing information from subjects and lists the guidelines published by the British Psychological Society. Kirk (1982: Section 1.5) gives references to ethical guidelines put out by the American Psychological Association, the American Sociological Association, and other bodies.
537
Within-subjects designs are preferred in fields where subjects are scarce or highly variable. The length of the sequence of treatments that can be presented to any one subject depends upon the nature of the experiment. For example, in experimentation involving multiple visits to a laboratory, subject tolerance can be as low as two or three visits. In experiments in which treatments can be changed rapidly with no long-term effect on the subject, a much longer sequence of treatments can be used, requiring fewer subjects in total. There are exceptional circumstances in which an experiment is conducted on a single subject (see Wilson, 1995; Kratochwill and Levin, 1992), but such experiments cannot give conclusive information about the population as a whole and are not used in standard experimentation.
Models and Analysis Planning an Experiment Guides to planning experiments can be found in many texts, for example, Myers (1979: Chapter 1), Cox (1958), Dean and Voss (1999: Chapter 2), and Leach (1991: Sections 5–9). A protocol, which gives in great detail, step-by-step, how the experiment is to proceed, is usually prepared well in advance. The protocol includes details about subject selection, measurement and data collection methods, preparation of materials, preparation of subjects, and a draft statistical analysis. A pilot experiment in which a small number of observations are collected is often run early in the planning stage. Although these observations will usually be thrown away when the main experiment commences, the pilot experiment gives an opportunity to check that the experimental procedure will work as planned and that the required analyses are possible. It also gives indications of unexpected important confounding variables and the likely accuracy of the results. It allows problems to be detected and corrected before they arise in the main experiment.
Selection of Subjects Ideally, the subjects taking part in the experiment are selected at random from the population to which the conclusions of the experiment will be applied (the ‘target population’). Since this is not always possible, the experimenter may be forced to use volunteers who will inevitably come from a subset of the population. The results of the experiment may or may not then apply to the entire target population. The number of subjects required to achieve the goals of powerful hypothesis tests and short confidence intervals, or of maximum expected utility, can be calculated using statistical formulae. The required number of subjects depends upon the design selected for the experiment, on the comparisons of interest, and on the variability of the responses of subjects when assigned the same treatment under identical experimental conditions. In general, ‘between-subjects designs’ require many more subjects than ‘within-subjects designs.’ Between-subjects designs are used in fields where subjects can be assigned only a single treatment (as in the evaluation of different teaching methods) and/or where subjects are sufficiently plentiful to offset the subject-to-subject variability.
The model links the dependent (response) variable(s) to all of the factors (independent variables) that could influence the response, such as the various stimuli, the subjects, time periods, extraneous variables that were used in determining the design, and other important variables that can be taken into account only during the analysis (called ‘concomitant variables’ or ‘covariates’: see, e.g., Maxwell and Delaney, 1990: Chapter 9). All extraneous variables that were ignored in the design are grouped together in a single ‘error variable.’ A measurement on a subject during a time period before the experiment begins is called a ‘baseline measurement’ and can be used to increase the accuracy of the results (see, e.g., Jones and Kenward, 1989: Chapter 2, Section 4.4; Cotton, 1998: Chapters 9, 10). If the distributions of the error variables are identical and approximately normal, then standard analysis of variance techniques can be used. Designs in which treatments are assigned the same numbers of subjects and the same numbers of time periods are the easiest to analyze and interpret. They are also the least sensitive to assumptions about normality of the error distributions and equal variances. When the errors do not follow a normal distribution, other types of analysis are needed such as analysis of generalized linear models and categorical data analysis (e.g., see Ratkowski et al., 1993: Chapter 7; Jones and Kenward, 1989: Chapter 3; Crowder and Hand, 1990: Chapter 8). Repeated measurements on a subject under a particular treatment may require a time series or regression analysis (see Crowder and Hand, 1990). Bayesian techniques in analyzing crossover designs are mentioned by Jones and Kenward (1989: pp. 80, 235). In the case of correlated responses per subject, multivariate analysis is used for normally distributed errors (see, e.g., Winer et al., 1991: Chapter 4; Johnson and Wichern, 1992; Maxwell and Delaney, 1990: Chapters 13, 14; Crowder and Hand, 1990: Chapter 4; Jones and Kenward, 1989: Chapter 7; Myers, 1979: Chapter 18).
Between-Subjects Designs Completely Randomized Designs In a completely randomized design, each subject is assigned to just one treatment. The assignment is done completely at random so that each subject has exactly the same chance of
538
Experimental Design: Overview
being assigned to each possible treatment (stimulus or combination of stimuli). Often a restriction is applied so that each stimulus receives the same number of subjects. Completely randomized designs are simple to use and analyze. They are most suited to situations where subjects are plentiful, where subjects’ responses would not be too variable if they were all given the same stimulus under the same experimental conditions, and where experimental conditions can be held constant throughout the experiment. If there are extraneous variables that add variation to the responses and that can be measured during the experiment (such as age or IQ), their effects can be removed during the analysis (analysis of covariance) (e.g., Winer et al., 1991: Chapter 10). However, where it is possible for the extraneous variables to become confounding variables in an unfortunate randomization, a matched pairs design, block design, or within-subjects design would be preferred (see Sections Matched Pairs and Block Designs and Within-Subject Designs). As in all designs, it is possible to take repeated measurements in completely randomized designs. Each subject is measured for some number of time intervals after administration of the single treatment. Cotton (1998) calls such a design a ‘multigroup split-plot design.’
Matched Pairs and Block Designs If there are just two treatments of interest, the variability in the responses (observations) due to differences in the subjects themselves can be reduced by pairing the subjects so that, within each pair, the subjects are as alike as possible. For each pair separately, the two subjects are assigned at random to the two treatments (a matched pairs design). When there are t treatments with t > 2, the subjects are matched into groups (or blocks) of size t, and within each group the subjects are assigned at random to the t treatments. This type of design is usually known as a ‘randomized block design.’ Block designs are also appropriate when the extraneous variation is due to variables unrelated to the subjects. Experimental conditions cannot always be held constant throughout the experiment, as for example, changes in the weather, use of different testing centers, use of different laboratory technicians, and so on. To combat this, the subjects would be put into groups of size t (not necessarily matched) and, apart from the assignment to different treatments, all subjects within a group would be tested under the same experimental conditions. If the number of treatments, t, to be compared is large (as is often the case in a factorial experiment where the treatments are combinations of several stimuli), there may not be enough subjects to form groups of t similar subjects nor may it be possible to hold conditions constant for t measurements. In this case, an incomplete block design can be used, where each group of s(
Within-Subject Designs In a within-subjects design, each subject is essentially matched with himself or herself and assigned a sequence of some or all
of the treatments. The order or presentation is decided using randomization for each subject separately. The comparison of any two treatments is made for each subject (‘within’ each subject) and then averaged over all the subjects. This has the advantage that subject-to-subject variability does not play a part in the comparison of the treatments. In experiments where stimuli are presented to each subject in quick succession, ‘carryover effects’ can be a problem. For example, a subject asked to work in bright light followed by normal light may perceive the normal light to be darker than if it had been preceded by dim light. These carryover effects can be mitigated by separating the trials by a period of time, called a ‘washout period,’ in which the subject is asked to do something completely different in some control state. If a long enough washout period cannot be arranged, then the experiment usually is counterbalanced (see Section Latin Squares). Carryover effects are also known as ‘residual effects’ and the effect of the treatment administered in the current time period is called the ‘direct treatment effect.’
Crossover Designs In the simplest within-subjects design, any randomization of the order of treatments for any subject is accepted. However, if time period effects or carryover effects are thought to be important and are to be included in the model, then it is desirable to exercise control over which treatment sequences are used. In most cases, the carryover effect from a given treatment will be assumed to be the same no matter which treatment follows it. If the treatments interact, then this assumption may not be valid and larger designs and more complicated analyses are needed. When the number of treatments is small, say two or three, and the subject can be measured over several time periods, then each subject can be assigned some or all of the treatments more than once. For three time periods and two treatments (one assigned to one period and one assigned to two periods), there are 6 possible treatment sequences; with four time periods and two treatments (each assigned to two periods), there are 14 possible treatment sequences; with four time periods and four treatments (each assigned to one period), there are 12 possible sequences, and so on. If there are sufficient subjects, all the possible sequences can be used an equal number of times. If the number of subjects is small, however, it may not even be possible to use each sequence once. Information must then be used about which set of sequences provides the best design. Among the possibilities are ‘variance balanced designs’ (see Section Balanced Crossover Designs) and ‘Latin squares’ (see Section Latin Squares). In general, if it is possible to avoid them, two-period designs are not recommended. Not only can the carryover effects not be estimated independently of the treatment by period interaction but also three-period designs are considerably more efficient (Jones and Kenward, 1989: Section 4.16).
Balanced Crossover Designs Crossover designs that allow differences between all pairs of treatments to be estimated with the same precision are
Experimental Design: Overview called ‘variance balanced.’ Variance balanced designs include crossover designs that use all possible treatment sequences, and counterbalanced Latin square designs (see Section Latin Squares). Variance balanced designs, efficient for comparing all pairs of treatments, are tabulated by Ratkowski et al. (1993: Chapter 5) and also by Jones and Kenward (1989: pp. 212–214, 223–224). The latter authors also list designs efficient for comparing test treatments with a control. The treatment given in the last period of a crossover design can be repeated in an extra period. Such designs, called ‘extraperiod designs,’ allow the carryover from the last treatment to be measured, thus increasing the precision of the treatment comparisons. Balanced designs typically require a large number of subjects when the number of treatments is large and so cannot always be used. An alternative is to use an efficient ‘partially balanced’ design. These designs allow treatment differences to be estimated with two or three different precisions that are fairly close in value. When the treatments are factorial in nature, the effects of the individual factors (main effects) and the interactions between the factors are usually of primary interest. Variance balance is then desirable for the comparisons of the levels of each factor separately (see Jones and Kenward, 1989: pp. 222–228; Ratkowski et al., 1993: Chapter 6, for tabulated designs).
Latin Squares A Latin square design is ideal for any experiment in which it is possible to measure each subject under every treatment and, in addition, it is necessary to control for changing conditions over the course of the experiment. A Latin square is a design in which each treatment is assigned to each time period and to each subject the same number of times (see Dean and Voss, 1999: Chapter 12). If there are t treatments, t time periods, and mt subjects, then m Latin squares (each with t treatment sequences) would be used. Carryover effects are controlled by using Latin squares that are ‘counterbalanced’ (Cotton, 1993). This means that, looking at the sequences of treatments assigned to all the subjects taken together, every treatment is preceded by every other treatment for the same number of subjects. Counterbalanced Latin squares exist for any even number of treatments and for some odd numbers of treatments (e.g., t ¼ 9, 15, 21, 27; see Jones and Kenward, 1989: Section 5.2.2, for references). For other odd numbers, a pair of Latin squares can be used, which between them give a set of 2t counterbalanced sequences. If a carryover effect is expected to persist for more than one time period, then the counterbalancing must be extended to treatments occurring more than one time period prior to the current treatment.
Other Designs Nested or Hierarchical Designs It is not unusual for extraneous variables to be ‘nested.’ For example, if subjects are recruited and tested separately at different testing centers, the subjects are ‘nested within testing center.’ If subjects are animals such as mice or piglets, then the
539
subjects are naturally nested within litters, which are nested within parent, which may be nested within laboratory. The nesting information can be used in matched designs, since the nesting forms natural groupings of like subjects. For withinsubjects designs, the nesting information can be used during the analysis for examining the different sources of extraneous variation. Designs in which different levels of nesting are assigned different treatment factors are called ‘split-plot designs’ (see Section Split-Plot Designs). A second type of nesting is a nesting structure within the treatment factors being examined. Examples given by Myers (1979) include memorization of words within grammatical class, time taken to complete problems within difficulty levels, and so on. Models and analyses used in such experiments must reflect the nested treatment structure.
Split-Plot Designs An experiment with more than one type of stimulus (factor) can be run as a split-plot design with a level of one or more factors being applied to a subject throughout the course of the experiment (as for a between-subjects design), and the levels of the other factor(s) being changed for each time period (as for a within-subjects design). Such designs are sometimes called ‘mixed designs.’ The stimuli applied as the within-subjects design will generally be measured more accurately than those applied as the between-subjects design, since subject-to-subject variability enters into the comparison of the latter. Split-plot designs are useful when it is difficult to change the levels of one of the factors. For example, Dean and Voss (1999: Chapter 19) cite an example of an optokinetic experiment on the drift of focus of a subject’s eyes from the center of a rotating drum measured under two different lighting conditions. The change of lighting conditions was a time-consuming process, whereas it was simple to change the speed of rotation of the drum. Consequently, each subject was assigned a single lighting condition throughout an entire viewing session, and during the viewing session the subject was assigned a sequence of different speeds.
Optimality and Efficiency of Designs As pointed out by Cotton (1998), designs that are best (i.e., optimal) for one purpose are not necessarily best for another purpose and compromises may need to be made. The optimal design for investigating the effects in one model may be totally unsuitable for a different model. For example, in a crossover design with two treatments and two time periods, a set of counterbalanced Latin squares provides an optimal design for estimating direct treatment effects and carryover effects. This design does not, however, allow estimation of both a carryover effect and an interaction between treatments and time periods. Thus, if both of these effects are required in the model, then more than two time periods must be used in the experiment. As a general rule, the most balanced designs are optimal when interest lies equally in all treatment comparisons. The following features are typical characteristics of balance: every treatment is assigned the same number of subjects, every treatment is observed in every time period the same number of
540
Experimental Design: Overview
times, and every treatment is preceded by every other treatment (including itself, if possible) for the same number of subjects and in the same number of time periods. When balance is not achievable, computer programs for generating optimal designs are commercially available. For other settings, where comparison of all treatments is not the main goal of the experiment, more sophisticated algorithms are needed; see, for example, Atkinson and Donev (1992) for a discussion of design optimality from a Bayesian perspective.
Bibliography Atkinson, A.C., Donev, A.N., 1992. Optimum Experimental Design. Oxford Science Publications, Oxford, UK. Breakwell, G.M., Hammond, S., Fife-Shaw, C. (Eds.), 1995. Research Methods in Psychology. Sage, London. Cotton, J.W., 1993. Latin square designs. In: Edwards, L.K. (Ed.), Applied Analysis of Variance. Dekker, New York. Cotton, J.W., 1998. Analyzing Within-Subjects Experiments. Erlbaum, Marwah, NJ. Cox, D.R., 1958. Planning of Experiments. Wiley, New York. Crowder, M.J., Hand, D.J., 1990. Analysis of Repeated Measures. Chapman and Hall, London. Dean, A.M., Voss, D.T., 1999. Design and Analysis of Experiments. Springer Verlag, New York. Edwards, L.K. (Ed.), 1993. Applied Analysis of Variance in Behavioral Sciences. Dekker, New York. Fisher, R.A., 1951. The Design of Experiments, sixth ed. Oliver and Boyd, Edinburgh, UK.
Harris, P., 1986. Designing and Reporting Experiments. Open University Press, Milton Keynes, UK. Johnson, R.A., Wichern, D.W. (Eds.), 1992. Applied Multivariate Statistics, third ed. Prentice Hall, Englewood Cliffs, NJ. Jones, B., Kenward, M.G., 1989. Design and Analysis of Cross-Over Trials. Chapman and Hall, London. Keppel, G., 1982. Design and Analysis: A Researcher’s Handbook, second ed. Prentice Hall, Englewood Cliffs, NJ. Kirk, R.E., 1982. Experimental Design: Procedures for the Behavioral Sciences, second ed. Brooks/Cole, Belmont, CA. Kratochwill, T.R., Levin, J.R., 1992. Single-case Research Design and Analysis: New Directions for Psychology and Education. Erlbaum, Hillsdale, NJ. Kuehl, R.O., 1994. Statistical Principles of Research Design and Analysis. Duxbury, Belmont, CA. Leach, J., 1991. Running Applied Psychology Experiments. Open University Press, Milton Keynes, UK. Maxwell, S.E., Delaney, H.D., 1990. Designing Experiments and Analyzing Data. A Model Comparison Perspective. Wadsworth, Belmont, CA. Moore, D.S., McCabe, G.P., 1999. Introduction to the Practice of Statistics, third ed. Freeman, New York. Myers, J.L., 1979. Fundamentals of Experimental Design, third ed. Allyn and Bacon, Boston. Ratkowski, D.A., Evans, M.A., Alldredge, J.R., 1993. Cross-Over Experiments. Design, Analysis and Application. Dekker, New York. Senn, S., 1993. Cross-Over Trials in Clinical Research. Wiley, New York. Wilson, S.L., 1995. Single case experimental designs. In: Breakwell, G.M., Hamond, S., Fife-Shaw, C. (Eds.), Research Methods in Psychology. Sage, London. Winer, B.J., 1971. Statistical Principles in Experimental Design, second ed. McGraw-Hill, New York. Winer, B.J., Brown, D.R., Michels, K.M., 1991. Statistical Principles in Experimental Design, third ed. McGraw-Hill, New York.