Evaluation in Education,
RESEARCH
1980. Vol. 4. pp. 5-10.
Peryamon
INTEGRATION:
Herbert
Press Ltd. Printed in Great Britain.
AN
J. Walberg
INTRODUCTION
and Edward
University of Illinois at Chicago College of Education, Chicago,
AND
OVERVIEW
H. Haettel
Circle
Illinois sc)Gwo, USA
importance of a scientific work can be measured by the number of previous publications it makes superfZuous to read." David HiZbert, Mathematician
“The
It has often been forgotten in the flood of publications of the last decade or two that two esthetic values in science are parsimony and replication. In educational research and evaluation, for example, the number of possible causes and effects seems to have multiplied; and recent claims that instructional and psychological treatments are narrowly effective only under certain conditions, at certain times, and for particular classes of people have encouraged the false equating of complex interactions with scientific progress. Too often, moreover, are single studies perhaps done most recently or covered in the press - cited as definitively supporting or dis-crediting a hypothesis or practical technique while dozens of other relevant studies are ignored or dismissed. It is understandable that such things have occurred. Investigators naturally tend to speak on the basis of research they have personally conducted; and they wish to convey, quite appropriately perhaps, the limited scope of their studies as well as possible complexities in the findings. Post often the investigator has only conducted a single study rather than a decade-long series of programmatic efforts and a comprehensive integration of previous work of others. . To concentrate our energies on more primary research and evaluation studies without systematic integration of previous studies is scientifically and educationally wasteful. Systematic integration allows us to bring less experienced research students and workers to the frontiers of knowledge. It permits us to assign degrees of confidence to conclusions, to estimate their scope of applicability, and to point to the most fruitful directions for subsequent empirical study. To the extent that education is or should be theoretical, quantitative research integration facilitates the comparison of theoretical inferences with comprehensive sets of relevant data. Education, moreover, is certdinly an applied field, requiring policy and practical decisions. Such decisions should be based on an explicit summary and assessment of the complete corpus or relevant evidence. If simple, widely generalisable conclusions can be reasonably and fairly given, SO much the better for policy makers and those who will implement the decision. If there are important uncertainties and complexities, and limited generality in the conclusions, they suggest subsequent research and caution in decision making and implementation.
5
6
H. J. Walberg
and E. H. Haertel
PURPOSE
AND
SCOPE
The purpose of the present issue of EuaZuat?:on in Educatir~n is to provide a forum of ideas, methods, and findings of investigators who are currently carrying out research integration in a variety of substantive areas, mostly in education but also in psychology, medicine, and child rearing. All have implications for educational policy or research-integration methods. The authors of the papers have kindly responded to our invitation to present their current work concisely even though some are beginning their work and others are midway or nearing completion. We believe that most of the pioneers in the young and rapidly growing movement of research integration are represented. Since about 80 investigators and 45 papers are included, the reader may grasp the scope and direction of much of the current work in educational research integration. The paper authors have been held to the limited amount of space we could afford to offer them; yet a considerable amount of concrete numerical information may be found in these pages. In addition, a great number of references to published and unpublished works and to papers in preparation are cited; and the addresses of the authors are given to encourage correspondence among authors and readers who wish to gain further details and subsequent results of on-going work. It is our hope that this issue will increase the speed of communication and development of research integration around the world while encouraging constructive, yet critical attitudes and standards. A few words on terminology may be in order. The various authors of the papers in this issue and of papers cited here have employed a variety of terms such as 'combining results', 'accumulating evidence', 'meta-analysis', and 'research synthesis'. Each of these and the way it is used by a particular author requires an operational definition or citation of one. The terms 'research integration' as used in the title of this issue is intended to suggest the inclusion of all of these methods (and several others such as conceptual and programmatic synthesis and reviews of reviews discussed in the last set of papers in this issue). These approaches have the common aim of bringingtogether and weighing the evidence on a topic, and represent different but related techniques, likely to converge in the long run on the same conclusions. The remainder of this introductory paper is divided into two parts. The first of these defines and gives examples of the chief quantitative techniques of research integration, to help to make the remaining papers comprehensible. The concluding part provides an overview of the remainder of the volume.
QUANTITATIVE
METHODS
OF RESEARCH
INTEGRATION
The basic idea in quantitative research integration is to apply statistical methods In the simplest with the published statistics in previous studies as the data. case, all studies are collected that permit a given comparison of interest to the investigator. From each of these studies, a single value is taken, expressing the These values, one from each study, are result of the comparison for that study. The ways of extracting statistically combined to yield an interpretable summary. single values from studies and ways of combining these values are closely related. Perhaps the three most common procedures are what may be called the voting method, the joint probability method, and the effect size method. To illustrate these method for teaching
consider a specific situation. procedures, long division has been developed, and five
Suppose that a new studies have appeared
Introduction and Overview
7
In each study, in which the new method is contrasted with the traditional method. an achievement test has been given to groups of students taught by each method. These studies have differed somewhat in methodology, and in some, the comparison of the two methods was not even the major focus of the study, but taken together, the five studies have yielded the data shown in the table below. Of course, research integration is more useful with larger collections of studies, but five studies may constitute the universe of studies available and, at any rate, suffice for purpose of illustration.
Data provided
by five
Treatment
The
studies
Control Group
Group
Study
N
Mean
S.D.
1 2
25 100
42.0 6.2
4.2 2.1
: 5
36 50 _
90.1 -
-
Voting
ofa newlong-division
method
N
Mean
S.D.
Group Favoured
Summary
28 254
39.7 6.0
3.8 2.1
treatment treatment
t(51)=2.00, pc.05 F(1348)=.65, N.S.
50 32 -
96.7 -
-
control treatment treatment
x1(1)=7.11, t(66)=-3.99, pc.01 pc.001 t(118)=3.44, pc.001
Statistic
Method
The simplest method of integrating findings is to determine, for each study, only whether or not a statistically significant difference was found, and, if so, its direction (Light and Smith, 1971; Glass, 1978). Four of the five studies in the table yielded statistically significant findings, three favouring the new method (treatment) and one favouring the traditional method (control). By the voting method, the new long division procedure would be declared the winner. Using a more refined version of voting, which might be called the sign approach, one would begin with a tabulation of the direction of effects in all studies, without regard to statistical significance, and then compute the probability of the results obtained under the assumption that the two long division methods were equally effective. In our hypothetical example, four of the five studies favoured the treatment group. Assuming a 50 per cent chance of favouring this group in each study, the (two-tailed) probability of four or more comparisons in the same direction is .375. Since this is far from the conventional .05 level of significance, the sign approach would not lead us to infer the superiority of either teaching method. It should be pointed out that the power of the sign approach increases rapidly with the number of studies included. If, instead of four results out of five, twelve results out of fifteen had been found to favour one method, we could reject the null hypothesis at the .05 level.
The Probability
Method
The voting and sign methods consider only the direction and statistical significance of each comparison. In the probability approach, the exact one-tailed probabilities obtained in each comparison are pooled. The directions of effects are taken into account by reversing (subtracting from 1.00) the probabilities for any comparisons opposite the expected direction. A number of different approaches have been developed to combine such separate probabilities. Two basic methods were discussed in an early paper by Jones and Fiske (1953); and more recently, nine different approaches were described by Rosenthal (1978), who notes that appropriately chosen methods
8
Ii.
J. Walberg
and E. H. Haertel
are rarely in serious disagreement. The best known among these is the Fisher method, which makes use of the fact that the natural logarithm of a probability, multiplied by -2, is distributed as chi-square with two degrees of freedom. Since a sum of independent chi-squares is also distributed as chi-square with the sum of the degrees of freedom of the chi-squares added, the following result can be derived. H
x2(~~=2N)=i~1(-210gePi)
(1)
Let us apply this method to the data in the table. The first study yielded a t of 2.00 on 51 degrees of freedom. The exact one-tailed probability for this statistic is .0254. Similarly, for studies 2, 3, 4 and 5 the exact probabilities are .2103, .0077 , .9999 and .0004. Applying formula (1) yields i2(10)=35.85, p<.OOOl. Where just a few results have extremely low probabilities, Fisher's method generally gives a significant result. It can easily yield significance when none of the original studies were significant, as well. For example, for three studies each with a probability of .lO, Fisher's'method yields a joint probability of .032.
The
Effect
Size
Method
The third, and most powerful, of the three approaches is the effect size method. While it involves the most work in preparing the data for analysis, this method permits the application of a broad range of techniques, from frequency distributions to multiple regressions. The basic idea here is to express the actual magnitude of group differences from different studies on a common scale, so that findings from studies employing different measuresand different methods can be meaningfully One straightforward way of arriving at such a scale is to express, for compared. each comparison, the mean difference, in standard deviation units. (Either the control group standard deviations or the pooled within-group standard deviation may be used.) Often, an estimate of the effect size can be obtained by algebraic manipulations of available summary statistics even when the group means and standard deviations are not presented. Let us return
to the table.
E
=
42.0-39.7
1
4.0
For
the first
study,
the effect
size
would
be
=.58
Note that this is simply the difference between the two means divided by their pooIn study 2, an effect size of .l; is found. The remaining led standard deviation. three studies do not permit computation of effect sizes, but by a variety of algeGlass (1978) presents methods which yield braic paths they may be approximated. the estimates Es=.55, El,=-.97 and Es=.63. Thus the mean effect size is .18, or The standard error of this somewhat less than two tenths of a standard deviation. mean is .30, so it is clearly not significantly different from zero. Note that in contrast to voting and combining probabilities, the method of calculating effect sizes provides an interpretable estimate of the size of an effect, as well as its significance.
Variations
on the
Basic
Methods
The most significant extension to the basic method is to record, for each study, other characteristics in addition to the effect size, and to relate these to the If methodoeffect size using techniques of analysis of variance or regression. a quality judgement might be recorded logical quality is of concern, for example, for each study, and tests might be done to see if effect sizes run larger in higher If no such differences were found, quality studies or in those of lower quality.
Introduction
and Overview
9
Likewise, the times and studies of different degrees of quality might be pooled. settings of different studies, the subject areas or grade levels used, or a host of other potential explanatory variables might be recorded systematically and related to the effect sizes obtained. A second variation concerns the inclusion of more than one comparison from some of This may introduce violations of the assumption that all the effect the studies. sizes are independent, leading to inflated Type 1 error rates if standard methods Several ways of dealing with this problem suggest are indiscriminately applied. including the use of nested analysis of variance models to compare themselves, within-study variation. Another useful approach to this problem has been via Tukey's Jackknife procedure (Glass, 1978). Finally, other sumnary measures besides the effect size might be chosen. Glass (1978) gives a number of useful formulae for converting various statistics to the metric of product-moment correlations, and for some bodies of literature these may be more easily derived and interpreted than effect sizes. For statistical procedures assuming normal distributions, the effect size is probably to be preferred.
Combining
Original
Data
Across
Studies
A fourth quantitative procedure for resolving disparities among studies addressing the same question was proposed by Light and Smith (1971). Their method goes beyond the use of published statistics and requires access to the original data collected in each investigation. These authors suggest that where the conclusions reached in two or more studies diverge, the studies probably differ along one or more relevant These dimensions might be regarded as additional factors influencing dimensions. the outcomes of the studies. Within any single study, since the levels of these factors would be constant, the factor could not be detected. Across studies, since levels of the factor would vary, its relation to the outcome could be examined. To detect and incorporate such factors, Light and Smith (1971) recommend a systematic procedure for combining original data across studies, which they label the 'cluster approach'. Where it is possible to apply this procedure, there is much to recommend it.
Comparison The four methods and variations just described differ intheamount of information each requires of the studies included, in the strength of assumptions required, and in the types of generalisations which they permit. It must be emphasised that none of these methods is uniformly superior. Depending upon the interests of the investigator and the nature of the question addressed, the number and types of studies available, and the completeness of the information reported in these studies, any of the techniques, or even some completely different technique, might be most appropriate. The information taken from each study for these various techniques ranges from the direction of each effect, through their exact probabilities or magnitudes in some standard metric, all the way to the original data from which each was derived. While the use of more information can lead to stronger conclusions, it can also sharply limit the studies which can be included. While some estimate of an effect size can often be obtained from summary statistics which appear quite unpromising, the assumptions required may render such an estimate quite imprecise.
10
H. J. Walberg
and E. H. Haertei
OVERVIEW
The papers in this issue are divided into four parts: Research Integration Methodology; Curriculum and Instruction; Individual Differences and Special Effects; and Programnatic Research and Integration. The first of the four parts, concerning the methodology of research integration, includes general discussions of this emerging field, as well as explorations of some specific problems that arise in adapting traditional statistical procedures to these new purposes. These problems include the effects of bias toward the publication of significant findings; studies of different combinations of treatments within a single analysis; and the choice of outcome measures appropriate to the treatments considered. One paper reports a study comparing the conclusions reached by reviewers applying quantitative versus traditional techniques to the same set of papers. As with the papers in subsequent parts, all of these authors were subject to stringent space limitations. Despite these limitations, they have provided an interesting and useful introduction to this important new area of research methods. In the second part, Curriculum and Instruction, the papers concern some standard topics in educational research and evaluation, particularly those manipulable aspects of the educational environment that are intended to bring about learning and the attainment of curricular goals. The topics include subject matter effects in nutrition, process learning in science and instructional modes in mathematics; advance organisers; teacher behaviour and open and individualised education. The papers in the third part, Individual Differences and Special Effects, are concerned with characteristics of the individual such as gender and socio-economic status as well as special programmes suited to particular classes of students. Some of these papers concern how characteristics of the individual such as prior attainments and communication skills affect learning; others assess the effectiveness of training progranes geared to special problems of learning. The remaining papers in the third part concern effects of television, drugs, medicine, and surgery; these papers indicate interesting substantive and methodological developments with implications for research integration in education. The fourth and last part of this issue, Programmatic Research Integration, surveys the work of the Chicago group on synthesis of research concerning a psychological and two integrative programmes sponsored by theory of educational productivity, The Chicago group, using several integrative the National Institute of Education. is attempting to synthesise theory and research on the chief determintechniques, ants of school learning such as student motivation and ability; amount and quality and peer-group environments. The NIE efforts of instruction; and home, classroom, are aimed at: synthesising research-based and practice-based knowledge; dealing with problems of conflicting evidence, values and interpretations; and inquiring into the relation of research synthesis and policy formation.