Some issues in the replication of social science research

Some issues in the replication of social science research

LABOUR ECONOMICS ELSEVIER Labour Economics4 (1997) 121-123 Some issues in the replication of social science research Robert Rosenthal Department of ...

151KB Sizes 0 Downloads 93 Views

LABOUR ECONOMICS ELSEVIER

Labour Economics4 (1997) 121-123

Some issues in the replication of social science research Robert Rosenthal Department of Psychology, Harvard University, William James Hall, 33 Kirkland Street, Cambridge. MA 02138, USA

JEL classification: CI Keywords." Counternull value; Effect size; Meta-analysis;Replication

is to be congratulated on its decision to stimulate the conduct of replication research in its field. The process of replication is at the very heart of the scientific process, yet the social sciences have been slow to encourage the early and systematic replication of new research findings (Neuliep and Crandall, 1990). Labour Economics

1. Exactness of replication The editors have also shown wisdom in defining replications quite broadly from (a) reanalyses of essentially the same data set as that analyzed in the original study to (b) fairly exact replications of studies conducted, to (c) fairly loose replications of an underlying hypothesis employing different operational definitions of independent, mediating, and dependent variables. All replications falling anywhere along this dimension of exactness of replication have their own value in the overall process of science-building, a process wonderfully likened, by the philosopher of science Otto Neurath, to the process of rebuilding a boat, plank by plank, not in dry dock but at sea. While all degrees of exactness of replication have value, it has been suggested that their value may be maximized by the use of a replication battery of two or more replications that differ in their degree of exactness of replication (Rosenthal, 0927-5371/97/$17.00 Copyright © 1997 Elsevier Science B.V. All rights reserved. PII S0927-537 1(97)000 12-2

122

R. Rosenthal / Labour Economics 4 (1997) 121-123

1990). If both or neither of the battery of replications was 'successful' (to be defined shortly) we would have a considerable strengthening or weakening of the tenability of the original result. If the more exact replication was 'successful' while the less exact replication was not, the basic finding might be seen as bolstered but its robustness to changed conditions might be called into question. More complex batteries of replications can often be constructed to yield a good bit more information than could be obtained by a less coordinated series of replications.

2. Defining the success of replication

How shall we decide whether a replication has been successful? Based on experience in my own discipline of psychology, we have not always been wise in choosing our definition of a 'successful' replication. Too often we have defined a successful replication of an obtained relationship between variables (e.g., a Pearson r) by the replicator's being able to reject the null hypothesis at a given level of significance (often p < .05) when the original investigation also was able to reject the null hypothesis. This practice can lead to the peculiar situation in which a replication obtains almost the exact size of relationship as the original research but, because of smaller sample size, fails to reject the null hypothesis leading to a claim of a 'failure to replicate' (Rosenthal, 1990; for a valuable recent comment on the common uses of significance testing see Cohen, 1994). A newer, more useful view of the success of a replication (a) focuses on the magnitude of relationships rather than on statistical significance tests and (b) indexes the success of replication in a continuous rather than a dichotomous fashion. For example, if a Pearson r is the index of relationship investigated in the original research and in its replication, the metric Cohen's q would be appropriate (Cohen, 1988). The quantity q is simply the difference in Fisher Z-transformed rs ( Z r) between the original research and its replication. The quantity q is immediately informative and it is easy to put confidence intervals around q so that the precision of our q (which is based on the size of the sample of the original research and its replication) can be evaluated.

3. The counternull value of an effect size

The quantity q can of course also be tested for statistical significance but if sample sizes are very large, even very similar estimates (e.g., Z r) may be declared significantly different. Researchers must then decide whether the statistical difference is of practical significance. For example, suppose Zrl - Z r 2 = q = 0.02, a modest value but significant at p < 0.001. A useful procedure then is to compute the counternull value of q (Rosenthal and Rubin, 1994). The counternull value of

R. Rosenthal / Labour Economics 4 (1997) 121-123

123

any effect size is the nonnull magnitude of effect size that is supported by exactly the same amount of evidence as supports the null value of the effect size. When the null value of an effect size is 0.00 as it usually is in the case of replication efforts, the counternull value is twice the effect size obtained (q = 0.02, counternull = 2(0.02)= 0.04). If even the counternull value of q is deemed by the researchers to be a trivial practical difference, the replication may be regarded as 'successful' despite the significant difference between Zrl and Zr2 (i.e., q). The counternull statistic can also be of value when the value of q seems large but is not statistically significant. For example, suppose Zrj - Z r 2 = q = 0.50, a large value but not significant because of small sample size. Before declaring a 'successful' replication it would be useful to compute the counternull (2(0.50)= 1.00) to remind us that a true q of 1.00 is exactly as likely as a true q of 0.00. Thus the use of the counternull can keep us from overoptimistically claiming a 'successful' replication as well as overpessimistically believing we have 'failed' to replicate. When there are more than two studies, the original and its replication, the counternull can still be employed. Also, when replications accumulate we will likely want to employ the coefficient of robustness of replication (mean effect size divided by the standard deviation of the effect sizes) and other meta-analytic metrics (Rosenthal, 1991). In conclusion, the decision of the editors systematically to foster the conduct of replication research in labour economics is to be applauded. The accumulation of replications and the resulting meta-analyses in all the social sciences will contribute greatly to making the social sciences more truly cumulative.

References Cohen, J., 1988. Statistical Power Analysis for the Behavioral Sciences (2nd Ed.). Lawrence Erlbaum Associates, Hillsdale, NJ. Cohen, J., 1994. The earth is round ( p < 0.05). American Psychologist 49, 997-1003. Neuliep, J.W., Crandall, R., 1990. Editorial bias against replication research. In: Neuliep, J.W. (Ed.), Replication Research in the Social Sciences. SAGE, Newbury Park, CA, pp. 85-90. Rosenthal, R., 1990. Replication in behavioral research. Journal of Social Behavior and Personality 5, 1-30.

Rosenthal, R., 1991. Meta-analytic Procedures for Social Research (Rev. Ed.). SAGE, Newbury Park, CA. Rosenthal, R., Rubin, D.B., 1994. The counternull value of an effect size: A new statistic. Psychological Science 5, 329-334.