Issue of statistical power in comparative evaluations of minimal and intensive controlled drinking interventions

Issue of statistical power in comparative evaluations of minimal and intensive controlled drinking interventions

Addictive Behaviors, Vol. 16, pp. 83-87, Printed in the USA. All rights reserved. 1991 Copyright 0306-4603/91 $3.00 + .oO o 1991 Pergamon Press plc ...

398KB Sizes 0 Downloads 13 Views

Addictive Behaviors, Vol. 16, pp. 83-87, Printed in the USA. All rights reserved.

1991 Copyright

0306-4603/91 $3.00 + .oO o 1991 Pergamon Press plc

BRIEF REPORT ISSUE OF STATISTICAL POWER IN COMPARATIVE EVALUATIONS OF MINIMAL AND INTENSIVE CONTROLLED DRINKING INTERVENTIONS WAYNE HALL and NICK HEATHER National Drug and Alcohol Research Centre, University

of New South Wales’

Abstract - An analysis of recent studies of minimal and intensive cognitive-behavioural treatments for problem drinking was undertaken to decide to whether a lack of statistical power explains the failure of the majority of studies to find a difference in outcome between these two types of treatment. Although the sample sizes have typically been small (n = 12-21), the analysis suggests that low statistical power is unlikely to be the explanation for the majority of null findings. It seems more likely that the difference in outcome between one positive study and the majority of null results reflects some combination of differences in the type of clients who were treated, the therapists’ experience, and the type of intensive therapy that was provided. The low power of these studies demonstrates the desirability of researchers calculating the sample size required to an effect before commencing an outcome study. If they continue to undertake studies with small sample sizes, then they should refrain from inferring that the failure to reject a null hypothesis means that there is no difference between treatments.

It is now widely accepted among behavioural psychologists that controlled drinking rather than abstinence is an appropriate treatment goal for some problem drinkers, especially those who show a low level of dependence on alcohol (Heather, 1989; Heather & Robertson, 1983). Given such a goal, and a cognitive-behavioural approach to treatment, the question then arises: What is the most cost-effective method of intervention for achieving this goal? Is it a minimal form of intervention such as simple advice to reduce consumption accompanied by printed information based on cognitive-behavioural principles? Or is a more intensive form of intervention required in which therapists provide supervised practice in the use of cognitive-behavioural principles to control problem drinking? A small number of controlled outcome studies (Miller & Baca, 1983; Miller, Gribskov, & Mortell, 1981; Miller & Taylor, 1980; Miller, Taylor & West, 1980; et al., Robertson, Heather, Dzialdowski, Crawford, & Winton, 1986; Skutle & Berg, 1984; Vogler & Weissbach, 1977, Robertson et al., 1986) have compared the outcome of intensive and minimal behavioural interventions for problem drinkers. All but one of these studies (Robertson et al., 1986) have failed to find a difference in outcome in favour of the intensive treatment. The results of the “null” studies have been widely interpreted as demonstrating that minimal and intensive behavioural interventions are equally effective in achieving controlled drinking (Miller et al., 1980). The major problem with this argument lies with the inference that the absence of a statistically significant difference in outcome between minimal and intensive interventions establishes that they are equally effective (Heather, 1989). At least two conditions should be We would like to thank Richard Mattick and Jenny Tebbut for their comments on an earlier version of this article, and Jenny Tebbut for her assistance in the preparation of the figures. Requests for reprints should be sent to Associate Professor Wayne Hall. National Drug and Alcohol Research Centre, University of New South Wales, P.O. Box 1, Kensington, N.S.W., 2033, Australia.

83

84

WAYNE

HALL and NICK HEATHER

met before we conclude that there is no difference in effectiveness between two forms of treatment (Hall & Einfeld, in press). First, there should be no statistically significant difference between the two interventions. Second, we should be able to exclude plausible rival explanations of the failure to observe a difference. The most serious of these rival explanations are that the study had insufficient statistical power to detect a difference and that confounding variables (e .g . , measurement artifacts) operated to suppress a true difference between treatments. Insufficient statistical power is the first possibility that should be considered when a study fails to reject the null hypothesis. Did the study have an adequate chance of detecting a specified difference, if one existed? If it did not, we cannot justifiably conclude that there is no difference in the population from which we sampled (Bird & Hall, 1986). Lack of statistical power is prima facie the most plausible explanation of the failure to find differences between intensive and minimal forms of behavioural treatment. The sample sizes used in many of these studies have been small (ranging from 12 to 21), and the size of effect (i.e., the average difference between the two treatment groups on the outcome measures) that these studies aimed to detect was likely to be small because they were comparing two versions of the same type of treatment, one of which was applied less intensively than the other. The low statistical power of these studies becomes apparent if we examine their chances of detecting each of Cohen’s three conventional effect sizes (measured in population standard deviation units): “small” (d = .2), “medium” (d = .5), and “large” (d = .8) (Cohen, 1977). The statistical power of a two-sample I test with a one-tailed type 1 error rate of .05 for a sample size of 15 subjects per group shows that these studies have at best a 69% chance of detecting a “large” difference in treatment outcome, a 38% chance of detecting a “medium” effect, and less than a 13% chance of detecting a “small” difference. The purpose of this article is to address the question, To what extent is the difference in outcome between the majority of null studies and the single positive study attributable to low statistical power on the part of the null studies? METHOD

Definitions of treatment We compared one minimal and one intensive form of treatment within each of the studies that were submitted to analysis (Miller & Baca, 1983; Miller et al., 1980, 1981; Miller & Taylor, 1980; Robertson et al., 1986; Skutle & Berg, 1984; Vogler & Weissbach, 1977). The minimal form of treatment in each study usually consisted of educational information and simple advice to cut down drinking accompanied by self-help guides. The intensive form of treatment in each study usually involved a series of sessions with a therapist delivering some variety of cognitive-behavioural treatment. To simplify analysis when there were more than one form of intensive treatment, we pooled the results of all intensive interventions within each study. For example, Miller et al. with three forms of treatment that consisted of (a) six (1980) compared “bibliotherapy” sessions based upon the bibliotherapy manual, (b) the same treatment as in (a) with the addition of a discussion of life problems, and (c) the same as in (b) with the client selecting the life problems that were discussed. Because the variations between the three intensive forms of treatment were so slight, we pooled the results of all forms of intensive treatment into a single intensive condition. The same procedure was followed in all the other studies by Miller and his colleagues. To further simplify presentation of the Miller et al. studies, the results of the minimal and intensive interventions in each of these studies have been separately pooled. That is, the

Statistical power in evaluations

of drinking interventions

85

consumption figures for the minimal and intensive conditions are the average results for all subjects observed under these conditions in the complete series of studies conducted by Miller and his colleagues. Although we recognize that such a pooling of data from different studies can be misleading, we did so because these interventions describe similar forms of treatment that were administered to patients who had been recruited in the same way in each study, and who had been treated by therapists who had been trained in the same way. Outcome measure The outcome measure selected for detailed analysis was self-reported alcohol consumption. This measure was taken for each time period reported in each of the studies (e.g., intake, termination of treatment, and 3, 6, 12, 15 and 24 months follow-up periods). In all studies except that of Robertson et al. ( 1986)) alcohol consumption was measured in standard ethanol content units (SEC) (0.5 fl oz) consumed per week. When necessary mean consumption was estimated from graphs (Skutle & Berg, 1984). In the Robertson et al. study, consumption was measured in units of 8 g of ethanol consumed per month. In order to consumption units equivalent, we divided monthly alcohol consumption by 4 to give weekly consumption and multiplied this figure by a correction factor of 0.59. RESULTS

We performed two sets of calculations to evaluate lack of power as an explanation of the discrepancy between the findings of Robertson et al. and Miller et al. (see Table 1 and Fig. 1). First, we calculated the chances that the pooled studies of Miller and colleagues had of detecting a difference as large as that reported by Robertson et al. Second, we calculated the chances that the Robertson et al. study had of detecting the average difference in consumption observed in the studies of Miller and colleagues. Both calculations were performed because we did not want to assume that the effect size of either group of studies was the “correct” one. These calculations suggest that lack of power may be excluded as an explanation of the differences in outcome. First, the size of effect in the Robertson et al. study was much larger (1.49 SD U) than that observed in all of the null studies (-0.06). Because the Robertson et al. effect size was so large, the pooled results of the studies of Miller and colleagues had a better than 99% chance of detecting it. Second, the consumption difference between the combined intensive and minimal interventions in the studies of Miller and colleagues and that of Skutle and Berg (1984) was exceedingly small. The largest effect size in favour of the intensive treatment was 0.21 standard deviation units, and the average difference between intensive and minimal treatments was only -0.06 standard deviation units (if we use the SD unit for the minimal intervention group at 24 months follow-up). These results suggest that any difference between minimal and intensive treatments in their effects on alcohol consumption is very small. DISCUSSION The large disparity in effects sizes between the studies with positive and negative outcomes suggests that these differences in outcomes probably reflect a real difference in outcome between the results of the single positive and the majority of the null results rather than a lack of power on the part of the null studies. Any one of the following possibilities could explain the difference in outcome. First, there were differences between studies in the way in which subjects were recruited for treatment (i.e., self-referral in response to a newspaper advertisement as against referral by medical practitioners), which may have produced clients with very different prognoses.

86

WAYNE HALL and NICK HEATHER

Second, there were differences between the studies in therapist experience. The therapists in the Miller et al. series of studies, for example, were undergraduate and graduate psychology students who had been given a short course of training. The therapists in the Robertson et al. study were experienced clinical psychologists who had specialized in the management of drug and alcohol problems. Third, there may have been differences in the nature of the intensive interventions that were used. Fourth, since these possibilities are not mutually exclusive, one or more of them could have operated jointly. All of these factors are confounded in the differences between the results of Robertson et al. and of Miller and colleagues, thereby making it impossible to quantify their exact contribution to the difference in average outcome. It is nonetheless worth noting that a variant of these hypotheses has been previously suggested by Orford, Oppenheimer, and Edwards (1976). It is that self-referred, and presumably well-motivated, problem drinkers (the clientele of Miller et al. and Skutle & Berg) have a better prognosis than problem drinkers who have been referred for treatment by other professionals and treatment agencies (the clientele of Robertson et al.). The average effectiveness of self-treatment with well-motivated clients is not likely to be greatly enhanced by the addition of more therapist time. By contrast, problem drinkers who have been referred by professionals are presumably less well motivated and may well benefit from intensive interventions provided by skilled therapists. IMPLICATIONS

FOR

FUTURE

RESEARCH

It would be a mistake to interpret our results as a justification of current practice in which researchers compare the outcomes of similar forms of treatment using small sample sizes. As Kazdin and Bass (1989) have recently documented, psychotherapy outcome studies that compare one form of treatment with another have less than adequate power to detect the average effect sizes reported in the literature. The same was true of the individual studies included in our analysis. In our analysis it simply proved to be the case that the pooled results indicated that low statistical power of individual studies was unlikely to be the explanation of a failure to find differences in outcome between minimal and intensive treatments. There is no guarantee that this happy result will always emerge. We accordingly end by endorsing the recommendations made by Kazdin and Bass: that power analyses should be performed in the planning of comparative outcome studies to “ensure that sample sizes can detect an effect of a given magnitude” (p. 146), and that data from meta-analyses and descriptive studies should be used to estimate the approximate effect sizes that researchers should be attempting to detect.

REFERENCES Bird, K.D., & Hall, W. (1986). Statistical power in psychiatric research. Australian and New Zealand Journal of Psychiarry, 20, 189-200. Cohen, J. (1977). Srarisrica[ power analysisfor the behaviorul sciences. New York: Academic Press. Hall, W., & Einfeld, S. (in press). On doing the impossible: Inferring the nonexistence of causal relationships. Australian and New Zealand Journal of Psychiatry. Heather, N. (1989). Psychology and brief interventions. British Journal of Addicfion, 84, 357-370. Heather, N., & Robertson, I. (1983). Confrolled drinking (rev. ed.) London: Methuen. Kazdin, A.E., & Bass, D. (1989). Power to detect differences between alternative treatments in comparative psychotherapy outcome research. Journal of Consulting and Clinical Psychology, 57, 138-147. Miller, W.R., Taylor, C.A., & West, J.C. (1980). Focused versus broad-spectrum behavior therapy for problem drinkers. Journal of Consulting and Clinical Psychology, 48, 590-601. Miller, W., & Taylor, C.A. (1980). Relative effectiveness of bibliotherapy, individual and group self-control training in the treatment of problem drinkers. Addictive Behaviors, 5, 13-24. Miller, W.R., Gribskov, C.J.. & Mortell, R.L. (1981). Effectiveness of a self-control manual for problem drinkers

Statistical power in evaluations

of drinking interventions

87

with and without therapist contact. International Journal of the Addictions. 16, 1247-1254. Miller, W.R., & Baca, L.M. (1983). Two-year follow-up of bibliotherapy and therapist-directed controlled drinking training for problem drinkers. Behnvior Therapy, 14, 44148. Orford, J., Oppenheimer, E., & Edwards. G. (1976). Abstinence or control: The outcome for excessive drinkers two years after consultation. Behaviour Research and Therapy, 14, 409-418. Robertson, I., Heather, N., Dzialdowski, A., Crawford, J., & Winton, M. (1986). A comparison of minimal versus intensive controlled drinking treatment interventions for problem drinkers. British Journal of Clinical Psychology, 25, 185-194. Skutle. A., & Berg, G. (1984). Training in controlled drinking for early-stage problem drinkers. Brirish Journal of Addiction, 82, 493-501. Vogler, R.E., Weissbacb, T.A.. Compton, J.V., &Martin, G.T. (1977). Integrated behavior change techniques for problem drinkers in the community. Journal of Consulting and Clinical Psychology, 45, 267-279.