J Clin Epidemiol Vol. 42, No. 5, pp. 473-476, 1989 Copyright 0
Printed in Great Britain. All rights reserved
0895-4356/89 $3.00 + 0.00 1989 Pergamon Press plc
Second Thoughts THE ALTERNATIVE HYPOTHESIS: OR TWO-SIDED? KARL
E.
ONE-SIDED
PEACE
Technical Operations, Parke-Davis Research Division, Warner-Lambert 2800 Plymouth Rd, Ann Arbor, MI 48105, U.S.A.
Company,
(Received in revised form 16 August 1988)
Abstract-The appropriateness of a one-sided alternative hypothesis rather than the more conservative, boiler-plate, two-sided hypothesis is discussed and examples provided. It is concluded that confirmatory efficacy clinical trials of pharmaceutical compounds should always be viewed within the one-sided alternative hypothesis testing framework. Hypothesis tests
One-sided
Two-sided
Research objective
INTRODUCTION
p-Values
the last point, appropriate instruction would have emphasized that H, should be the embodiment of the research question. Over the years, there have been many discussions in various arenas concerning the appropriate direction of H,, and whether a two-sided test or a one-sided test is more appropriate at the analysis stage. However, it appears that many persons abandon intellectual discussion as to which is more appropriate, and consider only two-sided tests as appropriate. Those of us in the pharmaceutical industry are (often painfully) aware that the Food and Drug Administration (FDA) has adopted this position. The position appears to have support in academia as attested in a recent letter by Fleiss [l].
Anyone who has had one or more courses in statistics will recall some discussion of elements of hypothesis testing. For some, this would have included the symbolic representations of the null hypothesis (H,:p, = p2) and the alternative hypothesis (H,), the symbolic specification as to whether H, is two-sided (#) or one-sided (< or a), the error (Type I) associated with making a wrong decision regarding H,, the magnitude (CC)associated with that decision, the need for a test statistic and the determination of the critical region. (For reader clarification, my use of the term “sided” is consistent with that of the term “tailed”, used by many investigators.) For others, additional elements would have included the need to know the distribution of the test statistic under H, and the importance of the error (Type II) associated with making a wrong decision about H,. Yet a few others would have additionally been exposed to the importance of deciding the direction of the alternative hypothesis, the importance of choosing an efficient experimental design, and the importance of determining the necessary number of experimental units needed to have reasonable expectations that the experiment would provide an answer to the research question. On
APPROPRIATENESS OF ONE-SIDED TESTS
As I have argued previously [2], there are many situations where a one-sided test is the appropriate test, provided that hypothesis testing itself is meaningful to do. On this point, I shall only state that fundamentally I believe that hypothesis testing is appropriate only in those trials designed to provide a confirmatory answer to a medically and/or scientifically important 413
474
SecondThoughts
question. Otherwise, at the analysis stage, one is either in an estimation mode or a hypothesisgenerating mode. I should underscore that helping to define the question at the protocol development stage is one of the most important contributions the statistician can make. The question to be answered is the research objective. I believe most persons would agree that the research objective is expressed by the alternative hypothesis (H,) in the hypothesis testing framework. The idea is that contradicting the null hypothesis (H,), at some relatively small level of significance, offers evidence in support of the research objective. One point in support of one-sided tests, therefore, is that if the main question toward which the research is directed is uni-directional, the significance tests should be one-sided. A second point is that we should have internal consistency in our construction of the altemative hypothesis. An example of what I mean here is the dose-response or dose comparison trial. Few statisticians (none that I have asked) would disagree that dose-response as a research in the hypothesis objective, is captured specification framework as H,:pr 4 &i $ &,2. For simplicity in this example, I have assumed that there are two doses dl and d2 of the test drug, a placebo (p) control, and the effect of drug is expected to be non-decreasing. If this is the case and if for some reason, the research is conducted in the absence of the dl group, then there is no reason why H,:pLpg &u should become I-&+ # c(d2. A third point is that if the trial is designed to be confirmatory, then the alternative cannot be two-sided and still be logical. I believe this point holds for positive controlled trials as well as for placebo controlled trials. However, the idea is more likely to gain broader acceptance for placebo controlled trials. To elaborate, suppose the (confirmatory) trial has one dose group and one placebo group and a two-sided alternative was assumed appropriate at the design stage. Suppose further that the analyst is blinded and that the only results known are the F-statistic and corresponding p-value (which is significant). So the analyst knows that a difference exists, but not whether the difference favors drug or placebo. The analyst then has to search for where the difference lies-a situation no different from those which invoke multiple range tests. That search fundamentally precludes a confirmatory conclusion even if the direction favors the drug. On this point, I refer
back to and quote the last two sentences of Fleiss [l]: “At a minimum,othersshouldbe advisedthat A was found to be.inferior to B so that they not, in ignorance,conduct the same kind of trial. Only a two-tailedtest will permit the investigatorto distinguishbetween‘A may be no different from B’ and ‘A may be worse than B’.” The point in the penultimate sentence is likely to be more consistent with failure to report negative results than with one-sidedness. The point in the last sentence, however, is more likely due to insufficient education by statisticians of investigators than to sidedness. In proper reporting, the magnitude and direction of the difference between the groups and the associated standard error of the difference, should be included as well as the p-value. A difference favoring placebo with a (two-sided) p-value of 0.02 had the alternative hypothesis been two-sided or the same difference favoring placebo with a one-sided p-value of 0.99 (for a symmetric distribution) had the alternative hypothesis been that drug is superior to placebo, should be equally informative to the investigator. The distinction is that the latter does not permit a confirmatory conclusion of the superiority of placebo. However, the one-sided alternative of drug superiority would have been confirmed had the same magnitude been observed favoring drug (which the two-sided setting would not). A fourth point and perhaps the most important for those of us who support drug development programs in a regulatory climate, recognizes the regulatory review/market approval process. The Type I error for efficacy trials may be appropriately viewed as the “regulatory risk”. Parenthetically, the Type II error is the “sponsor’s risk”. Over the years, I believe that regulatory authorities, particularly the FDA, have accepted as tolerable, that as many as 5% of the drugs which they give approval to market as efficacious may not be. However, the regulatory risk is conditional. It is based upon those applications which they review; and the ones that the regulators review are those that the sponsor believes support the drug. Therefore, the regulatory risk reflects a one-sided decision process. There should be consistency between sidedness of the regulatory risk and the design of trials to be submitted to support the application. For the FDA or any other regulatory authority to operate on a two-sided 5% regulatory risk is in fact operating on a 2.5% level.
Second DISCUSSION
The previous points made in support of onesided tests are ones which I have used over the last 10 years in supporting drug development programs in the pharmaceutical industry. Only the last three may be considered as examples explicitly supporting a one-sided alternative. The first one simply says that the direction of the alternative should be consistent with the research question-a position so logical as to appear unassailable. In turning to the literature, apologetically, I discovered that Feinstein [3] also dealt with the issue of direction in 1975. He argued that the investigator should be obligated to justify whether his scientific hypothesis is uni-directional or bi-directional. Then the statistical hypothesis should be appropriately one-sided or two-sided and the uni-lateral or bi-lateral aspects of probability follow accordingly. He further acknowledges that from time to time the statistician will encounter an investigator who cannot state any form of direction. He recognizes that the difficulty is not so much one of indecision about whether to go one way or two ways in the statistical hypothesis, but that the scientific hypothesis itself is either malformed or amorphous. He rightfully concludes that the statistician can do everyone a service by not adopting the conservative two-tailed approach but by advising the investigator to re-think the goals and aims of the research itself. A further MEDLINE search of the literature from 1 January 1976 through 16 August 1988 was performed. The search parameters specified were one or two-sided or tailed tests or p-values. The search produced 200 articles reporting onesided or two-sided results. Of the 81 articles reporting one-sided results, nine were statistically theoretical, two reported experimental results using both one-sided and two-sided approaches, and two were not of a statistical nature. (One of the two latter studies referred to a single-tailed goldfish and the other to a onesided cerebral contusion.) Thus, 68 [4] of 200 articles reporting one-sided or two-sided results, were of experiments where the researchers deemed that the one-sided approach was appropriate. Thus, as a conservative estimate, 34% of experimental results reported since 1975 reflected a one-sided direction. These articles span a wide range of experimental settings: both basic research and clinical trials, prospective and retrospective studies, and a variety of disease areas.
Thoughts
415
The published literature can help readers draw their own interpretations about the appropriate direction of the alternative hypothesis. My only point is that a sizable segment of the research community believes that there are settings where a one-sided alternative hypothesis is appropriate. My view again, as embodied in my first point, is that the objective of the research should be well thought out and well formulated. If hypothesis testing is a meaningful framework to address the objective, then the alternative hypothesis should embody the objective both in substance and direction. There may be situations where two-sided alternatives are appropriate. But my experience in the clinical development of drugs suggest few-if any. First of all, Phase I and Phase II trials should be viewed as pilots for Phase III. Since the state of knowledge of the pharmaceutical compound is evolving at these stages, these trials are seldom, if ever, designed to provide definitive information about study objectives. The objectives themselves are often difficult to state precisely. Therefore, these trials are exploratory and do not argue for decisions reflecting study objectives to be based upon hypothesis testing. The aim should be to provide estimates associated with biologically and/or medically meaningful events, which can be used to design confirmatory Phase III trials. I have already argued that hypothesis testing provides a meaningful decision framework in confirmatory trials, and further that the alternative hypothesis should be one sided, without exception, if the control is placebo. I have no problems, generally, with a one-sided direction being taken if the control is positive. For these trials, a conclusion of clinical equivalence is desired. To me this means “clinically as good as”, where “as good as” may in fact be superiority or numerically “a little worse”, but not so much as to be of clinical importance. There may be some exceptions to this. For example, in the development of generic drugs via blood level, bioequivalence trials, there may be legitimate clinical concerns about a generic which demonstrates very superior bioavailability. In this setting, it appears reasonable to bracket equivalence in both directions. Finally, a last point worthy of mention is that one-sided vs two-sided may be much to do about nothing if we would move to confidence intervals exclusively as the basis for decision making. However, as long as the regulatory decision risk is 5%, 90% two-sided confidence
Second Thoughts
416
intervals rather should be used.
than
95%
two-sided
ones
2. 3.
REFERENCES 1. Fleiss JL. Some thoughts on two-sided tests. J Controlled CIln Trials 1987; 8: 394.
Peace KE. Some thoughts on one-sided tests. Biomet-
rics 1988; 44: 3-911-912.
4.
Feinstein AR. Clinical Biostatistics. XxX11. Biological dependency, hypothesis testing, uni-lateral probabilities, and other issues in scientific direction vs statistical duplexity. Clln Pharm Ther 1975; 17: 4-499-513. Not included as references to conserve space. Interested readers may contact the author for the complete bibliography.