Postmarketing surveillance: Statistics and augury

Postmarketing surveillance: Statistics and augury

J Chron Dis Printed in Great Vol. 37. No. Britain. 12. pp. 94Y-952. All rights 0021.968lj84 1984 Copyright reserved 0 1984 $3.00 Pergamon ...

405KB Sizes 1 Downloads 113 Views

J Chron

Dis

Printed

in Great

Vol. 37. No. Britain.

12. pp. 94Y-952. All rights

0021.968lj84

1984 Copyright

reserved

0

1984

$3.00

Pergamon

+ 0.00

Press

Ltd

Second Thoughts

POSTMARKETING SURVEILLANCE STATISTICS AND AUGURY* ARTHUR Smith, Kline and French

Laboratories,

:

F. JOHNSON

1500 Spring Garden PA 19101, U.S.A.

(Received in reuised,form

Street, P.O. Box 7929, Philadelphia,

16 July 1984)

How CAN we organize and execute surveillance programs for selected new drugs which will allow us to resolve safely the more difficult and/or time-consuming issues of drug safety and efficacy after conditional approval for marketing the drug has been granted? I shall start with an aside. For me, drug efficacy in law and in science is established by the outcome of controlled clinical trials. When we get to the post-marketing period the drug is then efficacious by definition. Our concern is to determine the types or classes of patients who respond to the efficacious drug. The importance of this distinction is that the determination of responsiveness of a class of subjects to an efficacious drug as well as the determination of safety for a class of subjects (or for subjects in general) cannot be made solely on scientific grounds. Such decisions cannot be made solely on scientific grounds because there is often no opportunity in these situations for an unbiased, controlled clinical trial. The studies that can be made in the post-marketing surveillance situation, whether they be based on spontaneous reporting or controlled release, or on cohort observation or on case-control, are all open to bias of unknown magnitude and even of unknown direction. Sackett [l] has provided a catalog of biases which, for example, just in the area of specifying and selecting the study sample considers 22 separately named, described and referenced kinds of bias. As statisticians we are interested in providing our clients with a basis for decision that has quantifiable risk. In the face of inescapable bias of unknown magnitude and direction, we cannot quantify risks and any asterisks we may assign require the supposition that perhaps the bias is not too bad or that somehow two biased studies are better than one biased study. Despite our asterisks being blurred we, as statisticians, still have a role in post-marketing surveillance. Our role is especially important in the situation where the randomized controlled clinical trials is not possible. We must be clear, however, as to what the role is and how it is to be correctly carried out. Let me take you back to ancient Rome where in matters of decision on public policy there were no statisticians to consult. To whom did one then turn but to the augur -we are here today trying to in-augur-ate good public policy as regards drug surveillance or the auspex -drug efficacy you will note is determined Administration.

under the auspices of the Food & Drug

*Presented at a panel discussion on Postmarketing Surveillance-American StatistIcal Texas, August 1i-14, 1980. Sponsored by the Biopharmaceutical Subsection. 949

Association,

Houston,

950

ARTHUR F.JOHNSON

Superficially we dismiss divination, the contribution of augurs, the highest class of official diviners in ancient Rome, as having no sound basis and as being subject to the whim of the client who kept looking until the omens were good. This is however to misunderstand what is being done in these circumstances. I quote from the Encyclopedia Brittanica article on divination : “Ethnographic studies suggest that what a client seeks from the diviner is information upon which he can confidently act. He is seeking, in so doing, public credibility for his own course of action. Consistent with this motive, he should set aside any finding that he thinks would lead him into doubtful action and continue his consultations until they suggest a course that he can take with confidence.” Now I submit that this is the same task that we as statisticians are being asked to perform in public policy matters such as PMS. Provide information on which the client can confidently act, information which will provide public credibility for the course of action taken. Now if public credibility can be gained by studying the entrails of animals (not the function of the augur or the auspex, but of the haruspex) so be it, but in our time we have other activities which we deem more relevant. What then can we as statisticians do? 1. Establish PMS as a hard problem

Any one of us can calculate that if we are looking for a low incidence adverse event we may be going to look a long time before we encounter one. If an event occurs at a frequency of one in a thousand cases then I will have to be prepared to look at up to 3000 cases before I am assured of observing at least one event (p = 0.05). Based on a few largely untenable assumptions, we are each able to calculate the immense sample size for a clinical trial to distinguish a moderate elevation of incidence of an adverse effect over an uncertain or unknown or unknowable background incidence. Are we thus necessarily proposing to execute these manoeuvres? What we are doing is establishing interactively with all the involved publics (for our own benefit as individuals, for the manufacturer’s benefit, for the FDA’s benefit, for the medical community’s benefit, for the benefit of consumer advocates) public agreement that PMS is a hard probfem and that no matter what anybody does things may not turn out right. 2. Set out limitations on what can be done For this hard problem what can the statistician propose? First must be a clarification on the adverse event(s). Is there a target event of concern or is there simply a concern that there may be as yet unknown adverse events? For a specified target a case-control strategy is of interest. The statistician’s contribution is to be familiar, from the literature and personal experience, with the problems of planning, executing and interpreting case-control studies. Using Sackett’s 22 sources of bias the statistician would presumably examine any proposed study protocol to assure that it avoids at least these problems or if the problems cannot be avoided to at least point out in the protocol their presence and, if possible, add some conjecture as to their importance. For unknown adverse events the cohort study strategy is a possibility. Many of the possible biases in case-control work can occur in cohort studies as well. The statistician can also provide assistance in the data collection and storage and manipulation problems that arise in the larger numbers of subjects of a cohort study. 3. Control groups For both case-control and cohort studies the prime difficulty is with the construction of a reference or control group of subjects. A word on replication is useful here. Replication of a randomised experiment with an outcome similar to the previous experiment is a compelling piece of evidence as to the authenticity of the conclusions. Replication of the outcome of a survey, a non-randomised trial, may simply confirm that

Postmarketing

Surveillance

951

the biases have been reproduced. For a non-randomized trial situation, the most powerful evidence is similar outcomes from as divergent situations as possible. Of course the catch is that if divergent situations give divergent results, either or neither may be an indication of the correct, and unknowable, state of affairs. Similarly with the construction of control groups for case-control or cohort studies more than one control group may be a reasonable recommendation, but the problem of interpretation of divergent control groups remains. Of course the interpretation of divergent control groups is not solved simply by confining the investigation to a single control group. The suggestion has been made that a protocol should include a defense of the control group(s) proposed. Such a defense would preclude the ad hoc acceptance or rejection of a control group(s) at the analysis stage. Perhaps more importantly presentation of a defense would reduce the self justification by use and publication that accompanies any report of a control group. It is definitely not the case that any control group is better than no control and there are situations in which a conscious decision not to incorporate a control group was based on inability to defend any proposed control groups.

In the previous discussion I have indicated some of the difficulties involved in the studies that can be done. There appear to be compelling reasons why studies should be done in spite of the attendant difficulties. The question then remains as to what is to be done with the results. “We must do the best we can” is a frequently advanced justification. This slogan raises two questions for which empirical information can be accumulated. Are we doing the best we can? What happens when we do what we do? In my opinion the most important contribution the statistician can make is the avoidance of pretense. The armamentarium of techniques for analysis-and for designis continually increasing. Unfortunately the introduction of a new technique is not often accompanied by any convincing demonstration that the results obtained are in any determinable sense any more useful than results obtained by previously available procedures. While it is essential that new ways of proceeding be explored it is doubtful whether they should be promulgated without some demonstrated benefit. The statistician must know not only what to do and when to do it but must also know what not to do. Never forget that the surface appeal of a procedure-what the behavioral scientist calls face validity-is very likely simply surface appeal. Never forget that the congruence of results with an immediately developed substantive scenario is a tribute to human ingenuity and imagination and not to the validity of the methodology. Criteria for what constitutes an adequate epidemiologic trial have been and are being proposed. The list of biases mentioned earlier is but one example. The application of the criteria should of course be part of protocol development and the results of the application should themselves constitute part of the protocol and any subsequent literature report. One recent retrospective application of criteria is that reported by Horwitz and Feinstein [2]. They identified 17 episodes such as reserpine and breast cancer in which an adverse effect was attributed to a drug and then examined subsequent reports for support or lack of support. One half of the studies supported the original conjecture and one half did not support the conjecture. A set of twelve criteria for adequate studies was applied and it was found that the average proportion of studies meeting the criteria was 40%. Certainly this report does not leave a positive impression about the utility of what we do nor about how well we do it. There exists and will continue to exist an empirically testable question: Is the best we can do with cohort or case-control studies worth doing? We must not let the fact that we have ever more esoteric procedures and a deeper understanding of what can go wrong distract us from the consideration of this question. I submit that it is the type of question which it would be proper for the Biopharmaceutical

ARTHUR F. JOHNSON

952

Section of the American Statistical Association itself to address as the directly concerned group in our professional organization. A CASE

STUDY:

THE

“TAGAMET”

POST-MARKETING

SURVEILLANCE

PROGRAM

‘Tagamet’@ is the SmithKline Beckman brand name for cimetidine, a histamine H2 receptor antagonist which inhibits basal and stimulated gastric acid secretion. As favorable clinical results accumulated in the management of ulcer and other gastric disorders it became apparent to all concerned that, upon marketing, a very large number of subjects would be exposed to the drug and indeed the number of subjects at present is estimated as well into eight figures. As senior statistician in the “Tagamet” program I participated in the consideration and development of our post-marketing surveillance effort. Discussions were held with in-house personnel, with outside statistical and medical consultants and with the regulatory agency statistical and medical personnel. From these discussions came a shared acceptance of the magnitude of the problems of this specific post-marketing surveillance program. Since there was no indication that any particular adverse response existed that would call for a case-control effort it was logical to prepare and submit for discussion a cohort investigation. Worldwide experience at the time was on the order of 2000 patients and it was felt that a cohort of 10,000 together with subjects accruing from ongoing studies would provide an order of magnitude increase in supervised drug treated subjects. On the basis of the usual arithmetic one should then have seen at least one case of any adverse effect with an incidence of l/5000. Considerable discussion centered around the question of construction of control groups. It was my personal feeling on the basis of this and other discussions that there seemed to be no way, in this instance, to construct a defensible control group. On that basis I maintained that it was better science to proceed with an uncontrolled cohort study than to promulgate a control group for which there was no sound defense. Discussion on this point was thorough and ultimately we proceeded with the proposed cohort study without attempting to construct a reference group. It was also discussed and agreed that no formal statistical inference could be developed and that surveillance data from the cohort, from ongoing trials (and in the event from voluntary reporting as well) would be tabulated and reported for evaluation from a clinical perspective both internally and by regulatory agency. From the standpoint of mechanics it was agreed that our medical affairs staff would elicit participation from 1000 physicians who would each report on ten consecutive patients treated with “Tagamet” on both a short-term and a follow-up basis as regards adverse effects. The physicians would be selected to provide representation of both general and specialty practice. A pilot program was run locally to shake down the methodology. As with any effort of this magnitude there was a strong temptation to try to acquire various other kinds of data in addition to reports of adverse events. Discussion of the problems in interpretation of even the minimal data sought discouraged successfully any embroidering of the effort. The cohort survey proceeded smoothly with responses on the promised numbers of patients from almost all physicians for both early and later follow up. To date, information on reports of adverse effects has been compatible for all three sources-the cohort study, voluntary reporting and clinical trials. (Since presentation data have been reported [3]). There are one or two interesting technical questions that might be raised for meditative purposes. What is the difference between a cohort of ten patients from each of 1000 physicians and a cohort from some other selection procedure‘? Would there be a possibility of and a benefit from an introduction of randomness in physician selection’? REFERENCES 1. Sack& DL: J Chron Dis 32: 60-63, 1979 AR: Am J MEd 66: 556-64. I979 2. Horwitz RI. F&stein 3. Gifford LM, Aeugle ME, Myerson RM : JAMA 243: 1532-l 535, 1980