J Clin Epidemiol Vol. 49, No. 1, pp. 121-123, 1996 CopyrIght 0 1996 Elsevier Science Inc.
0895.4356/96/$15.00 SSDI 0895-4356(95)00537-E
ELSEVIER
Comparing the Toxicity of Two Drugs in the Framework of Spontaneous Reporting: A Confidence Interval Approach Pascale Tubert-Bitter, ’ Bernard Beguud,2 Yola Moride, 2 Anicet Chah-ie,2 and Franpise Huramburu2 ‘INSERM
~169,
PHARMACOLOGIE-CENTRE
16
AVENUE
PAUL
VAILLANT
94807
COUTURIER,
DE PHARMACOVIGILANCE,
HtiPITAL
VILLEJUIF,
PELLEGRIN
FRANCE, UNIVERSITk
AND
‘DBPARTEMENT
DE BORDEAUX
II,
DE FRANCE.
ABSTRACT.
Spontaneous reporting remains the most frequently used technique in post-marketing surveillance. Decision-making usually depends on comparisons between the number of adverse drug reactions (AD%) reported for two drugs on the basis of an equivalent number of prescriptions. The validity of such comparisons is expected to be jeopardized by probable underreporting of ADR cases. This problem is accentuated when it cannot be assumed that the magnitude of underreporting is the same for the both drugs. Differences in reporting ratios can overemphasize, cancel, or reverse the conclusions of a statistical comparison based on the number of reports. We propose a single method for (1) calculating confidence intervals for relative risks estimated in the context of spontaneous reporting and (2) deriving the range of reporting ratios for which the conclusion of the statistical comparison remains statistically valid. J CLIN EPIDEMIOL 49;1:121-123, 1996.
KEY
WORDS.
spontaneous
Confidence reporting
intervals,
pharmacoepidemiology,
INTRODUCTION The estimation of the incidence of adverse drug reactions (ADRs) in pharmacovigilance is commonly based on spontaneously reported data: for a given ADR, the number of reports received during a given period of time is divided by the number of treated patients or by the corresponding number of patient-months roughly estimated from drug sales data [l]. This incidence estimation can be jeopardized by many biases [2], particularly underestimation of the numerator due to expected underreporting of ADR cases. For a given drug event association, the proportion of reported cases out of the total number of cases occurring is usually unknown and is likely to be influenced by numerous factors [3], which excludes the use of operational correction factors. In current practice, drug toxicity comparisons are made on the basis of such incidence estimates. When reporting rates can be considered identical for the two drugs compared, underreporting does not affect the validity of the toxicity comparison. Conversely, when they cannot be assumed equal, it is difficult to know if the observed difference in toxicity reflects what would have been measured in the absence of underreporting [4]. In the latter case, the most frequently practiced, it has been recommended that a sensitivity analysis be performed by studying the influence of different values of reporting rates for each of the drugs compared [4]. We propose a simple method (1) for calculating a confidence interval (CI) for the ratio of two risks estimated from spontaneous reporting data and (2) for deriving the difference in underreporting that would
*All correspondence should be addressed to: Bernard Begaud, Centre de Pharmacovigilance, Hapital Pellegrin, Universite de Bordeaux 2 33076, Bordeaux Cedex, France.
pharmacovigilance,
lead to change in the conclusions the drugs.
Poisson
distribution,
on the difference
in toxicity
between
METHODS Let k, and k, be the numbers of reported cases of a given adverse reaction among n, and nZ patients treated by Drug 1 and Drug 2, respectively (nl and n2 can also be expressed as number of treatments, patient-months, etc). We consider the risk ratio (RR) equal to rl/rZ, where rI and r2 are the risks of adverse event associated with Drug 1 and Drug 2, respectively. The null hypothesis of interest is HO: RR = 1, which is tested against the alternative hypothesis HI: RR # 1 for a given significance level a. This will be performed by calculating the two-sided confidence interval for RR at the 1 - (Y confidence level. HO will be rejected if the resulting interval does not include 1. l/u, and 1 /u, are the proportions of ADR cases that have occurred and been reported for Drug 1 and Drug 2, respectively. U is the ratio of these proportions (U = u,Iu,), the larger U, the larger underreporting for Drug 1. The risk ratio would then be estimated by uIh/nl -=(Jx--u2 khz
kIn* kzn,’
Since post-marketing surveillance generally involves rare events and large populations, it can be assumed that the numbers of reported cases, k, and k,, independently foll ow the Poisson distributions P( A,) and P(h,), where A, = n,r,/u, and A, = nzr,/uz are the expected numbers of reports for Drug 1 and Drug 2, respectively [5]. Consequently, the proportion of reports involving Drug 1 k,l(k, + k,), which estimates p = A,l(Al + A,) can be considered as a binomial proportion [6,7]. It is then possible to calculate a Cl for p. Noting
122
Tubert-Bitter
- P)], the lower (PI,) that h,lh, = P/(1 - P), RR = U X nzP/[n,(l and upper (P,,) confidence limits, obtained for P, can be substituted in this equation to calculate the lower (RR,,) and upper (RR,,) limits of the CI for RR at the same confidence level:
x [(538 x 3.25)/(68 X 9.16)] = 2.81 U and the corresponding 95% two-sided CI is calculated as:
538 - 1.96 uxLzx
PI0 n2 RR,,, = LJ x n, x (1 _ p,,)
(1)
et al.
J
538 x 68 ~ 606
9.16
and
= U x ;
~
P
n2
RR,,
3.25 ux9.16x
x &.
Assuming that k, + kz is large enough, the normal approximation of the conditional distribution of the observed proportion k,l(k, + k,) is valid. Thus, the most accurate unbiased lower and upper limits of the two-sided confidence interval for P are [B]:
k, ‘lo = k, + k,
Za,2 /v
= [2.23U;
3.721/l.
Assuming that the reporting ratios were the same for the two drugs (U = I), the CI for RR [2.23; 3.721, does not include 1. Thus, H, (RR = 1) will be rejected and piroxicam considered more gastrotoxic than diclofenac as long as U > l/2.23. Reporting 2.23 times lower for diclofenac than for piroxicam would have precluded this conclusion while it would have been reversed (diclofenac more gastrotoxic than piroxicam) by a reporting 3.72 times lower for diclofenac.
and
PUP = k,k, + k,+
‘d2
DISCUSSION
i&(1-&) J k,+ k,
where Za,2 is the normal deviate corresponding to the desired confidence level 1 - a (i.e., 1.96 for a 95% two-sided CI). From equation (l), the two-sided normal CI for RR can be calculated:
k, + &,I, ;IJxzx n1
Most published comparisons on drug toxicity and decision-making are based on spontaneously reported data. In practice, it is seldom certain that the proportion of cases reported is of the same order of magnitude for the two drugs compared. A difference as small as a factor of 1.5 may destroy the validity of a statistical comparison [4]. It is therefore essential to calculate a CI for any RR estimated from a number of reports. Although the use of the binomial distribution for the calculation of confidence limits for the ratio of two Poisson variables (h,/h,) is unusual in epidemiology [6-71, the approach suggested in this paper has three main advantages.
k, - Liz
Critical Values The null hypothesis H,: RR = 1 is rejected at the significance level a if the 1 - (Y CI for RR does not include 1, that is, if RR,, > 1 or RR,, < 1. As can be seen in the preceding formula, the two limits RR,, and RR,, are a function of U: RR,, = U X L,, and RR,, = U x L,. Thus, the interval does not include 1 and H, is rejected if and only if U > l/L,, or U < l/L,,.
Example The Committee on Safety of Medicines (CSM) [9] has published complete data on post-marketing surveillance of nonsteroidal antiinflammatory drugs (NSAIDs) in the United Kingdom. For two drugs launched approximately at the same date, piroxicam (1980) and diclofenac (1979), the number of serious gastrointestinal reactions reported to the CSM during the same time interval (5 years) was 538 over 9.16 millions of prescriptions for piroxicam versus 68 over 3.25 millions of prescriptions for diclofenac. The RR estimated from these data is U
1. The estimated value of the RR and the corresponding CI are both expressed as a function of U. This makes it possible to simply derive the RR estimate and the lower and upper limits of the interval from any assumed value of U. In the above example, for instance, the RR estimate is 2.81 and 95% CI,, = [2.23; 3.721 if it is assumed that the proportion of reported cases is, more or less, the same for Drug 1 and Drug 2. If this proportion is twice as large for diclofenac than for piroxicam (U = 2), the RR estimate is 2.81 x 2 = 5.62 and 95% CI,, = [4.46; 7.441. 2. The calculation process does not impose assignment of a priori values for the reported proportions l/u, and l/u2 for the two compared drugs. The only assumption required is the order of magnitude of the ratio U = u,/u2, regardless of the individual and generally unknown values of u1 and u2. The most straightforward approach is to perform the calculation assuming equal reporting for the two drugs (U = 1) and to derive the values of U that would reverse the conclusion. In the above example, it can be seen that, although statistically significant, the conclusion that the two drugs are associated with risks of different magnitudes is not particularly robust. A difference by a ratio of l/2.23 in reporting rates leads to a RR estimate of 2.81/2.23 = 1.26 with a 95% two-sided CI [l.OO;
Spontaneous
Reporting:
Comparing
the Toxicity
123
of Two Drugs
1.671, including 1. As to the context (e.g., dates of marketing, differences in promotion, media interest, etc.), the concern is to know whether it is plausible to consider that the reporting rate would have been more than 2.2 times greater for Drug 1 than for Drug 2. Despite the fact that the extent of underreporting is generally considered great, particularly for labelled and/or nonserious reactions [l, 10-121, to our knowledge no study quantifying its relative magnitude for two or more drugs has as far been conducted. It is generally admitted that underreporting is roughly of the same order of magnitude provided that the two drugs belong to the same therapeutic class, have been launched approximately at the same date, are compared for the same types of events, and do not differ in regard of the information provided for the potential reporters [3]. In these conditions it would seem acceptable to consider that their reporting rates do not differ by more than a ratio of 2 or 3. In other words, a CI for RR that does not include 1 even for U varying by a ratio of 1 to 4 or 5 gives some consistency to the existence of a real difference in toxicity. For instance, in the CSM data [9], benoxaprofen was launched in the United Kingdom approximately at the same date (1980) as diclofenac (1979). The number of serious reactions (of any type) involving benoxaprofen was 332 over 1.47 millions prescriptions versus 128 over 3.25 millions prescriptions for diclofenac, which provides an estimate of RR value of 5.7311 with a 95% two-sided CI ranging from 4.72U to 7.11 U. This gives some credibility to the decision to withdraw benoxaprofen from the market in 1984 because of unacceptable excess toxicity. 3. The use of the normal approximation makes the calculation of the CI for RR particularly easy. This approximation remains valid provided p(k, + kz) is at least 15 or at least 5 if (k, + k,) is large enough and p not too far from 0.5 [13-151. These criteria are generally met in pharmacovigilance where (k, + k,) is expected to be greater than 30 and the proportion p = A,l(h, + A,) to range between 0.1 and 0.9. For instance, in the former example (piroxicam VS. diclofenac), (k, + k,) = 606 and p, estimated by k,l(k, + k,), is 0.89. In the case where the criteria for the proper use of the normal approximation are not met, the exact confidence limits p,, and pup for p can be obtained from tables for binomial
proportions [7] and the corresponding equation (1). We are indebted to the Association its fnnnctil suppurr of this work.
interval
for RR derived from
Frangaise pour la Recherche Therapeutique for
References 1. Fletcher AP. Spontaneous adverse drug reaction reporting versus event monitoring: a comparison. J Roy Sot Med 1991; 84: 341-344. 2. Rawlins MD, Fracchia GN, Rodriguez-Farre E. Euro-ADR: Pharmacovigilance and Research. An European perspective. Pharmacoepidemiology and
Drug Safety 1992; 1: 261-268. 3. Haramburu F. Estimation of underreporting. In: ARME-P, ed. Methodological Approaches in Pharmacoepidemiology. Application to Spontaneous Reporting. Amsterdam: Elsevier Science Publishers; 1993; 39-49. 4. %gaud B, Tubert P, Haramburu F, Moride Y, Salame G, P&e JC. Comparing toxicity of drugs: use and misuse of spontaneous reporting. Post-
Marketing
Surveillance
1991; 5( 1): 59-67.
5. Tubert P, Begaud B, P&e JC, Haramburu F, Lellouch J. Power and weakness of spontaneous reporting: a probabilistic approach. J Clin Epidemiol
1992; 45: 283-286. 6. Ederer F, Mantel N. Confidence limits on the ratio of two Poisson variables. Am J Epidemiol 1974; 100: 165-167. 7. Daly LE, Bourke GJ, McGilvray J. Interpretation and Uses of Medical Statistics, 4th edition. Oxford: Blackwell Scientific Publications; 1991. 8. Lehmann EL. Testing Statistical Hypotheses, 2nd edition. New York: John Wiley & Sons; 1986. 9. Committee on Safety of Medicines. Non-steroidal anti-inflammatory drugs and serious adverse reactions-2. Br Med J 1986; 292: 1190-l 191. 10. Feely J. Moriarty S, O’Connor P. Stimulating reporting of adverse drug reactions by using a fee. Br Med J 1990; 300: 22-23. 11. Griffin JP. Is better feedback a major stimulus to spontaneous adverse drug reaction momtoring? Lancet 1984(H): 1098. 12. Kgaud B, Haramburu F, Moride Y, Tubert-Bitter P, Alvarez-Requejo A, Carvajal A, Vega T, Chaslerie A. Assessment of reporting and underreporting in pharmacovigilance. In: Fracchia GN, ed. European Medicines Research. Perspectives in Pharmacotoxicology and Pharmacovigilance. Amsterdam: 10s Press; 1994; 276-283. 13. Snedecor GW, Cochran WC. Statistical Methods, 8th edition. Ames: Iowa State University Press; 1989. 14. Miller I, Freund JE. Probability and Statistics for Engineers. Englewood Cliffs: Prentice Hall; 1965. 15. Gardner MJ, Altman DG. Statistics with Confidence. Confidence Intervals and Statistical Guidelines. London: British Medical Journal Editions; 1989.