How does the WHI study alter the risk–benefit ratio of HT?

How does the WHI study alter the risk–benefit ratio of HT?

IN PERSPECTIVE How Does the WHI Study Alter the Risk–Benefit Ratio of HT? John A. Collins, M.D. Department of Obstetrics and Gynecology, McMaster Uni...

110KB Sizes 0 Downloads 14 Views

IN PERSPECTIVE

How Does the WHI Study Alter the Risk–Benefit Ratio of HT? John A. Collins, M.D. Department of Obstetrics and Gynecology, McMaster University

Good clinical practice depends on knowledge of the current best medical care research evidence, but clinicians must be able to determine what is the best evidence and whether this evidence is relevant to their own patients. At the heart of evidencebased medicine is the assessment of the validity, importance, and relevance of a given study. These may be evaluated by asking key questions; here these questions are applied to the WHI study.

• For now, the answer for most physicians is to fine-tune therapy for each patient

edical practice is often beset by questions about the interpretation of evidence. In the summer of 2002 the Women’s Health Initiative (WHI) study of the long-term benefits and risks of estrogen and progestin among healthy postmenopausal women was terminated early because the risks outweighed the benefits (1). Media treatment of the early termination distorted the public’s understanding of the study findings, most of which were not news at all. Many clinicians had to answer questions from patients before they had a chance to evaluate the study report. It was natural, too, that many were suspicious of medical care evidence that was delivered by press headlines and televised talking heads. Good clinical practice depends on knowledge of the current best medical care research evidence, but how can clinicians quickly determine what is best? And how are they to determine whether the best evidence is relevant to the patients in their practices?

M

Of course, evidence from medical care research is only one factor in clinical decision making: the patient’s clinical details, social characteristics, and personal preferences are also key factors, together with the clinician’s judgment about what is feasible and worthwhile. Evidence is the essence of the discussion, however, and appraisal of the literature and assessment of its relevance to individual patients are

Sexuality, Reproduction & Menopause, Vol. 1, No. 1, October 2003 © 2003 American Society for Reproductive Medicine Published by Elsevier Inc.

19

not as easy as they may seem. Many knowledge gaps remain, not all evidence is sound, and not all sound evidence is relevant to the present patient. When the published evidence is relevant, the results of various studies may not be concordant. Even if all the study results were in agreement, the size of the effects might differ. It is not surprising that uncertainty prevails in some domains. For example, only two of three excellent commentaries in this issue agree with the primary conclusion of the WHI authors: combined estrogen plus progestin should not be initiated or continued for primary prevention of coronary heart disease. Dr. Speroff, however, believes that the WHI cannot be viewed as a primary prevention trial until the results are analyzed according to years from menopause. In his view, initiating estrogen and progestin at an average age of 63 years in asymptomatic women (as in the WHI trial) misses the opportunity to test early prevention of coronary heart disease in women who are truly estrogen deficient. Of course that question would require another study, and that study seems unlikely to be funded, given that other preventive treatments (especially statins) are able to reduce coronary heart disease incidence even in moderate risk patients in whom the atherosclerotic process is presumably well-established, although subclinical (2). The need to interpret expanding new knowledge that may be conflicting is a challenge for reproductive medicine clinicians and others. Fortunately, evaluating the medical literature has been made easier by evidence-based medicine (EBM), which is the “judicious and conscientious use of current best evidence from medical care research for making medical decisions” (3–5). Medical care research involves studies that are done in typical clinical practices among patients who have typical clinical problems. While the hypotheses tested in these clinical studies may originate with laboratory and animal research, only the results obtained among patients are applicable in clinical practice. Naturally, time prevents clinicians from pursuing every uncertainty in their practices, but by using EBM when the priority is high, such as the appropriate use of hormone replacement therapy, clinicians can improve their efficiency in the assessment of evidence. At the heart of EBM is the assessment of the validity, importance, and relevance of a given study. Valid studies use methods that reduce bias and uncertainty. Important effects are large enough to matter in clinical practice. If the methods are valid and the results are important, the study will be most useful when the results are relevant to typical patients and feasible in the clinician’s practice setting.

Validity What was the validity of the WHI study of the benefits and risks of long-term use of estrogen and progestin among asymptomatic postmenopausal women? Was it “definitive,” “unequivocal,” and “solid,” or “problematic” with “unanswered questions,” as Dr. Speroff wonders? Briefly, the WHI study was a randomized, double-blinded, controlled trial that involved 16,608 healthy postmenopausal women aged 50 –79 who were allocated to treatment with placebo or 0.625 mg of conjugated equine estrogen and 2.5 mg of medroxyprogesterone acetate. Stopping rules were published in 1996, shortly after the trial started (6). The statistical analysis was by time to event, yielding estimates of hazard ratios (HR) or relative hazards that are essentially equivalent to relative risks over time. Validity is addressed by questions about whether the two groups began the study with the same prognosis and were managed through treatment and follow-up in a similar fashion. The most important questions (Table 1, questions 1– 4) are similar in most EBM sources. Supplementary questions (Table 1, questions 5–7) differ somewhat among sources, but these are relatively straightforward and will be passed over here.



20

Was the assignment of patients to treatments randomized? Randomization is important because in designs lacking random allocation the patient groups might not be similar in all respects other than the intervention. In the WHI study eligible women were randomly assigned to receive either estrogen plus progestin or placebo by means of a randomization procedure developed at the Clinical

Sexuality, Reproduction & Menopause, Vol. 1, No. 1, October 2003

Table 1: Questions Concerning the Validity of Treatment Studies 1. Was the assignment of patients to treatments randomized? 2. Was the randomization list (allocation sequence) concealed? 3. Was follow-up sufficiently long and complete? 4. Were all patients analyzed in the group to which they were randomized? Supplementary questions might include some or all of the following: 5. Were patients, clinicians, and outcome assessors blind to allocation? 6. Were the groups similar at the start of the trial? 7. Were groups treated equally apart from the experimental treatment?

Coordinating Center and implemented locally, using a block algorithm that was stratified by clinical center site and age group. Randomization in blocks ensures that allocations will be evenly distributed over time, while stratification ensures that allocations are balanced within each site and age group.



Was the randomization list concealed? Concealment is important to avoid unconscious or deliberate steering of the next patient to be randomized away from the study when study and health care professionals have knowledge of the next allocation and a feeling that it might not be best for the next patient. In the WHI study, effective concealment was achieved by arranging for all study medication bottles to have a unique bottle number and bar code to allow for blinded dispensing.



Was follow-up sufficiently long and complete? The trial was well managed, with only 3.5% loss to follow-up, but given the early termination, it is legitimate to question whether the average of 5.2 years of follow-up was sufficiently long. Trends in coronary heart disease and breast cancer incidence were not stable in the sixth and seventh years of observation.



Were all patients analyzed in the group to which they were randomized? This would be an analysis by intention to treat, where subjects are analyzed in the group to which they were randomly allocated regardless of whether they dropped out, discontinued the assigned treatment, or switched to another treatment. Analysis by intention to treat resembles clinical practice where patients frequently change their plans about continuing on a prescribed drug. In the WHI estrogen and progestin group, 42% discontinued the study drug and 6% switched to another drug; in the placebo group, 38% discontinued and 11% switched to an HT drug. Both rates are similar to the rates in clinical practice (7).

Importance As the WHI trial of estrogen and progestin appears to score well on the validity questions, the next questions concern importance: Are the results of this valid study important enough to matter in clinical practice? Would patients be interested in hearing about this outcome? Is the effect large enough to make a difference in their lives? If the treatment effect is not large enough to interest patients, the P value is of no importance. The questions to ask are the following: [1] How large is the treatment effect? and [2] How precise is the estimate of the treatment effect? As Dr. Wenger has given a detailed evaluation of the coronary heart disease outcomes in the WHI study, this section will address the breast cancer outcomes. The main WHI hazard ratios are shown in Table 2. There were three primary outcomes: the primary benefit outcome was nonfatal myocardial infarction and cardiac deaths (CHD); the primary adverse outcome was breast cancer; the third primary outcome was a global index comprising the sum of each other outcome in Table 2. The secondary outcomes were stroke and venous thromboembolism, endometrial and colorectal cancer, and hip and vertebral fractures. The precision of the primary estimates was expressed by means of nominal 95% confidence intervals (CI). This presentation provides more information than does a simple P value: when the confidence interval includes one, the hazard ratio does not represent a significant difference Sexuality, Reproduction & Menopause, Vol. 1, No. 1, October 2003

21

Table 2: Relative Effect of Estrogen and Progestin on Important Clinical Outcomes During an Average 5.2 Years of Use Outcome

HR*

Nominal** or Adjusted 95% CI

Coronary heart disease Breast cancer Global index Stroke Venous thromboembolism Endometrial cancer Colorectal cancer Hip fracture Vertebral fracture

1.29 1.26 1.15 1.41 2.11

(1.02,1.63)** (1.00,1.59)** (1.03,1.28)** (0.86,2.31) (1.26,3.55)

0.83 0.63 0.66 0.66

(0.29,2.32) (0.32,1.24) (0.33,1.33) (0.32,1.34)

*HR ⫽ hazard ratio. **Nominal confidence intervals apply to the three primary outcomes. All other confidence intervals are adjusted by a factor to allow for multiple significance testing.

in risk between the hormone and placebo groups. For secondary outcomes, however, adjusted confidence intervals were given because of the number of statistical tests that were done; in essence, the usual ␣ (5%) was divided by 7. After this adjustment, the only secondary outcome that was significantly altered by estrogen and progestin use was the increase in venous thromboembolism incidence. How large is the treatment effect? In clinical trials the treatment event rate can be compared with the control event rate relatively (the relative risk) or absolutely (the risk difference). The hazard ratios presented in the WHI study correspond to the relative risk. In clinical practice, the measure of effect that usually makes the most sense is the risk difference, because it is a natural description of the difference between groups (the experimental event rate minus the control event rate). The WHI study presented absolute differences in risk as annualized percentages, but these small fractions are easier to understand when expressed as events per 10,000 women per year (Table 3). The inverse of risk difference is the number needed to treat (NNT), which describes how many persons would need to receive the intervention before there would be one additional or one less event, as compared with the controls. The NNT is usually defined within a period of time, which would be 1 year in the WHI example. For invasive breast cancer, the hazard ratio was 1.26 (95% CI 1.00, 1.59) in the estrogen and progestin group compared with the placebo group. There were 38 and 30 cases per 10,000 women per year, respectively, in the hormone and placebo groups, an absolute difference of 8 cases per 10,000 women per year. The NNT is 10,000/8 ⫽ 1,250: for every year of treatment there would be one additional breast cancer for every 1,250 patients assigned to estrogen and progestin, compared with placebo. How precise is the treatment effect? The result of a clinical trial is a single estimate—that is, an approximate calculation— of the true difference between experimental and control treatment. It would be useful to know how close this point estimate might lie to the unknown true value; for this purpose, medical care research usually relies on the 95% CI, which is analogous to the 5% level of significance. In the WHI study the relative hazard for invasive breast cancer in the HT group was 1.26 (95% CI 1.00, 1.59); if 20 RCTs evaluated the effect of estrogen and progestin on breast cancer incidence with similar methods and in a similar setting, 19 of them would have relative hazards ranging from 1.00 to 1.59. The confidence interval just includes unity, but it makes no sense to quibble over significance, given that the power of the trial was reduced by stopping early, and that this estimate is consistent with the small effect in the epidemiological literature to date, as noted by Dr. Speroff.

22

Sexuality, Reproduction & Menopause, Vol. 1, No. 1, October 2003

Table 3: Absolute Effect of Estrogen and Progestin on Important Clinical Outcomes During an Average 5.2 Years of Use Outcome

Coronary heart disease Breast cancer Global index Stroke Venous thromboembolism Endometrial cancer Colorectal cancer Hip fracture Vertebral fracture

Number of events per 10,000 woman-years HT

Placebo

HT Excess

37 38 170 29 34 5 10 10 9

30 30 151 21 16 6 16 15 15

7 8 19 8 18 ⫺1 ⫺6 ⫺5 ⫺6

Table 4: Questions Concerning the Relevance of Valid, Important Results to Patients 1. Were the study patients similar to the patients in my practice? 2. Were all clinically important outcomes considered? 3. Are the likely treatment benefits worth the harm and cost?

If the hazard ratio was not significantly different, why was the trial stopped early? The z value associated with the “design-specified weighted” log-rank test for breast cancer was ⫺3.19, which is consistent with a P value of .001. The small P value suggests that the test statistic incorporated the trend in the semiannual analyses that were done to decide whether the trial should be stopped early. Thus, the increased breast cancer incidence with use of estrogen and progestin by healthy postmenopausal women was marginally significant. Also, the absolute effect was small when assessed from the perspective of a single patient or a single clinician. However, eight additional breast cancer cases per 10,000 users would be important at a national level. Therefore, the WHI study results should lead to a change in prescribing policies, unless these policies already reflect the similar information from epidemiological studies. The small additional risk with estrogen and progestin treatment should not induce fear or panic among individuals. The WHI study results also included reassuring details about the breast cancer risk that will be considered in the section on clinical reality.

Relevance Finally, in evaluating medical care research evidence, judgments are made about whether the valid, important study results are relevant to clinical practice. The questions are listed in Table 4. Were the study patients similar to the patients in my practice? The WHI study involved long-term use of estrogen and progestin among women aged 50 –79 years, most of whom were healthy and not having troublesome vasomotor symptoms. Because the enrollment and analyses were stratified by age group, the results apply directly to symptom-free women aged 50 –59 years even though only 33% of the women were in this age group. In contrast, the most common indication for estrogen and progestin is vasomotor symptoms. While the WHI study results are not directly relevant to symptomatic patients, these high-quality results cannot be Sexuality, Reproduction & Menopause, Vol. 1, No. 1, October 2003

23

ignored unless more relevant studies become available that have similar quality and directly evaluate important clinical outcomes. Thus, the WHI finding that estrogen and progestin is not indicated for the primary prevention of heart disease applies generally in clinical practice. Were all clinically important outcomes considered? The WHI study reported on cardiovascular outcomes, including nonfatal myocardial infarction, cardiac deaths, stroke and venous thromboembolism (VSE); cancers of the breast, endometrium, and large bowel; and osteoporotic fractures. It did not report on common side effects such as bleeding, breast discomfort, and weight gain and by design, it cannot report on severe menopausal symptoms. As Dr. Wenger notes, among women who considered themselves healthy, there were no meaningful effects of estrogen and progestin on general health, vitality, mental health, depressive symptoms, or sexual satisfaction, and only small (but statistically significant because of the large numbers) improvements in sleep disturbance, physical functioning, and bodily pain after 1 year (8). Until now, the WHI has been silent on outcomes such as urinary incontinence, senile dementia, and ovarian cancer. What alternative treatments are available? Typically, hot flash frequency declines by 80% within 1 month of initiating estrogen and progestin treatment, even among women with severe and frequent hot flashes (9). Other choices are less effective: evidence on exercise and reductions in caffeine and alcohol intake is of poor quality (10, 11); herbal remedies, although widely used, are ineffective (12); and the effects of nonhormonal pharmacological treatments are not impressive (13–15). When life is disturbed by hot flashes, there is no reasonable alternative to estrogen and progestin treatment. Could other estrogen–progestin preparations, lower dosages, or different routes of delivery have a better balance of benefits and risks than the WHI study medication? Theoretically that may be possible, but the relevant studies involve only metabolic, short-term, or surrogate outcomes. There are no studies of important clinical outcomes among large numbers of women followed for a clinically relevant length of time. Until such studies emerge, the WHI results are the benchmark for the balance of benefits and risks in the class of drugs involving estrogen combined with progestin.

Good clinical practice depends on knowledge of the current best medical care research evidence, but how can clinicians quickly determine what is best? The Clinical Reality The most common indication for estrogen and progestin is symptomatic estrogen deficiency in women aged 50 –59, who typically will use the prescription for less than 3 years. Do the WHI study results add to knowledge about adverse events for such women who have severe vasomotor symptoms? The three commentaries remind us that the WHI results include: [a] a small but significant increase in coronary heart disease; [b] a small increase in breast cancer risk; [c] a twofold increase in VTE risk that confirms the effect identified in epidemiological studies; and [d] a potential increased stroke risk that was not significant after adjustment. The VTE incidence increases from 16 –34 cases and stroke incidence from 21–29 cases per 10,000 women per year. At age 50 –59 the baseline incidences of cardiovascular disease and breast cancer are lower than at age 63, the average age at screening for entry into the WHI trial. Thus the absolute risks would be slightly smaller than the average WHI estimates.

24

Sexuality, Reproduction & Menopause, Vol. 1, No. 1, October 2003

Even so, these risks are not inconsequential. Healthy oral contraceptive users aged 40 – 49 have only two more cardiovascular events per 10,000 women per year than nonusers (16). In the treatment of disease, however, higher risks may be tolerated. Among arthritic patients treated with anti-inflammatory drugs for up to 1 year, the rate of death from cardiovascular disease was 0.2%, or 20 per 10,000 per year (17). In the WHI study there were 15 and 13 cardiovascular deaths, respectively, in the estrogen and progestin and placebo groups. Therefore, the balance of benefits over risks with estrogen and progestin may be reasonable for women with moderate to severe estrogen deficiency symptoms. For many nonhormonal options, adverse event profiles have yet to be determined in appropriate studies involving large numbers of women. With respect to breast cancer risk, the WHI story is not complete without an examination of additional information. Dr. Speroff reminds readers that in situ breast cancer incidence was not significantly different between hormone and placebo groups. A further salient WHI finding was the 4-year delay before breast cancer incidence was significantly higher in the estrogen and progestin group than the placebo group. Moreover, breast cancer risk was negligible with estrogen plus progestin use for women who never used postmenopausal hormones before enrolment in the WHI study (HR, 1.06; 95% CI, 0.81–1.38). The risk was twofold or higher for the minority who were prior users of estrogen and progestin. It is also important to recall that diagnostic bias cannot be ruled out for breast cancer: although follow-up rates for mammography were comparable in the estrogen plus progestin and placebo groups, breast density is increased in so many estrogen and progestin users that mammography readings were unlikely to be blinded (18, 19). Whether the increased breast cancer incidence in the estrogen and progestin group was due to earlier detection of breast cancer or to induction of new disease may become clear when the analysis of longer follow-up indicates whether breast cancer mortality is increased in the estrogen and progestin group. Given that hormonal treatment is the best current choice for estrogen deficiency symptoms, what are the choices? Women who have had a hysterectomy can take unopposed estrogen: the WHI is continuing its study of estrogen only, in which the balance of benefits and risks has not yet been established. For women with a uterus, however, the increased endometrial cancer risk is too high to consider use of unopposed estrogen in the absence of regular endometrial evaluation. Women with a uterus can consider short-term use of estrogen and progesterone with annual review of the balance of their personal benefits and risks.

John A. Collins, M.D. 400 Mader’s Cove Road RR # 1, Mahone Bay Nova Scotia B0J 2E0, Canada [email protected]

References 1.

2.

3. 4.

5.

6.

Writing Group for the Women’s Health Initiative Investigators. Risks and benefits of estrogen plus progestin in healthy postmenopausal women: principal results from the women’s health initiative randomized controlled trial. JAMA 2002;288:321-33. Pignone M, Phillips C, Mulrow CD. Use of lipid lowering drugs for primary prevention of coronary heart disease: meta-analysis of randomised trials. BMJ 2000;321:983-6. Guyatt GH, Rennie D. Users’ guides to the medical literature. JAMA 1993;270:2096-7. The Evidence-Based Medicine Working Group. Users’ Guides to the Medical Literature: A Manual for Evidence-Based Clinical Practice. Chicago, AMA Press, 2002. Sackett DL, Rosenberg WMC, Gray JAM, Haynes RB, Richardson WS. Evidence-based medicine: what it is and what it isn’t. BMJ 1996;312:56-7.

7.

8.

9.

10.

Freedman L, Anderson G, Kipnis V, Prentice R, Wang CY, Rossouw J, et al. Approaches to monitoring the results of long-term disease prevention trials: examples from the Women’s Health Initiative. Controlled Clin Trials 1996;17:509-25. The ESHRE Capri Workshop Group. Continuation rates for oral contraceptives and hormone replacement therapy. Hum Reprod 2000;15:186571. Hays J, Ockene JK, Brunner RL, Kotchen JM, Manson JE, Patterson RE et al. Effects of estrogen plus progestin on health-related quality of life. N Engl J Med 2003;348:1839-54. Speroff L, Symons J, Kempfert N, Rowan J, femhrt Study Investigators. The effect of varying low-dose combinations of norethindrone acetate and ethinyl estradiol (femhrt) on the frequency and intensity of vasomotor symptoms. Menopause 2000;7:383-90. Brewer D, Nashelsky J, Hansen LB. What nonhormonal therapies are effective for postmeno-

Sexuality, Reproduction & Menopause, Vol. 1, No. 1, October 2003

25

11.

12.

13.

14.

15.

26

pausal vasomotor symptoms? J Fam Pract 2003; 52:324-7. Kronenberg F, Fugh-Berman A. Complementary and alternative medicine for menopausal symptoms: a review of randomized, controlled trials. Ann Intern Med 2002;137:805-13. Davis SR, Briganti EM, Chen RQ, Dalais FS, Bailey M, Burger HG. The effects of Chinese medicinal herbs on postmenopausal vasomotor symptoms of Australian women. A randomised controlled trial. Med J Aust 2001;174:68-71. Clayden J, Bell J, Pollard P. Menopausal flushing: double-blind trial of a non-hormonal medication. BMJ 1974;1:409-12. Loprinzi CL, Kugler JW, Sloan JA, Mailliard JA, Lavasseur BI, Barton DL, et al. Venlafaxine in management of hot flashes in survivors of breast cancer: a randomised controlled trial. Lancet 2000; 356:2059-63. Loprinzi CL, Sloan JA, Perez EA, Quella SK, Stella PJ, Mailliard JA, et al. Phase III evaluation of

16.

17.

18.

19.

fluoxetine for treatment of hot flashes. J Clin Oncol 2002;20:1578-83. Farley TMM, Collins JA, Schlesselman JJ. Hormonal contraception and risk of cardiovascular disease: an international perspective. Contraception 1998;57:211-30. Bombardier C, Laine L, Reicin A, Shapiro D, Burgos-Vargas R, Davis B, et al. Comparison of upper gastrointestinal toxicity of rofecoxib and naproxen in patients with rheumatoid arthritis. N Engl J Med 2000;343:1520-8 VIGOR Study Group. Greendale GA, Reboussin BA, Sie A, Singh HR, Olson LK, Gatewood O, et al. Effects of estrogen and estrogen-progestin on mammographic parenchymal density. Ann Intern Med 1999;130:262-9. Rutter CM, Mandelson MT, Laya MB, Seger DJ, Taplin S. Changes in breast density associated with initiation, discontinuation, and continuing use of hormone replacement therapy. JAMA 2001; 285:171-6.

Sexuality, Reproduction & Menopause, Vol. 1, No. 1, October 2003