Progesterone, Inhibin, and hCG Multiple Marker Strategy to Differentiate Viable From Nonviable Pregnancies MAUREEN GLENNON PHIPPS, MD, JOSEPH W. HOGAN, ScD, JEFFREY F. PEIPERT, MD, MPH, GERALYN M. LAMBERT-MESSERLIAN, PhD, JACOB A. CANICK, PhD, AND DAVID B. SEIFER, MD Objective: To determine whether a combination of serum and urine biomarkers drawn from symptomatic pregnant women will help early differentiation of viable from nonviable pregnancies. Methods: We conducted a prospective cohort study of 220 women who presented in the first trimester of pregnancy with complaints of pain, cramping, bleeding, or spotting. Serum samples for progesterone, inhibin A, and hCG, and urine beta-core hCG, were collected at presentation. To evaluate whether those biomarkers could predict viable and nonviable outcomes in pregnancy, we used likelihood ratios to compare operating characteristics of single and multiple biomarker strategies. Results: Of 220 pregnancies studied, 98 were viable and 122 nonviable. Among single biomarkers, progesterone alone appears to have the greatest utility (area under the receiver operator characteristic curve ⴝ 0.923). Among dualbiomarker strategies, progesterone plus hCG and progesterone plus inhibin A improved specificity but not sensitivity. At 95% sensitivity, the combination of progesterone and hCG improved specificity from 0.29 to 0.66 (improvement ⴝ 0.37 [95% confidence interval 0.23, 0.52]). A triple-biomarker combination did not show substantial improvement over the dual-biomarker strategy. Also, combinations that used urine beta-core hCG did not improve diagnostic accuracy.
From the University of Michigan Health System, Robert Wood Johnson Clinical Scholars Program and Department of Obstetrics and Gynecology, Ann Arbor, Michigan; the Center for Statistical Sciences, Department of Community Health, Brown University, Providence, Rhode Island; Department of Obstetrics and Gynecology and Department of Pathology and Laboratory Medicine, Women and Infants Hospital of Rhode Island, Providence, Rhode Island; and the University of Medicine and Dentistry of New Jersey-Robert Wood Johnson Medical School, Department of Obstetrics and Gynecology, New Brunswick, New Jersey. The following companies provided assay reagents for this study: Diagnostic Products Corp., Los Angeles, California and Chiron Diagnostics Corp., Alameda, California.
VOL. 95, NO. 2, FEBRUARY 2000
Conclusion: Serum progesterone appeared to be the single most specific biomarker for distinguishing viable from nonviable pregnancies. When a dual-biomarker strategy was applied, combining serum progesterone with hCG, specificity improved significantly, which suggests that a multiple biomarker strategy might help distinguish viable from nonviable pregnancies in early gestation. (Obstet Gynecol 2000; 95:227–31. © 2000 by The American College of Obstetricians and Gynecologists.)
Many biomarkers in serum, including hCG, progesterone, estradiol (E2), alpha-fetoprotein, fetal fibronectin, and inhibin A, have been studied to determine whether they could help diagnose ectopic pregnancies.1– 4 Progesterone’s utility as a biomarker has been well demonstrated.5–7 Inhibin A was shown to be lower in ectopic pregnancies compared with intrauterine pregnancies.3,8 Urine -core hCG, the major metabolite of hCG in maternal urine, has been studied as a potential biomarker for determining ectopic pregnancies compared with normal pregnancies.9 Combinations of biomarkers have also been studied to support rapid diagnosis of ectopic pregnancies.1 The purpose of this study was to determine whether a combination of multiple serum and urine biomarkers from symptomatic women at first-trimester clinical presentation could differentiate a viable from a nonviable pregnancy. We conducted a prospective cohort study of women who presented with complaints of pain, cramping, bleeding, or spotting. Serum quantitative hCG, serum progesterone, serum inhibin A, and urine -core hCG were evaluated independently and in combination to determine their accuracy in predicting viability of a pregnancy.
0029-7844/00/$20.00 PII S0029-7844(99)00480-9
227
Materials and Methods We used an observation cohort of 238 pregnant women who presented to the Women and Infants Hospital of Rhode Island urgent care unit between June 1996 and March 1997. Women were eligible if they presented with complaints of bleeding, spotting, pain, or cramping in the first trimester (less than 13 weeks’ gestation). We limited our sample to spontaneously conceived pregnancies. Pregnancy outcomes determined by review of the medical records were known in all subjects included in analysis. Of 238 women initially enrolled, 18 were excluded from analysis because pregnancy outcomes could not be definitively established in 12 and gestational age was unknown in six. Seven women who had therapeutic abortions were included in analysis because there was documentation of a viable pregnancy before termination. The Women and Infants Hospital Institutional Review Board approved the research in June 1996. To determine marker values each eligible subject had serum drawn and a urine sample collected at presentation to the urgent care unit. Serum and urine samples were placed in aliquots and frozen at ⫺20C until assays were done. Commercially available assays were used to analyze samples for quantitative hCG (Immulite; Diagnostic Products Corp., Los Angeles, CA), progesterone (Immulite; Diagnostic Products Corp.), inhibin A (enzyme-linked immunosorbant assay by Serotec, Ltd., Oxford, United Kingdom) and urine -core hCG (Titron; Chiron Diagnostics Corp., Alameda, CA). Assays were done without knowledge of pregnancy outcomes and results did not influence treatment. Besides pregnancy status and marker data, we recorded several demographic and historical variables including maternal age, gestational age, gravidity, parity, race, insurance status, and reproductive history. For our primary analysis, we compared operating characteristics (ie, measures of diagnostic accuracy) of progesterone only (P), progesterone plus inhibin A (P⫹I), progesterone plus serum hCG (P⫹H), and progesterone plus serum hCG plus inhibin A (P⫹H⫹I). Our goals were to quantify information gained using multiple biomarker strategies compared with progesterone alone and to determine the nature of differences between single- and multiple-biomarker strategies. Following Haddow et al10, we used likelihood ratios to quantify the maternal- and gestational-age-adjusted odds of nonviable pregnancies for a given combination of marker values. We used multivariate normal regression models11 to adjust for variability in biomarkers due to differences in maternal and gestational age and to account for correlations between biomarkers. Spearman
228 Phipps et al
Multiple Marker Strategy
nonparametric correlation was determined for each marker combination. We estimated the area under the receiver operator characteristic (ROC) curve based on the likelihood ratios calculated for each screening strategy. To understand the differences between strategies, we compared specificity for given values of sensitivity and compared sensitivity for given values of specificity. For example, because progesterone alone was a very sensitive test, we evaluated various multiple biomarker strategies by comparing their specificity at 95% sensitivity. That involved determining the likelihood ratio cutoff values that corresponded to 95% sensitivity, estimating specificity for the cutoff, and estimating the difference in specificity (and associated 95% confidence interval [CI]) between biomarker combinations. Comparisons of sensitivity at given values of specificity were done similarly. Confidence intervals for the differences in sensitivity and specificity were calculated using the normal approximation to the sampling distribution of each estimated difference, making proper adjustment for within-subject correlation.12 An important component of our analysis is the calculation of maternal- and gestational-age-adjusted likelihood ratios. Conditional on maternal and gestational age, the likelihood ratio for a woman with a particular set of marker values is the odds that she will have a nonviable pregnancy, denoted by LR ⫽ Pr(NV兩M, C)/Pr(V兩M, C), where Pr(A兩B) is the conditional probability of A given B, NV and V, respectively, denote nonviable and viable, M represents a set of marker values and C denotes individual characteristics that might affect marker values (maternal and gestational age). Likelihood ratios can be computed by logistic regression,13 but using correlated markers can lead to loss of efficiency and problems with collinearity. Instead, we computed likelihood ratios by modeling marker distribution separately by viability status. Using Bayes theorem, the likelihood ratio can be expressed in terms of the (multivariate) marker distributions, so that LR ⫽ [Pr(M兩NV, C) Pr(NV兩C)]/[Pr(M兩V, C) Pr(V兩C)]. In general, covariates such as maternal and gestational age might affect marker distribution and viability status. We assume that maternal and gestational age do not affect belief about viability status before collecting marker data, the likelihood ratio is proportional to Pr1 (M兩NV, C)/Pr0 (M兩V, C). The parameters 1 and 0 emphasize that marker values follow separate models by viability status. Individual likelihood ratios are calculated by estimating 1 and 0, then evaluating Pr1 (M兩NV, C)/Pr0 (M兩V,C) using individual marker values, maternal age, and gestational age. Models for the numerator (NV) and denominator (V) of the likelihood ratios were fit under the assumptions that after a suitable transformation,
Obstetrics & Gynecology
Table 1. Demographic and Clinical Characteristics Characteristic Age (y) Race White Black Hispanic Other Gravidity 1 2 ⱖ3 Parity 0 1 ⱖ2 Symptoms Pain Cramping Bleeding Spotting
Viable (n ⫽ 98)
Nonviable (n ⫽ 122)
24.1 ⫾ 5.5
27.9 ⫾ 6.6
48 (49) 27 (28) 23 (23) 0 (0)
70 (57) 20 (16) 28 (23) 4 (4)
34 (35) 15 (15) 49 (50)
34 (28) 21 (17) 67 (55)
43 (44) 25 (26) 30 (31)
50 (41) 37 (30) 35 (29)
57 (58) 34 (35) 21 (21) 22 (22)
48 (39) 53 (43) 73 (60) 34 (28)
Table entries are total n (%) except for age, which is mean ⫾ standard deviation.
each set of marker values follows a multivariate normal distribution and that the mean marker value varies linearly with maternal age, possibly up to a cubic function of gestational age. We used natural log transformations for progesterone and inhibin A, and 1/6 power transformation for hCG. The estimated parameters ˆ 1 and ˆ 0 contain regression parameters, marker standard deviations, and marker correlations for the fitted multivariate models. Each woman’s likelihood ratio was computed using LRi ⫽ f1 (Mi兩NV, Ci)/f0 (Mi兩V, Ci), where i indexes subject and f is the multivariate normal density function, with dimension equal to the number of markers. Under our assumptions, LRi is proportional to (ie, has the same rank-ordering across individuals as) the odds of having a nonviable pregnancy for a given set of marker values, adjusted for maternal and gestational age. The implication is that it can be used for nonparametric calculations of sensitivity, specificity, and ROC curves. Fitted models used for likelihood ratio calculations were checked using residual plots, and orthogonal Table 2. Biomarker Distribution Percentile Nonviable (n ⫽ 122)
Viable (n ⫽ 98) Biomarker
25%
50%
75%
25%
50%
75%
Serum hCG (IU/mL) Progesterone (ng/mL) Inhibin A (pg/mL) Urine -core (fmol/mL)
1512 16.8 19.5 0.1
23,650 19.0 51.8 17.5
62,100 27.0 128.9 81.9
339 3.0 8.0 0.1
2,457 5.7 8.0 1.2
7680 9.8 27.2 10.8
VOL. 95, NO. 2, FEBRUARY 2000
polynomials were used to avoid overfitting of the marker distributions to covariates. All analyses were done using SAS Version 6.12 (SAS Institute, Cary, NC); multivariate models were fit using SAS Proc Mixed.
Results Pregnancy outcomes included 98 viable intrauterine pregnancies, 85 first-trimester spontaneous abortions, and 37 ectopic pregnancies. Viable intrauterine pregnancies were defined as viable pregnancies (45%); spontaneous abortions and ectopic pregnancies were defined as nonviable pregnancies (55%). The mean age of subjects with viable pregnancies was lower than that of subjects with nonviable pregnancies (24.1 versus 27.9 years, P ⬍ .01). Other demographic information and clinical characteristics are given in Table 1. There were no significant differences in race, gravidity, or parity. The overall distribution for each biomarker, given outcomes of viable or nonviable pregnancy, was divided into percentiles (Table 2). For progesterone, the viable pregnancy value for the 25th percentile was well above the nonviable pregnancy value at the 75th percentile. There was more overlap between other biomarkers. The greatest disparity between nonviable and viable pregnancies was in the distribution of progesterone values. As a single marker, progesterone had the greatest area under the ROC curve (0.923, compared with 0.795 for inhibin A, 0.646 for urine -core hCG, and 0.736 for serum hCG). In formulating multiple-marker strategies, we chose serum hCG rather than urine -core hCG because the two correlated so strongly (Spearman rank correlation 0.90) and because serum hCG is the biomarker used most commonly by practitioners for evaluating pregnancy viability. For other biomarker combinations, the pairwise rank correlation ranged from 0.32 for progesterone and urine -core hCG to 0.73 for serum hCG and inhibin A. Multiple-marker strategies resulted in improvement in the area under the ROC curve. After incorporating maternal and gestational age–adjusted likelihood ratios, the area under the ROC curve for P was .91. When serum hCG was added it increased to .95 (P⫹H), with inhibin A to .94 (P⫹I), and the triple marker combination had an area of .95 (P⫹I⫹H). Area under the ROC curve is a global measure for characterizing utility, and interpreting clinical implications of differences in area under the ROC curve can be difficult. In more detail, we compared sensitivities at fixed specificity, and specificity at fixed sensitivities. Table 3 compares specificity for fixed sensitivities. At 95% sensitivity, the specificity for P was 29%. At the same sensitivity, using the dual biomarker strategy of
Phipps et al
Multiple Marker Strategy
229
Table 3. Estimated Specificity at Given Sensitivity for Biomarker Combinations
Table 5. Estimated Sensitivity at Given Specificity for Biomarker Combinations
Sensitivity
P
P⫹I
P⫹H
P⫹I⫹H
Specificity
P
P⫹I
P⫹H
P⫹I⫹H
.95 .90 .85 .80 .75
.29 .70 .88 .90 .97
.57 .83 .87 .95 .99
.66 .87 .94 .97 ⬎.99
.69 .87 .93 .97 ⬎.99
.95 .90 .85 .80 .75
.83 .89 .92 .93 .93
.82 .85 .90 .91 .92
.83 .87 .92 .93 .93
.83 .89 .92 .93 .93
P ⫽ progesterone alone; P⫹I ⫽ progesterone plus inhibin A; P⫹H ⫽ progesterone plus serum hCG; P⫹I⫹H ⫽ progesterone plus inhibin A plus serum hCG.
P⫹I, the specificity increased to 57% and the specificity for P⫹H was 66%. Based on 95% confidence intervals for difference in specificity, those combinations represent statistically significant gains in specificity over P. Specificity for the combination P⫹H was 9% higher than P⫹I (95% CI ⫺0.02, 0.20). Table 4 shows comparisons between specificity for P and combinations with the other biomarkers when sensitivity is 85% and 95%, which indicated that P⫹H is the superior choice. At 95% sensitivity, P⫹I⫹H did not show an appreciable gain in specificity over P⫹H (0.69 compared with 0.66). At 95% specificity, P had sensitivity of 83%. Table 5 shows minimal gains in sensitivity among the various biomarker strategies at fixed specificity values between 0.75 and 0.95.
Discussion We evaluated diagnostic usefulness of biomarkers for distinguishing viable from nonviable early pregnancies in symptomatic women who presented for urgent care. It has been well established that measuring serial serum quantitative hCG is helpful in treating symptomatic women in early gestation.14 In clinical practice, the time delay necessary for distinguishing a viable from a nonviable pregnancy is often distressing to women and practitioners when women present with symptoms in an emergency setting. Adding serum progesterone to the diagnostic tests can be helpful in clinical management if the level is under 10 ng/mL, suggesting a
Table 4. Difference in Specificity Sensitivity (95% confidence interval) Comparison
0.85
0.95
P vs P⫹I P vs P⫹H P⫹I vs P⫹H
⫺0.01 (⫺0.08, 0.06) 0.06 (⫺0.02, 0.14) 0.07 (0.00, 0.14)
0.28 (0.14, 0.43) 0.37 (0.23, 0.52) 0.09 (⫺0.02, 0.20)
P ⫽ progesterone alone; P⫹I ⫽ progesterone plus inhibin A; P⫹H ⫽ progesterone plus serum hCG. 95% confidence interval widths adjusted for the six multiple comparisons using the Bonferroni method.
230 Phipps et al
Multiple Marker Strategy
Abbreviations as in Table 3.
nonviable pregnancy, or at least 25 ng/mL, suggesting a viable pregnancy.4,5 However, in our study, 41% of subjects had progesterone levels in the range of 10 –25 ng/mL. A highly sensitive and specific test to determine viability would be useful in a clinical setting when women present with acute symptoms and a decision regarding treatment must be made quickly. A highly sensitive biomarker more accurately identifies viable pregnancies (true positives) and a highly specific biomarker more accurately identifies nonviable pregnancies (true negatives). Treatment of women who present with cramping and spotting in the first trimester of pregnancy would be better guided by a sensitive and specific test that would reliably categorize prognoses for pregnancies. Thus far, there is no single test that accurately predicts pregnancy outcome in urgent situations. Of the single biomarkers, progesterone has the greatest utility as measured by area under the ROC curve. Although dual-biomarker strategies improved the area under the ROC curve, we found that applying the multiple-biomarker strategies improved specificity but not sensitivity. For improving specificity, the dual biomarker combination progesterone plus serum hCG (P⫹H) is better than progesterone plus inhibin A (P⫹I) and as good as the triple biomarker combination progesterone plus inhibin A plus serum hCG (P⫹I⫹H). The addition of the urine -core hCG was not helpful in this study. Our study had several limitations. It was conducted from a convenience sample of symptomatic women in early gestation who conceived spontaneously. Thus, the results cannot be generalized to asymptomatic pregnant women or women who pursued ovulation induction and assisted reproductive technologies to achieve pregnancy. Our computation of likelihood ratios relied on a (carefully constructed) parametric model to adjust for differences in maternal and gestational ages, both of which have underlying associations with various marker values. The assumptions are needed because of limited sample size. Having data from large, population-based samples would reduce the need to rely on them.
Obstetrics & Gynecology
Our study suggests the need for formalizing a multiple-biomarker strategy for distinguishing viable from nonviable pregnancies. Clinically, that approach might be helpful in prognoses for pregnancies or identifying women with ectopic pregnancy. A study validating our multiple-biomarker screening strategy using a large population database might aid in predicting pregnancy outcomes.
References 1. Grosskinsky CM, Hage ML, Tyrey L, Christakos AC, Hughes CL. hCG, progesterone, alpha-fetoprotein, and estradiol in the identification of ectopic pregnancy. Obstet Gynecol 1993;81:705–9. 2. Ness RB, McLaughlin MT, Heine RP, Bass DC, Mortimer L. Fetal fibronectin as a biomarker to discriminate between ectopic and intrauterine pregnancies. Am J Obstet Gynecol 1998;179:697–702. 3. Seifer DB, Lambert-Messerlian GM, Canick JA, Frishman GN, Schneyer AL. Serum inhibin levels are lower in ectopic than intrauterine spontaneously conceived pregnancies. Fertil Steril 1996;65:667–9. 4. Stovall TG, Ling FW, Andersen RN, Buster JE. Improved sensitivity and specificity of a single measurement of serum progesterone over serial quantitative beta-human chorionic gonadotropin in screening for ectopic pregnancy. Hum Reprod 1992;7:723–5. 5. Cowan BD, Vandermolen DT, Long CA, Whitworth NS. Receiveroperator characteristic, efficiency analysis, and predictive value of serum progesterone concentration as a test for abnormal gestations. Am J Obstet Gynecol 1992;166:1729 –37. 6. McCord ML, Muram D, Buster JE, Arheart KL, Stovall TG, Carson SA. Single serum progesterone as a screen for ectopic pregnancy: Exchanging specificity and sensitivity to obtain optimal test performance. Fertil Steril 1996;66:513– 6. 7. Dart R, Dart L, Segal M, Page C, Brancato J. The ability of a single serum progesterone to identify abnormal pregnancies in patients with beta-human chorionic gonadotropin values less than 1,000 mIU/mL. Acad Emerg Med 1998;5:304 –9.
VOL. 95, NO. 2, FEBRUARY 2000
8. D’Antona D, Mamers PM, Lowe PJM, Balazs N, Groome NP, Wallace EM. Evaluation of serum inhibin-A as a surveillance biomarker after conservative management of tubal pregnancy. Hum Reprod 1998;12:2305–7. 9. Cole LA, Kardana A, Seifer DB, Bohler HCL. Urine hCG -subunit core fragment, a sensitive test for ectopic pregnancy. J Clin Endocrinol Metab 1994;78:497–9. 10. Haddow JE, Palomaki GE, Knight GJ, Williams J, Miller WA, Johnson A. Screening of maternal serum for fetal Down’s syndrome in the first trimester. N Engl J Med 1998;338:955– 61. 11. Crowder MJ, Hand DJ. Analysis of repeated measures. 1st ed. London: Chapman & Hall, 1990. 12. Agresti A. Categorical data analysis. New York: Wiley & Sons, 1990. 13. Irwig L. Modeling result-specific likelihood ratios. J Clin Epidemiol 1992;45:1335– 6. 14. Speroff L, Glass RH, Kase NG. Clinical gynecologic endocrinology and infertility. 5th ed. Baltimore: Williams & Wilkins, 1994.
Address reprint requests to:
Maureen G. Phipps, MD University of Michigan Health System Robert Wood Johnson Clinical Scholars Program 6312 Medical Science Building I 1150 West Medical Center Drive Ann Arbor, MI 48109-0604 E-mail:
[email protected]
Received March 23, 1999. Received in revised form June 18, 1999. Accepted July 15, 1999.
Copyright © 2000 by The American College of Obstetricians and Gynecologists. Published by Elsevier Science Inc.
Phipps et al
Multiple Marker Strategy
231