A comparison of visual analyses of intrapartum fetal heart rate tracings according to the new National Institute of Child Health and Human Development guidelines with computer analyses by an automated fetal heart rate monitoring system

A comparison of visual analyses of intrapartum fetal heart rate tracings according to the new National Institute of Child Health and Human Development guidelines with computer analyses by an automated fetal heart rate monitoring system

A comparison of visual analyses of intrapartum fetal heart rate tracings according to the new National Institute of Child Health and Human Development...

38KB Sizes 0 Downloads 11 Views

A comparison of visual analyses of intrapartum fetal heart rate tracings according to the new National Institute of Child Health and Human Development guidelines with computer analyses by an automated fetal heart rate monitoring system Lawrence Devoe, MD,a Steven Golde, MD,a Yevgeny Kilman, MD,a Debra Morton, RN,b Kimberly Shea, RN, CNM,b and Jennifer Waller, PhDc Augusta, Georgia OBJECTIVES: The aim of this study was to compare the visual analyses of fetal heart rate tracings by observers according to recent National Institute of Child Health and Human Development interpretative guidelines both with each other and with those of a computerized fetal heart rate analysis and alerting system. STUDY DESIGN: One-hour sections of intrapartum fetal heart rate records were analyzed by a computerized monitoring system (Hewlett-Packard TraceVue; HP GmbH, Böblingen, Germany) and by 4 observers (a registered obstetric nurse, a certified nurse-midwife, an obstetrics resident physician, and a physician maternalfetal medicine faculty member) instructed to use the new National Institute of Child Health and Human Development guidelines. We compared specific alerts, baseline rates, frequencies of accelerations and decelerations, and signal quality assessments generated by the TraceVue system and the observers. Power analysis indicated that 50 tracings were required to detect interobserver and observer-computer agreement levels of 80% ± 10%. Statistical comparisons used κ coefficient, χ2 test, and analysis of variance with repeated measures as appropriate. RESULTS: Levels of agreement between observer pairs and the computer did not vary significantly across successive 10-minute intervals. Overall levels of interobserver agreement for baseline rate, tracing quality assessment, frequencies of accelerations and decelerations, and alerts ranged from 45% to 99% and were highest for baseline rate and signal loss and lowest for acceleration and deceleration counts. Interobserver agreement for alerts was relatively high (range, 72%-84%), with virtually no difference between any of the observers and the computer (range, 76.9%-79.2%; κ = 0.25). CONCLUSION: Use of the National Institute of Child Health and Human Development guidelines for visual fetal heart rate interpretation did not increase agreements on most fetal heart rate features beyond those expected by chance or noted in previous reports. These guidelines did appear to blunt some interpretive differences, possibly as a result of observer background. Although levels of agreement on fetal heart rate features differed, agreements on clinical alerts were similar among all observers and a computerized fetal heart rate monitoring system. Computer analysis of fetal heart rate tracings could eliminate interobserver variation that results from visual analysis and could produce more consistent clinical responses to normal and abnormal fetal heart rate patterns. (Am J Obstet Gynecol 2000;183:361-6.)

Key words: Computerized fetal heart rate analysis, intrapartum monitoring, visual assessment

For the past 3 decades electronic fetal heart rate (FHR) monitoring has been used for antepartum and intrapartum fetal surveillance. Although challenges to the From the Departments of Obstetrics and Gynecology,a Nursing,b and Biostatistics,c Medical College of Georgia. Supported by a grant from the Hewlett-Packard Company, Palo Alto, California. Presented at the Sixty-second Annual Meeting of The South Atlantic Association of Obstetricians and Gynecologists, St Petersburg, Florida, January 22-25, 2000. Reprint requests: Lawrence Devoe, MD, Department of Obstetrics and Gynecology, Medical College of Georgia, 1120 15th St, Augusta, GA 30912. Copyright © 2000 by Mosby, Inc. 0002-9378/2000 $12.00 + 0 6/6/107665 doi:10.1067/mob.2000.107665

ability of electronic FHR monitoring to improve fetal outcomes first appeared >20 years ago,1 the reliability of visual interpretation of FHR tracings remains a source of concern.2-10 Standards for FHR interpretation and clinical management have been published previously by authoritative organizations.11, 12 However, the perceived need to restudy FHR interpretation led the National Institute of Child Health and Human Development (NICHD) to convene a panel of clinical experts and charge them to develop standardized definitions for the visual interpretation of FHR tracings. These proceedings have been widely promulgated among both physicians and nurses.13 Recommendations for future research included the investigation of computerized FHR analysis. 361

362 Devoe et al

August 2000 Am J Obstet Gynecol

Table I. List of major conditions for which FHR alerts are provided by computer system Rule set type Basic alerts

Advanced intrapartum alerts

User information rules Signal loss Baseline level Bradycardia Tachycardia Pattern-based alerts Bradycardia Tachycardia with reduced variability Baseline undeterminable Variability and deceleration alerts Absent variability with decelerations Decreased variability with decelerations Absent variability Sinusoidal pattern Decreased variability Late decelerations Prolonged decelerations Severe variable decelerations Repetitive variable decelerations Tachycardia with decelerations

Several recent studies14-16 have shown that computerized FHR analysis systems could minimize diagnostic errors and interpretative variability associated with the visual assessment of FHR patterns. For the past 6 years we have collaborated with the Hewlett-Packard Company to develop the online computer system for FHR analysis and alerting (TraceVue System; HP GmbH, Böblingen, Germany) that is described in the Material and Methods section. The coincidental publication of the previously mentioned NICHD consensus criteria for FHR interpretation13 and the completed development and validation of the initial TraceVue System prompted this study. Our goals were to test the following hypotheses: (1) Use of the new FHR interpretative criteria would increase interobserver levels of agreement on the major intrapartum FHR features and on alerts indicating a possibly abnormal fetal condition; (2) for selected FHR features individual observers would exhibit high levels of agreement with a computerized system that used similar standard rules and criteria. We also expected that a standardized approach to FHR interpretation would enhance the reliability and reproducibility of clinical alerts used to influence obstetric management. Material and methods This study was conducted at the Medical College of Georgia from May 1 to July 1, 1998. Fifty consecutive tracings were acquired from patients who consented to enrollment after admission to the labor and delivery unit, according to a protocol approved by the human assurance committee on April 30, 1998. The inclusion requirements were as follows: (1) obstetric or medical indications for continuous electronic FHR monitoring according to The American College of Obstetricians and

Gynecologists’ criteria,11 (2) singleton gestation, (3) pregnancy ≥32 weeks’ gestation, (4) maternal age between 15 and 45 years, and (5) entry into the active phase of labor (cervical dilatation >3 cm). After consent was obtained, an electronic FHR monitoring system (Hewlett-Packard M1350A; HewlettPackard GmbH) was interfaced with a specially programmed computer (Hewlett-Packard Vectra; HewlettPackard Company, Palo Alto, Calif) with a Windows NT (Microsoft Corporation, Redmond, Wash) operating system. This system acquired digitized cardiotocographic signals, maintained an on-line display verifying the continuous recording and analysis of data, and stored the data to hard disk. After all studies were completed, the initial hour of each tracing was reprinted in both annotated (on-line indications of clinical alerts) and unannotated (no alerts) forms. Summary sheets of the entire monitoring session containing baseline rate, variability measures, and acceleration and deceleration counts were also obtained but concealed from clinical observers. System overview. The analytic software used a proprietary algorithm for determination of FHR baseline, computation of variability, and rules for detection and validation of episodic and periodic events and signal loss. Clinical alerting functions were triggered by a rule-based expert system. The expert system used reviews of pertinent world literature and numerous contact sessions with domain experts. Although criteria for detecting and classifying accelerations and decelerations were virtually the same as those of the recent NICHD criteria,13 the FHR baseline was defined more precisely with a moving 6-minute statistical routine. FHR variability was also determined by analysis of mathematical arrays across 5- to 30minute windows; its classification levels corresponded approximately but not exactly with variability templates used for visual analysis. The expert system rule base was validated with a database of 1400 patients and >3600 hours of FHR tracings. Rule sets were organized as modules— validation operations (identification of patterns or events), classification (event-specific), and user information (explanatory). The electronic FHR monitoring provided the raw signal to the first module (cardiotocographic source), then valid ranges of FHR signal were determined by the second module (FHR validity). Next the FHR baseline module calculated the baseline with the proprietary algorithm and sent the data to a detector module that recognized events when present. An interpreter module validated and classified the events found and routed these data to the alert module, which submitted them to the rule base. Clinical conditions that could trigger alerts to users are listed in Table I. Observer protocol. Four observers, an obstetrics resident physician (Yevgeny Kilman, MD), an experienced obstetric nurse (Debra Morton, RN), a certified nursemidwife (Kimberly Shea, RN, CNM), and a senior mater-

Devoe et al 363

Volume 183, Number 2 Am J Obstet Gynecol

Table II. Instructions for review of tracing 1. These tracings are minimally annotated and will be identified only by sequence number. 2. Each tracing will have about 60 minutes of continuous record for review. You are to review the tracing in sequence, a segment at a time, until all segments have been reviewed. Please note the approximate starting and finishing times as indicated on the recordings in the time column. 3. The responses in the boxes should be filled out after each individual segment has been reviewed. Do not go back to the beginning after viewing the entire tracing and revise your responses. You should view this tracing as you would see it generated by an FHR monitoring unit. 4. The key to filling out each box is geared to the National Institutes of Health consensus statement included in your packet. Please read this before you begin the review. Regardless of any preconceived ideas about FHR monitoring, keep these definitions and standards in mind when you look at the tracings. It might be useful to write a brief summary of the standard criteria on a separate sheet to keep with you. 5. Baseline rate is to be approximated (within 5 beats/min) for each segment (120, 125, 130, 135, etc). 6. Accelerations should be counted if their peak exceeds 15 beats/min. 7. Decelerations should be counted whenever they are apparent and >10 beats/min in depth. They should be categorized as early, variable, late, or prolonged. 8. Variability should be categorized according to the closest match to the published template. 9. Tracing quality can be rated as “OK” (I can interpret this tracing as is) or “not OK” (there is too much artifact or signal loss to assign a baseline or determine any of the other properties). 10. Assess alert level. Imagine that you are using a central alerting system to tell you about the patient’s status. You do not want to be bothered by inconsequential and relatively normal variants, such as isolated variable decelerations or a brief episode of tachycardia. Nevertheless, in your clinical judgment there are situations that should get your attention. A number of alert situations have been programmed into the computer. We want to see how these alerts compare with your own clinical assessment. If you think that no alert should be triggered, simply mark the box with an N for no; if you think that an alert should be triggered, simply mark the box with a Y for yes and indicate why.* *Reasons for alert: T, Tachycardia; B, bradycardia; S, poor signal quality; V, decreased or absent variability; VD, persistent variable decelerations; LD, persistent late decelerations; PD, prolonged deceleration; O, other, describe at bottom of page.

nal-fetal medicine faculty member (Steven Golde, MD) were given the standardized study sheets and an instruction protocol shown in Table II. A reprint of the NICHD workshop’s clinical opinion13 was appended to the protocol for reference purposes. Each observer received a scripted tutorial from Lawrence Devoe, MD, and was told only that the intrapartum FHR records were obtained from single gestations ≥32 weeks in duration. Study end points and comparisons Sample size. From previous personal and reported experience with studies involving observer-computer levels of agreement,14-16 we estimated that levels of agreement for visual interpretation of FHR baseline rate and alerting should be ≥80%. Subsequent power analysis determined that ≥50 tracings were needed to achieve this level of agreement with a tolerance of 10%. Outcomes examined. For the 6 interobserver pairings, levels of agreements for baseline FHR, acceleration and deceleration frequency, signal loss (tracing quality), and alert activations were compared in each of the 300 windows of 10 minutes. Because the computer routines and the visual templates for variability differed considerably, we elected to exclude those data from final comparisons. An overall comparison of the previously mentioned parameters for all of the tracings was also performed for each observer pair. The observer-computer comparisons were conducted somewhat differently, because the computergenerated alerts could be delayed for as long as 4 minutes after the condition prompting their activation began. This feature was intended to enable a reliable determination of the FHR baseline and to avoid premature alerting for trivial perturbations of the FHR signal. Consequently,

alert comparisons viewed both current and subsequent 10-minute FHR tracing windows to allow for computergenerated delays in alerting. Statistical comparisons used κ coefficient, χ2 test, and analysis of variance with repeated measures where appropriate, and significance was set at P < .05. Results The mean levels of agreement among observer pairs for FHR baseline, acceleration and deceleration frequencies, tracing quality, and alerts were compared with analysis of variance models across successive 10-minute intervals of the tracings. Each model contained the tracing effect and the 2-factor interaction between group and time to assess for differences across time within rater pairs. A Bonferroni adjustment to the α level for the number of comparisons made was used to assess for differences across time within rater pairs. After each model was run with the 2-factor interaction between group and time, no significant interactions were found; similarly, determination of the differences across time within rater pairs showed no significant differences for any agreement outcome. Consequently, for all remaining analyses we compared the overall levels of interobserver agreement and observercomputer agreement for all of the tracings. Table III shows overall levels of agreement among observers for baseline FHR, acceleration and deceleration frequencies, tracing quality, and alerts. Although the agreements for baseline FHR and signal quality were uniformly high among all observer pairs, there were relatively poor agreements for counts of accelerations and decelerations. There was an approximately 80% level of agreement

364 Devoe et al

August 2000 Am J Obstet Gynecol

Table III. Overall levels of interobserver agreement for visual analysis of FHR records Obstetrics resident Overall

Registered nurse

Certified nurse-midwife

Lower (%) Agree (%) Higher (%) Lower (%) Agree (%) Higher (%) Lower (%) Agree (%) Higher (%)

Registered nurse Baseline heart rate Acceleration Deceleration Tracing quality Alert Certified nurse-midwife Baseline heart rate Acceleration Deceleration Tracing quality Alert Maternal-fetal medicine faculty member Baseline heart rate Acceleration Deceleration Tracing quality Alert

0.7 30.5 52.0 1.7 22.2

99.0 49.3 44.6 95.3 71.7

0.3 20.1 3.4 3.0 6.1

1.7 28.6 34.0 1.6 17.9

98.0 55.7 56.9 95 76.7

0.3 15.7 9.1 3.5 5.3

1.4 24.8 14.4 2.6 6.9

98.3 51.3 43.1 95.1 83.8

0.3 23.9 42.5 2.3 9.8

1.0 8.5 18.8 2.5 9.1

98.7 61.8 66.5 94.9 82.1

0.3 29.8 14.7 2.5 8.8

1.7 8.8 5.5 2.0 3.3

97.3 47.2 43.6 96.7 78.0

1.0 44.0 50.8 1.3 18.7

0.3 8.0 9.5 3.7 3.4

98.7 48.0 60.0 93.8 81.0

1.0 44.0 30.5 2.5 15.6

Table IV. Overall levels of interobserver agreement for visual analysis of FHR records versus computer system Obstetrics resident

Registered nurse

Certified nurse-midwife

Maternal-fetal medicine faculty member

Computer system

Lower (%)

Agree (%)

Higher (%)

Lower (%)

Agree (%)

Higher (%)

Lower (%)

Agree (%)

Higher (%)

Lower (%)

Agree (%)

Higher (%)

Baseline heart rate Acceleration Deceleration Tracing quality Alert

7.1 11.3 37.3 5.6 18.7

88.1 58.6 51.1 94.4 76.9

4.8 30.1 11.6 0 4.4

11.9 10.7 43.1 6.4 10.7

84.4 49.5 35.8 93.6 79.2

3.7 39.8 22.1 0 10.1

10.3 10.7 28.7 7.3 12.5

83.5 50.5 49.2 92.7 77.1

6.1 38.8 22.1 0 10.3

9.6 19.8 36.2 5.7 18.1

85.6 62.3 49.1 94.3 77.2

4.8 17.9 14.8 0 4.7

for alerts among most observer pairs, close to that predicated in the determination of study sample size. The χ2 tests applied to the levels of agreement among observers showed the following: baseline FHR, χ2 = 3.26 (P = .66); acceleration frequency, χ2 = 19.6 (P = .001); deceleration frequency, χ2 = 62.5 (P = .001); signal quality, χ2 = 2.87 (P = .72); and alert frequency, χ2 = 15.9 (P = .007). Comparisons of the observers with the computerized system are shown in Table IV. At α = .05, registered nurse and certified nurse-midwife observers had significantly lower agreements for acceleration counts than did the obstetrics resident or maternal-fetal medicine faculty member. All observers had similar levels of disagreement for deceleration counts (lower when compared with the computer). Observers showed a lower level of agreement with the computer for baseline FHR. However, the computer reported actual calculated baseline values, whereas the observers approximated their readings to the nearest 5 beats/min, a lower level of precision. The overall κ statistic for alerting was 0.25 (range, 0.19-0.30), which suggests a modest level of agreement for alerting between observers and the computer system.

Comment The use of the new standardized guidelines for FHR interpretation failed to reduce interobserver differences for important features of intrapartum electronic FHR monitoring recordings. Previous studies have addressed the validity of visual interpretation of FHR tracings.2-10, 14-16 Although study designs, interpretative criteria, observer experience, and sample sizes have differed, these studies have shown that unaided visual analyses of FHR records appear to have limited reliability, reproducibility, or both. Various factors contribute to these findings, including the lack of standardized interpretative criteria8 and observer bias related to experience.9 More recent studies comparing human and computer analysis14-16 have shown that inconsistencies in visual interpretation could be virtually eliminated through the use of automated systems. Our principal findings are similar to those of Gagnon et al,14 who compared visual and computer analysis of antepartum records with an algorithmic rather than an expert system. Observers, whether given strict interpretative templates and instructions (as in this study) or a set of selected tracings with questionnaires (as in the study of

Devoe et al 365

Volume 183, Number 2 Am J Obstet Gynecol

Gagnon et al14), behaved similarly. Periodic FHR events were undercounted, with lower detection of decelerations. As a consequence, observers in the study of Gagnon et al14 classified more tracings as reassuring than did the computer system, whereas observers in our study invoked fewer clinical alerts for FHR abnormalities than did the expert system. Although there were significant differences among interobserver agreements for specific FHR patterns, there were no significant differences in the directions of interobserver agreements for alerts on the basis of FHR patterns. Our study differs from its other predecessors in several important respects. First, we chose to examine only interobserver variation rather than intraobserver variation. The consistency of intraobserver agreement has been successfully challenged by most prior surveys2-5, 10 irrespective of criteria for assessment of FHR patterns. Further, our rationale was to simulate practice conditions under which clinicians get a “single pass” at FHR tracings before making management decisions. Next, we limited the patient-specific medical and obstetric data available to clinicians. In at least one previous study, by Beaulieu et al,2 the addition of specific clinical data did not significantly affect interobserver reproducibility. Although clinical context is extremely important in achieving an overall understanding of and perspective on FHR interpretation,17 we believed that adding other layers of data might compromise our ability to evaluate our primary end points in visual assessment of objective FHR features. The obstetric community has eagerly awaited the development of standardized and commonly accepted criteria for visual interpretation of intrapartum FHR tracings. A study of this limited size should not be considered as a no-confidence vote for the ability of the new NICHD guidelines to address the reliability of visual inspection of FHR tracings. However, it should raise concerns that, regardless of the refinement of rules governing FHR interpretation, human observers perform in predictable ways that differ in direction and degree from those of computers programmed with similar background data and analogous interpretative criteria. It was interesting that, regardless of individual differences in visual analysis of FHR records, our observers tended to be most consistent in their triggering of clinical alerts when compared among themselves or with a computer system. This suggests that a principal value of visual assessment of electronic FHR monitoring data may be to provide cues that will prompt closer attention to patient care. The actual outcomes resulting from such attention will likely remain inconsistent without a fundamental change in how electronic FHR monitoring is viewed and used by the clinical community, as suggested by Cibils.18 There is a small but growing body of evidence, as summarized recently by Devoe,19 that computerized evalua-

tion of FHR records may obviate the interpretive problems attributable to observer inconsistency. The groundbreaking work of Keith et al20 in launching a clinical trial of an expert system for intrapartum assessment shows a possible direction for electronic FHR monitoring in the new millennium. If subsequent similar trials of such advanced monitoring systems prove valid, then the transition of electronic FHR monitoring from a visual to a computer-assisted modality becomes a question not of why but of when. REFERENCES

1. Banta HD, Thacker SB. Assessing the costs and benefits of electronic fetal monitoring. Obstet Gynecol Surv 1979;34:627-40. 2. Beaulieu MD, Fabia J, Leduc B, Brisson J, Bastide A, Blouin D, et al. The reproducibility of intrapartum cardiotocogram assessments. Can Med Assoc J 1982;127:214-6. 3. Nielsen PV, Stigsby B, Nickelsen C, Nim J. Intra- and interobserver variability in the assessment of intrapartum cardiotocograms. Acta Obstet Gynecol Scand 1987;66:421-4. 4. Borgatta L, Shrout PE, Divon MY. Reliability and reproducibility of nonstress test readings. Am J Obstet Gynecol 1988;159:554-8. 5. Lotgering FK, Wallenburg HC, Schouten HJ. Interobserver and intraobserver variation in the assessment of antepartum cardiotocograms. Am J Obstet Gynecol 1982;144:701-5. 6. Peck T. Physician’s subjectivity in evaluating oxytocin challenge tests. Obstet Gynecol 1980;56:13-6. 7. Hage ML. Interpretation of nonstress test. Am J Obstet Gynecol 1985;153:490-5. 8. Donker DK, van Geijn HP, Hasman A. Interobserver variation in the assessment of fetal heart rate recordings. Eur J Obstet Gynecol Reprod Biol 1993;52:21-8. 9. Helfand M, Morton K, Ueland K. Factors involved in the interpretation of fetal monitor tracings. Am J Obstet Gynecol 1985;151:737-44. 10. Trimbos JB, Keirse MN. Observer variability in assessment of antepartum cardiotocograms. Br J Obstet Gynaecol 1978;85:900-6. 11. American College of Obstetricians and Gynecologists. Fetal heart rate patterns: monitoring, interpretation and management. Washington: The College; 1995. Technical Bulletin No.: 207. 12. Society of Obstetricians and Gynaecologists of Canada. Fetal health surveillance in labor. J Soc Obstet Gynaecol Can 1995;17: 865-901. 13. Electronic fetal heart rate monitoring: research guidelines for interpretation. National Institute of Child Health and Human Development Research Planning Workshop. Am J Obstet Gynecol 1997;177:1385-90. 14. Gagnon R, Campbell MK, Hunse C. A comparison of visual and computer analysis of antepartum fetal heart rate tracings. Am J Obstet Gynecol 1993;168:842-7. 15. Hiett AK, Devoe LD, Youssef A, Gardner P, Black M. A comparison of visual and automated methods of analyzing fetal heart rate tests. Am J Obstet Gynecol 1993;168:1517-21. 16. Schneider E, Schulman H, Farmakides G, Paksima S. Comparison of the interpretation of antepartum fetal heart rate tracings between a computer program and experts. J Matern Fetal Invest 1991;1:205-8. 17. Alonso-Betanzos A, Moret-Bonillo V, Devoe LD, et al. Computerized antenatal assessment. Automedica 1992;14:3-22. 18. Cibils L. On intrapartum fetal monitoring. Am J Obstet Gynecol 1996;174:1382-9. 19. Devoe LD. Computerized approaches to fetal surveillance. In: Maulik D, editor. Asphyxia and fetal brain damage. New York: Wiley-Liss; 1998. p. 241-54. 20. Keith RD, Beckley S, Garibaldi JM, Westgate JA, Ifeachor EC, Greene KR. A multicenter comparative study of 17 experts and an intelligent computer system for managing labor using the cardiotocogram. Br J Obstet Gynaecol 1995;102:688-700.

366 Devoe et al

Discussion DR DAVID SHAVER, Charlotte, North Carolina. Monitoring of FHR during labor has been a standard part of management of the obstetric patient for >100 years. The primary goal of monitoring is to detect fetal hypoxia and thus prevent complications of asphyxia, namely, perinatal death and long-term neurologic morbidity (primarily cerebral palsy). The introduction of electronic FHR monitoring in the 1960s and 1970s was met with widespread enthusiasm by the obstetric community. Retrospective reports suggested a decrease in the incidence of both intrapartum stillbirths and neonatal deaths among monitored pregnancies. Predictions of a subsequent decrease in cerebral palsy appeared justified. Despite the early optimism the publication of numerous randomized controlled trials during the next 2 decades has cast doubt on the benefits of electronic FHR monitoring. A meta-analysis of 12 randomized clinical trials by Thacker et al1 suggested that the only clinical benefit of electronic FHR monitoring was a reduction in the incidence of neonatal seizures, with no difference in perinatal mortality rate. This limited benefit came at the expense of a significant increase in operative deliveries. As a result of the published data The American College of Obstetricians and Gynecologists has stated that electronic FHR monitoring and intermittent auscultation of the fetal heart during labor are equivalent. Nevertheless, strong proponents of the benefit of electronic FHR monitoring persist. The lack of consistent definitions and recommendations for intervention for fetal jeopardy makes interpretation of the various studies difficult. In addition, wide intraobserver variability and interobserver variability of FHR monitor interpretation have been repeatedly demonstrated. It has been suggested that it is the human factor of poor interpretation and clinical decision making that accounts for the mediocre results.2 Improvements in our ability to interpret the monitoring results appear to be required, and the development of expert computer systems such as those described by Devoe et al may be one answer. This study addressed the ability of various obstetric providers to interpret fetal heart rate records compared with a computerized monitoring system according to the new NICHD guidelines. The results are disappointing in that the new guidelines do not appear to improve the ability of the observers to interpret FHR monitoring tracings. I have the following questions for Dr Devoe: 1. Although only 1 hour of tracing was reviewed for each patient, were there any clinically significant episodes in which the observers and the computer differed in interpretation and management of findings? 2. Why was the initial hour of monitoring chosen for review? Presumably the last hour before delivery would have had the highest likelihood of containing significant abnormalities and would seem to be most informative. 3. Finally, because many of the clinical alerts, such as decreased variability and repetitive variable decelera-

August 2000 Am J Obstet Gynecol

tion, occur with great frequency during normal labor, is there any concern about “information overload” and an increase in unnecessary intervention? REFERENCES

1. Thacker SB, Stroup DF, Peterson HB. Efficacy and safety of intrapartum electronic fetal monitoring: an update. Obstet Gynecol 1995;86:613-20. 2. Cibils L. On intrapartum fetal monitoring. Am J Obstet Gynecol 1996;174:1382-9.

DR GENE BURKETT, Miami, Florida. The problem with electronic FHR monitor tracings is more the pattern than the individual changes that occur—whether there are accelerations or decelerations. Consequently, interpretation may be affected by this. Does your computer take into account these patterns so that we are better able to gauge whether they represent stress or distress? DR DEVOE (Closing). I appreciate Dr Shaver’s helpful discussion of my paper and am happy to address the 3 questions that he has posed. His first question was whether there were clinically significant episodes during the hour of monitoring that was presented to the observers. The answer is yes. Not all these tracings were perfectly normal, but what is clinically significant in terms of FHR pattern perturbation and in terms of clinical outcome are two different issues. Although I did not present the obstetric outcome data for these pregnancies, all patients had live-born infants, none of whom required neonatal intensive care unit admission. With respect to the choice of the initial hour of the FHR recording, we tried to create a best-case scenario. That is, if an observer who used the standardized criteria had difficulties with a clinical setting in which things were least likely to be complex or confusing, then we knew that we would have problems when data became more complex later in labor. The final question dealt with the nature of the clinical alerts and the information provided. There were disagreements between the computer and the observers. As I tried to indicate, in the main these were disagreements that led to more alerts being invoked by the computer than by the observers. One might therefore theoretically be concerned about use of the computer leading to information overload. However, I submit that the natural human tendency in most of the studies that I have reviewed is for human observers to tend to classify more tracings as normal than abnormal and to miss pathologic states. This has also been validated by the initial 1995 report by Keith et al1 in the United Kingdom with their automated system. In response to Dr Burkett’s question, the computer system synthesizes interpretative patterns and performs time trend analyses, rather than strictly focusing on isolated events out of context. REFERENCE

1. Keith RD, Beckley S, Garibaldi JM, Westgate JA, Ifeachor EC, Greene KR. A multicenter comparative study of 17 experts and an intelligent computer system for managing labor using the cardiotocogram. Br J Obstet Gynaecol 1995;102:688-700.