Reliability of a Perinatal Outcomes Measure: The Optimality Index–US

Reliability of a Perinatal Outcomes Measure: The Optimality Index–US

Reliability of a Perinatal Outcomes Measure: The Optimality Index–US Julia S. Seng, CNM, PhD, Emeline Mugisha, and Janis M. Miller, APRN, PhD The Opti...

81KB Sizes 3 Downloads 42 Views

Reliability of a Perinatal Outcomes Measure: The Optimality Index–US Julia S. Seng, CNM, PhD, Emeline Mugisha, and Janis M. Miller, APRN, PhD The Optimality Index–US, a recently developed perinatal clinimetric index, has been validated with both clinical and research databases. Documentation of the reliability of the instrument for medical record abstraction is needed. This paper reports outcomes of interrater reliability assessments conducted for two projects. Abstraction was supervised by the same investigator, but staffed by different coders who had a variety of qualifications (perinatal nurse, nurse-midwife, clinical trial professional, student research assistants). Medical records were entirely paper at one site and partially electronic at another. Reliability (reproducibility) was assessed via percent agreement between pairs of coders on charts randomly selected for audits. Mean percentage agreement was 92.7% in both projects with a range from 89.1% to 97.8% in the first project, and a range from 88.5% to 96.2% in the second project. The sources of error differed between clinician and lay abstractors, but the number of errors did not differ. The average time per chart was assessed in the first project. Once proficiency was achieved, the average time needed to complete coding was 24 minutes, with some additional time needed for ordering paper charts. These analyses indicate that excellent reproducibility can be achieved with the Optimality Index–US. J Midwifery Womens Health 2008;53: 110 –114 © 2008 by the American College of Nurse-Midwives. keywords: chart abstraction, clinimetric index, interrater reliability, optimality, perinatal outcomes

EDITOR’S NOTE This article presents the outcomes of a research project that generated evidence in support of the reliability of the Optimality Index-US. An article reporting research that produced evidence in support of the validity of the Optimality Index-US will appear in the July/August 2008 issue (Volume 53, Number 4) of this Journal. These two articles augment the already published evidence about the clinimetric and psychometric properties of this maternity care measurement tool, which can be found on the ACNM Division of Research website.

INTRODUCTION Perinatal research is frequently hampered by the lack of a global, interval-level index to represent outcomes of maternity care. Birth weight, gestational age, and infant mortality are reliable and valid outcome indicators, but often are not adequately sensitive to differences in processes and outcomes among low-risk women. Investigators can create indices specific to their research questions, but these can be subject to bias and do not permit comparison across studies. Two recently developed instruments based on World Health Organization (WHO) standards, one with 78 items and one with 5 items, are available to study the processes and outcomes of labor. However, these do not include antepartum factors that may affect labor processes and outcomes, nor do they include neonatal or postpartum information.1,2 A reliable, valid index encompassing the whole of maternity care processes and outcomes that could be selected a priori is needed. The Optimality Index–US (OI-US) is an evidence-

Address correspondence to Julia S. Seng, CNM, PhD, University of Michigan Institute for Research on Women and Gender, G120 Lane Hall, Ann Arbor, MI 48109-1290. E-mail: [email protected]

110 © 2008 by the American College of Nurse-Midwives Issued by Elsevier Inc.

based, clinimetric index3 designed to include heterogeneous information to reflect overall clinical status.4 It was adapted from a European measure originating in the 1960s and validated in two studies of midwifery care in the Netherlands.5–9 The US version was modified from the original Dutch version to reflect the US context for childbearing and to include evidence-based items, supporting their significance in relation to outcomes.4 The OI-US is designed to yield a summary score reflective of processes and outcomes of maternity care, taking preexisting risk into account. It is based on the premise that “optimal” maternity care obtains the best outcomes with the least intervention required.6 Details on the development of the measure have been published, and links to these papers, the coding documents, guidelines, and bibliography for the evidence base of the items are available via the American College of Nurse-Midwives, Division of Research, Optimality Index Work Group Web site. The OI-US, current as of July 2007 (Optimality Working Group, personal correspondence, 2007), is a 2-part instrument comprised of 54 total items. The optimality index score, which is the outcome index, has 40 items distributed over four clinical domains: antepartum, intrapartum, neonatal, and postpartum. There is also a 14-item perinatal background index (PBI) comprised of demographic, social history, and obstetric history items that can be used with the outcome index to control for pre-existing risk. The outcome index items include conditions (e.g., preeclampsia) and interventions (e.g., fetal surveillance, medication, electronic fetal monitoring, induction, and epidural anesthesia) that incrementally deduct from the total score in a self-weighting manner. Also included are elements of labor management (e.g., auscultation of fetal heart tones, use of non-supine Volume 53, No. 2, March/April 2008 1526-9523/08/$34.00 • doi:10.1016/j.jmwh.2007.09.006

position in labor, and skin-to-skin contact) that contribute to optimality.7 Aspects of validity of the OI-US have been assessed in several studies. The validation of individual items was done using archived data on 1286 cases from a research study of planned home birth.10 A pilot study where overall (outcome index and PBI) OI-US scores were strongly negatively correlated with post-traumatic stress disorder symptom level (n ⫽ 22; Pearson’s r ⫽ ⫺.725; P ⬍ .001) suggested predictive validity for use in outcomes research.11 Another validation study was conducted using the archived clinical dataset (n ⫽ 3425) of a midwifery group practice comprised primarily of women at low and moderate risk for obstetrical complications.12 This analysis showed convergent validity in that scores were equivalent when comparing the two randomly selected halves of the sample. It also demonstrated discriminant validity, with a large effect size (Cohen’s d ⫽ ⫺1.4), by distinguishing scores of women who remained in midwifery care throughout their course (mean ⫽ 84%, standard deviation [SD] ⫽ 8%), from those whose condition changed to require obstetrician consultation, co-management, or transfer of care (mean ⫽ 71%, SD ⫽ 10%). An outcome study examining outcome index scores in a sample of moderate- and high-risk midwifery and physician clients (n ⫽ 375) extends validation information across the low- to high-risk continuum.13 Thus, the OI-US instrument has support for item validity and evidence of convergent, discriminant, and predictive validity. Importantly, it demonstrates sensitivity to differences in low-risk, as well as high-risk samples, and low- as well as high-intervention models of care. These qualities make the OI-US a very suitable outcome measure for research on processes and outcomes of maternity care, regardless of the risk profile of the women, the level of intervention in the setting, or the type of maternity care provider. The sensitivity to differences, even within low-risk samples, suggests that it is ideally suited to outcomes research studies of midwifery care. To date, however, the validation studies of both the European and US versions relied on data abstracted mostly from standard reporting summary forms routinely completed by midwives or archived electronic data of mostly low-risk samples of women. Reliability of the

Julia S. Seng, CNM, PhD, is a research associate professor at the University of Michigan Institute for Research on Women and Gender and School of Nursing and research assistant professor in the School of Medicine Department of Obstetrics and Gynecology. Emeline Mugisha is completing a self-designed Community Health Sciences major in the College of Literature, Science and the Arts at the University of Michigan. She is interested in global and women’s health. Janis M. Miller, APRN, PhD, is a women’s health nurse practitioner and associate research scientist and assistant professor at the University of Michigan School of Nursing and research assistant professor in the School of Medicine, Department of Obstetrics and Gynecology.

Journal of Midwifery & Women’s Health • www.jmwh.org

measure for data collected by full chart abstraction has not been previously assessed. Abstraction can be an error-prone process and a costly one. Therefore, for both scientific and pragmatic reasons, we tested the data abstraction process for efficiency and reliability. The most important reliability characteristic for a clinimetric index is reproducibility—wherein abstractors locate the same information in a medical record and assign it the same numerical code. The purpose of this paper is to report on interrater assessments of reliability of chart abstraction using the OI-US data collection form with its accompanying coding guidelines from two projects where (1) the full range of low- to high-risk care is occurring, (2) chart data are abstracted by research assistants, and (3) electronic and paper records both contribute necessary data. A secondary purpose is to report reasons for error and time estimates for abstraction. METHODS Based on recommendations for reliable abstraction of medical record data,14 we used an iterative, interactive process during start-up of both projects. Training included orientation to the optimality concept and item-byitem discussion of the definition and significance of each item in the index, as well as orientation to the paper and electronic components of the charts. On items where technical knowledge would be required, supporting documentation was included as a supplementary note sheet. A group process was used to edit definitions and augment the coding tools until coding, including application of rules and judgments made in unusual cases, was generally consistent. Project 1 (principal investigator Sampselle, R01 NR04007) was a study of pelvic floor outcomes in relation to spontaneous versus directed bearing down efforts. Ten charts were selected early in the project to verify reliability. The random digit assignment resulted in the selection of charts representing eight vaginal births and two cesarean births. The original coding was performed by a graduate student research assistant, and the second coding was done by a nurse-midwife (J.S.S.). The research assistant was a social worker with no clinical education or experience related to maternity care. The computerized obstetric prenatal record was reviewed by scrolling through screens. The labor, neonatal, and postpartum documentation was on paper chart forms. Project 1 charts were drawn from a single tertiary-level hospital and included a range of high- to low-risk women. Three-quarters were clients of physicians, and onequarter were clients of certified nurse-midwives. The rate of cesarean birth was 25%. Project 2 (principal investigator Seng, R01 NR008767) is an ongoing study of effects of post-traumatic stress on perinatal outcomes. Eleven charts were selected by random 111

digit assignment as part of a periodic 5% audit, including eight vaginal and three cesarean births. The original coders were a labor and delivery nurse and a certified clinical research professional, both of whom routinely abstract obstetric charts for other studies. The second coding was performed by an undergraduate research assistant (E.M.) for the audit. Project 2 charts came from two hospitals. At the first, records are maintained entirely on paper and are coded by the nurse abstractor. At the second, which is the same site as that in project 1, the records are in mixed formats and are coded by the certified clinical research professional abstractor. Approximately half the participants in project 2 come from each site. This study’s sample also included lowand high-risk women, cared for by both nurse-midwives (30%) and physicians (67.8%; 2.2% missing data). The rate of cesarean section in the project 2 sample was 36%. Because 6 of the 54 items pertained only to multiparous women (regarding obstetric history) and both of our studies included only nulliparous women, we eliminated those 6 items from the chart abstraction process. Three additional items were eliminated before abstraction because the information is not reliably charted at these institutions (skin-to-skin contact, non-supine position, and non-directed pushing). Therefore, 45 items could be consistently abstracted from each of the two projects. Items were coded using the potential values of yes, no, missing data, or not applicable. Missing data referred to instances where there was randomly missing data (e.g., someone failed to record information in the chart). Not applicable is coded when the data are logically (systematically) missing, as would occur for some items relevant for a vaginal birth if delivery was by cesarean. For example, all second-stage items relevant to a vaginal birth (e.g., episiotomy) are coded as logically missing if the cesarean birth occurred in first stage labor. The number of items that have missing data or are nonapplicable items is subtracted from that case’s denominator. We considered there to be lack of agreement if the code for each item was not identical. The interrater reliability was assessed by calculating percent agreement. The types and rates of errors were studied and the types of errors made by lay assistants and clinical professionals were compared. In the first study, the graduate student abstractor logged her start and end times so that we could calculate the average amount of time needed per record reviewed. Both of these parent project studies were conducted with full informed consent of participants, under approval of the university’s institutional review board. RESULTS Interrater Assessment In project 1, a total of 28 disagreements occurred across the set of 10 audited records which included two firststage cesarean deliveries. The mean percentage agree112

ment was 92.7% and the range covered 8.7 percentage points, from a low of 89.1% to a high of 97.8%. In project 2, a total of 36 disagreements occurred across the 11 audited records, three of which were cesarean births. The mean percent agreement again was 92.7%, with a range of 7.7 percentage points, from 88.5% to 96.2%. This 88.5% agreement chart was an outlier in the second project, with the low agreements mostly caused by differing coding judgments about a very complicated twin case. If this case is removed from the analysis, the mean percent agreement from project 2 is 93.1%, with a tighter range of 5.8 percentage points, from 90.4% to 96.2%. Error Analysis Although clinician and lay abstractor rates of errors were similar, the reasons for errors differed. Clinicians tended to rely more heavily on the narrative notes and applied clinical knowledge. For example, the clinician captured an instance of fetal heart tone abnormality from a numerical description in a provider progress note (“baseline 190s”) that the lay research assistant missed because she had been trained to look for the word “tachycardia.” Conversely, the nurse-midwife clinician missed social history information listed on a separate screen of the electronic prenatal record when she failed to navigate to that part of the prenatal record because the provider’s labor admission note stated “negative social history.” The greatest proportion of errors (approximately half) occurred in the labor and delivery items, largely because of difficulty in reading the narrative note handwriting. Time Estimate and Staffing Choice Data to estimate the time needed per chart were logged by the research assistant throughout project 1. The average was 41 minutes in the early part of implementation. The average time per chart toward the end of that 200-chart project was 24 minutes. There was additional time needed to order paper charts from the medical records department and to access the electronic files when the system was working slowly, making 30 minutes per record a reasonable estimate of time required over the life of a large project. DISCUSSION Results of these two small reliability assessments of the OI-US for chart abstraction demonstrate satisfactory agreement. There is no standard for acceptable reliability of medical record data coding. While one set of authors considers 95% to be an “extraordinarily stringent benchmark,”14 another group recommends 79.5% agreement as a minimum standard.15 Thus, the interrater agreement rate of 93% achieved in both of these projects would be considered excellent. This rate was obtained for both Volume 53, No. 2, March/April 2008

paper-only and mixed electronic and paper records. Rates of error were similar when abstraction was done by a trained, lay student research assistant and by experienced clinical professionals. This 93% reliability level compares favorably with a more ideal process, used in the outcome research study cited above.13 In that study, the OI-US items were included on data collection forms routinely completed at discharge by clinicians or medical or midwifery students involved with the client. Only missing data needed to be retrieved by the researchers via medical record abstraction. Interrater reliability assessment done at the start of that study compared the clinician-completed forms and researcher-completed record review on a subset of 10 cases demonstrated 95% agreement. Because the sources of error and best abstraction process are likely to be idiosyncratic to each institution’s medical record system, a reliability assessment unique to each study may be necessary. Our process demonstrated successful reliability in studies crossing two hospital systems, involving a mix of electronic and paper charts, and employing abstractors with a variety of backgrounds. In the future, there are potential modifications that could further improve reliability of the OI-US for chart abstraction, which other investigators may wish to consider. First, new electronic medical record software programs that permit queries could be programmed to report a large proportion of the data used in the OI-US, therefore avoiding a proportion of human error. Second, collection and entry of raw data that retains the original units or values recorded in the chart may further reduce error. The raw data can later be converted to “0” (not-optimal) and “1” (optimal) using software programming to recode. For example, a diagnosis of fetal tachycardia, bradycardia, or late decelerations would all fall under the single OI-US variable (coded value) of fetal heart rate abnormality (0 ⫽ not optimal). The original discrete diagnosis is lost in the data abstraction. Similarly, there is loss of interval-level data for particular variables, such as Apgar score ⱖ8 (1 ⫽ optimal), if the more precise 0 to 10 Apgar score itself is not abstracted. The effect on reliability, efficiency, and cost of a 2-step rather than 1-step process of abstracting and coding data into the OI-US format should be evaluated within the context of the broader goals of the research study. Third, some error could be reduced by not orienting the abstractors to the optimality concept. We found improvement in reliability and speed when the notion of judging what is “optimal” was set aside during data collection and replaced with a simple coding system indicating “yes” or “no” for item occurrence. This is especially likely to be the case if clinicians do the reviews, because overinterpretation can lead to errors.14 Finally, based on our pattern of errors, assigning relatively easier but more tedious coding of the prenatal, neonatal, and postpartum items to lay research assistants, and assigning the more Journal of Midwifery & Women’s Health • www.jmwh.org

challenging coding of intrapartum abstraction to clinicians, might marginally increase reliability, but likely would decrease efficiency. All of these suggestions assume that data are not routinely summarized. Of course, including OI-US items in summary reports that clinicians complete at the time of delivery and update before discharge would likely enhance efficiency and could further increase reliability, assuming adequate training in coding for the clinicians themselves. Although we used a relatively small number of charts (a 5% audit, 21 charts total) and a small number of coders (five), results of this study document that high interrater reliability rates can be achieved in chart abstraction for the OI-US. We thank Holly C. Baier, RN, BSN, Diane Dengate, BSN, RNC, and Suzanne Daley, CCRP, for their work on this project, as well as Abigail G. Hanson, MSW, for her substantial contributions to the training materials for the chart abstraction process, Carolyn Sampselle, RNC, PhD, for conceptual and editorial contributions, Yoram Sorokin, MD, and Cathy Collins Fullea, CNM, for essential support for the STACY Project, as well as Judith Fullerton, CNM, PhD, and Patricia A. Murphy, CNM, DrPH, for technical review. Funded as part of the National Institutes of Health grant R01 NR04007, principal investigator Carolyn Sampselle and R01 NR008767, principal investigator Julia Seng. Ms. Mugisha’s work was supported by the University of Michigan Undergraduate Research Opportunity Program (UROP).

REFERENCES 1. Sandin Bojo AK, Hall-Lord ML, Axelsson O, Uden G, Wilde Larsson B. Midwifery care: Development of an instrument to measure quality based on the World Health Organization’s classification of care in normal birth. J Clin Nurs 2004;13:75– 83. 2. Chalmers B, Porter R. Assessing effective care in normal labor: The Bologna score. Birth 2001;28:79 – 83. 3. Wright J, Feinstein A. A comparable contrast of clinimetric and psychometric methods for constructing indexes and rating scales. J Clin Epidemiol 1992;45:1201–18. 4. Murphy PA, Fullerton JT. Measuring outcomes of midwifery care: Development of an instrument to assess optimality. J Midwifery Womens Health 2001;46:274 – 84. 5. Prechtl HF. Neurological findings in newborn infants after pre- and perinatal complications. In: Jonxis JHP, Visser HKA, Troelstra JA, editors. Aspects of prematurity and dysmaturity, Proceedings of Nutricia Symposium. Leiden: Stenfert Kroese, 1968:305–21. 6. Prechtl HF. The optimality concept. Early Hum Dev 1980; 4:201–5. 7. Low LK, Miller J. A clinical evaluation of evidence based maternity care using the optimality index. J Obstet Gynecol Neonatal Nurs 2006;35:786 –93. 8. Wiegers RA, Keirse M, Berghs G, van der Zee J. An approach to measuring quality of midwifery care. J Clin Epidemiol 1996;49:319 –25. 9. Wiegers RA, Keirse M, van der Zee J, Berghs G. Outcome of planned home and planned hospital births in low risk

113

pregnancies: Prospective study in midwifery practices in the Netherlands. BMJ 1996;313:1309 –13.

dex-US in perinatal outcomes research: A validation study. J Midwifery Womens Health 2008;53(4): in press.

10. Murphy PA, Fullerton J. Outcomes of intended home births in nurse-midwifery practice: A prospective descriptive study. Obstet Gynecol 1998;92:461–70.

13. Cragin L, Kennedy HP. Linking obstetric and midwifery practice with optimal outcomes J Obstet Gynecol Neonatal Nurs, 2006;35:779 – 85.

11. Seng JS, Low LK, Ben Ami D, Liberzon I. Cortisol level and perinatal outcome in pregnant women with posttraumatic stress disorder: A pilot study. J Midwifery Womens Health 2005;50:392– 8. 12. Low LK, Seng JS, Miller JM. The use of the optimality in-

14. Eder C, Fullerton J, Benroth R, Lindsay SP. Pragmatic strategies that enhance the reliability of data abstracted from medical records. Appl Nurs Res 2005;18:50 – 4. 15. Waltz C, Strickland O, Lenz E. Measurement in nursing research. Philadelphia: F.A. Davis, 1991.

Manuscripts Submitted to JMWH Now Online The Journal of Midwifery & Women’s Health is pleased to announce that all manuscripts are now submitted online. Elsevier’s Editorial System (EES) is fast, efficient, and offers step-by-step guidance through the submission process. To submit a manuscript online please visit http://ees.elsevier.com/jmwh/. First-time users will need to complete a quick registration to receive a username and password. Authors: *You can view your submission history and monitor the progress of your manuscript as it goes through the peer review process. *You can submit your manuscript or revisions wherever you have Internet access. *Your uploaded files are automatically converted into one PDF file so you do not have to format one manuscript that has text, tables and references all together. *You will have a personalized author center that gives you access to review previous manuscript decisions, make changes to current submissions, and contact JMWH editorial staff. *EES speeds up the peer review process so you can expect to get a reply from the editorial office in 8-10 weeks. Peer reviewers: *You will have a record of manuscripts reviewed for convenient CEU tracking. Elsevier’s online submission and editorial system provides authors and reviewers with step-by-step instructions, a help menu, invaluable tutorials and an excellent helpdesk to deal with any queries that arise when using the system. Just contact [email protected].

114

Volume 53, No. 2, March/April 2008