Designing an Abstraction Instrument: Lessons from Efforts to Validate the AHRQ Patient Safety Indicators

Designing an Abstraction Instrument: Lessons from Efforts to Validate the AHRQ Patient Safety Indicators

The Joint Commission Journal on Quality and Patient Safety Research Methods Designing an Abstraction Instrument: Lessons from Efforts to Validate the...

100KB Sizes 2 Downloads 41 Views

The Joint Commission Journal on Quality and Patient Safety Research Methods

Designing an Abstraction Instrument: Lessons from Efforts to Validate the AHRQ Patient Safety Indicators Garth H. Utter, M.D., M.Sc.; Ann M. Borzecki, M.D., M.P.H.; Amy K. Rosen, Ph.D.; Patricia A. Zrelak, Ph.D., R.N.; Banafsheh Sadeghi, M.D., Ph.D.; Ruth Baron, R.N., B.Sc.N.; Joanne Cuny, R.N., M.B.A.; Haytham M. A. Kaafarani, M.D., M.P.H.; Jeffrey J. Geppert, J.D.; Patrick S. Romano, M.D., M.P.H.

T

he medical record serves many purposes: patient care, payment for care, legal documentation, medical research, and, with increasing prominence, efforts to ensure or improve the quality and safety of care. Although some quality measurement and improvement programs, such as The Joint Commission’s core measures1 and the American College of Surgeons’ National Surgical Quality Improvement Project,2 involve case-by-case scrutiny of the medical record, others, such as the U.S. Agency for Healthcare Research and Quality (AHRQ) Quality Indicators (QIs), instead leverage administrative data with International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) diagnosis and procedure codes abstracted from the record by professional coders. The latter approach is attractive both because it is cost-efficient and because the data are collected by almost all acute care hospitals. These advantages make administrative data, and thereby indicators such as the AHRQ QIs, particularly suitable for use in state or national health care quality improvement policies, such as pay-for-performance and public reporting of hospital outcomes. Nonetheless, important disadvantages must also be acknowledged: Coding may be inaccurate, either because the documentation on which it is based is incomplete or misleading, or because coders do not select the most precise, meaningful codes. Furthermore, the logic of such indicators is constrained by the lack of detailed clinical data. The Patient Safety Indicators (PSIs) are a subset of AHRQ QIs that focus on potentially preventable complications for patients treated in hospitals.3 Of the 16 provider-level morbidity indicators in this current subset, 8 have been endorsed (some with conditions) by the National Quality Forum, and 3 were adopted for public reporting by the Centers for Medicare & Medicaid Services (CMS).4 It is likely that others will also be endorsed and/or adopted by quality evaluators as the United States grapples with continuously increasing health care expenditures and variation in provider practices that affect resource 20

January 2011

Article-at-a-Glance Background: The U.S. Agency for Healthcare Research

and Quality (AHRQ) and other organizations have developed quality indicators based on hospital administrative data. Characteristics of effective abstraction instruments were identified for determining both the positive predictive value (PPV) of Patient Safety Indicators (PSIs) and the extent to which hospitals and clinicians could have prevented adverse events. Methods: Through an iterative process involving nurse abstractors, physicians, and nurses with quality improvement experience, and health services researchers, 25 abstraction instruments were designed for 12 AHRQ provider-level morbidity PSIs. Data were analyzed from 13 of these instruments, and data are being collected using several more. Findings: Common problems in designing the instruments included avoiding uninformative questions and premature termination of the abstraction process, anticipating misinterpretation of questions, allowing an appropriate range of response options; using clear terminology, optimizing the flow of the abstraction process, balancing the utility of data against abstractor burden, and recognizing the needs of end users, such as hospitals and quality improvement professionals and researchers, for the abstracted information. Conclusions: Designing medical record abstraction instruments for quality improvement research involves several potential pitfalls. Understanding how we addressed these challenges might help both investigators and users of outcome indicators to appreciate the strengths and limitations of outcome-based quality indicators and tools designed to validate or investigate such indicators within provider organizations.

Volume 37 Number 1

Copyright 2011 © The Joint Commission

The Joint Commission Journal on Quality and Patient Safety use. The AHRQ PSI Validation Pilot Project is currently evaluating the validity of 10 PSIs (Table 1, page 22), with initial attention being paid to their positive predictive value (PPV), that is, the proportion of cases flagged by the indicator that represent the adverse outcome of interest. Other projects associated with the U.S. Department of Veterans Affairs (VA) and the nonprofit University HealthSystem Consortium (UHC) are also assessing the validity of several AHRQ PSIs, bringing the total number of PSIs undergoing validation among the three organizations to 12. As contributors to these efforts, the authors have accrued experience5–10 in developing medical record abstraction instruments to help train users to navigate the record and collect the information necessary to determine whether the flagged case represents an iatrogenic event and, if so, a deficiency in quality. To date, we have analyzed data from 13 of these instruments and are currently collecting or in the process of analyzing data using several more. In this article, we report our experience in developing abstraction instruments for the AHRQ PSIs, which may be useful to other investigators and organizations who wish to develop similar abstraction instruments based on these PSIs, future AHRQ QIs, or other indicators based on administrative data. Although development of an abstraction instrument to identify whether a complication occurred seems simple, the AHRQ, UHC, and VA projects identified several recurrent pitfalls and unanticipated challenges. We describe what we have learned about creating an optimal set of questions for a skilled abstractor to answer to allow accurate validation of administrative data-based quality measures, building on previous work related to conducting chart review studies.11–13 We also wish to inform future users of the PSIs and other similar measures based on administrative data of the challenges inherent in examining flagged cases. Regardless of whether the PSIs are used for such potentially contentious purposes as public reporting and pay-for-performance, organizations with high complication rates may want to use these or similar abstraction instruments to learn more about the causation and preventability of complications. Future users will be able to make the best use of these abstraction instruments—and better appreciate the strengths and limitations of the PSIs—if they are to understand the process by which the instruments were developed and the challenges therein.

Validation of the AHRQ Patient Safety Indicators The PSIs are explained in detail elsewhere.3 In brief, each indicator is constructed as a proportion based primarily on relevant

January 2011

ICD-9-CM diagnosis and procedure codes. The denominator represents all hospitalizations at risk for the complication, and the numerator represents the subset of those hospitalizations in which the complication occurred. Depending on the indicator, certain types of cases are excluded from the denominator, including those deemed by the developers as difficult to distinguish from normal uncomplicated events (for example, “iatrogenic pneumothorax” with a diagnosis of chest trauma), those that pertain to complications during a separate episode of care (infection of a dialysis catheter present on admission), and those that represent complications that seem more difficult to prevent (postoperative wound dehiscence in a patient with severe malnutrition) or that are addressed in separate indicators (complications of care in pediatric patients). Efforts to develop quality indicators typically focus on the following three criteria14: 1. The indicator should be both sensitive and specific in detecting true adverse events (“criterion validity”). 2. The indicator should identify cases for which there is an opportunity to improve care (“construct validity”). 3. Use of the indicator should foster real quality improvement rather than just changes in documentation and coding practices (“usability” or “feasibility”). Our projects have focused on determining one aspect of criterion validity—PPV—and assessing the construct validity of indicators by collecting information concerning the potential preventability of the event. These are the most salient concerns for quality improvement professionals and other stakeholders who wish to use administrative data to screen for potential quality problems or to track variation in performance across providers and over time.

Designing the Abstraction Instruments THE ABSTRACTION SETTING The AHRQ project included 47 hospitals in the United States that represent a variety of sizes, ownership types, and academic affiliations across 29 states. The VA project focused on 28 geographically diverse acute care VA hospitals encompassing a broad spectrum of PSI rates. The UHC projects focused on 18–41 academic medical centers throughout the United States. Participation of all centers in the AHRQ and UHC projects was voluntary and without compensation. The AHRQ and UHC projects began in 2006 and the VA project in 2007; all three currently continue for at least one of the PSIs. For the AHRQ and UHC projects, we trained abstractors (typically, on-site nurses with interest and experience in the quality improvement process) via Web-based teleconferences

Volume 37 Number 1

Copyright 2011 © The Joint Commission

21

The Joint Commission Journal on Quality and Patient Safety Table 1. Agency for Healthcare Research and Quality (AHRQ) Provider-Level Morbidity Patient Safety Indicators (Excluding Obstetrical Indicators) Evaluated by the AHRQ, VA, and UHC Project Teams* PSI 3 5 6 7 8 9 10 11 12 13 14 15

Complication Decubitus Ulcer Foreign Body Left During Procedure Iatrogenic Pneumothorax Selected Infections Due to Medical Care Postoperative Hip Fracture Postoperative Hemorrhage or Hematoma Postoperative Physiologic and Metabolic Derangement Postoperative Respiratory Failure Postoperative Pulmonary Embolism or Deep Vein Thrombosis Postoperative Sepsis Postoperative Wound Dehiscence Accidental Puncture or Laceration Total

AHRQ X X X X X X X X X X 10

VA X X X X X X X X X X X X 12

UHC X

X X

3

* VA, Department of Veterans Affairs; UHC, University HealthSystem Consortium.

and instructional resources. The VA project used nurse abstractors at a single VA hospital to access data from all study hospitals through the VA’s centralized electronic medical record system (VistAWeb). Training included review of the PSI and the rationale underlying the instrument, as well as specific guidance as to definitions of terminology, acceptable sources of information within the medical record, and the meaning of and rationale for particular questions in the data collection instrument.

CREATING

THE INSTRUMENTS

Although the AHRQ, VA, and UHC teams communicated during the process of developing instruments, for each PSI, each project team developed and tailored an instrument to the needs of its organization. Many aspects of instrument development were similar across organizations, as follows: ■ Project teams consisted of 6–10 investigators, including health services researchers, physicians, nurses, and other medical specialists. ■ Instruments were developed through an iterative review process that involved pilot testing. ■ A companion document was developed for each instrument to provide data definitions and guidance to ensure uniform interpretation by abstractors. Each organization evaluated a specific subset of PSIs (Table 1). The VA team formally assessed interrater reliability among its four nurse abstractors and continued formal pilot testing until > 90% interrater reliability was achieved on all items. All instruments were structured with a similar basic format (Sidebar 1, page 23; a more detailed version is available in 22

January 2011

online article). The section for ascertainment of the event constituted the bulk of the instruments. We included both questions with menus of response options and questions with open-ended text fields to help ensure standardization but still allow abstractors the opportunity to describe details about the events. Because hospitals often use multiple electronic applications to store real-time clinical information, we encouraged abstractors to use all approved components of the record system. Because abstractors were unable to query physicians directly, we designed the instruments to include response options to indicate that the data in question were not available. We frequently asked abstractors to answer questions using specific sources, such as a relevant operative note or radiologic report. If the abstractor encountered conflicting information, we instructed him or her to report the findings from the final report or the most senior member of the team (for example, an attending physician’s note would take priority over a resident’s note) or to follow a specific hierarchy (for an estimate of intraoperative blood loss, the anesthesiologist’s note should take precedence over the surgeon’s note). To the extent possible, we focused on “explicit” criteria for each indicator15 to minimize subjectivity. The AHRQ instruments and guidelines are available online16; VA and UHC instruments and guidelines are available by request from the authors.

Principles of an Effective Abstraction Instrument FOCUSING ON THE MOST USEFUL INFORMATION As each project team designed abstraction instruments for

Volume 37 Number 1

Copyright 2011 © The Joint Commission

The Joint Commission Journal on Quality and Patient Safety Sidebar 1. Basic Format of the PSI-specific Abstraction Instruments Developed by the AHRQ, VA, and UHC Project Teams for Verification of Cases Flagged Positive For a Complication* Section 1. Record Identification Section 2. Demographic/Hospitalization Characteristics Section 3. Ascertainment of the Event —Exclusion Criteria —Whether the Event Itself Occurred —How/Why the Event Occurred Section 4. Preventive Measures/Risk Factors Section 5. Evaluation and Treatment Section 6. Outcomes * A more detailed version is available in online article. AHRQ, Agency for Healthcare Research and Quality; VA, Department of Veterans Affairs; UHC, University HealthSystem Consortium.

the PSIs, a natural tendency was to include questions that were actually unrelated to the aims of the project. For example, questions concerning risk factors for the complication seemed appropriate but served little purpose in assessing the PPV of the PSIs, given that our samples were limited to cases that flagged positive. On occasion, such questions might be of interest to determine whether the PPV of the indicator varies depending on certain patient characteristics, but this rationale rarely justified the extra abstraction burden.

LIMITING PREMATURE TERMINATION Because abstractor time and effort are valuable, it is desirable to position questions that identify exclusionary criteria (that is, reasons why the case does not meet denominator eligibility criteria) or failure to meet numerator criteria (that is, evidence that the event in question did not occur) early in the instrument and to have the abstractor terminate data collection at that point. However, early termination of abstraction may prevent the collection of sufficient information to judge the appropriateness of exclusion. For example, in the AHRQ project, abstractors terminated early in 96 cases of PSI 13 (Postoperative Sepsis), which often seemed inappropriate (Table 2, page 24). In at least 9 of these cases, the abstractors erred in considering an elective procedure (for example, laparoscopic segmental colectomy) to have been urgent, thus prematurely terminating abstraction. Abstractors also frequently answered “no” (thus prompting termination) to a summary question of whether there was documentation that any bacteremia, septicemia, sepsis, or systemic inflammatory response syndrome (SIRS) occurred during the hospital stay. We suspect that some of these responses were inaccurate because little explanation was offered

January 2011

for why the record was professionally coded such that it was flagged for “postoperative sepsis.” We learned that it would have been better to have the abstractor continue collecting information until it became clearer that the case was falsely positive. For example, premature termination of data collection for PSI 13 could have been prevented by asking specifically about physician documentation, culture results, SIRS criteria, and treatment typically reserved for sepsis (for example, drotrecogin alfa). Question-response pairings that result in early termination should be clearly written to represent cases in which additional data are obviously not worth obtaining. The VA team largely avoided this early termination problem by using a single team of abstractors who were also involved in instrument development.

AVOIDING QUESTIONS SUSCEPTIBLE MISINTERPRETATION

TO

Another potential problem, also highlighted by Allison et al.,11 is that questions may be susceptible to misinterpretation. In the AHRQ instrument for PSI 5 (Foreign Body Left In), we included the question, “Did the patient have a foreign body that was unintentionally left in during a procedure or operation during this hospitalization?” In pretesting the instrument, we became concerned that abstractors might consider a retained foreign body to be intentional (and thus answer “no” to the question) if the foreign body was left in place because it was impossible or undesirable to retrieve. For example, a misdirected embolization coil in the cerebrovasculature might be recognized immediately and yet left in place because retrieving it would risk (further) damage to the brain. Clearly, such a situation would still be of interest as a true complication. A “no” response to a complex question does not necessarily identify which aspect of the question is incorrect. For example, with the “foreign body” question, it is possible that a “no” response means “No, no foreign body was left in,” “No, a foreign body was left in intentionally,” “No, a foreign body was left in but it did not occur as a result of a ‘procedure or operation,’” or “No, a foreign body was left in during a previous encounter.” If such information is desired, either additional questions or additional response options (including “unable to determine”) are necessary to ascertain the details. For example, in the instrument for PSI 8 (Postoperative Hip Fracture), we separated a question about documentation of any “impaired mobility on admission or a history of falls in the month prior to admission or prior to the first operation” into two questions: one addressing impaired mobility on admission and one, any history of falls prior to admission or the first operation. We also found it helpful to nest certain questions within

Volume 37 Number 1

Copyright 2011 © The Joint Commission

23

The Joint Commission Journal on Quality and Patient Safety Table 2. Frequency of Specific Events with Abstraction of Cases Flagged Positive for Specific Provider-Level Morbidity Patient Safety Indicators* PSI

Organization Total Cases Early Termination Early Termination Use of Text Field Use of Text Field Abstracted (Percentage of Known to Be by Abstractor to Provided Information All Cases) Inappropriate Describe General of Some Relevance†

Change in Classification

Change in Classification

(Percentage of All Early Termination Cases)

Aspects of the Case (Percentage of All Cases Not Involving Early Termination)

(Percentage of All Cases in Which Text Field Was Used)

from True- to False-Positive by Project Team (Out of All Cases Reported to Be True-Positive by Abstractors)‡

from False- to True-Positive by Project Team (Out of All Cases Reported to Be False-Positive by Abstractors)‡

22 (15) 49 (32) 61 (54) 203 (33) 83 (74) 42 (34)

22 (100) 49 (100) 61 (100) 198 (98) 74 (100) 28 (67)

0/150 6/119 3/82 N/A 0/89 2/123

6/50 2/72 2/30 N/A 1/23 0/32

19 (28)

19 (100)

53 (23)

32 (60)

Unable to assess 5/227

Unable to assess 4/22

6 7 9 11 11 12

AHRQ AHRQ VA§ UHC VA§ AHRQ

200 191 112 609 112 155

50 (25) 38 (20) N/A N/A N/A 32 (21)

13

AHRQ

164

96 (59)

6 (12) 5 (13) N/A N/A N/A Unable to assess 9 (9)||

15

AHRQ

249

22 (9)

4 (18)

Note: This table includes data for all eight instruments for which we have already analyzed data. Data collection is planned or in progress for the remaining instruments we have developed. * PSI, patient safety indicator; N/A, Not applicable to this particular abstraction instrument; AHRQ, Agency for Healthcare Research and Quality; UHC, University HealthSystem Consortium; VA, Department of Veterans Affairs. †

We deemed information “relevant” if it pertained to any topic addressed in the abstraction instrument.



The project teams evaluated only abstracted data (i.e., not augmented by other sources) when evaluating the abstractors’ classification. Differences in classification frequently were attributable to discrepancies between a technical interpretation of the indicator’s criteria (the project team perspective) and a more purely intuitive one (the abstractor’s perspective). §

The VA projects included the abstractors as part of the investigator team.

||

We were unable to assess the appropriateness of early termination for some such cases for this indicator.

others so that only one concept at a time would be addressed. This allowed us to classify occasional cases differently than abstractors, both in the direction from true-positive to falsepositive and vice versa (Table 2). These revisions occurred less frequently in the VA project, perhaps because of the abstractors’ involvement in instrument design and our attention to interrater reliability.

about negative responses (for example, questions about evidence-based measures that bear on the preventability of the complication), we included response options forcing abstractors to respond with “yes,” “no,” or (if appropriate) “no documentation available.” For questions with response options that were not mutually exclusive, we included both “yes” and “no” check boxes for each option.

DIFFERENTIATING BETWEEN “ABSENCE OF EVIDENCE” AND “EVIDENCE OF ABSENCE”

CHOOSING AN APPROPRIATE RANGE OF RESPONSE OPTIONS

Incomplete responses are a potential pitfall even with trained abstractors, especially for long and/or complex instruments. Responses may be obviously incomplete, such as skipped questions or pages, but they can also be subtly incomplete if, for example, a question allows more than one response and not all correct responses are checked. In these cases, it is possible either that the abstractor meant for an unchecked item to indicate a negative response or that the abstractor neglected to record a truly positive response. For questions that required certainty

Questions of a clinical nature can be designed for a wide range of responses. At one extreme are questions with binary responses (for example, “yes”/“no”); at the other are questions that require richly descriptive responses and should perhaps include all available documentation verbatim from the medical record. However, the optimal balance for many questions falls between these two extremes. For example, it may be important to allow the abstractor to indicate that the requested information is unavailable or to qualify simple “yes”/“no” answers. For

24

January 2011

Volume 37 Number 1

Copyright 2011 © The Joint Commission

The Joint Commission Journal on Quality and Patient Safety open-ended topics, the investigator may want to constrain response options to a specific set of possible answers to encourage uniformity in data collection and to facilitate data analysis. The challenge in constraining response options is determining which options are sufficiently frequent or, if rare, represent important distinctions. For example, PSI 15 (Accidental Puncture or Laceration) pertains to a wide variety of surgical procedures, so the types of events detected might involve different anatomic regions and circumstances. In actuality, this indicator primarily detects bowel, bladder, and spinal dural injuries. If we had better characterized the anatomic distribution of these events, using either ICD-9-CM procedure codes in the administrative data or narrative reports from previous PSI validation efforts,17 then we might have been able to emphasize pertinent topics in the abstraction instrument. Unless the possible range of response options is highly circumscribed, we recommend a two-pronged approach: one question forcing the abstractor to select one (or more) response(s) from a constrained list of options and another question allowing the abstractor to cite exact wording from the medical record (subject to a character count). We frequently asked the abstractors to answer both questions even if they thought that their answer to one was sufficient. We also included a text box at the end of each instrument to allow abstractors to explain any details that might otherwise not have been apparent. Abstractors used this text box in 15%–74% of cases (Table 2). On average, abstractors presented relevant information in 91% of the cases in which they used the text box. For example, one abstractor identified diffuse intraoperative mediastinal bleeding during mitral valve repair as an accidental puncture or laceration; the description in the text box allowed us to reclassify this case as falsely positive. Similarly, in one of the cases from UHC’s analysis of postoperative respiratory failure, the abstractor used the final text box to indicate that the physician’s diagnosis of acute respiratory failure appeared to represent a misdiagnosis based on clinical criteria.

CLARIFYING CONFUSING LANGUAGE Medical terminology can be complex, duplicative, and imprecise. For example, medication names can be confused because of generic versus proprietary names, different formulations, and similar-sounding names or ambiguous abbreviations. Similarly, two diagnoses may refer to the same pathophysiologic entity but represent different clinical manifestations (for example, deep venous thrombosis versus “thrombophlebitis”). Generally, these problems can be addressed by clear guidance and appropriate warnings to increase the abstractor’s awareness.

January 2011

Sometimes the names of complications represented by the PSIs are inherently vague, such as “respiratory failure” and “wound dehiscence.” Even with ubiquitous terms such as “elective admission,” abstractors frequently had different interpretations, for example, “the hospitalization was planned” versus “the patient desired the operation performed during this hospitalization.” Furthermore, the elective status of some hospitalizations can be ambiguous. For example, an admission for a revascularization procedure for critical limb ischemia might be nonelective if necessary for wet gangrene but elective in other circumstances. In some cases, the admission was elective but the qualifying operation was not. We found that clear definitions of terminology in guidelines and other training materials are helpful to prevent confusion but that the usefulness of these measures might be limited by “guideline fatigue.” An equally challenging problem is ensuring that abstractors accurately identify clinical information from physician narratives. For example, surgeons may document technical complications such as accidental lacerations with obtuse language such as “a defect in the bowel was encountered,” which may make it difficult to distinguish a potential complication from the underlying disease process. Sometimes, a normal step in an operation might even be mistaken for an accidental puncture (for example, an enterotomy created for access of a stapling instrument). Assessing interrater reliability, as the VA team did, is worthwhile to reduce variability in abstractor judgment. Also, VA experience suggests that having ready access to physician specialists for review of medical records may facilitate interpretation of ambiguous notation. However, there will always be matters that simply cannot be judged on the basis of the medical record alone and thus remain unknown.

MINING THE CONTENTS OF EFFICIENTLY

THE

MEDICAL RECORD

For efficiency, we favor ordering questions such that the abstractor does not have to repeatedly review the same section of the record. However, we learned that it is often best to include some basic questions related to the nature of the complication early in the instrument so that the abstractor gets an overview of the case, even if that involves later returning to the same section(s) of the record. This approach can simplify subsequent portions of the instrument by clarifying the context of the questions and allowing the abstractor to focus on the relevant portion of the hospitalization. For example, PSI 9 (Postoperative Hemorrhage or Hematoma) can pertain to any procedure performed during a hospital stay, not necessarily the operation that most immediately preceded it. Although we

Volume 37 Number 1

Copyright 2011 © The Joint Commission

25

The Joint Commission Journal on Quality and Patient Safety could have asked detailed questions about each operation in sequence, we instead found it more efficient to collect detailed information on the first operation (to assess the preventability of the complication and to confirm that the hemorrhage/ hematoma did not precede the first operation) and any subsequent operation thought to have led to the hemorrhage/ hematoma. In general, when more than one operation occurred during an elective hospitalization, we found that the first operation almost always played at least an indirect role in any eventual complications and thus warranted attention. Similarly, when multiple episodes of a complication occurred during the same hospitalization (for example, postoperative respiratory failure), we preferred to restrict abstraction to the first episode because it typically was directly related to subsequent episodes. A related challenge is how to guide the abstractor to find the pertinent information. For example, the most relevant information about an Accidental Puncture or Laceration (PSI 15) is usually in the physician’s operative note, so we asked abstractors to copy the description of the complication verbatim from this source. Notes from nephrologists were particularly useful to ascertain the etiology and chronicity of renal failure detected by PSI 10. With increased adoption of electronic medical records, abstractors can often search free text for specific phrases or words, a technique on which the VA investigators capitalized. This functionality helped identify some coding errors for PSI 12 in which the abbreviation “PE” was intended to represent “physical examination” but was misinterpreted by the coder as “pulmonary embolism.” However, corresponding challenges in abstracting electronic medical records include the possibilities that some information may not be apparent to abstractors (that is, they may not know where in the electronic record to look for specific items, they may not recognize when information exists but is hidden from view, or the information may still be captured in paper form) and that information may be duplicated (and thus of dubious time attribution) with “copy-and-paste” functions. In one example, an abstractor failed to recognize that several patients had received appropriate thromboembolism prophylaxis with pneumatic compression devices because they were prescribed by nursing protocol rather than by physician order.

BALANCING THE UTILITY OF DATA VERSUS ABSTRACTOR BURDEN Some information, although desirable, is not worth the incremental effort involved in abstracting it. For example, PSI 11 (Postoperative Respiratory Failure) detects both cases that include procedure codes for mechanical ventilation and reinsertion of an endotracheal tube and cases that include a diagnosis 26

January 2011

code for acute respiratory failure. The abstraction instrument for this indicator addressed many details about the timing of endotracheal tube placement and removal, but it also had to assess the clinical evidence for respiratory failure in the 18% of cases that did not meet the procedure code criteria. Therefore, we wanted to ask about details of noninvasive positive pressure ventilation, but we chose not to because this information can be difficult to locate in the record and to summarize. Similarly, for PSI 10 (Postoperative Physiologic and Metabolic Derangement), it proved more efficient—and virtually as informative— for abstractors to answer whether the blood glucose level was ever documented to be below a threshold value (for example, 40 mg/dL) rather than to find the lowest value documented.

ANTICIPATING THE NEEDS OF CONSUMERS OF STUDY FINDINGS

THE

The main focus of the AHRQ and VA projects (and a secondary aim of the UHC projects) was to estimate the PPVs of the indicators, but individual hospitals or providers typically want to understand whether evidence-based preventive measures or local protocols were followed. Ultimately, knowledge of whether a complication occurred is most useful when it is linked to information about processes of care that might have prevented the complication, so we considered such questions relevant both for establishing the construct validity of the indicator and for informing quality improvement processes at the hospital and provider levels. Although the medical record alone is insufficient to identify many of the root causes of adverse events, information gleaned from review of PSI cases can serve as a useful springboard for quality investigators and hospitals to identify and address such root causes in follow-up efforts. Different users of the PSIs can arrive at different interpretations of PPV depending on their perspective. Users focused on the narrow issue of coding accuracy might evaluate flagged cases primarily on the basis of whether coding criteria were met, whereas those users interested in the real clinical utility of the indicators are typically more interested in whether flagged cases represent true deficiencies in the quality of care. For example, many of the events detected by PSI 15 (Accidental Puncture or Laceration) were coded correctly (high coding PPV) but did not represent serious instances of patient harm (modest clinical PPV). To complicate matters, indicators with complex logic such as PSI 11 (Postoperative Respiratory Failure) can be falsely positive by one criterion and falsely negative by a different criterion in the same patient. For example, if a hospitalization was coded as involving procedure 96.72 (Continuous mechanical ventilation for 96 consecutive hours or more) two days after

Volume 37 Number 1

Copyright 2011 © The Joint Commission

The Joint Commission Journal on Quality and Patient Safety Table 3. Guidelines for Designing Medical Record Abstraction Instruments to Evaluate Cases Flagged by the Agency for Healthcare Research and Quality (AHRQ) Patient Safety Indicators Guideline Focus on the most useful information

Rationale There is a tendency to ask about topics extraneous to whether the event in question truly happened and how it might have been prevented, but these topics serve little use for quality improvement.

Limit premature termination

Questions that allow early termination of the abstraction process because a case is identified as an exclusion or a false positive are prone to error unless safeguards are built in to ensure correct responses.

Avoid questions susceptible to misinterpretation

Questions that contain multiple clauses frequently lead to ambiguous responses; instead, questions should be simplified and presented in nested format to dissect out the salient issues.

Differentiating between “absence of evidence” and “evidence of absence”

The meaning of a lack of response to a question can be ambiguous; instead, force abstractors to specify negative responses.

Choose an appropriate range of response options

Both the abstraction process and the analysis of abstracted data are facilitated when response options closely match what abstractors prefer to record.

Clarify confusing language

Assume that an abstractor is unfamiliar with key terminology and define it in guidance to render responses as meaningful as possible.

Mine the contents of the medical record efficiently

Consider how abstractors will navigate the medical record as they progress through the abstraction instrument, but recognize that putting questions in the optimal context frequently involves nonlinear examination of the record.

Balance the utility of data versus abstractor burden

The medical record only yields certain kinds of information readily; to ask abstractors to look for rarely documented or burdensome information invites fatigue and may compromise the overall quality of data collection.

Anticipate the needs of consumers of the study findings

Addressing the needs of quality-of-care researchers as well as hospitals and clinicians involves some forethought but increases the utility of abstracted data.

Arrange for oversight by expert clinicians

Evaluation of quality of care and patient safety, particularly when restricted to the information available from a medical record, hinges on astute clinical judgment.

the qualifying operation, when in fact the patient was reintubated on that date and underwent mechanical ventilation for less than 96 hours, the case would be falsely positive for the 96.72 criterion for PSI 11 but falsely negative for the 96.04 (Insertion of endotracheal tube) criterion. In general, we recommend designing the instruments to assess both coding and clinical criteria, with recognition that most PSI users will be interested primarily in clinical validity.

OVERSIGHT BY EXPERT CLINICIANS Whatever the context in which the instruments are being used, the abstraction process and analysis must be overseen by expert clinicians, preferably with an interest in patient safety. Such leaders are necessary to evaluate the myriad issues that arise but which the abstraction guidelines cannot address comprehensively. The AHRQ and UHC projects provided such oversight only if it was requested by “frontline” abstractors, whereas the VA projects used specialist physicians as consultants working side by side with research staff during the abstraction process, which likely improved data quality. Similarly, hospitals that use the abstraction instruments for internal qual-

January 2011

ity improvement efforts should have skilled clinicians—preferably representing the relevant disciplines—providing oversight.

Discussion The AHRQ PSIs, as well as other indicators based on administrative data, are publicly available and are increasingly used in quality assurance, quality improvement, public reporting, and pay-for-performance programs. Eight of the provider-level morbidity PSIs (5, 6, 11, 12, 14–17) have been endorsed (some with conditions) by the National Quality Forum.4 Three of these indicators (6, 14, and 15) have been adopted by CMS as part of its hospital pay-for-reporting program. Even as efforts are under way to convert to International Classification of Diseases, 10th Revision, Clinical Modification,18 there is interest in and progress toward developing a new generation of indicators based on this classification system.19,20 The need to determine what type of events such indicators detect and how such complications might be reduced will persist for some time. The process of designing abstraction instruments for such purposes is not as simple as it might seem. As Allison et al. stated,11 the precise specification of abstraction variables for quali-

Volume 37 Number 1

Copyright 2011 © The Joint Commission

27

The Joint Commission Journal on Quality and Patient Safety ty assessment is a challenging process. Both explicit and implicit choices have to be made regarding what information is worth collecting, how abstractors will interpret questions, and whether the medical record will consistently provide the information needed to answer those questions. Because abstracting records is expensive and laborious, the instruments designed for this work need to be clear, logical, efficient, and accurate. By carefully considering how questions are worded and the logic of how they are ordered and presented, it is possible to anticipate many of the inherent challenges. Yet, even with scrupulous preparation, some pitfalls will only become apparent in retrospect after examining the data that abstractors collect. Through our experience validating selected AHRQ PSIs, we have identified some, but clearly not all, such problems (Table 3, page 27). Other quality improvement researchers and professionals may have different experiences, on the basis of the application of other administrative data-based quality indicators in other settings of care. By increasing the transparency of the process by which quality indicators are validated, all stakeholders in the quality improvement field can gain a better understanding of how the indicators function, which uses they might support, and how physicians, hospitals, and other health care providers should respond to the results. J Garth H. Utter, M.D., M.Sc., is Assistant Professor, Department of Surgery, University of California, Davis, Medical Center, Sacramento, California. Ann M. Borzecki, M.D., M.P.H., is Research Scientist, Center for Health Quality, Outcomes and Economic Research, Bedford Veterans Affairs (VA) Medical Center; and Assistant Professor, Department of Health Policy and Management, Boston University School of Public Health, and Section of Internal Medicine, Boston University School of Medicine. Amy K. Rosen, Ph.D., is VA Research Career Scientist, VA Boston Healthcare System; and Professor, Department of Health Policy and Management, Boston University School of Public Health. Patricia A. Zrelak, Ph.D., R.N., is Administrative Nurse Researcher, Center for Healthcare Policy and Research, University of California, Davis. Banafsheh Sadeghi, M.D., Ph.D., is Assistant Adjunct Professor, Department of Internal Medicine, University of California, Davis, Medical Center. Ruth Baron, R.N., B.Sc.N., is Nurse Researcher, Center for Healthcare Policy and Research, University of California, Davis. Joanne Cuny, R.N., M.B.A., formerly Director of Quality, University HealthSystem Consortium, Oak Brook, Illinois, is Director of Measure Testing and Performance Improvement, Physician Consortium for Performance Improvement, American Medical Association, Chicago. Haytham M.A. Kaafarani, M.D., M.P.H., is Chief Resident, Department of Surgery, Tufts University School of Medicine, Boston. Jeffrey J. Geppert, J.D., is Research Leader, Battelle Memorial Institute, Sacramento. Patrick S. Romano, M.D., M.P.H., is Professor, Departments of Internal Medicine and Pediatrics, University of California, Davis, Medical Center. Please address correspondence to Garth H. Utter, M.D., M.Sc., [email protected].

28

January 2011

References 1. The Joint Commission: Performance Measurement Initiatives. http://www.jointcommission.org/core_measure_set/ (last accessed Nov. 30, 2010). 2. American College of Surgeons: National Surgical Quality Improvement Project. http://www.acsnsqip.org/ (last accessed Nov. 19, 2010). 3. U.S. Agency for Healthcare Research and Quality: AHRQuality Indicators. http://www.qualityindicators.ahrq.gov/psi_download.htm (last accessed Nov. 19, 2010). 4. National Quality Forum: Press Release: National Quality Forum Endorses Consensus Standards for Quality of Hospital Care. May 15, 2008. http://www.qualityforum.org/News_And_Resources/Press_Releases/2008/ National_Quality__Forum__Endoreses__Consensus_Standards_for__ Quality__of__Hospital__Care.aspx (last accessed Nov. 22, 2010). 5. Sadeghi B., et al.: Cases of iatrogenic pneumothorax can be identified from ICD-9-CM coded data. Am J Med Qual 25:218–224, May–Jun. 2010. 6. Utter G.H., et al.: Detection of postoperative respiratory failure: How predictive is the Agency for Healthcare Research and Quality’s Patient Safety Indicator? J Am Coll Surg 211:347–354, Sep. 2010. Epub Jul. 13, 2010. 7. Utter G.H., et al.: Positive predictive value of the AHRQ accidental puncture or laceration patient safety indicator. Ann Surg 250:1041–1045, Dec. 2009. 8. White R.H., et al.: How valid is the ICD-9-CM based AHRQ patient safety indicator for postoperative venous thromboembolism? Med Care 47:1237–1243, Dec. 2009. 9. White R.H., et al.: Evaluation of the predictive value of ICD-9-CM coded administrative data for venous thromboembolism in the United States. Thromb Res 126:61–67, Jul. 2010. Epub Apr. 28, 2010. 10. Kaafarani H.M., et al.: Validity of selected Patient Safety Indicators: Opportunities and concerns. J Am Coll Surg, in press. Epub Sep. 2010. 11. Allison J.J., et al.: The art and science of chart review. Jt Comm J Qual Improv 26:115–136, Mar. 2000. 12. Banks N.J.: Designing medical record abstraction forms. Int J Qual Health Care 10:163–167, Apr. 1998. 13. Gilbert E.H., et al.: Chart reviews in emergency medicine research: Where are the methods? Ann Emerg Med 27:305–308, Mar. 1996. 14. Davies S., et al.: Refinement of the HCUP Quality Indicators. Technical Review Number 4. Report No.: 01-0035. Agency for Healthcare Research and Quality, May, 2001. http://www.qualityindicators.ahrq.gov/downloads/ technical/qi_technical_summary.pdf (last accessed Nov. 19, 2010). 15. Weingart S.N., et al.: Discrepancies between explicit and implicit review: Physician and nurse assessments of complications and quality. Health Serv Res 37:483–498, Apr. 2002. 16. Agency for Healthcare Research and Quality (AHRQ): AHRQ Quality Indicators Validation Pilot. http://qualityindicators.ahrq.gov/ validationpilot.htm (last accessed Nov. 19, 2010). 17. Scanlon M.C., et al.: Evaluation of the Agency for Healthcare Research and Quality Pediatric Quality Indicators. Pediatrics 121:e1723–e1731, Jun. 12, 2008. Epub May 12, 2008. 18. U.S. Department of Health & Human Services (HHS): Press Release: HHS Proposes Adoption of ICD-10 Code Sets and Updated Electronic Transaction Standards, Aug. 15, 2008. http://www.hhs.gov/news/press/2008pres/08/ 20080815a.html (last accessed Nov. 19, 2010). 19. De Coster C., et al.: Identifying priorities in methodological research using ICD-9-CM and ICD-10 administrative data: Report from an international consortium. BMC Health Serv Res 6:77, Jun. 15, 2006. 20. Quan H., et al.: Adaptation of AHRQ Patient Safety Indicators for Use in ICD-10 Administrative Data by an International Consortium. Aug. 6, 2008. http://www.ahrq.gov/downloads/pub/advances2/vol1/AdvancesQuan_52.pdf (last accessed Nov. 19, 2010).

Volume 37 Number 1

Copyright 2011 © The Joint Commission

The Joint Commission Journal on Quality and Patient Safety Online-Only Content

8

Sidebar 1. Basic Format of the PSI-specific Abstraction Instruments Developed by the AHRQ, VA, and UHC Project Teams for Verification of Cases Flagged Positive For a Complication* Section 1. Record Identification Abstractor identification, date of abstraction, patient/hospitalization identification Section 2. Demographic/Hospitalization Characteristics Dates of birth, admission, discharge; gender Section 3. Ascertainment of the Event — Exclusion Criteria Is it clear that each of the inclusion criteria did apply and that each of the exclusion criteria did not apply? If not, why not? Did the event in question happen prior to this hospitalization? — Whether the Event Itself Occurred Did the adverse event truly occur? What specific information objectively supports whether it did or did not occur? How severe an event was it? Which part of the body was affected? — How/Why the Event Occurred What unique circumstances contributed to the occurrence of the adverse event? Where in the facility did it occur? Who (position, level of experience) contributed to its occurrence? What system or organizational factors might have contributed to its occurrence? Section 4. Preventive Measures/Risk Factors Were evidence-based practices (or consensus standards) or local protocols to prevent the adverse event followed? If not, why not? What risk factors did the patient have? Section 5. Evaluation and Treatment What kinds of additional testing/monitoring and treatment were necessary as a result of the adverse event? Section 6. Outcomes Did the patient suffer as a result of the adverse event? To where was the patient discharged? Did the patient require rehospitalization? Did the patient die? * AHRQ, Agency for Healthcare Research and Quality; VA, Department of Veterans Affairs; UHC, University HealthSystem Consortium.

AP1

January 2011

Volume 37 Number 1

Copyright 2011 © The Joint Commission