Joint Commission
Journal on Quality and Safety
Performance Measures
Methodological Issues in Public Reporting of Patient Perspectives on Hospital Quality
Judith K. Barr, Sc.D. Sara Banks, Ph.D. William J. Waters, Ph.D. Marcia Petrillo, M.A.
ncreasing attention is being focused on public reporting of patient satisfaction and experience with hospital care, both across the United States and at the state level. Currently, the Centers for Medicare & Medicaid Services (CMS) has initiated pilot projects and special studies to measure patient perspectives on hospital quality with a standardized instrument— (Consumer Assessment of Health Plans [CAHPS®])—that could be used to compare hospitals in local areas, statewide, and nationally.1 This measurement addresses one traditional outcome measure relating to the patientcenteredness component of hospital quality.2,3 Although hospitals throughout the United States are using vendorprovided or “homegrown” measures of patient satisfaction, comparative reports on hospital patient satisfaction and experience using a standard survey are few. A small but growing number of states and localities have, using standard instruments, already undertaken hospital patient surveys and reported the results in a comparative format to the public. However, relatively little is known about the underlying methodological approaches.4 To learn about different approaches used in developing such reports, we conducted an in-depth review of comparative public reports on hospital patient satisfaction and experience with care. The major purposes of this review were as follows: ■ Identify existing public reports on hospital patient satisfaction and experience with care ■ Assess similarities and differences in survey and statistical methodologies used in these public reports
I
October 2004
Article-at-a-Glance Background: Increasing attention is being focused on public reporting of patient satisfaction and experience with hospital care, both nationally and at the state level. Comparative reports on hospital patient satisfaction use a standard survey, but little is known about underlying methodological approaches for reporting these quality measures. Methods: Literature, Web sites, and key informants were used to identify nine public reports. In-depth reviews were conducted to determine approaches to collecting, analyzing, and publicly reporting comparative data. Data were grouped into four analytic categories: survey, sampling, computation of scores, and reporting of scores. Results: The reports were similar in response rates and sampling procedures but differed in the number of hospitals included, the survey instrument, and survey procedure. The reports varied considerably in the techniques for computing hospital scores and decisions about reporting scores. Conclusions: Reports from nine locales illustrate the decision making necessary to produce comparative reports on hospital patient satisfaction. Differences stem from decisions about the survey instrument and statistical decisions about how to interpret and report data. These issues should be clearly delineated as part of any public reporting process.
Volume 30 Number 10
Copyright 2004 Joint Commission on Accreditation of Healthcare Organizations
567
Joint Commission
Journal on Quality and Safety
■ Discuss the advantages and disadvantages of various
■ California Institute for Health Systems Performance/
methodologies
California HealthCare Foundation report (CA) ■ Massachusetts Health Quality Partnership report (MA) ■ Ontario Hospital Association report (ONT) ■ Four Hospital Profiling Project reports—-Southeast Michigan (SEMI), Buffalo (BUF), Indianapolis (IND) and Cleveland (CLE) ■ Niagara (Western New York) Health Quality Coalition report (WNY) In addition to the public report, RI, CA, MA, and ONT produced companion technical reports available to the public on Web sites.
Methods The initial step in the review process was to identify all reports of hospital patient satisfaction that present comparative data and make the data available to the public. The next step was to compare and contrast these reports on several key dimensions relevant to data collection and public reporting of patient satisfaction survey data. We used a combination of a literature search, Web-site searches, and key informant telephone calls to determine and verify the existence of public reports comparing patient satisfaction across hospitals as of June 2002. Reports were solicited from the source or downloaded from Web sites. If a technical report had been produced, it was obtained whenever available. For this review, we selected only public reports that met the following three criteria: 1. Data based on patient surveys of satisfaction and experience with care received in hospitals 2. Data reported in a format that shows comparisons among hospitals 3. Report disseminated to the public (including employees), either hard copy or Web-based For the review of methodological approaches, we focused first on information in the public reports and the companion technical reports, when available. We reviewed four categories of information: (1) survey facts, (2) sampling procedures, (3) computation of scores, and (4) reporting of scores. To clarify and complete information not found in either the public or technical reports, key informant telephone calls were made to individuals knowledgeable about the hospital reports in the local area. The gaps found were primarily for information on methods of data collection and statistical analysis of scores. Nine* public reports in the United States and Canada, as shown in the Appendix (page 578), met the selection criteria: ■ Rhode Island Department of Health report (RI) * Similar public reporting has been published by the Victoria (Australia) Department of Human Services: Victorian Patient Satisfaction Monitor, Annual Survey Report: Year Two, 1 Sep. 2001–31 Aug. 2001. http://www.health.vic.gov.au/patsat/ (last accessed Jul. 27, 2004).
568
October 2004
Results Survey Facts Table 1 (page 569) displays the characteristics of the nine public reports and patient surveys. Sponsorship of four reports was by an employer (SEMI, BUF, IND, CLE), two by a coalition (MA, WNY), one by a hospital association (ONT), one by a health department (RI), and one by an independent public corporation/philanthropic organization (CA). These reports were state/provincebased, regional, or city-based. The number and timing of report cycles varied, as of June 2002, from four rounds for SEMI and WNY to the first round MA (1998), CA (2001), and RI (2001).† Locales also varied in the number of hospitals surveyed. RI achieved 100% participation from acute care and selected specialty hospitals (an adult psychiatric and a rehabilitation hospital) in the state because of its legislative mandate.5 All other locales relied on voluntary participation, ranging from 9% of invited hospitals in IND to 95% of the acute care hospital systems in ONT and 100% of the larger hospitals in the Buffalo-Niagara region of WNY. All but RI used a three-wave survey mailing strategy to maximize response rates. This process included an initial mailing of the survey, a reminder postcard mailing in one to two weeks, and a remailing of the survey two to three weeks later. RI opted for a two-wave mailing procedure, initial mailing plus reminder postcard one week later. (For discharges from the psychiatric facility, RI followed an alternative procedure, handing the survey † Additional reports have become available since this work was completed: Ontario (2002), Rhode Island (2003), California (2003), and Western New York (2004).
Volume 30 Number 10
Copyright 2004 Joint Commission on Accreditation of Healthcare Organizations
Joint Commission
Journal on Quality and Safety
Table 1. Survey Facts
Locale
California (CA)
Sponsoring Organization
Survey Vendor*
California Institute for Health Systems Performance Picker and California HealthCare Foundation
Number of Hospitals in the Survey
Survey Procedure
Mean Patient Response Rate
113 (30% of eligible hospitals)
3-wave mailing
43%
58 (76% of acute care hospitals)
Massachusetts Massachusetts Health Quality (MA) Partnership
Picker
3-wave mailing
47%
Ontario Ontario (ONT) Hospital Association
95 hospital systems (95% of eligible 3-wave Parkside acute care hospital mailing systems in ONT)
40%
Rhode Island RI Department (RI) of Health
11 general, 2 specialty (100% of Parkside general hospitals in RI)
SE Michigan (SEMI)
Hospital Profiling Project
Buffalo (BUF) same Indianapolis (IND) Cleveland (CLE) BuffaloNiagara region (WNY)
Picker same
same
same
same
same
Niagara Health Quality Picker Coalition
2-wave mailing (for psych: 43% one-wave hand-out)
15 (52% of invited 3-wave hospitals in region) mailing 18 (78% of invited same hospitals in city) 2 (9% of invited same hospitals in city) 7 (30% of invited same hospitals in city) 15 (100% of larger 3-wave hospitals in region) mailing
Alternate Language Surveys Spanish and Chinese surveys available through 3 options: patient requests from cover letter, hospital pre-identifies, or hospital double mails. Most hospitals declined options or chose patient request. Spanish, Russian, Khmer, and Portuguese surveys available. Some hospitals pre-identified patients; others provided request option in cover letter. English and bilingual (English/ French) surveys available. Hospitals selected their preferred version. Spanish survey available by patient request from cover letter. Patient must call 800 number to get alternate copy.
No info
Determined no need.
same
same
same
same
same
same
49%
Determined no need.
* The Picker Institute was acquired in 2001 by National Research Corporation (NRC); Parkside Associates was acquired in December 2000 by Press Ganey Associates, Inc.
and a business-reply envelope to patients with no mailed reminders.) The average response rates across the locales were consistently in the 40% range for those that were available. Four locales made alternate-language surveys available for patients who preferred this option in the form of a bilingual cover letter with toll-free number (RI) or a
October 2004
bilingual survey that hospitals could select depending on their patient characteristics (ONT). CA provided three options for use of alternate-language surveys: survey cover letter, hospitals pre-identify patients whose primary language is not English, or hospitals mail surveys in English and a second language. In MA, some hospitals pre-identified patients, and other hospitals had patients
Volume 30 Number 10
Copyright 2004 Joint Commission on Accreditation of Healthcare Organizations
569
Joint Commission
Journal on Quality and Safety
Table 2. Sampling Procedures* Locale
Number of Patients Sampled
Inclusion Criteria (1) medical, surgical, or obstet600 patients per hospital; 300 rics (OB) patients per hospital if < 2,500 California annual discharges; 100% of eligi- (2) adult (CA) ble patients if < 300 eligible dis- (3) overnight stay charges (4) discharged to home (1) medical, surgical, or OB (2) adult Massachusetts 600 patients per hospital (MA) (3) overnight stay (4) discharged to home Approximately half of participat(1) medical or surgical ing hospitals surveyed 500 Ontario (ONT) patients per hospital; the other (2) adult and pediatric half surveyed > 500 (3) overnight stay patients/hospital. (1) medical, surgical, OB, rehab, 325 patients per service type per or psychiatry Rhode Island hospital). 100% of patients if < (2) adult (RI) 25 patients per service type per (3) overnight stay week (4) discharged to home (1) medical, surgical, or OB (2) adult SE Michigan 600 patients per hospital (200 (SEMI) medical, 200 surgical, 200 OB) (3) overnight stay (4) discharged to home Buffalo (BUF) Indianapolis (IND) Cleveland (CLE) BuffaloNiagara region (WNY)
Selected equal numbers from 3 services
Selected equal numbers from 3 services Some larger hospitals used stratified sampling procedure to survey multiple programs or sites, yielding larger sample sizes. Otherwise, simple random sample of 500/hospital. Selected equal numbers from 3 services plus a rehabilitation and a psychiatric hospital
Selected equal numbers from 3 services
same
same
same
same
same
same
same
same
same
600 patients per hospital
(1) medical, surgical, or OB (2) adult
Selected equal numbers from 3 services
request an alternate language survey. The four locales in the Hospital Profiling Project and WNY determined that there was no need for alternate language surveys.
Sampling Procedures Table 2 (above) displays the sampling procedures for the nine public reports. All but ONT used a sampling design in which equal numbers of patients were randomly selected from each hospital service type (for example,
570
Sampling Stratification
October 2004
medical, surgical, obstetrical) surveyed. In ONT, most hospitals selected equal numbers of medical and surgical patients, although some larger hospitals used a stratified sampling procedure. RI’s sampling plan also included patients from two specialty hospitals (rehabilitation and psychiatric). The number of patients sampled ranged from 600 patients per hospital (that is, 200 per service type) in seven locales to 325 per service type per hospital in RI. For smaller hospitals, 100% of discharges were
Volume 30 Number 10
Copyright 2004 Joint Commission on Accreditation of Healthcare Organizations
Joint Commission
Journal on Quality and Safety
selected. All locales required an overnight stay in either a medical or surgical unit to be eligible for sampling, and all but ONT required the sampled patient to be an adult (age 18 years or older). Only ONT excluded obstetrical patients, and only RI included patients from the psychiatric and rehabilitation hospitals. All locales except ONT and WNY specified inclusion only of those patients discharged to a personal residence.
Computation of Scores The variations in the way hospital scores were calculated are displayed in Table 3 (page 572). All locales used a stratified reporting approach, although the stratification variable differed. All locales calculated and reported scores separately by hospital service, except ONT, which calculated and reported scores separately by hospital type. The locales that used the Picker survey computed hospital-level “problem scores” (BUF, CLE, IND, and SEMI) or their arithmetic inverse, “performance scores” (CA, MA, and WNY). This scoring system dichotomizes a 3-point scale into two categories: never a problem and always/sometimes a problem. The locales using the Parkside survey (RI and ONT) computed hospital-level scores based on the 5-point response scale, transformed from an ordinal to interval scale by assigning numerical values to categorical responses (Excellent = 100, good = 75, fair = 50, poor = 25, very poor= 0). Patient-level domain scores were computed as the mean of responses to questions in each domain, and hospital-level domain scores were calculated as the mean of the patient-level domain scores. All except RI and WNY chose to case-mix-adjust hospital-level scores for key patient characteristics, including age, sex, self-reported health status, and education. In addition, ONT adjusted for whether the patient or a proxy completed the survey and for the number of hospitalizations in the previous two years. RI tested these characteristics and determined that case-mix adjustment was unnecessary because of the minimal effect on scores. WNY decided against case-mix adjusting hospital scores so that the scores would represent the actual case mix of the hospitals. The seven locales that adjusted for case mix employed multiple regression, using separate adjustment models for each hospitalservice by survey-domain combination or by each survey
October 2004
domain (ONT). Overall, patient characteristics accounted for a small portion of the variance in patient satisfaction, and the degree of shift in hospital scores due to case-mix adjustment was minimal. For example, CA and MA reported an average shift of less than one percentage point between unadjusted and adjusted hospital scores.
Reporting of Scores Table 4 (page 573) displays similarities and variations in the criteria and formats for reporting. All locales except MA established a minimum number of hospital survey responses required to compute a hospital’s score and include the hospital in the report. ONT had the most requirements, and nine hospitals not meeting the standard were omitted from the report. The Hospital Profiling Project and WNY required a 30% response rate, eliminating 2 of the 15 SEMI hospitals and 6 of the 18 BUF hospitals. RI required 40 responses per service; all hospitals met the minimum, except one hospital for the surgical service. CA required two survey returns per hospital to include as many hospitals as possible in the report; all hospitals met this minimum. MA had no minimum required for this round of reporting. The number of hospitals reported ranged from CA (N = 113) to IND (N = 2). In all locales, each hospital's score (a composite score based on item domain rather than on individual questions) was compared with a normative score, rather than with one another, for the purpose of computing comparative ratings. The “normative score” refers to an average score based on a normative distribution calculated from survey data in the vendor’s client database. In most locales, domain-specific normative scores were calculated as the arithmetic mean of domain scores by hospital service. In addition to the national norm, CA and MA also used state averages; and CA used only the state normative mean for presenting hospital-specific comparisons in the public report. RI used averages computed from the vendor’s national client database for normative scores, and ONT used Province averages. Most locales did not display the normative scores, except for MA (in the public report) and CA (in the technical report). In the public reports, eight locales displayed comparative ratings on the basis of the position of each hospital’s
Volume 30 Number 10
Copyright 2004 Joint Commission on Accreditation of Healthcare Organizations
571
Joint Commission
Journal on Quality and Safety
Table 3. Computation of Scores* Locale
Calculation of Hospital-Level Scores
Variables Tested
Adjustment Performed Hospital score adjusted separately for each service-domain combination (n = 21). Multiple regression: age, sex, self-reported health, and education. Hospital score adjusted separately for each servicedomain combination (n = 21). Multiple regression: all tested variables. Hospital score adjusted separately for each domain (n = 10). Multiple regression: 6 domains adjusted by all 5 patient characteristics; remaining 4 domains adjusted by first 4 characteristics.
Hospital-level domain score = “Performance score.”
Age; sex; selfreported health status; education; additional nonsignificant variables
Massachusetts Hospital-level domain score = (MA) “Performance score.”
Age; sex; selfreported health status; education
Patient-level domain scores = mean of responses to questions in domain. Hospital-level Ontario (ONT) domain scores = mean of casemix-adjusted patient-level scores. The 10th indicator is a weighted sum of 9 scales.
Age; sex; selfreported health status; whether other filled-out survey; number of hospitalizations in previous 2 years
Patient-level domain score = mean of responses to questions Rhode Island in domain. Hospital-level (RI) domain score = mean of patient-level domain scores.
Age; sex; selfreported health staNo tus; insurance; LOS; service type
California (CA)
Degree of Shift in Hospital Scores due to Adjustment Average shift < 1 percentage point. Maximum shift = 5 percentage points. Average shift < 1 percentage point. Maximum shift = 3.4 percentage points.
Average shift = 2.0 percentage points. Maximum shift = 3.0 percentage points.
NA
SE Michigan (SEMI)
Hospital-level domain score = “Problem score.”
Buffalo (BUF) Indianapolis (IND) Cleveland (CLE) BuffaloNiagara region (WNY)
same
Hospital score adjusted Age; sex; selfseparately for each serviceStatistically insignifreported health sta- domain combination (n = icant shift. tus; education 21). Multiple regression: all tested variables. same same same
same
same
same
same
same
same
same
same
Hospital-level domain score = “Performance score.”
NA
No
NA
* LOS, length of stay; NA, not applicable. Performance score = weighted average of percent of patients who gave the best possible response for any question within a given domain (inverse of “Problem Score”). Problem score = weighted average of percent of patients who reported a problem for any question within a given domain.
score in relation to the normative mean, assigning hospitals to a performance category. MA used a graphic display of hospital scores in the public report and presented com-
572
October 2004
parative ratings only in the technical report. Eight locales chose a three-level categorization scheme corresponding to three performance categories—for example, below
Volume 30 Number 10
Copyright 2004 Joint Commission on Accreditation of Healthcare Organizations
Joint Commission
Journal on Quality and Safety
Table 4. Reporting of Scores* Locale
California (CA)
Minimum # # of Returns to Hospitals Report Scores in Report
2 returns per hospital
Massachusetts No minimum (MA)
Normative Score
Comparative Reporting of Hospital Scores (rating categories based on normative score)
113
(1) State average = mean performance scores of all patients. (2) National average, from Picker client database; domain- and hospital-servicespecific (includes CA)
3 rating levels: 95% CI computed around each hospital score; compared to normative mean. Hospitals with CI overlapping normative mean are designated average; hospitals with CI fully above or below norm are designated above or below average, respectively. Reporting stratified by hospital service.
58
(1) State average = mean scores of all participating hospitals. (2) National average, from Picker client database; domain- and hospitalservice-specific
3 rating levels: 95% CI computed around each hospital score; compared to normative mean. Hospitals with CI overlapping normative mean designated average; hospitals with CI fully above or below norm designated above or below average, respectively. Reporting stratified by hospital service.
Buffalo (BUF) same
12
5 rating levels: Hospitals with 99.9% CI fully above normative mean designated above average; those with 95% CI fully above norm designated somewhat above average; hospitals with 95% Province average = and 99.9% CIs fully below norm AND with scores lower than mean scores of scores of all average hospitals assigned somewhat below and all participating below average, respectively. Remaining hospitals designated hospitals. average. Reporting stratified by hospital type (i.e., teaching, community, small). 3 rating levels: Hospitals with scores within 1 standard deviation National average, (SD) tolerance region computed around normative mean designated from Parkside client average. Hospitals with scores outside 1 SD designated above or database; domainbelow average only if 95% CI computed around hospital score does and hospital-servicenot overlap the normative mean; if CI overlaps, hospital designated specific average. Reporting stratified by hospital service. 3 rating levels: 2-SD tolerance region computed around normaNational average, tive mean; no CIs computed around hospital scores. Hospitals from Picker client with scores above or below 2 SD designated above or below averdatabase; domainand hospital-service- age, respectively. Remaining hospitals designated average. Reporting stratified by hospital service. specific same same
Indianapolis (IND)
same
2
same
same
Cleveland (CLE)
same
7
same
same
National average, from Picker client database; domainand hospital-servicespecific
3 rating levels: 95% CI computed around each hospital score; compared to normative mean. Hospitals with CI overlapping normative mean designated average; hospitals with CI fully above or below norm designated above or below average, respectively. Reporting stratified by hospital service.
(1) 100 returns per hospital (2) 50% of 65 questions and 86 Ontario (ONT) at least 1 question on 5 of 9 indicators completed.
Rhode Island (RI)
40 returns per 13 service type
SE Michigan (SEMI)
30% response 13 per hospital
Buffalo30% response 15 Niagara per hospital region (WNY)
* CI, confidence interval; SD, standard deviation.
October 2004
Volume 30 Number 10
Copyright 2004 Joint Commission on Accreditation of Healthcare Organizations
573
Joint Commission
Journal on Quality and Safety
average, average, above average. ONT used three levels in the first public report released in 1999 but used a fivelevel categorization scheme in the second report released in 2001. The comparative ratings of hospital scores were visually displayed in the public reports as symbols presented in tabular format. When symbols were used, stars were chosen by all locales except RI, which used diamonds. Only MA used a graphic format in the public report to compare hospital scores (plotted as solid circles along with the 95% confidence intervals) and normative scores (displayed as vertical bars). None of the locales displayed numerical hospital scores in the public reports, although CA, MA, and RI displayed them in technical reports on their Web sites. The method for assigning the comparative ratings to hospitals relied either on confidence intervals around hospital scores or on a confidence interval around the normative mean. Using the first approach, CA and WNY computed a 95% confidence interval around each hospital’s score; if the confidence interval did not overlap the normative mean, that hospital was flagged as above or below average, accordingly. MA used the same approach for the technical report. ONT followed the same methodology but added a wider 99.9% confidence interval around each hospital’s score; comparing the two confidence intervals to the normative mean produced five performance categories. Using the other approach, the four Hospital Profiling Project locales assigned hospitals to three performance categories by comparing the hospital’s observed score to a confidence interval around the normative mean: Hospitals were designated “average” if the observed score fell within two standard deviations around the normative mean; those with scores falling outside the two standard deviations were flagged “above” or “below” average, accordingly. RI blended these two approaches by computing one standard deviation around the normative mean and a 95% confidence interval around each hospital’s observed score. Hospitals with observed scores falling within the one standard deviation were designated “average”; hospitals were flagged as “below” or “above” average if the observed score fell outside one standard deviation around the normative mean, and the 95% confidence interval around the hospital’s score was completely below or above the normative mean.
574
October 2004
Discussion This review of nine public reports of hospital patient satisfaction raises a number of issues that can inform decision making about the public reporting of health care quality data. With major national reporting initiatives currently underway, including health plan member experience and hospital clinical measures, these state and local reports provide examples of concerns that developers of public reports continue to confront. The issues are especially relevant because of the current three-state pilot test of Hospital CAHPS, part of a national initiative to develop a standardized survey to measure the patient’s experience with hospital care.1 Two major areas of concern will continue to need attention as public reporting initiatives evolve—(1) the representation of the hospital population, and (2) the translation of hospital scores into a meaningful and usable public report. This discussion focuses on advantages and disadvantages of methods relevant to these topics and considers the extent to which the issues raised by these nine examples of public reporting on patient satisfaction are consistent with prior research and are being addressed in current initiatives.
Representation of the Population Two issues—response rates and risk adjustment— concern how adequately the population is represented in data collection and reporting. Response Rates. Hospitals continue to address ways to increase response rates in patient surveys, particularly among minority populations. Generic approaches publicize the survey in the hospitals by posters and other communication means to alert patients to an upcoming survey. Efforts to increase responses among minority patients include making surveys available in different languages. Yet there is no agreement on the best way to determine who needs an alternate-language questionnaire. Current methods, such as a message in the survey cover letter about how to obtain the survey in other languages, identify relatively few patients, perhaps because the patient is required to take some action to get the survey. Moreover, in most locales, hospitals do not have a consistent or reliable method for recording race/ethnicity
Volume 30 Number 10
Copyright 2004 Joint Commission on Accreditation of Healthcare Organizations
Joint Commission
Journal on Quality and Safety
or language needs. Lack of consistent information on whether health plans offered both Spanish and English versions of the CAHPS survey to their members may have accounted for the finding of no difference in satisfaction between whites and Hispanics, if less satisfied Spanish-speaking Hispanics were not reached in the survey process.6 This issue is not limited to non-English languages but may include low literacy level, compromised vision, or other impairments. Risk Adjustment. A second population issue is the extent to which variations in patient characteristics are taken into account in determining the hospital scores on patient satisfaction. Studies consistently have found patient age and self-reported health status to be significantly related to hospital satisfaction scores. In general, older patients tended to report greater satisfaction, and patients with poorer health status tended to be less satisfied.7–12 Other patient characteristics significantly related to lower levels of hospital patient satisfaction include being female,8–10 being covered by Medicaid or uninsured,7,9 and not having a regular physician.9 Results were mixed for education level,7,9 minority race/ethnicity,7,9,12 and income.9,12 Differences by hospital service have been noted, with obstetrical patients most satisfied and surgical patients more satisfied than medical patients with the experience of care.3,9,12 In our review, all locales recognized the importance of reporting results separately by type of service. All but two locales also used risk or case-mix adjustment and acknowledged that adjustment made little difference in hospital scores, consistent with prior literature.13 An advantage of using case-mix adjustment is to provide a level of comfort to hospitals that their ratings are not the result of differences in patient mix.8 It can also reduce response bias related to differences in patient populations by accounting for differential distributions of patients across hospitals. However, casemix adjustment has several disadvantages. For example, it may suggest that it was needed to mediate real differences, when there actually was little or no effect of the adjustment on scores. Further, case-mix effects may vary from one hospital to another,13–15 and adjustment may mask real population differences that need to be addressed and undermine the goals of customer feedback.12
October 2004
Reporting Hospital Performance to the Public The second overriding consideration is how comparative hospital patient satisfaction results are presented to the public. In the process of translating and presenting hospital scores in ways that are easy for the public to understand and use, issues include selection of a benchmark, visual display, and assigning rating categories. Selection of a Benchmark. Currently, there is no universally accepted benchmark of patient satisfaction or experience to use for comparing individual hospitals. All nine public reports used normative scores for comparison, although the norms used varied from national to state/province or regional averages. An advantage of using normative scores is that there is a “criterion” to which individual hospitals can be compared rather than comparing hospitals directly to one another.16 A national norm provides a broader comparative value; on the other hand, state/province or regional norms may be viewed as more relevant because they reflect performance in the same locale. Another consideration is whether the national norm reflects similar populations or conditions as those in the reporting area. Providing data on the characteristics of hospitals that constitute the national norm may help to alleviate such concerns. Visual Display. Once the decision is made to compare the scores to a normative score, the question becomes how to present that comparison. Two common approaches are graphs and symbols. Only one of the nine public reports on hospital patient satisfaction used a graphic format to display the data; all others displayed symbols. Graphs depict the scores being compared and include the normative score used for comparison. Symbols summarize the data and typically classify scores into categories in relationship to the normative score (for example, three stars for above norm, two stars for same as, one star for below), displaying data in tabular format. The difficulty in determining the most appropriate visual presentation is addressed in the extensive research on consumer responses to reports of the Medicare CAHPS survey results.17–19 Prior research and guides to report construction and usability17–22 have considered data presentation issues in terms of the consumer’s ability to process and retain the information and to get information in a usable format. Although some research shows that stars were more effective than graphs when measuring
Volume 30 Number 10
Copyright 2004 Joint Commission on Accreditation of Healthcare Organizations
575
Joint Commission
Journal on Quality and Safety
consumer comprehension of comparative health plan reports,19 other work indicates how star charts can be misinterpreted by Medicare beneficiaries.17 Rating Categories. If symbols are used, a related issue that has been identified in this review is the way the symbols are assigned—that is, the criteria for deciding whether a hospital or health plan receives one, two, or three of the selected symbol. All nine reports reviewed classified hospital scores as above average, average, or below average, but the classification methodology varied. Sampling variability was addressed by applying one or both of the following conditions: (1) placing a “tolerance region” around the normative mean or (2) placing a confidence interval around each individual hospital score. The tolerance region sets a boundary (for example, one standard deviation) around the normative mean to delineate the minimum distance a hospital score must be from the mean to be flagged as above or below average. The confidence interval around each hospital score takes into account both the sampling variability for that score and the size of the hospital sample on which the score is based, reflected in the width of the confidence interval. A question related to the use of these criteria is how accurately each method classifies hospitals as average, or as above or below the norm. Any classification scheme has the potential for statistical error in assigning a hospital to these categories. An average hospital may be misclassified as above or below average (Type I error), and an above- or below-average hospital may be misclassified as average (Type II error). Given the lack of consensus on the ideal balance of types of error, a consideration is what classification outcomes are perceived as acceptable for public reporting. For example, from the hospital’s perspective, misclassification of a below-average hospital score to average, or an average hospital score to above average would be desirable. Yet, from a public-policy perspective, misclassification of hospitals as average means less discrimination among the hospitals and less opportunity for consumers to discern differences. Current public reports of hospital performance measures have detailed the ranking methodology used. At the national level, the Leapfrog Group and Health Grades each uses symbols, circles for the Leapfrog reports on patient safety23 and stars (one, three, or five) for the Health Grades reports on patient outcomes (mortality
576
October 2004
and complications).24 The Alliance, a Wisconsin employer health care cooperative, promotes the reporting of Leapfrog Group results,25 and the Wisconsin Hospital Association sponsors a voluntary program for reporting hospital scores on consensus clinical measures.26 The Pacific Business Group on Health reports the California Patients’ Evaluation of Performance (PEP-C) Survey data using one, two, and three stars.27 Despite this growing body of reports of clinical processes and outcomes measures for hospitals and the extensive work with CAHPS reporting of patient experience for health plans, there is no standardized methodology for assigning and reporting comparative ratings of patient perceptions of hospital care. The National Quality Forum has established consensus on hospital clinical measures but has not yet applied the consensus approach to hospital patient experience measures or issues in the display of hospital measures.28*
Summary and Policy Implications Nine public reports that compare hospitals on patient satisfaction or experience with care were identified. Although the approaches for sampling patients were similar, the use of different vendors meant different questionnaires and scoring of responses. All nine reports used a normative comparison for reporting each hospital’s score, but the methods for assigning any hospital to a comparison category varied in important ways. Public reporting of comparative patient satisfaction data for hospitals is still in its relative infancy. As this segment of public reporting by health care providers continues to develop and grow, methodological and data presentation issues will require consideration. For example, decisions should reflect all participants fairly. However, does this mean that risk adjustment of scores should be used, or should data be based on the actual case mix of each facility to better reflect the patient population being served? To make the reports understandable and useful to the public, should data displays include both symbols and actual scores, and how should ranking categories be determined? * An excellent and extensive overview of topics and resources relevant to designing a health care quality report for consumers, Talking to Consumers about Health Care Quality, can be found at http://www.talkingquality.gov (last accessed Jul. 27, 2004).
Volume 30 Number 10
Copyright 2004 Joint Commission on Accreditation of Healthcare Organizations
Joint Commission
Journal on Quality and Safety
In the absence of a standard benchmark and until there is a national normative database for hospital patient satisfaction, other critical questions remain for public reporting: What normative comparisons should be used for hospitals, and how should decisions be made about the specifics of comparative approaches (for example, source of the normative score, use and size of a tolerance region around the normative mean, and use and size of a confidence interval around each hospital’s score)? If data reporting is overly homogenized in an attempt to portray fair comparisons among hospitals, the effectiveness of the reports for use by the public might be limited. To date, there have been no completed evaluations on public response to the nine reports included in this review. Although systematic public feedback will help when
considering these reporting issues, decision makers at local and national levels will have to wrestle with the answers. J The work on which this manuscript is based was carried out by Qualidigm, as part of its contract with the Rhode Island Department of Health. The authors appreciate the helpful feedback on earlier versions from Tierney Sherwin, Thomas Van Hoof, Yun Wang, and Anne Zauber.
Judith K. Barr, Sc.D., is Senior Scientist, Qualidigm, Middletown, Connecticut. Sara Banks, Ph.D, is Senior Health Information Analyst, Quality Partners of Rhode Island, Providence, Rhode Island. William J. Waters, Ph.D., is Deputy Director, Rhode Island Department of Health, Providence. Marcia Petrillo, M.A., is Chief Executive Officer, Qualidigm. Please address correspondence to Judith K. Barr, Sc.D.,
[email protected].
References 1. Centers for Medicare & Medicaid Services: Hospital Quality Initiative. http://www.cms.hhs.gov/quality/hospital (last accessed Jul. 27, 2004). 2. Institute of Medicine: Envisioning the National Health Quality Report. Washington, D.C.: National Academy Press, 2001. 3. Cleary P.D., et al.: Patients evaluate their hospital care: A national survey. Health Aff 10:254–267, Winter 1991. 4. Mehrotra A., Bodenheimer T., Dudley R.A.: Employers’ efforts to measure and improve hospital quality: Determinants of success. Health Aff 22:60–71, Mar.–Apr. 2003. 5. Barr J.K., et al.: Public reporting of hospital patient satisfaction: The Rhode Island Experience. Health Care Financ Rev 23:51–70, Summer 2002. 6. Morales L.S., et al.: Differences in CAHPS® adult survey reports and ratings by race and ethnicity: An analysis of the National CAHPS® benchmarking data 1.0. Health Serv Res 36:595–617, Jul. 2001. 7. Finkelstein B.S., et al.: Patient and hospital characteristics associated with patient assessments of hospital obstetrical care. Med Care 36:AS68–AS78, Aug. 1998. 8. Hargraves J.L., et al.: Adjusting for patient characteristics when analyzing reports from patients about hospital care. Med Care 39:635–641, Jun. 2001. 9. Rogut L., Newman L.S., Cleary P.D.: Variability in patient experiences at 15 New York City hospitals. Bull N Y Acad Med 73:314–334, Winter 1996. 10. Rosenheck, R., Wilson N.J., Meterko M.: Influence of patient and hospital factors on consumer satisfaction with inpatient mental health treatment. Psychiatr Serv 48:1553–1561, Dec. 1997. 11. Nguyen Thi P.L., et al.: Factors determining inpatient satisfaction with care. Soc Sci Med 54:493–504, Feb. 2002. 12. Young G.J., Meterko M., Desai K.R.: Patient satisfaction with hospital care: Effect of demographic and institutional characteristics. Med Care 38:325–334, Mar. 2000.
October 2004
13. Zaslavsky A.M., et al.: Does the effect of respondent characteristics on consumer assessments vary across health plans? Med Care Res Rev 57:379–394, Sep. 2000. 14. Zaslavsky A.M., et al.: Impact of sociodemographic case mix on the HEDIS measures of health plan quality. Med Care 38:981–992, Nov. 2000. 15. Elliott M.N., et al.: Case-mix adjustment of the National CAHPS® benchmarking data 1.0: A violation of model assumptions? Health Serv Res 36:555–573, Jul. 2001 16. California Health Care Foundation: What Patients Think of California Hospitals: Results from the Patients’ Evaluation of Performance (PEP-C) Survey, 2003 Edition Technical Report. http://www.chcf.org/documents/consumer/PEPCTechReport.pdf (last accessed on Jul. 27, 2004). 17. Goldstein E., Fyock J.: Reporting of CAHPS® quality information to Medicare beneficiaries. Health Serv Res 36:477–488, Jul. 2001. 18. Harris-Kojetin L.D., et al.: Creating more effective health plan quality reports for consumers: Lessons from a synthesis of qualitative testing. Health Serv Res 36:447–476, Jul. 2001. 19. Hibbard J.H., et al.: Making health care quality reports easier to use. Jt Comm J Qual Improv 27:591–604, Nov. 2001. 20. McGee J.: Writing and Designing Print Materials for Beneficiaries: A Guide for State Medicaid Agencies. Baltimore: Health Care Financing Administration, Centers for Medicaid and State Operations. HCFA Publication No. 10145. Oct. 1999. 21. National Committee for Quality Assurance (NCQA): Ten Steps to a Successful Report Card Project: Producing Comparative Health Plan Reports for Consumers. Washington, D.C.: NCQA, Oct. 30, 1998. 22. Vaiana M.E. and McGlynn E.A.: What cognitive science tells us about the design of reports for consumers. Med Care Res Rev 59:3–35, Mar. 2002.
continued
Volume 30 Number 10
Copyright 2004 Joint Commission on Accreditation of Healthcare Organizations
577
Joint Commission
Journal on Quality and Safety
References, continued 23. The Leapfrog Group for Patient Safety: Survey Results. http://www.leapfroggroup.org/consumer_intro1.htm (last accessed Jul. 27, 2004). 24. HealthGrades®: Hospital Report CardsTM Methodology. http://www.healthgrades.com/public/index.cfm?fuseaction=mod&mod type=content&modact=Hrc_Methodology (last accessed Jul. 27, 2004). 25. Quality CountsTM: Consumer Information for Better Health Care. http://www.qualitycounts.org/quality_reports.htm (last accessed Jul. 27, 2004).
26. CheckPoint SM: Wisconsin Hospitals Accountable for Quality. http://www.wicheckpoint.org (last accessed Jul. 27, 2004). 27. HealthScope: Hospital Ratings. http://www.healthscope.org/ Interface/hospitals/default.asp (last accessed Aug. 19, 2004). 28. National Quality Forum (NQF): A Comprehensive Framework for Hospital Care Performance Evaluation: A Consensus Report. Washington, D.C.: NQF, 2003. http://www.qualityforum.org (last accessed May 10, 2004).
Appendix. List of Public Reports Reviewed 1. Rhode Island Department of Health report (RI) Providence, RI: Rhode Island Department Of Health, 2001. A Report of Patient Satisfaction with Hospital Care in Rhode Island. http://www.health.ri.gov/chic/ performance/quality/quality10.pdf (last accessed Jul. 28, 2004). 2. California Institute for Health Systems Performance/ California HealthCare Foundation report (CA) Oakland, CA: California HealthCare Foundation and California Institute for Health Systems Performance, 2001. Results from the Patients’ Evaluation of Performance (PEP-C) Survey: What Patients Think of California Hospitals. http://www.chcf.org/documents/consumer/ PEPCTechReport.pdf.* 3. Massachusetts Health Quality Partnership report (MA) Watertown, MA: Massachusetts Health Quality Partners, Inc., 1998, Statewide Patient Survey Project. http://www.mhqp.org/statewidesurvey.html.
4. Ontario Hospital Association report (ONT) Toronto, Canada: Ontario Hospital Association: Hospital Report 2001: Acute Care. http://secure.cihi.ca/hreports/ research/HospitalReport2001.pdf 5. Niagara (Western New York) Health Quality Coalition report (WNY) Niagara Health Quality Coalition, 2002: Quality of Care as Reported by Hospital Patients. http://www.myhealthfinder.com/hospital_care/ Quality_Reports/Patient_Survey/April%202004/ picker_introduction_winter1_new.htm.† Four Hospital Profiling Project reports:‡ 6. Southeast Michigan (SEMI) 7. Buffalo (BUF) 8. Indianapolis (IND) 9. Cleveland (CLE)
* The Web site has been updated with the 2003 report, and the 2002 report is no longer available. Last accessed July 29, 2004. †
The Web site has been updated with the 2004 report, and the 2002 report is no longer available. Last accessed July 29, 2004.
‡
The hospital data are no longer available to the public, and the Web site reporting has been temporarily discontinued. Personal communication between the author [J.K.B.] and Diane L. Bechel, Dr.P.H., Pharmacy Benefits Manager, Ford Motor Company, Dearborn, MI, Aug. 9, 2004.
578
October 2004
Volume 30 Number 10
Copyright 2004 Joint Commission on Accreditation of Healthcare Organizations