458
Journal of Pain and Symptom Management
Vol. 23 No. 6 June 2002
Original Article
A Scale for Measuring Patient Perceptions of the Quality of End-of-Life Care and Satisfaction with Treatment: The Reliability and Validity of QUEST Daniel P. Sulmasy, OFM, MD, PhD, Jessica M. McIlvane, PhD, Peter M. Pasley, MD, and Maike Rahn, MS The John J. Conley Department of Ethics (D.P.S., P.M.P.), Saint Vincent’s Catholic Medical Center Manhattan, New York, New York; Institute for Social Research (J.M.), University of Michigan, Ann Arbor, Michigan; and The Division of Nutritional Sciences (M.R.), Cornell University, Ithaca, New York, USA
Abstract We report on the adaptation and evaluation of a previously developed patient-centered instrument that we call the Quality of End-of-life care and Satisfaction with Treatment (QUEST) scale. In a separate group of 30 inpatients, test–retest reliability for QUEST items ranged from 63% agreement (kappa 0.43) to 93% agreement (kappa 0.86) and construct validity was evidenced by correlations with a somewhat related satisfaction scale ranging from 0.38 to 0.47. QUEST was then administered to 206 consecutive medical inpatients (or their surrogates) with DNR orders and to a comparison group of 51 medical inpatients without DNR orders at 2 academic medical centers. Among these main study patients, internal consistency was reflected by Cronbach alphas of 0.88 to 0.93. QUEST scores showed modest inverse correlations with severity of symptoms, but were uncorrelated with severity of illness, anxiety, or depression, suggesting an appropriate relationship to symptom control but divergence of the underlying construct from degree of physical illness or affective state. QUEST scores were lower for patients with DNR orders compared to those without DNR orders ( P 0.02 to 0.06). Surrogate ratings of satisfaction and quality were uncorrelated with patient ratings. Although preliminary, these findings suggest that QUEST may be useful in assessing quality and satisfaction with the care rendered by physicians and nurses to hospitalized patients at the end of life. J Pain Symptom Manage 2002;23:458–470. © U.S. Cancer Pain Relief Committee, 2002. Key Words Satisfaction, quality, palliative care, validity, reliability, end of life, physicians, nurses
Introduction Address reprint requests to: Daniel P. Sulmasy, OFM, MD, PhD, The John J. Conley Department of Ethics, Saint Vincent’s Manhattan, 153 W. 11th Street, New York, NY 10011, USA. Accepted for publication: September 4, 2001. © U.S. Cancer Pain Relief Committee, 2002 Published by Elsevier, New York, New York
While the public and many health care professionals are clamoring for improved care for the dying, there are few tools available for assessing the care of these patients.1,2 Most instruments for measuring quality of care and 0885-3924/02/$–see front matter PII S0885-3924(02)00409-8
Vol. 23 No. 6 June 2002
The Reliability and Validity of QUEST
satisfaction with care have been developed for other contexts, with very few specifically designed for the terminally ill. Important quality measures in this population will certainly include the control of symptoms. Process measures, such as whether certain services are delivered, have a special role in end-of-life care. But a vital and yet often neglected aspect of the assessment of the care of the dying is the perspective of the patients, especially regarding the quality of the interpersonal interactions between them and their health care professionals. In this article, we report on the adaptation and psychometric assessment of scales by which dying patients might rate the quality of the care they have received from their doctors and nurses, and their satisfaction with that care. We report on the content of these scales and their validation, the range of responses elicited, their internal consistency, their test–retest reliability, their capacity to discriminate underlying differences in care, and the validity of surrogate reports of patient assessments of quality and satisfaction. We have named this instrument QUEST—the Quality of End-of-Life care and Satisfaction with Treatment scale. We also report on the validity of surrogate assessment of patient symptoms, and the relationship between these symptoms and QUEST scores.
Methods Setting and Subjects Our patients were all drawn from the internal medicine inpatient services of two teaching hospitals—Georgetown University Hospital in Washington, DC and St. Vincent’s, Manhattan. We attempted to enroll consecutive patients with Do Not Resuscitate (DNR) orders, a marker for patients likely to die in the hospital.3,4 Interviews were conducted between 2 and 7 days after the DNR order, in either English or Spanish, and the Spanish language version was verified by back-translation. Patients were excluded if they were less than 18 years of age, clinically lacked decision making capacity, scored less than 12 out of 20 on the reduced set mini-mental score,5 or screened positive for delirium using the Confusion Assessment Method.6 We also attempted to interview the family member or friend most closely associated with the patient’s inpatient care. A comparison group
459
of 50 medical inpatients or their surrogates was selected at Saint Vincent’s to undergo the full battery of instruments. A further, separate convenience sample of 30 inpatients at Saint Vincent’s was selected for assessing test–retest reliability and construct validity. The study was approved by the Institutional Review Boards of both institutions and all participating patients and surrogates gave informed consent.
Development of the QUEST Instrument We based our scales upon previous work of Matthews and Feinstein, who had developed instruments to elicit inpatients’ appraisals of physician performance.7 We concentrated upon the two subscales that seemed most appropriate to the interpersonal aspects of care—their “Availability” subscale and their “Attentiveness to the Individual Patient” subscale. They had constructed and validated these “clinimetric”8 instruments from qualitative patient interviews in the late 1980s.9,10 We asked a focus group of recently discharged patients with DNR orders to rate the importance and relevance of the questions from these subscales and to select those that seemed most relevant to their situations. By group consensus, they chose 9 items regarding the quality of the interpersonal aspects of care— questions asking patients to rate the frequency of particular behaviors or practice styles. They also selected 6 items regarding patients’ satisfaction with care. We then adapted these questions to be able to ask them regarding nurses as well as doctors and to ask for ratings from family or friends as well as from patients. These items were presented again to a focus group of outpatients with DNR orders for further refinement of the wording. The interview schedule was pilot tested with 21 inpatients and presented to the Faculty Scholars of the Project on Death in America to confirm face validity before settling upon the ultimate version in 1998. Quality was rated using a 5-point scale to assess how often particular behaviors or styles of care were true of physicians or nurses—from never to always—regarding the doctors or nurses taken as a whole over the previous two days. Items included how often the doctors or nurses “spent enough time with you,” “arrived late,” were “hard to reach,” “seemed distracted,” were “willing to listen,” treated you “more as a case of disease than as a person,” “showed concern,” “ig-
460
Sulmasy et al.
nored your feelings,” and “responded quickly in time of need.” Similarly, satisfaction was rated using a 5-point Likert scale ranging from “very dissatisfied” to “very satisfied.” Items on the scale included bedside manner, common courtesy, way of talking, technical skills, and overall satisfaction. The complete instrument is included as an Appendix. The surrogate instrument is also available.
Other Instruments Administered The QUEST instrument was administered to the main study group of 257 patients as part of a 30-minute battery of instruments. The symptom severity scales were similar to commonly used assessments of multiple symptoms at the end of life,11 and assessed the severity of 9 symptoms using the categorical responses of absent, mild, moderate, and severe. The symptoms assessed were dyspnea, restlessness, nausea, constipation, fever, fecal incontinence, dry mouth, urinary incontinence, and pain. We also administered Folstein’s Reduced-Set MiniMental Status test5 and the British Hospital Anxiety and Depression Scale.12 The interviewer rated the patient’s degree of delirium using Inouye’s Confusion Assessment Method.6 Because of the severity of the illness of these patients, all questions were asked in verbal interviews, even when the instrument could be selfadministered. Charts were reviewed to obtain demographic information and APACHE-III severity of illness scores.13
Analysis Construct validity and test–retest reliability for the QUEST instrument were tested on a separate convenience sample of 30 seriously ill medical patients at St. Vincent’s (10 with malignancy, 10 with HIV, and 10 with cardiopulmonary disease) who had been hospitalized at least two days. They were administered only the QUEST and the Patient Satisfaction Index (PSI) of Guyatt et al.,14 and the QUEST was repeated with these patients within 1 to 3 days. We employed kappa scores to assess test–retest reliability. We also used the Pearson correlation between the QUEST and the PSI in these patients to assess convergent construct validity, using case-mean substitution for PSI responses of “not applicable.” We assessed the range, Cronbach alpha, and univariate associations for each QUEST scale
Vol. 23 No. 6 June 2002
among the main study patient population of 257 patients from both New York and Washington, DC, including the comparison group without DNR orders. Because the scale scores were not normally distributed, we performed a ranktransformation in order to conduct parametric analyses.15 Univariate associations between clinical and demographic factors and the dependent variables were assessed using Pearson correlations for the QUEST scales, Spearman correlations for the symptom scales, chi-square tests for dichotomous variables and t-tests, as appropriate. To assess the capacity of the scales to discriminate underlying differences in care, we used the comparison group of 50 patients without DNR orders at Saint Vincent’s and compared their QUEST scores with those of the Saint Vincent’s patients with DNR orders. To assess the validity of the surrogate’s responses on behalf of patients, we selected all 23 cases in which we had paired ratings by both a cognitively capable patient and a surrogate, comparing responses using Pearson correlations for the QUEST scales and Spearman rank correlations for the Symptom Severity scales.
Results Test–Retest Reliability We examined the test–retest reliability of the QUEST scales by administering them twice to a convenience sample of 30 seriously ill hospitalized medical patients: 10 with malignancy, 10 with HIV, and 10 with either COPD or CHF. Their mean age was 55.1 years, 73% were men, and 70% were white. Because the QUEST instrument assesses satisfaction and quality regarding “the last two days,” we needed to readminister the instrument rapidly after the first administration. The second administration occurred the next day for 25 (83%) of these 30 patients, and within 3 days for the other 5 patients. Individual item agreement rates between the two administrations ranged from a low of 63% for the nursing bedside manner item (kappa 0.43, P 0.001) to a high of 93% for the nurse treated you more like a disease than a person item (kappa 0.86, P 0.001). Overall, the MD Quality scale agreement was 80% (kappa 0.61, P 0.001); MD Satisfaction 83% (kappa 0.65, P 0.001); Nurse Quality 78% (kappa 0.68, P 0.001); Nurse Satisfaction 73% (kappa 0.58, P 0.001).
Vol. 23 No. 6 June 2002
The Reliability and Validity of QUEST
Construct Validity Among these same 30 medical inpatients, we also measured the correlation between each of the four QUEST scales and the Patient Satisfaction Index (PSI) of Guyatt et al.,11 co-administered on the baseline day, as an assessment of convergent construct validity. We excluded one case because more than 20% of this individual’s responses on the PSI were “not applicable.” The correlations are shown in Table 1. All were statistically significant and ranged between 0.38 and 0.47.
Main Study Patient Population In the main study, we identified 509 consecutive patients with DNR orders at the two institutions. In 234 cases (46%), we were unable to interview either the patient or a surrogate because of logistical factors (such as inability to reach a surrogate for a cognitively impaired patient within the 5-day window period, or patient death or discharge before an interview could be set up). Of the 275 cases in which we could approach a patient or a surrogate, 69 (25%) refused. We completed an interview with either a patient or a surrogate for 206 patients with DNR orders (40% of all eligible patients). For the comparison group, we identified 114 general medical patients without DNR orders who had been in the hospital between 3 and 8 days. Of these, 22 (19%) could not be interviewed for logistical reasons such as hospital discharge. Of the 92 cases in which we could approach a patient or surrogate, 41 (45%) refused. We completed an interview for 51 patients without DNR orders (48% of all eligible patients). Participants did not differ significantly from non-participants in age, sex, race, or diagnosis at either the Washington, DC, or New York
Table 1 Construct Validity of QUEST: Correlation of QUEST Scales with the Patient Satisfaction Index (PSI) for a Sample of 29 Medical Inpatients QUEST Scale MD care MD satisfaction RN care RN satisfaction
Correlation with PSI 0.46 (P 0.01) 0.47 (P 0.01) 0.46 (P 0.01) 0.38 (P 0.04)
These patients were administered only the QUEST and the PSI. One of the 30 patients was excluded because of excessive “not applicable” responses on the PSI.
461
City sites. However, there was a trend for Washington non-participants with DNR orders to be more likely to be female (72% vs. 58% P 0.06). An additional 7 patients who had been deemed cognitively capable of participating by their physicians scored less than 12 out of 20 on the reduced set mini-mental status exam and were excluded from all analyses involving patient interviews. Patient QUEST data was available for 84 cognitively capable patients and surrogate QUEST data was available for 195 patients. Paired interviews between cognitively capable patients and their surrogates were available for only 23 patients. The characteristics of our main patient population are shown in Table 2, broken down by patients with DNR orders (n 206) and the comparison group of patients without DNR orders (n 51). Taking the group as a combined whole, the average age was 68 years, with a range from 27 to 101. Men (42%) were slightly outnumbered by women. Non-white minorities constituted 41% of the sample. While onequarter of the patients had cancer, there was a
Table 2 Main Study Patient Characteristics Terminally Ill Patients (n 206) Percent New York Mean age (years) Percent men Race (%) White African-American Latino Asian Other Diagnosis (%) Malignancy HIV Cardiopulmonary Other Reduced-set Mini-Mental Anxiety Depression APACHE-APS
Comparison Group (n 51)
49% 71.4 38.1%
98% 53.9a 62.7b
59.6% 27.1% 7.4% 4.9% 1.0%
60.8% (NS) 21.6% 9.8% 7.8% 0%
29.5% 12.0% 30.0% 28.5% 16.4 8.0 7.3 33.7
9.8%c 17.6% 44.1% 27.5% 18.8b 7.0 (NS) 5.63 (NS) 13.7a
Maximum reduced set mini-mental score is 23—higher scores indicate better mental status; Depression and Anxiety scores are from the Hospital Anxiety and Depression scales—higher scores indicate more symptoms; APACHE-APS is the Acute Physiology Score of APACHE-III—higher scores indicate more severe illness. NS not significant. a0.001. b0.01. c0.05.
462
Sulmasy et al.
wide variety of diagnoses, with HIV and cardiovascular diseases the most significant other categories. The patients were quite ill, with a mean APACHE III Acute Physiology score of 30, with a range from 3 to 93. Patients with DNR orders were older, more likely to be women and to have cancer, and were more cognitively impaired and more severely ill than the comparison group.
Vol. 23 No. 6 June 2002
the ratings of physician and nurse quality and the means of the ratings of satisfaction with physicians and nurses. Shown in Figure 1 are the QUEST scale scores for patient ratings of the quality of physicians’ care. While still skewed, the scores ranged from 2.50 to 5.00—a broader distribution for the summed scale than for the individual items. All the QUEST scales gave similar results. Shown in Figure 2, as another example, is the distribution of QUEST scores for the surrogate ratings of satisfaction with nurses. As can be seen, the scores ranged from 1.25 to 5.00. While this distribution is also skewed, the conversion to a scale produces a more fine-grained and broader distribution than responses to individual items.
Range and Distribution of Individual QUEST Items Shown in Table 3 are the responses for individual items for the patient ratings of the quality of physician and nursing care. While positively skewed, the individual responses showed a fairly wide distribution for scales of this sort. For example, 48.2% of patients were willing to report that physicians had not always spent enough time with them. Table 4 shows patient responses for individual items for the patient ratings of their satisfaction with physicians and nurses. The individual responses also showed a fairly wide distribution. For example, almost 50% of the patients were willing to rate their overall satisfaction with the nurses as less than very satisfied. We created 8 individual QUEST subscales (4 based on ratings by patients and 4 based on ratings by surrogates) by computing the means of
Internal Consistency The QUEST scales showed excellent internal consistency, with Cronbach alpha scores of 0.83 for patient ratings of physician quality and 0.88 for nurse quality. Cronbach’s alpha was 0.88 for patient ratings of their satisfaction with physicians and 0.95 for their satisfaction with nurses. Similarly, Cronbach alpha scores for surrogate ratings of quality were 0.88 for physicians and 0.91 for nurses. They were 0.92 for surrogate’s satisfaction with the care delivered by physicians and 0.93 for their satisfaction with nurses. In addition, each of the four patient QUEST
Table 3 Patient Ratings of the Quality of Physician and Nurse Care
Physicians Spent enough time Arrived late Hard to reach Seemed distracted Willing to listen Treated as disease, not person Showed concern Ignored feelings Responded quickly Nurses Spent enough time Performed duties timely Hard to reach Seemed distracted Willing to listen Treated as disease, not person Showed concern Ignored feelings Responded quickly
Never n (%)
Rarely n (%)
Sometimes n (%)
Usually n (%)
Always n (%)
2 (2.4) 42 (56.8) 52 (65.0) 60 (72.3) 1 (1.2) 61 (74.4) 3 (3.6) 58 (70.7) 2 (2.6)
4 (4.8) 12 (16.2) 10 (12.5) 8 (9.6) 2 (2.4) 8 (9.8) 0 (0) 9 (11.0) 1 (1.3)
8 (9.6) 15 (20.3) 15 (18.8) 10 (12.0) 8 (9.6) 6 (7.3) 8 (3.1) 10 (12.2) 6 (7.7)
26 (31.3) 5 (5.4) 1 (1.3) 3 (3.6) 12 (14.5) 3 (3.7) 23 (27.4) 2 (2.4) 20 (25.6)
43 (51.8) 1 (1.4) 2 (2.5) 2 (2.4) 60 (72.3) 4 (4.9) 50 (59.5) 3 (3.7) 49 (62.8)
1 (1.2) 1 (1.2) 40 (49.4) 49 (59.8) 3 (3.6) 58 (70.7) 2 (2.4) 61 (74.4) 4 (4.8)
2 (3.6) 4 (4.9) 15 (18.5) 13 (15.9) 5 (6.0) 12 (14.6) 1 (1.2) 9 (11.0) 3 (3.6)
17 (20.2) 12 (4.7) 17 (21.0) 12 (14.6) 11 (13.3) 6 (7.3) 13 (15.5) 7 (8.5) 15 (17.9)
23 (27.4) 24 (29.3) 3 (3.7) 3 (3.7) 18 (21.7) 1 (1.2) 20 (23.8) 3 (3.7) 21 (25.0)
41 (48.8) 41 (50.0) 6 (7.4) 5 (6.1) 46 (55.5) 5 (6.1) 48 (57.1) 2 (2.4) 41 (48.8)
This includes only cognitively capable patients for whom QUEST data were available (n 84). Since there are missing data, the total n is not 84 for all items.
Vol. 23 No. 6 June 2002
The Reliability and Validity of QUEST
463
Table 4 Patient Ratings of Satisfaction with Physicians and Nurses
Physicians Bedside manner Courtesy Way of talking Technical skills Concern Overall Nurses Bedside manner Courtesy Way of talking Technical skills Concern Overall
Very Dissatisfied n (%)
Dissatisfied n (%)
Neutral n (%)
Satisfied n (%)
Very Satisfied n (%)
0 (0.0) 1 (1.2) 1 (1.2) 0 (0.0) 0 (0.0) 1 (1.2)
2 (2.4) 1 (1.2) 1 (1.2) 0 (0.0) 3 (3.6) 1 (1.2)
5 (6.0) 3 (3.6) 6 (7.1) 4 (4.8) 6 (7.1) 2 (2.4)
25 (9.7) 17 (20.2) 21 (25.0) 26 (31.3) 28 (33.3) 30 (35.7)
52 (61.9) 62 (73.8) 55 (65.5) 53 (63.9) 47 (56.0) 50 (59.5)
1 (1.2) 1 (1.2) 2 (2.4) 0 (0.0) 1 (1.2) 4 (4.8)
2 (2.4) 3 (3.6) 2 (2.4) 1 (1.2) 5 (6.0) 2 (2.4)
4 (4.8) 5 (6.0) 4 (4.8) 6 (7.1) 4 (4.8) 4 (4.8)
2 (26.2) 23 (27.4) 25 (29.8) 26 (31.1) 24 (28.6) 28 (33.3)
55 (65.5) 52 (61.9) 51 (60.7) 51 (60.7) 50 (59.5) 46 (54.8)
This includes only cognitively capable patients for whom QUEST data were available (n 84). Since there are missing data, the total n is not 84 for all items.
subscales was highly correlated with the others. The Pearson correlation coefficients were: 0.69 (P 0.001) for physician with nursing quality, 0.65 (P 0.001) for physician quality with physician satisfaction, 0.47 (P 0.001) for physician quality with nursing satisfaction, 0.48 (P 0.001) for physician satisfaction with nursing quality, 0.55 (P 0.001) for physician satisfaction with nursing satisfaction, and 0.68 (P 0.001) for nursing quality with nursing satisfaction.
no consistent or important associations with QUEST scores.
QUEST and Symptoms
Among the patients who completed the QUEST as part of the full study, we measured univariate factors associated with QUEST scores. The results are shown in Table 5. As can be seen, age, sex, race, anxiety score, depression score, diagnosis, and severity of illness, as measured by the APACHE-III APS score, showed
Table 6 displays associations between QUEST scores and symptoms. This correlation could only be assessed for the subset of 84 patients for whom we had both symptom assessment and QUEST scores from the patients themselves. As can be seen, more severe symptoms were correlated with worse ratings of quality and satisfaction. There were no statistically significant correlations between QUEST scores and the symptoms of dyspnea, nausea, feverishness, and urinary incontinence. The severity of all other symptoms was modestly yet significantly correlated with at least one of the QUEST scores (Spearman rank correlations of between 0.22 and 0.33). Among the 9 symptoms, fecal incontinence stands out as sig-
Fig. 1. Patient rating of quality of care from doctor.
Fig. 2. Surrogate rating of satisfaction with nurse.
Tests of Association with Unrelated Constructs
464
Sulmasy et al.
Vol. 23 No. 6 June 2002
Table 5 Associations Between Patient QUEST Scores and Clinical and Demographic Factors
Age Sex Race Anxiety score Depression score APACHE-APS
MD Quality
MD Satisfaction
RN Quality
RN Satisfaction
r 0.08 P NS t 0.93 P NS t 1.10 P NS r 0.03 P NS r 0.08 P NS r 0.14 P NS
r 0.11 P NS t 1.63 P NS t 1.25 P NS r 0.13 P NS r 0.13 P NS r 0.08 P NS
r 0.07 P NS t 0.63 P NS t 1.24 P NS r 0.21 P .06 r 0.19 P .08 r 0.10 P NS
r 0.04 P NS t 0.43 P NS t 0.90 P NS r 0.08 P NS r 0.16 P NS r 0.07 P NS
This includes only cognitively capable patients for whom QUEST data were available (n 84). r Pearson correlation coefficient
nificantly associated with worse scores on all four QUEST scales.
Sensitivity Another important aspect of a good qualityof-care or satisfaction scale is its ability to detect differences between populations that differ in the quality of the care they receive. We had hypothesized (on the basis of a number of anecdotal accounts) that patients with DNR orders would receive a lower quality of interpersonal care from physicians and nurses. We therefore tested this hypothesis by comparing the 39 cognitively capable non-DNR (comparison group) patients at St. Vincent’s who had patient QUEST scores with the 22 cognitively capable DNR patients at St. Vincent’s who had patient QUEST scores. We also compared their surrogates’ ratings. As shown in Table 2, the patients with DNR orders tended to be more likely to have cancer, and were older and more severely ill. However,
because none of these factors was associated with the QUEST scores (Table 5), we did not perform statistical adjustments. As shown in Table 7, surrogate QUEST scores did not differ significantly and did not discriminate between patients with and without DNR orders. However, patient QUEST scores were lower for patients with DNR orders. Patients with DNR orders were less satisfied with their physicians (4.31 vs. 4.67) and nurses (4.21 vs. 4.53). Patients with DNR orders also gave lower ratings for the quality of their interpersonal interactions with their physicians (4.12 vs. 4.57) and with their nurses (3.96 vs. 4.38).
Validity of Surrogate QUEST Responses There were only 23 cases in which we obtained valid matched responses from both cognitively capable patients and their surrogates. We estimated the validity of surrogates’ ratings of quality and satisfaction by measuring the Pearson correlation between surrogates’ and
Table 6 Association Between Patient QUEST Scores and Patient Symptom Severity
Dyspnea Pain Restlessness Nausea Constipation Feverishness Fecal incontinence Urinary incontinence Dry mouth
MD Quality
MD Satisfaction
RN Quality
RN Satisfaction
0.18 0.16 0.23a 0.06 0.19 0.13 0.22a 0.20 0.03
0.0 0.25a 0.29b 0.10 0.15 0.08 0.27a 0.16 0.11
0.05 0.17 0.19 0.11 0.07 0.00 0.33b 0.16 0.03
0.02 0.09 0.21 0.14 0.31b 0.04 0.23a 0.11 0.24a
This includes only cognitively capable patients for whom QUEST data were available (n 84). Numbers in the table are Spearman’s rho correlation coefficients. aP 0.05. bP 0.01.
Vol. 23 No. 6 June 2002
The Reliability and Validity of QUEST
Table 7 Patient and Surrogate QUEST Scores Comparing Patients With and Without DNR Orders Non-DNR SVH (n 39)
DNR-SVH (n 22)
P-value for ranked data t-test
4.10
4.13
0.86
4.32
4.49
0.71
4.34
4.22
0.20
4.19
4.41
0.93
4.57
4.12
0.02
4.67
4.31
0.02
4.38
3.96
0.03
4.53
4.21
0.06
Surrogate MD quality Surrogate MD satisfaction Surrogate RN quality Surrogate RN satisfaction Patient MD quality Patient MD satisfaction Patient RN quality Patient RN satisfaction
Raw means before rank transformation are displayed.
patients’ scores on each of the four rank-transformed QUEST scales. The correlations were 0.16 for satisfaction with MD care (P 0.48), 0.06 for satisfaction with nursing care (P 0.79), 0.48 for quality of MD care (P 0.02), and 0.17 for quality of nursing care (P 0.43). In McNemar testing, there was no systematic tendency for surrogates to overestimate or to underestimate the ratings of quality and satisfaction with care elicited directly from patients.
Symptom Severity Scales As shown in Table 8, patients had significant symptom burdens. Those for whom both patient and surrogate interviews were obtained (n 23) did not differ in their symptoms from
Table 8 Terminally Ill Patients’ Self-Reports of Symptom Severity Absent Mild Moderate Severe n (%) n (%) n (%) n (%) Dyspnea Pain Restlessness Nausea Constipation Feverishness Fecal incontinence Urinary incontinence Dry mouth
15 (33) 18 (40) 14 (31) 26 (58) 29 (64) 31 (71) 32 (74) 31 (71) 11 (24)
12 (26) 7 (16) 10 (22) 11 (24) 4 (9) 9 (21) 9 (21) 2 (5) 10 (22)
8 (17) 11 (24) 10 (22) 10 (22) 14 (31) 7 (16) 5 (11) 3 (7) 4 (9) 8 (18) 1 (2) 3 (7) 1 (2) 1 (2) 6 (14) 5 (11) 8 (18) 16 (36)
Includes only the DNR patients who were cognitively capable and for whom symptom data were available (n 46).
465
Table 9 Surrogates’ Accuracy in Predicting the Severity of Patients’ Symptoms Symptom Dyspnea Pain Restlessness Nausea Constipation Feverishness Fecal incontinence Urinary incontinence Dry mouth
Spearman’s Rho
P-value
0.63 0.23 0.44 0.62 0.49 0.61 0.65 0.11 0.56
0.001 0.31 0.03 0.002 0.02 0.003 0.001 0.60 0.007
Includes only cognitively capable patients with DNR orders for whom symptom data were available from both the patient and the surrogate (n 23).
those patients for whom surrogate interviews were not obtained. As shown in Table 9, surrogates estimated patients’ symptoms with moderate accuracy for most symptoms (Spearman correlations of 0.44 to 0.65). However, for urinary incontinence and pain, surrogates did not accurately represent patients’ symptoms, with Spearman correlations of only 0.11 (P 0.60) and 0.23 (P 0.31), respectively. In McNemar testing, there was a statistically non-significant trend for the surrogates to overestimate patients’ ratings of both urinary incontinence (P 0.13) and pain (P 0.18).
Discussion Most available instruments to measure quality and satisfaction with care at the end of life have been developed for other clinical contexts.18 The SUPPORT study used two 4-item scales to interview family members about satisfaction with care after the patient’s death, but these were never validated.19 It is not known how grief reactions, recall effects, and the need to bring closure may affect surrogates’ afterdeath responses.20 Further, the satisfaction of the family cannot be assumed to represent the satisfaction of the patient. Other investigators have concentrated on studying the satisfaction of patients in hospice settings.21 None have attempted to distinguish patient ratings of satisfaction from patient ratings of quality, and none have reported directly on the validity of family ratings of patients’ experiences. Our instrument is based on one originally developed for hospitalized patients, and we have adapted it specifically for use among hospitalized patients at the end of life. The QUEST
466
Sulmasy et al.
instrument concentrates upon the interpersonal aspects of the care delivered by health care professionals, an aspect of care that the literature22 as well as the terminally ill patients in our focus groups themselves identified as critically important at the end of life. In addition to its patient-centered, clinimetric validity,8 our results suggest that it has several other especially important psychometric properties. First, it is brief, taking only about 10 minutes. This is highly desirable for any instrument,23 but it seems particularly important in assessing patients at the end of life.18 Second, while patient responses were skewed, the QUEST instrument showed a good range, with approximately half giving ratings that were less than the highest category. Perhaps due to the cognitive dissonance of suggesting that one’s own choice of physician is imperfect, significant skew towards the highest ratings is common when measuring quality and satisfaction.24 Scaling the items further enhanced the variability, presenting opportunities to measure improvement if the instrument proves responsive. Third, it showed high internal reliability (or consistency) as a scale. The Cronbach alpha scores were excellent, ranging from 0.83 to 0.95 for the four QUEST scales. Fourth, it showed very good test–retest reliability with very good rates of agreement and moderate to high kappa scores for all individual items and for the overall scales. Fifth, our results have demonstrated evidence of convergent construct validity, since QUEST scores correlated modestly with the most closely related instrument we could find—the Patient Satisfaction Instrument (PSI). There was also a correlation between QUEST and the severity of some patient symptoms. The positive correlation between all of the QUEST scales and the PSI was moderate and statistically significant, ranging between 0.38 and 0.46. This suggests convergent construct validity. However, the PSI conflates patient ratings of quality and satisfaction in a manner that we think it conceptually useful to distinguish. The PSI was also designed for the longterm care setting, not the acute hospital setting. It is, therefore, not surprising that the correlations between QUEST and the PSI are not extraordinarily high. Sixth, we have arguably provided further evidence of construct validity because our patients’
Vol. 23 No. 6 June 2002
ratings of quality and satisfaction on the QUEST instrument correlated modestly with symptom burden but not with severity of illness. An association between satisfaction and severity of illness has been noted previously in settings other than end-of-life care.16,17 However, others have shown that quality of life correlates more strongly with symptoms than with severity of illness at the end of life.25,26 Patients at the end of life have a different outlook compared with other patients. Quality of life ratings can be high despite very severe illness.26 Yet, because patients can expect symptom control in spite of severe illness, it is logical that high symptom burdens would be associated with lower patient ratings of quality and satisfaction.27 We did not find correlations with anxiety or depression. This might also suggest another important measure of validity—discriminant (or, divergent) validity. That is to say, our instrument appears to be able to measure a construct that can be teased apart from the patient’s affective symptoms. In acute care settings, anxiety and depression have not been consistently associated with satisfaction.28,29 It is important that an instrument purporting to measure satisfaction and quality of care not be overly confounded by anxiety and depression. Seventh, it appears that the QUEST scales are sensitive to differences in quality and satisfaction, detecting differences between patients with and without DNR orders in our study. We confirmed our hypothesis that patients with DNR orders would rate the quality of care and satisfaction with care less favorably than patients without DNR orders, even when controlling for severity of illness.
Validity of Surrogate Responses Although our sample size for this analysis was small, our results suggest that surrogate assessments of quality and satisfaction with patient care do not validly represent the assessments of patients. In our study, only family assessments of the quality of physician care were modestly correlated with patients’ own assessments. Further, we found that families tended to overestimate patients’ degree of pain and urinary incontinence. Others have previously noted this tendency of families to inaccurately report patients’ symptom burdens,30 often overestimating.18,31 The SUPPORT study (which relied heavily upon surrogate responses),
Vol. 23 No. 6 June 2002
The Reliability and Validity of QUEST
devised an imputational strategy in order to estimate the patients’ degree of pain in the face of systematic bias in the reports of surrogates.32 To our knowledge, our study is the first direct report on the validity of surrogate estimates of quality and satisfaction at the end of life. It is not known what accounts for these discrepancies between surrogates and patients. As surrogates were more accurate in their assessments of symptoms such as dyspnea, it may be because these symptoms have a more objective, observable component compared with the intrinsic subjectivity of pain. We suspect that misinterpretation of the meaning of having an indwelling urinary catheter may have played a role in the poor correlation between patients and surrogates with respect to urinary incontinence. Alternatively, patients may be underreporting their symptoms due to embarrassment or other complex social factors. By contrast, there was no discernable pattern to the disagreements between patients and surrogates regarding quality and satisfaction. We suspect that this is because patient experiences of hospital care and surrogate experiences of hospital care for the patient are simply two important but different constructs. Good end-oflife care should address the needs of both patients and families, but our data suggest that the conflation of these domains may be a mistake. Loved ones are generally not in the hospital at 3 AM when the patient rings the call bell and help is slow in coming. Nor is the patient present in the waiting room when the family is anxiously awaiting news. Both are important aspects of care. But perhaps they cannot be captured in the same way.
Limitations This work is basic and preliminary. The instrument has only been developed and tested at two institutions, the DNR versus non-DNR comparison took place at only one institution, and it has been tested only among internal medicine inpatients. Further, although the sample size is not insubstantial, larger numbers would allow for more secure conclusions. It may be objected that this instrument will not be useful because it is so hard to interview patients at the end of life. However, we would argue that it is worth the effort, because serving the needs of the patient is vitally important, and if surrogates are inaccurate, there may be
467
no alternative but to speak directly with patients. It might also be objected that QUEST only provides a narrow view of quality, centered on interactions with health care professionals. However, one should note that we do not propose that QUEST should replace all other methods, but only that it should complement methods such as symptom assessment and process measures. Patients consider their interactions with health professionals an important part of the quality of their care and this should not be ignored.
Conclusions In these preliminary studies, the QUEST instrument has demonstrated excellent internal consistency and test–retest reliability. QUEST has good construct validity and was originally developed on the basis of clinimetric validity. It appears to be sensitive to underlying differences in care, distinguishing between the ratings of quality and satisfaction given by patients with and without DNR orders. However, surrogate ratings of the quality of patients’ care and their satisfaction with patients’ care are largely uncorrelated with patients’ own ratings using the QUEST instrument. These findings will need to be verified in other settings. It will also be important to determine whether these instruments are sensitive enough to detect improvements that might follow upon efforts to improve the quality of care of those at the end of life.
Acknowledgments Dr. Sulmasy’s work was supported by a Faculty Scholar’s award of the Open Society Institute and by a grant from the Altman Foundation. We are grateful to Vic Tolentino, MPH, JD, Sr. Grace Henke, SC, EdD, and Eric Marx, PhD for their assistance with patient interviews. Finally, we thank Dr. Dale Matthews for his advice and support.
References 1. Hearn J, Higginson IJ. Outcome measures in palliative care for advanced cancer patients: a review. J Pub Health Med 1997;19:193–199. 2. Teno JM, Byock I, Field MJ. Research agenda for developing measures to examine quality of care
468
Sulmasy et al.
and quality of life of patients diagnosed with life-limiting illness. J Pain Symptom Manage 1999;17:75–82. 3. DeJonge KE, Sulmasy DP, Gold KG, et al. The timing of do-not-resuscitate orders and hospital costs. J Gen Intern Med 1999;14:190–192. 4. Shepardson LB, Youngner SJ, Speroff T, Rosenthal GE. Increased risk of death in patients with do-not-resuscitate orders. Med Care 1999:37:727–737. 5. Wells JC, Chase GA, Aboraya A, Folstein MF, Anthony JC. Discriminant validity of a reduced set of mini-mental state examination items for dementia and Alzheimer’s disease. Acta Psychiatr Scand 1992; 86:23–31. 6. Inouye Skvan Dyck CH, Alessi CA, Balkin S, et al. Clarifying confusion: the confusion assessment method. Ann Intern Med 1990;113:941–948. 7. Matthews DA, Feinstein AR. A new instrument for patients’ ratings of physician performance in the hospital setting. J Gen Intern Med 1989;4:14–22. 8. Feinstein AR. Clinimetrics. New Haven: Yale University Press, 1987. 9. Matthews DA, Sledge WH, Lieberman PB. Evaluation of intern performance by medical inpatients. Am J Med 1987;83:938–944. 10. Matthews DA, Feinstein AR. A review of systems for the personal aspects of patient care. Am J Med Sci 1988;295:159–171. 11. Mercadante S, Casuccio A, Fulfaro F. The course of symptom frequency and intensity in advanced cancer patients followed at home. J Pain Symptom Manage 2000;20:104–112. 12. Zigmond AS, Snaith RP. The hospital anxiety and depression scale. Acta Psychiatr Scand 1983;67: 361–370. 13. Knauss WA, Wagner DP, Draper EA, et al. The APACHE III prognostic system. Chest 1991;100: 1619–1636. 14. Guyatt GH, Mitchell A, Molloy DW, et al. Measuring patient and relative satisfaction with level or aggressiveness of care and involvement in decisions in the context of life-threatening illness. J Clin Epidemiol 1995;48:1215–1224. 15. Conover WJ, Iman RL. Rank transformations as a bridge between parametric and nonparametric statistics. Amer Statistician 1981;35:124–133. 16. Covinsky KE, Rosenthal GE, Chren MM, et al. The relation between health status changes and patient satisfaction in older hospitalized medical patients. J Gen Intern Med 1998;13:223–229. 17. Kroenke K, Stump T, Clark DO, Callahan CM, McDonald CJ. Symptoms in hospitalized patients: outcome and satisfaction with care. Am J Med 1999; 107:425–431.
Vol. 23 No. 6 June 2002
18. Field MJ, Cassell CK, eds. Institute of Medicine Report: approaching death: improving care at the end of life. Washington, DC: National Academy Press, 1997:139–149. 19. Baker R, Wu AW, Teno JM, et al. Family satisfaction with end-of-life care in seriously ill hospitalized patients. J Am Geriatr Soc 2000;48(5 Suppl):S61–S69. 20. Morrison RS, Siu AL, Leipzig RM, Cassel CK, Meier DE. The hard task of improving the quality of care at the end of life. Arch Intern Med 2000;160: 743–747. 21. Wallston KA, Burger C, Smith RA, Baugher RJ. Comparing the quality of death for hospice and nonhospice cancer patients. Med Care 1988;26:177–182. 22. Rogers A, Karlson S, Addington-Hall J. ‘All the services were excellent. It is when the human element comes that things go wrong’: dissatisfaction with hospital care in the last year of life. J Adv Nurs 2000;31:768–774. 23. Lohr KN, Aaronson NK, Alonso J, et al. Evaluating quality-of-life and health status instruments: development of scientific review criteria. Clin Ther 1996;18:979–992. 24. Ross CK, Steward CA, Sinacore JM. A comparative study of seven measures of patient satisfaction. Med Care 1995;33:392–406. 25. Simmons Z, Bremer BA, Walsh SM, Fischer S. Quality of life in ALS depends on factors other than strength and physical function. Neurology 2000;55: 388–392. 26. Cohen RS, Mount BM, Tomas JJN, Mount LF. Existential well-being is an important determinant of quality of life. Cancer 1996;77:576–586. 27. Tierney RM, Horton SM, Hannan TJ, Tierney RM. Relationships between symptom relief, quality of life, and satisfaction with hospice care. Palliat Med 1998;12:333–344. 28. Pound P, Tilling K, Rudd AG, Wolfe CD. Does patient satisfaction reflect differences in care received after stroke? Stroke 1999;30:49–55. 29. Katz JN, Stucki G, Lipson SJ, et al. Predictors of surgical outcome in degenerative lumbar spinal stenosis. Spine 1999;24:2229–2233. 30. Nekolaichuk CL, Maguire TO, Suarez-Almazor M, Rogers WT, Bruera E. Assessing the reliability of patient, nurse, and family caregiver symptom ratings in hospitalized advanced cancer patients. J Clin Oncol 1999;17:3621–630. 31. Higginson I, Priest P, McCarthy M. Are bereaved family members a valid proxy for a patient’s assessment of dying? Soc Sci Med 1994;38:553–554. 32. Desbiens NA, Wu AW, Broste SK, et al. Pain and satisfaction with pain control in seriously ill hospitalized adults: findings from the SUPPORT research investigations. Crit Care Med 1996;24:1953–1961.
Vol. 23 No. 6 June 2002
The Reliability and Validity of QUEST
469
Appendix Quality of End-of-Life Care and Satisfaction with Treatment (QUEST) Scale: Patient Version First, I would like you to think about your doctors and the care they have given you over the past two days. By “your doctors,” I mean all doctors who have come in to see you over the past two days, not just your personal doctor. When responding, I would like you to think of these doctors as a group, and give your overall impression of the care you have received. The following list names some things that doctors are known to do. Some of the things are good for all doctors to do, others are not so good for any doctor to do. Your own doctors may or may not do these things. I would like you to choose a response that indicates how often you think your doctors have done each of the things over the past two days. Your choices are never, rarely, sometimes, usually, or always. If you choose “never,” you think it never has been true of your doctors over the past two days; if you choose “sometimes,” you think it sometimes has and sometimes hasn’t been true of your doctors; if you choose “always,” it means that you think it has always been true of your doctors over the past two days. You may use different responses for different items, and you also may use the same response for more than one item on the list. We want you to feel comfortable being totally honest with your answers! All of your responses will be confidential. Now, over the past two days, how often have the doctors… Never Rarely Sometimes Usually Always Spent enough time with you. Arrived late when they promise to come see you. Been hard to reach in time of need. Seemed distracted by other things when you talk. Been willing to take time to listen. Treated you more as a disease than as a person. Showed personal concern about you. Ignored your feelings. Responded quickly in time of need. Now I am going to ask the same questions again, but this time I want you to think about the things that the nurses have done. By “nurses,” I mean all nursing staff, including nurse aides and nurse technicians. Over the past two days, how often have the nurses… Never Rarely Sometimes Usually Always Spent enough time with you. Performed patient care duties in a timely manner. Been hard to reach in time of need. Seemed distracted by other things when you talk. Been willing to take time to listen. Treated you more as a disease than as a person. Showed personal concern about you. Ignored your feelings. Responded quickly in time of need.
470
Sulmasy et al.
Vol. 23 No. 6 June 2002
Now we are going to begin a new set of questions. I would like you to choose a number between 1 and 5 to describe how satisfied you are with your doctors on several aspects of care; the higher the number you choose, the more satisfied you should be with your doctors for that aspect of care. If you choose 1, you have been very dissatisfied with your doctors; if you choose 3, you have been neither satisfied nor dissatisfied; and if you choose 5, you have been very satisfied. You may use different numbers for different items, and you also may use the same number for more than one item on the list. Over the past two days, how satisfied have you been with your doctors’… Very Very Dissatisfied Dissatisfied Neutral Satisfied Satisfied 1
2
3
4
5
Bedside manner Common courtesy Way of talking to you Clinical and technical skills in treating you Concern for you as an individual What is your overall level of satisfaction with the care your doctors have provided during the past 2 days? Now I am going to ask the same questions again, but this time I want you to think about your satisfaction with the care your nurses have provided. Over the past two days, how satisfied have you been with your nurses’… Very Very Dissatisfied Dissatisfied Neutral Satisfied Satisfied 1
2
3
4
5
Bedside manner Common courtesy Way of talking to you Clinical and technical skills in treating you Concern for you as an individual What is your overall level of satisfaction with the care your nurses have provided during the past 2 days? Do you have any additional comments you would like to make about the care you have received from your doctors or nurses? (Respond in the space below)