Journal of Clinical Epidemiology 58 (2005) 286–290
Self-report was a viable method for obtaining health care utilization data in community-dwelling seniors Deborah P. Lubeck, Helen B. Hubert* Division of Immunology and Rheumatology, Stanford University, School of Medicine; 701 Welch Road, Suite 3305, Palo Alto, CA 94304, USA Accepted 21 June 2004
Abstract Objective: Patient self-report and audits of medical records are the most common approaches for obtaining information on utilization of medical services. Because of the time and cost savings associated with self-report, it is important to demonstrate the reliability of this approach, particularly in older persons who use more medical resources but may have poorer recall. Study Design and Setting: We contacted the medical providers of a random sample of seniors (n ⫽ 150) who participated in an ongoing study of health care use. Providers’ reports on the participant’s medical utilization in the prior year were compared with patients’ self-report over the same time period using weighted kappa statistics. Results: Perfect or almost perfect agreement (weighted kappa ⫽ 0.80–1.00) was obtained for physician, hospital, and emergency department visits and high-cost therapies (chemotherapy, radiation therapy). Agreement was substantial (weighted kappa ⫽ 0.60–0.80) for x-ray procedures and prescription medications and moderate (weighted kappa ⫽ 0.40–0.60) for outpatient procedures and diagnostic tests. Conclusion: Participant self-report is a viable, reasonably accurate method to obtain information on most types of medical utilization in an older study cohort. 쑖 2005 Elsevier Inc. All rights reserved. Keywords: Validation of resource use; Health care utilization; Self-report survey research
1. Introduction In epidemiologic studies, participant self-report is the most common, accessible, and cost-effective method of collecting information on utilization of health care. However, information obtained from respondent self-report may not be accurate or consistent with provider records. Although provider records may not be considered a gold standard, they are the standard for reporting medical utilization, and any existing systematic discrepancy could bias research results. A number of studies have compared two data sources for the assessment of medical history, medication use, hospital stays, and other health care utilization and generally found good agreement on the part of study participants [1–6]. However, many of these studies indicate that participant responses are biased toward under-reporting of utilization of health care when compared with medical records. For example, Ritter et al. [3] and Yafee et al. [7] observed that the frequency of physician visits was under-reported by study participants. At the same time, emergency room care was reported more frequently by study participants than in the
* Corresponding author. Tel.: 650-723-5639; fax; 650-725-2918. E-mail address:
[email protected] (H.B. Hubert). 0895-4356/05/$ – see front matter 쑖 2005 Elsevier Inc. All rights reserved. doi: 10.1016/j.jclinepi.2004.06.011
medical record [3]. Roberts et al. [4] described similar findings for outpatient care and also observed improved accuracy for inpatient utilization. In some studies, sociodemographic factors, such as age, gender, education, income, and health status of the participant, were evaluated to determine if systematic underreporting was associated with any of these variables. Such systematic reporting errors could introduce bias when comparing groups that differ on these characteristics. However, no investigator has observed a specific relationship [1–3, 5–6,8]. Research by Glandon et al. [8] focused on self-report of health care utilization by older individuals compared with medical records. They observed that individuals in poorer health, especially those with higher levels of utilization, were more likely to have reporting error. Particular interest in the reliability of self-report in the elderly stems from concern that they consume a disproportionate amount of the health care resources but may have poorer recall of use. This study evaluates respondent reported utilization in a cohort of community-dwelling seniors over 70 years of age. Respondents were 99% Caucasian with similar education and income levels. This homogeneous sample provides an internal control for some of the critical sociodemographic factors that may be associated with self-report reliability of older individuals.
D.P. Lubeck, H.B. Hubert / Journal of Clinical Epidemiology 58 (2005) 286–290
2. Methods In 1986, a longitudinal study of physical disability in a group of University of Pennsylvania alumni was initiated as part of the Arthritis, Rheumatism, and Aging Medical Information System. The purposes of this study are to develop risk factor models for musculoskeletal disability, health care utilization, and costs of care in the elderly. A total of 2,843 participants have responded to annual surveys since study inception. All individuals attended the University of Pennsylvania some time between 1939 and 1944 and were at least 60 years of age when first contacted to participate in the study. Longitudinal information on risk factors, comorbidity, health status, and physical disability has been acquired in all the annual surveys. Extensive questions on annual health care utilization were added to the questionnaire only in recent years. This paper focuses on data collected the first year (1997) that these questions were included in the annual surveys. Questionnaires are mailed to study participants in January of every year. Resource utilization in the prior calendar year (January–December) was reported by the participant on annual questionnaires in the following categories: inpatient care (number of hospitalizations, reason for hospitalizations, length of stay, and nursing home or live-in rehabilitation stays), outpatient care (emergency room visits, outpatient procedures, laboratory tests, radiology, diagnostic tests, physician visits, and home care services), and pharmaceutical use. Outpatient procedures, radiology, diagnostic tests, and physician visits were reported by the type of test or provider. We conducted a separate study comparing self-reports with information from respondents’ medical records to measure the comparability of these two methods of data collection. After the return of the January 1998 questionnaires that covered resource utilization for the 1997 calendar year, a random sample of 150 respondents was selected and included respondents who used and those who did not use medical care. This sample included an equal number of men and women, allowing us to evaluate whether or not there was reporting bias associated with gender. Individuals were asked for permission to contact their medical providers whether or not they sought health care in the previous year. Provider names, addresses, and telephone numbers were obtained. All providers were contacted by study staff and asked to complete a questionnaire reporting on the participant’s use of any outpatient medical services in 1997, including prescription medications, using the same calendar year (January–December). The provider questionnaire mirrored that of the study respondent and asked the number of times they saw the participant, the number and types of outpatient surgeries, diagnostic tests performed, medications prescribed, and referrals they made to other doctors in the 1997 calendar year. As an alternative to completing the questionnaire, the physician could copy the participant’s medical record and return it for abstraction and data entry by trained medical
287
records abstracters. If the provider did not see the patient in the corresponding year, they were asked to indicate this on the questionnaire. Comparisons of inpatient utilization were based on information in hospital medical records. Medical records were requested for all reported hospitalizations, including emergency room visits, inpatient stays, and rehabilitation care for the calendar year. For continuous variables, such as the number of physician visits, hospital days, and prescription medications, the mean and standard deviation are reported. Means are calculated on the basis of all reported utilization, including participants who reported no health care use. The mean difference in utilization between physician and respondent also is reported. The extent of agreement between physician and participant report was measured using a chance-corrected weighted kappa coefficient that assigns a maximal weight to exact concordance and reduced weights to disagreements [9–11]. Weighted kappa was used assuming that reported visits of providers and patients that were close but not equal to one another (e.g., 3 versus 4 visits) should be assigned some importance when contrasted with values that were not close in agreement (e.g., 0 versus 4 visits). The interpretation of weighted kappa is similar to unweighted kappa, where values of 0.80 or more signify almost perfect agreement beyond chance, values ranging from 0.60–0.80 represent substantial agreement, values from 0.40–0.60 signify moderate agreement, and values below 0.40 signify poor to fair agreement. All analyses were performed using SPSS for Windows statistical software, version 10.0. 3. Results Of the 150 men and women approached to participate in this special study, 12 individuals could not be reached to obtain consent, and seven individuals declined participation. Out of the 400 physician names received and contacted, 62 physicians did not respond to our survey. The physician nonresponses reduced the total patient sample size to 123. Table 1 summarizes patient characteristics. The sample had approximately equal numbers of male and female respondents with a median age of 77 years. No individual was younger than 73 years of age, and the oldest were 87 years of age. Sixty percent of the sample were married; 22% were widowed; and the remainder were divorced, separated, or never married. All individuals had 13 or more years of education and were predominantly Caucasian (99%). The specialties of physicians responding to our request for office visit, medication, and outpatient procedure information are described in order of frequency of utilization in Table 1. The largest representation is for ophthalmologists (20%), followed by general practitioners and internists (14%) and dermatologists (12%). Table 2 presents mean summary outpatient and inpatient utilization as reported by respondents and physicians or obtained from the medical record. Hospital visits, length of stay,
288
D.P. Lubeck, H.B. Hubert / Journal of Clinical Epidemiology 58 (2005) 286–290
Table 1 Characteristics of participants and physicians in the sample Participant characteristic (n ⫽ 123)
Median
Range
Age at questionnaire (y)
77 Frequency
73–87 Percentage
Women Men Marital status Married Widowed Other Physician specialty (n ⫽ 338) Ophthalmologist General practitioner General internist Dermatologist OB/Gyn Urologist Cardiologist Orthopedic surgeon Podiatrist Rheumatologist Gastroenterologist General surgeon Chiropractor Other doctors
62 61
51.2 48.8
72 26 23
59.5 21.5 19.0
67 46 46 40 24 21 20 14 11 10 7 5 1 22
20.1 13.8 13.8 12.0 7.1 6.5 6.0 4.2 3.3 3.0 1.9 1.5 0.3 6.7
and emergency department visits were accurately reported by respondents as confirmed by audit of medical records with near-perfect correspondence (weighted kappa ⫽ 1.0). The frequency of chemotherapy visits and radiation therapy had near perfect agreement between physician and respondents (kappa ⫽ 1.0). Agreement was lower but still substantial (kappa ⭓0.6) for total x-rays and prescriptions and was moderate (kappa ⭓ 0.4) for total outpatient procedures and diagnostic tests. Physicians reported more outpatient procedures, prescriptions, and diagnostic tests than respondents. Although total diagnostic tests had the lowest level of agreement, agreement levels were higher for individual diagnostic tests, including CT scans, MRIs, sonograms, colonoscopies, endoscopies,
and mammograms (data not shown). Poorest correspondence was found for EKGs, angiograms, and venipunctures. Information on physician visits by type of provider is presented in Table 3. The total number of visits was slightly higher for respondents (mean ⫽ 6.95) compared with physicians (mean ⫽ 6.77) but had excellent agreement (weighted kappa ⫽ 0.8). Among individual physicians, the agreement was substantial (kappa ⭓0.6) to almost perfect (kappa ⭓ 0.80). The lowest agreement was found in the specialties that were most frequently seen: ophthalmologists and general practitioners (kappa ⫽ 0.6) followed by internists and dermatologists (kappa ⫽ 0.7). More detail on the trend in the reporting of total physicians visits is presented in Fig. 1. For all visits combined, the discordance occurs with physicians that reported no doctor visits when the patient indicated a visit did occur. Most importantly, when individuals had more frequent visits, agreement was high. Further analysis of these data suggests that there were no systematic reporting differences by gender (data not shown).
4. Discussion We evaluated participant and physician reported utilization among a group of older survey respondents. Results were similar to those of other studies in that respondents tended to under-report utilization slightly, except in the area of physician visits. Almost perfect concurrence was observed for the most expensive categories of health care (i.e., hospitalizations and emergency room use) and for expensive treatments such as radiation and chemotherapy. Substantial to almost perfect congruence was observed for physician visits by type, x-rays, and prescription medications. Congruence was moderate for outpatient procedures and total diagnostic tests, with better agreement for such procedures as CT scans, MRI studies, colonoscopies, and mammograms. These results indicate that agreement between self-report data and medical records information varies depending upon
Table 2 Correspondence between physician or medical record and participant report of utilization in the prior year (January–December) Variable
Participant (n ⫽ 123)a
Physician (n ⫽ 338)a
Mean differencea
Agreement (weighted Kappa)
P value
Number of hospital visits Number of hospital days ER visits Outpatient procedures Total x-rays Total diagnostic tests Total prescriptions Chemotherapy visits Radiation therapy Venipunctures Urinalyses
0.2 1.1 0.1 0.3 1.1 1.0 3.4 0.3 0.4 0.2 0.8
0.2 1.1 0.1 0.6 1.1 2.2 4.1 0.3 0.4 1.6 0.7
0 0 0 0.3 0.07 1.2 0.8 0 0.07 1.4 0.2
1.0 1.0 1.0 0.5 0.7 0.4 0.6 1.0 1.0 0.2 0.3
⬍.001 ⬍.001 ⬍.001 ⬍.01 ⬍.01 .16 .15 ⬍.001 ⬍.001 .20 .18
(0.4) (3.1) (0.5) (1.2) (1.3) (1.5) (2.4) (2.7) (3.6) (4.6) (1.0)
(0.4) (3.1) (0.5) (1.0) (0.9) (2.5) (3.9) (2.7) (3.7) (2.7) (1.4)
(0) (0) (0) (1.0) (0.9) (2.3) (3.8) (0) (0.9) (4.5) (1.6)
Abbreviations: ER, emergency room; SD, standard deviation. Note: Provider information for hospitalizations and ER visits was obtained from hospital audits. a Values are mean (SD) unless otherwise noted.
D.P. Lubeck, H.B. Hubert / Journal of Clinical Epidemiology 58 (2005) 286–290
289
Table 3 Correspondence between physician and participant report of doctor visits in the prior year (January–December) by specialty Number of visits All doctors Physician type Cardiologist Chiropractor Dermatologist Gastroenterologist General practitioner Internist OB/Gyn Ophthalmologist Orthopedic surgeon Podiatrist Rheumatologist General surgeon Urologist Other doctors a
Physician (n ⫽ 338)
Mean difference
Agreement (weighted Kappa)
P value
6.9 (5.5) (range 0–41)
6.8 (5.8) (range 0–41)
0.01 (5.7)
0.8
⬍.001
0.7 0.1 0.8 0.2 1.7 0.9 0.3 1.6 0.4 0.6 0.3 0.1 0.4 0.7
0.5 0.1 0.7 0.1 1.1 1.0 0.3 1.1 0.2 0.2 0.2 0.1 0.4 0.8
0.2 0.04 0.09 0.1 0.6 0.1 0.1 0.5 0.2 0.4 0.1 0.09 0.8 0.1
0.9 0.9 0.7 0.9 0.6 0.7 0.9 0.6 0.9 0.8 0.9 0.9 0.9 0.8
⬍.001 ⬍.001 ⬍.001 ⬍.05 ⬍.05 ⬍.05 ⬍.001 ⬍.001 ⬍.01 ⬍.001 ⬍.01 ⬍.01 ⬍.01 ⬍.01
Participant (n ⫽ 123) a
(1.8) (0.7) (1.2) (0.5) (2.1) (1.8) (0.7) (2.7) (1.0) (1.7) (1.0) (0.5) (1.0) (3.9)
(1.5) (0.7) (1.3) (0.2) (2.5) (2.6) (0.5) (1.9) (0.9) (0.9) (0.6) (0.5) (1.3) (3.8)
(1.5) (0.5) (1.0) (0.4) (2.5) (2.1) (0.5) (2.9) (1.0) (1.5) (0.9) (0.8) (1.1) (0.9)
Values are mean (SD) unless otherwise noted.
the services studied. Although participant and provider reports of all doctor visits were well correlated, congruence of reported visits varied by physician type. Providers such as ophthalmologists and general practitioners that were seen more frequently had the lowest kappa coefficients. Although men reported slightly more inpatient stays and diagnostic tests than women and women reported more physician visits,
Fig. 1. Comparison of the number of physician and patient reported doctor visits.
there were no statistically significant reporting biases associated with gender in any utilization category after controlling for provider report. Despite the high levels of agreement between respondents and providers for most types of utilization, over- and underreporting still may have an impact on the findings from proposed study analyses. Generalized misclassification can result in loss of statistical power to test for group differences in utilization outcomes and may lead to conservative findings with regard to the association between patient characteristics and utilization. On the other hand, a systematic bias in reporting where, for example, older individuals compared with younger are more likely to over-report doctor visits, could inflate estimates of the potential impact of age on utilization. It is uncertain whether the patient or the provider report doctor visits more accurately and, thus, which measure would introduce the least bias. In our data, the physician visits differ most from patient report when the record indicates no visits and the patient disagrees. Factors contributing to good agreement between physician and participant reporting include the high education level of the respondents and the fact that they are all regular participants in an annual health survey. Participants are conscientious and complete when filling out their questionnaires. They are also better prepared to respond, sometimes with supporting records readily accessible for review when completing the questionnaire. Poorer congruence for certain types of utilization may have occurred due to the fact that not all physicians seen by a respondent participated in our study. Thus, the study respondent may have correctly reported a test or visit that was not in the record because the recording physician did not participate in this comparative study. Other factors influencing concordance include incompleteness of provider records, incorrect recording of medical information, differential understanding of the requested study variables (e.g.,
290
D.P. Lubeck, H.B. Hubert / Journal of Clinical Epidemiology 58 (2005) 286–290
patients may misidentify a specific treatment), and multiple reports of the same test or procedure for an individual patient. In addition, prescriptions from the doctor may not always be filled or used by the patient; thus, agreement between patient and physician would not be expected in such cases. This study has some limitations, particularly with regard to generalizability of findings to other populations. Despite the older age of the study cohort, participants were well educated and had completed many questionnaires in prior years, factors that may have improved their reporting compared with other seniors. However, this investigation was based on detailed utilization questions from the first year they were included in the annual survey, and therefore familiarity should not have played a major role. The sample size was small, and some therapies or diagnostic tests were rarely used, possibly improving the concordance associated with null values. Furthermore, it is unknown if the nonresponse of some individuals (12.7%) and physicians (15.5%) introduced bias into the analyzable study sample. Other research studies offer similar results with fairly high concordance between self-reported and medical records of hospital care among the general population [2–5,8,12]. As health care utilization becomes a more critical study area, there is a need for simple and reliable measures of resource use. This study suggests that participant self-report, even among an older population, can be a viable and cost-effective method for obtaining most types of these data.
Acknowledgments This work was supported by a grant from the National Institute of Child Health and Human Development (R01
HD035641) to Stanford University, Helen B. Hubert, Principal Investigator. References [1] Law MG, Hurley SF, Carlin JB, Chandros P, Gardiner S, Kaldor JM. A comparison of patient interview data with pharmacy and medical records for patients with acquired immunodeficiency syndrome or human immunodeficiency virus infection. J Clin Epidemiol 1996; 49:997–1002. [2] Zhu K, McKnight B, Stergachis A, Daling JR, Levine RS. Comparison of self-report data and medical records data: results from a casecontrol study on prostate cancer. Int J Epidemiol 1999;28:409–17. [3] Ritter PL, Stewart AL, Kaymaz H, Sobel DS, Block DA, Lorig KR. Selfreports of health care utilization compared to provider records. J Clin Epidemiol 2001;54:136–41. [4] Roberts RO, Bergstralh EJ, Schmidt L, Jacobsen SJ. Comparison of self-reported and medical record health care utilization measures. J Clin Epidemiol 1996;49:989–95. [5] Reijneveld SA. The cross-cultural validity of self-reported use of health care: a comparison of survey and registration data. J Clin Epidemiol 2000;53:267–72. [6] Walter SD, Clarke EA, Hatcher J, Stitt LW. A comparison of physician and patient reports of Pap smear histories. J Clin Epidemiol 1988; 41:401–10. [7] Yaffe R, Saphiro S, Fuchsberg RR, Rohde CA, Corpen HC. Medical economics survey methods study: cost effectiveness of alternative survey strategies. Med Care 1978;16:641–59. [8] Glandon GL, Counte MA, Tancredi D. An analysis of physician utilization by elderly persons: systematic differences between selfreport and archival information. J Gerontol 1992;47:S245–52. [9] Fleiss JL. Statistical methods for rates and proportions. New York: John Wiley & Sons; 1981. p. 212–25. [10] Cohen J. Weighted kappa: nominal scale agreement with provisions for scaled disagreement or partial credit. Psychol Bull 1968;70:213–20. [11] Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics 1977;33:159–74. [12] Reijneveld SA, Stronks K. The validity of self-reported use of health care across socioeconomic strata: a comparison of survey and registration data. Int J Epidemiol 2001;30:1407–14.