General Hospital Psychiatry 23 (2001) 239 –253
Psychiatry and Primary Care Recent epidemiologic studies have found that most patients with mental illness are seen exclusively in primary care medicine. These patients often present with medically unexplained somatic symptoms and utilize at least twice as many health care visits as controls. There has been an exponential growth in studies in this interface between primary care and psychiatry in the last 10 years. This special section, edited by Wayne J. Katon, M.D., will publish informative research articles that address primary care-psychiatric issues.
The Quality Improvement for Depression Collaboration: general analytic strategies for a coordinated study of quality improvement in depression care Kathryn M. Rost, Ph.D.a,*, Naihua Duan, Ph.D.b, Lisa V. Rubenstein, M.D., M.S.H.S.c,d,e, Daniel E. Ford, M.D.f, Cathy D. Sherbourne, Ph.D.e, Lisa S. Meredith, Ph.D.e, Kenneth B. Wells, M.D., M.P.H.b,e for the Quality Improvement for Depression (QID) Consortium a
Department of Family Medicine, University of Colorado Health Sciences Center, 1180 Clermont Street, Campus Box B155, Denver, CO 80220, USA b Departments of Psychiatry and Behavioral Sciences and Biostatistics, University of California at Los Angeles, 10920 Wilshire Boulevard, Los Angeles, CA 90024, USA c Department of Medicine, University of California at Los Angeles, Box 957035, Los Angeles, CA 90095-7035, USA d Department of Medicine, VA Greater Los Angeles Healthcare System, 16111 Plummer Street 152, Sepulveda, CA 91343, USA e RAND Health Program, 1700 Main Street, Santa Monica, CA 90407, USA f Department of General Internal Medicine, Johns Hopkins School of Medicine, 2024 E. Monument Street, Baltimore, MD 21205, USA
Abstract It is difficult to evaluate the promise of primary care quality-improvement interventions for depression because published studies have evaluated diverse interventions by using different research designs in dissimilar populations. Preplanned meta-analysis provides an alternative to derive more precise and generalizable estimates of intervention effects; however, this approach requires the resolution of analytic challenges resulting from design differences that threaten internal and external validity. This paper describes the four-project Quality Improvement for Depression (QID) collaboration specifically designed for preplanned meta-analysis of intervention effects on outcomes. This paper summarizes the interventions the four projects tested, characterizes commonalities and heterogeneity in the research designs used to evaluate these interventions, and discusses the implications of this heterogeneity for preplanned meta-analysis. © 2001 Elsevier Science Inc. All rights reserved. Keywords: Depression; Quality improvement; Primary care; Meta-analysis
1. Introduction Recently, policy makers have given considerable attention to closing the gap in quality of care delivered for many chronic conditions in “best” versus “usual” care settings by funding the development of interventions to implement best practice [1]. Depression is a useful condition to learn more about “closing the gap” because it is prevalent [2], ex* Corresponding author. Tel.: ⫹1-303-315-9721; fax: ⫹1-303-2213893. E-mail address:
[email protected]
tremely disabling [3– 6], responsive to readily available treatments [7], and poorly managed in many primary care practices where it usually presents [8 –13]. Deriving specific conclusions about how implementing best practice for primary care depression may affect patients in diverse practices is complicated because published estimates have been derived by evaluating distinctive interventions in dissimilar populations using different research designs [14 –35]. Traditional meta-analysis has become a mainstay for drawing meaningful conclusions about intervention efficacy. Its use is most straightforward when virtually identical interventions are evaluated in large numbers of studies con-
0163-8343/01/$ – see front matter © 2001 Elsevier Science Inc. All rights reserved. PII: S 0 1 6 3 - 8 3 4 3 ( 0 1 ) 0 0 1 5 7 - 8
240
K.M. Rost et al. / General Hospital Psychiatry 23 (2001) 239 –253
ducted in diverse practice settings by using similar research designs. Unfortunately, these conditions are difficult to realize in the literature examining how to improve primary care treatment for mental health problems. Current studies do not evaluate virtually identical interventions across diverse practice settings; rather, investigators deliberately tailor interventions to the practice setting. In addition, key aspects of the research design differ. As an alternative to relying entirely on qualitative methods to synthesize findings across the existing literature, investigators have begun exploring preplanned meta-analysis [36 –39]. Preplanned meta-analysis recognizes that pooling data across a small number of high-quality studies has the potential to provide a more precise and generalizable estimate of intervention effects than any single study alone if threats to internal and external validity arising from differences across research protocols can be addressed by using appropriate analytic strategies. The Quality Improvement for Depression (QID) collaboration joins other cooperative studies designed for preplanned meta-analysis [40 – 42]. The preplanned meta-analysis QID will undertake is a meta-analysis of patient-level data, pooling data from four projects conducted by different research teams funded by different agencies. The goal of the QID collaboration is to test whether quality-improvement interventions based on the treatment principles elucidated in earlier studies can be implemented with sufficient integrity and intensity to enhance outcomes in the diverse types of community-based practice settings that serve most depressed adults. Prior quality-improvement studies in this area have been conducted primarily in staff model HMOs, testing models in which treatment was assigned [17] or directed in part [15,16,23,31] by the research team. In contrast, QID projects were conducted across a variety of practice organizations by using models in which the research team encouraged high-quality treatment [43] rather than assigned it or directly consulted with patients on it. The specific aims of the QID collaboration are to provide precise estimates about: 1) how quality-improvement interventions affect symptom change and functional status in primary care patients with major depression (intent-to-treat analyses); and 2) how high-quality care affects similar outcomes in the same population (as-treated analyses). We define quality improvement as practice-level strategies designed to promote evidence-based care. Because practice-level quality improvement generally tailors an intervention to the local practice structure, it is important to evaluate the value of these interventions by deriving generalizable estimates of their effect across a wide range of practices using society relevance weights. Society relevance weights (also known as importance weights [44]) increase the external generalizability of the findings by assigning numerical weights to each subject that indicate how representative that subject and his/her primary care setting are to patients seeking primary care across the country across a number of policyrelevant characteristics. Even with society relevance
weights, the generalizability of QID findings to American primary care may be somewhat limited because the best practice models QID projects tested may have to be further tailored for successful integration into “similar” practices. In this paper, we provide a concise history of the QID collaboration; describe research design commonalities; characterize QID intervention commonality and heterogeneity; and analyze implications of the heterogeneity in QID research designs for preplanned meta-analysis of patients. Some heterogeneity across interventions is potentially desirable because it is unlikely that a broad diffusion of quality-improvement interventions will follow one particular model; thus, estimates of quality-improvement effects derived across a diverse set of intervention models may be more realistic for informing policy. Heterogeneity across research designs is less desirable because variation introduces alternative explanations for observed relationships; however, the plausibility of these alternative explanations can potentially be reduced by introducing statistical controls to reduce such threats to internal validity.
2. History of QID QID is a cooperative study consisting of four projects that evaluated how six interventions affected the quality and outcomes of care provided to depressed primary care patients. Three of the projects were combined after funding as an National Institutes of Mental Health (NIMH) Cooperative Agreement to test the effectiveness of primary care practice guidelines for major depression (the Hopkins Quality Improvement for Depression [HQID] Project directed by investigators at Johns Hopkins University; the Mental Health Awareness Project [MHAP] directed by investigators at VA Greater Los Angeles/RAND/UCLA); and the Quality Enhancement by Strategic Teaming [QuEST] Project directed by investigators currently at the University of Colorado Health Sciences Center). In accordance with the cooperative agreement’s aims, the three projects tested different interventions to increase the proportion of primary care patients receiving treatment consistent with Agency for Health Care Policy and Research guidelines [7,45]. The funding agency required that data be collected so that it could be combined to examine the effects of high-quality depression treatment on outcomes. Two projects (HQID and QuEST) evaluated specified quality-improvement interventions to improve pharmacotherapy and referral to specialty care, whereas the third (MHAP) evaluated two specified processes for developing quality-improvement interventions. The fourth project (Partners in Care [PIC] directed by investigators at RAND/UCLA-NPI) was funded by the Agency for Healthcare Research and Quality to evaluate the impact of disseminating to managed primary care practices two specified quality-improvement interventions, each designed to improve overall rates of appropriate depression care.
K.M. Rost et al. / General Hospital Psychiatry 23 (2001) 239 –253
After receiving confirmation that already specified intervention studies would be funded, all four projects recognized the potential of coordinating research designs to enhance the potential for meta-analysis. As a group, they worked to determine the degree to which research designs could be standardized while still allowing each project to address the funded specific aims. The four projects initially proposed different follow-up periods, consistent with their expectations that the varying interventions would have different peak effects and duration. Funding from the MacArthur Foundation enabled all four projects to collect baseline, 6- and 12-month follow-up data as well as providing infrastructure support to address the scientific issues in creating a pooled database for meta-analysis. An additional NIMH grant was obtained to permit all four projects to field both 18- and 24-month follow-up to determine longer-term intervention effects.
3. Research design commonalities Common design features across QID projects are summarized in this section from detailed descriptions provided in previously published articles [14,46,47]. All four QID projects used a four-level nested design, recruiting community-based health care organizations, primary care practices within organizations, primary care clinicians within practices, and primary care patients within clinicians. Each study recruited one or more health care organizations to participate in the project, and then recruited practices from the participating organization(s). The practices that consented to participate were randomized to intervention or usual care conditions by using a blocked randomization design that stratified participating clinics into homogeneous blocks before conducting randomization within each block. Within each participating practice, primary care clinicians were invited to participate in the project. Patients who screened positive for depression and satisfied other eligibility criteria during their visits to participating clinicians were invited to participate in the project. Because all clinicians and patients in a given practice were randomized to the same condition, QID projects could target their efforts to the practice, the clinician and the patient in intervention practices. Major constructs in the QID core database are summarized in Table 1. Organizational, clinician, and patient characteristics were assessed at baseline. Selected clinician variables were assessed again an average of 18 months later. Patient reports of process and outcome variables were assessed at baseline (shortly after the index visit) and every 6 months over a 2-year period.
4. Intervention commonality and heterogeneity For all four projects, the goal of the intervention was to increase high-quality depression care in a context where clinicians and patients were free to choose the treatment
241
Table 1 Major constructs across all projects Organizational characteristics Size Age Staffing composition HMO/IPA/PPO/mixed model Profit status Primary care compensation Carve-outs Subcontracts Restricted access to specialists Copayments Visit restrictions Quality assurance efforts Average waiting time Patient characteristics Sociodemographic Transportation Responsibility for dependent family members Physical comorbidities (1 month) Chronic conditions Medication Psychiatric comorbidities Lifetime Previous psychiatric hospitalization Psychotropic medication One-year Dysthymia Panic screener Alcohol screener Clinician characteristics Background and training Outpatient caseload Participation in professional activities Reimbursement and financial incentives Depression treatment Knowledge Attitudes Proclivity Experience
Process of depression care Assessment Current symptoms Length of current episode Previous episodes Suicial ideation Substance use Manic episodes Treatment Initiation/adjustment to medication Reduction of depression -inducing medication Side effect management Primary care counseling Specialty care referral Monitoring by index provider In-person follow-up Telephone follow-up Outcomes of care Depression diagnosis Depressive symptoms Work productivity Household productivity Physical functioning Emotional functioning Utilization/expenditures In-Plan Primary care Speciality care Emergency room Hospitalization Out-of-plan
they judged to be most appropriate. High-quality depression care was defined as antidepressant medication and/or counseling consistent with the recommendations defined in Agency for Health Care Policy and Research guidelines [7,45]. To achieve this goal, each of the four projects employed a partnership model to construct manualized multimodal chronic disease management interventions including clinician education and depression care managers. Intervention differences across the four projects reflected: 1) differences in the primary target of the intervention—practice administrators (MHAP and PIC) versus clinicians (HQID and QuEST); 2) differences in whether the intervention was prespecified (HQID, PIC, and QuEST) versus preplanned (MHAP); 3) differences in the intensity at which intervention components could be incorporated into usual care; and 4) differences in funding levels.
242
K.M. Rost et al. / General Hospital Psychiatry 23 (2001) 239 –253
Developing an intervention for a network model HMO with a behavioral health care carveout, the HQID project provided academic detailing to clinicians and trained a centrally located care manager to educate patients and monitor their response to treatment monthly over 12 months, faxing progress reports to the clinician. Developing two interventions for staff model HMOs, the MHAP project trained two organizations to use two distinct team-based quality-improvement approaches to formulate specific improvement goals [14]. The first approach used a centrally organized team of experts, who in consultation with practice representatives, implemented a plan across multiple practices; the second approach used local teams who developed and implemented individualized plans at their own practice [14]. Developing two interventions for managed care practices, the PIC project trained local clinicians to provide lectures/ academic detailing and audit/feedback to other clinicians in the practice. In the first intervention, local nurses were trained to provide improved medication management for either 6 or 12 months; the second intervention trained local psychotherapists in cognitive behavioral therapy, which patients were able to obtain for a reduced copayment [19,46]. Developing an intervention for mixed-model practices, the QuEST project provided academic detailing to clinicians, instructed administrative staff on how to systematically screen patients before the visit, and trained local nurses to educate patients and monitor their response to treatment weekly over 2 months; local nurses also contacted patients monthly from 12 to 24 months to identify symptomatic patients for whom specific recommendations for treatment adjustment were provided [47]. Additional information describing the organization and financing of QID interventions is given in Tables 2 and 3.
5. Research design heterogeneity 5.1. Randomization Across all four projects, randomization to the intervention or usual care condition occurred within each organization at the practice level after participating practices were stratified into homogeneous blocks. The variables used to stratify practices differed across projects. HQID stratified by geographic area; MHAP stratified by patient demographics and practice size; PIC stratified by patient demographics and clinician mix including onsite mental health specialists; and QuEST stratified by preexisting depression practice patterns. These differences potentially contribute to differences in the level of precision projects have to estimate intervention effects, but they do not bias the estimate of intervention effects within each project.
5.2. Comparison group Across all four projects, interventions were compared to usual care (QuEST) or usual care supplemented by distribution of guidelines to administrators/clinicians (HQID, MHAP, and PIC). Previous research suggests that distribution of guidelines will have minimal if any impact on the process or outcomes of care [48,49]. 5.3. Eligibility criteria and recruitment 5.3.1. Organizations All four projects recruited convenience samples of community-based health care organizations or practice networks. The 11 participating organizations provided both managed (staff- and network-model HMOs, IPAs, PPOs, and public health clinics) and nonmanaged (fee-for-service) health care in single insurer and mixed-model clinics. PIC specifically sought organizational diversity; PIC and QuEST sought geographic diversity; and HQID and PIC sought practices with high proportions of minority patients. The difference in the type of organizations recruited will affect the generalizability of each study; such differences can be addressed through society relevance weighting. It is difficult to characterize organizational recruitment rates across the four projects because no project attempted to recruit members at random from a defined population of health care organizations; rather each project partnered with organizations that expressed interest in the project. 5.3.2. Practices Each project recruited multiple practices from participating organizations, varying in how practices were selected. MHAP and QuEST invited selected matched practices within a given organization to participate. HQID invited all practices serving more than 250 similarly insured patients in a given geographic area. PIC invited all practices in selected regions that had three or more full-time clinicians. In QuEST, practices were not eligible if they employed an onsite mental health professional who provided ongoing depression treatment, while in MHAP, practices that had recently sponsored mental health quality-improvement initiatives were ineligible. The implications of these differences for generalizability will be addressed through society relevance weighting. In HQID, 41 of 60 practices eligible to participate in the project agreed to participate. In MHAP, the first 9 of 15 eligible practices approached agreed to participate. In PIC, 7 of 8 selected regions/partnerships agreed to participate; in regions/partnerships agreeing to participate, 46 of the 48 invited practices agreed to participate. In QuEST, it was not possible to estimate a practice recruitment rate because practices in the research network volunteered to participate in the project. Across the four projects, there were 108 participating primary care practices (41 in HQID, 9 in MHAP, 12 in QuEST, and 46 practices treated as 27 experimental units in PIC). A description of
K.M. Rost et al. / General Hospital Psychiatry 23 (2001) 239 –253
243
Table 2 Organization of QID interventions Intervention activities Provider education % Providers participating
Materials
Patient assessment and education Care manager background Modality Visit length (minutes) Standardization of assessment Standardization of education % of patients assessed/educated Materials
Patient case managementb Modality Visit length (min) Standardization of case management % with one case management visit or more Average no. of case management visits Duration Organizational change activities Evidence-based priority setting and planning Collaborative care activities
HQID
MHAPa
PIC
QuEST
55% attended seminars, 95% completed one or more detailing visits AHCPR guidelines, other algorithms
ND
100% attended seminars, 49% participated in detailing AHCPR guidelines, provider manual, quick reference cards
100% completed four detailing calls
Psychiatric nurse/MSW
1/6 Psychologist, 2/6 RN, 2/6 PCP 5/6 in person ND High in 5/6 Low ND Varied
RN
Office nurse
In person 20 High High 72% Brochures, videos, copies of depression care plan
In person 15 High High 92% AHCPR Patient’s Guide, self-help books, copies of depression care plan
Telephone (in person) 15 Moderate
Telephone (in person) ND High in 3/6
Telephone (in person) ND Highc
Telephone (in person) 15 High
68%
ND
61%c
87%
3.5
ND
4.5c
4.4
Up to 12 months
ND
Up to 12 months
Up to 12 monthsd
No
Yes
No
No
Study-trained psychiatrist on calle
Joint mental health/ primary care planning and implementation of study interventions
Study-trained psychiatrist supervision of nurse case managers in one experimented arme
Study-trained psychiatrist on calle
By telephone 15 High Moderate 76% AHCPR Patient’s Guide, self-help books
AHCPR guidelines, provider manual, quick reference cards
AHCPR guidelines, other algorithms
a
Intervention deliberately varied at discretion of CQI teams across 6 practices. Intensity with which components were implemented could not be measured. Case management visits defined as depression care manager contacts following initial patient assessment/education session. c Calculated for QI Med condition only. d Case management re-instituted 12 to 24 months. e Intervention component for which implementation was substantially less than planned. ND ⫽ not determined. b
organizational and financing characteristics of participating practices appears in Table 4. 5.3.3. Clinicians Primary care clinicians (physicians, physicians assistants, and nurse practitioners) from each practice were invited to participate. HQID, MHAP, and PIC attempted to recruit all primary care clinicians in each practice; QuEST recruited two clinician volunteers in each practice. These differences in clinician recruitment strategy will be addressed with society relevance weighting. Clinician recruitment rates for the three projects that attempted to recruit all clinicians in participating practices were 85% (HQID), 97%
(PIC), and 100% (MHAP). Across the four projects, there were 388 clinicians (33 from HQID, 179 from MHAP, 24 from QuEST, and 152 from PIC) who provided primary care treatment to the QID core patient sample at the index visit; 265 (69%) completed baseline questionnaires (45% in HQID, 47% in MHAP, 100% in QuEST, and 93% in PIC). Characteristics of the 265 QID core clinicians are described in Table 5. 5.3.4. Patients All projects attempted to recruit a representative sample of consecutive patients meeting the following inclusion and exclusion criteria to evaluate the impact of the intervention.
244
K.M. Rost et al. / General Hospital Psychiatry 23 (2001) 239 –253
Table 3 Grant support of required depression care intervention activities
Intervention specification Intervention materials Intervention implementation Training Clinical QI leaders Primary care physicians Nurse care managers Patient screening Patient assessment/education Patient case management Medication Psychotherapy Unrestricted grant contributions to activities noted above as partially supported
HQID
MHAP
PIC
QuEST
Fully Fully
Partially Not at all
Fully Fully
Fully Fully
NA Fully Fully Fully Fully Fully Not at all Not at all None
Fully Not at all Not at all Not at all Not at all Not at all Not at all Not at all $5,000 to one organization, $2,500 to other
Fully Not at all Fully Fully Partially* Partially* Not at all Partially Average of $35,000 to each organization
NA Fully Fully Partially Not at all Not at all Not at all Not at all $36,000 to organization
* Estimated 50%. NA ⫽ not applicable.
5.3.4a. Inclusion criteria. Patients had to be positive on a screener [50] which identified individuals who reported 2 weeks or more during the last year and 1 week or more during the past month when they felt sad, empty, depressed
or lost interest in things they normally enjoyed. The screener was self-administered with assistance as needed at the time of the index visit (MHAP, PIC, and QuEST) or by telephone shortly afterward (HQID). In addition, screen-
Table 4 Characteristics of participating organizations and practices Characteristic Number of healthcare organizations Number of practices Size PCPs in practices, na Adult outpatient visits per physician, n/weekb Visits involving depressed patients, % of total visitsb Financing For-profit practices, na Patients in capitated contracts, %c PCPs salaried, %d Uninsured patients, %e Organization Mental health care carved-out, % of patients Gatekeeper required for access to mental health services, % of patientsf Average time per visit, minutesb New patients Follow-up patients Pre-authorization required for mental health treatments, % of practicesg Patients must choose provider from a group or list, %h a
HQID
MHAP
PIC
QuEST
1 41
2 9
7 46
1 12
78 86 5
296 90 8
186 96 7
71 60 9
32 100 0 0
0 100 100 0
0 53 94 4
4 8 58 20
100 0
0 0
30 74
14 36
25 14 0
24 16 0
23 14 25
25 14 26
100
100
90
43
Based on information obtained from study sites at the start of the study. From the Clinician Background Questionnaire (CBQ). c For PIC, from the CBQ; for QuEST, percent of patients in study; otherwise, based on information from study sites. d For PIC, information from the CBQ; otherwise, based on information from study sites. e For PIC and QuEST, percent of patients in study; otherwise, based on information from study sites. f For HQID, based on information from organization; otherwise from the Patient Assessment Questionnaire (PAQ). g For QuEST, percent of patients in study; otherwise, percent of practices. h For HQID and MHAP, based on plan policy information; for PIC, percent of patients in study from self-report; for QuEST, percent of patients based on combination of plan policy information and self-report. b
K.M. Rost et al. / General Hospital Psychiatry 23 (2001) 239 –253
245
Table 5 Demographics of clinician responders treating QID core patients (N ⫽ 265)
Age, mean (SD) Gender, n (%) Male Female Race, n (%) White Non-white Specialtyb, n (%) Internal medicine General/family practice Any continuing medical education for depression in the past 3 years, n (%) No Yes a b
HQIDa (n ⫽ 15)
MHAPa (n ⫽ 85)
PICa (n ⫽ 141)
QuEST (n ⫽ 24)
47.0 (9.1)
42.7 (7.9)
43.5 (8.9)
43.5 (6.3)
10 (66.7) 5 (33.3)
40 (48.8) 42 (51.2)
90 (63.8) 51 (36.2)
16 (66.7) 8 (33.3)
8 (57.1) 6 (42.9)
57 (70.4) 24 (29.6)
101 (72.1) 39 (27.9)
22 (91.7) 2 (8.3)
7 (46.7) 8 (53.3)
81 (95.3) 4 (4.7)
65 (46.1) 76 (53.9)
1 (4.2) 23 (95.8)
3 (20.0) 12 (80.0)
24 (29.3) 58 (70.7)
39 (29.1) 95 (70.9)
7 (29.2) 17 (70.8)
Due to missing data, some categorical n’s do not sum to total sample for pertinent study. P⬍.001 (all other tests of differences had P⬎.05); P values are from 2 tests, except for age which was from analysis of variance.
positive patients had to meet criteria for 1-year major depression on a subsequent structured interview [50] to be considered eligible for the QID core database. QuEST required that screen-positive patients report five or more current symptoms of major depression [51] before completing the structured interview to increase the proportion of patients meeting structured interview criteria for major depression. 5.3.4b. Exclusion criteria. Patients were excluded across all four projects if they were ⬍ 18 years of age; had an acute life-threatening condition or cognitive impairment that prevented them from completing the screener; indicated they did not intend to receive care in the clinic on an ongoing basis; had no access to a telephone; were currently pregnant, breastfeeding, or less than 3 months postpartum; or screened positive for current bereavement, lifetime mania, or lifetime alcohol dependence with current drinking. It is important to note that none of the four projects excluded patients symptomatic despite recent treatment for depression or suicidal patients. Patients were excluded if they did not speak English (HQID, QuEST, and MHAP) or English/Spanish (PIC). Patients were excluded from HQID and PIC if they were covered by plans not collaborating in the project. HQID, PIC, and QuEST also recruited patients who screened positive but failed to meet criteria for 1-year major depression on the structured interview. The PIC project recruited patients who were positive on selected exclusion criteria (e.g., pregnancy/breastfeeding/postpartum, mania, alcohol, and bereavement). These patients were forwarded to the QID database but flagged so they would not be included in QID core analyses. Patients meeting the eligibility criteria outlined above were considered eligible in HQID, MHAP, and QuEST and potentially eligible in PIC pending confirmation of insurance status. Whereas all four projects recruited identified eligible patients at rates comparable to previously published primary
care depression intervention studies [15–18,21], sample loss contributing to nonresponse bias occurred both during the screening process while eligibility status was being determined and during the recruitment process when patients found to be eligible were recruited to participate in the study (see Appendix). We mitigated potential nonresponse bias by developing an overall enrollment nonresponse weight for each subject. Defined as the reciprocal of the overall enrollment response rate, the overall enrollment nonresponse weight adjusts for differences in the overall enrollment response rates among different subgroups within the same project and adjusts for differences in the overall enrollment response rates across projects. Across the four projects, there are 1499 patients in the QID core database (72 from HQID, 366 from QuEST, 514 from MHAP, and 547 from PIC). Characteristics of patients in the QID core database are described in Table 6. In addition to nonresponse bias, differences in patient recruitment lead to three other potential sources of heterogeneity: 1) variation in initiation of recruitment relative to initiation of patient intervention; 2) length of recruitment window; and 3) sampling strategies for screening. First, HQID, PIC, and QuEST delivered patient interventions after the initiation of patient recruitment, whereas MHAP delivered patient interventions before and after the initiation of patient recruitment. MHAP subjects who made a visit in the “gap” between the intervention initiation and their own recruitment may have received part or all of the intervention prerecruitment. To address this source of heterogeneity, the research team will employ time-trend analyses that re-date MHAP subjects’ baseline interview to reflect how long it occurred after the subject’s first visit to a clinic with a patient intervention in place. For example, if a MHAP patient was recruited to the study 6 months after they made a visit to a clinic that had initiated its patient intervention, time-trend analyses will re-date the baseline interview to be a 6-month interview. Time-trend analyses use carefully
246
K.M. Rost et al. / General Hospital Psychiatry 23 (2001) 239 –253
Table 6 QID core patient characteristics (N ⫽ 1498) HQID (n ⫽ 72)
MHAP (n ⫽ 514)
PIC (n ⫽ 546)
QuEST (n ⫽ 366)
16 (22.2) 43 (59.7) 13 (18.1) 0 (0.0)
101 (19.6) 187 (36.4) 169 (32.9) 57 (11.1)
139 (25.5) 226 (41.4) 130 (23.8) 51 (9.3)
123 (33.6) 160 (43.7) 74 (20.2) 9 (2.5)
12 (16.7) 60 (83.3)
208 (40.5) 306 (59.5)
126 (23.1) 420 (76.9)
49 (13.4) 317 (86.6)
30 (41.7) 34 (47.2) 6 (8.3) 2 (2.8)
347 (67.5) 35 (6.8) 68 (13.2) 64 (12.5)
345 (63.2) 33 (6.0) 141 (25.8) 27 (5.0)
301 (82.2) 30 (8.2) 11 (3.0) 24 (6.6)
7 (9.7) 43 (59.7) 22 (30.6) 30.3 (16.0)
32 (6.2) 338 (65.8) 144 (28.0) 32.9 (14.7)
76 (13.9) 338 (61.9) 132 (24.2) 33.6 (13.2)
69 (18.9) 255 (69.7) 42 (11.5) 40.7 (13.9)
46 (63.9) 26 (36.1)
307 (59.7) 207 (40.3)
376 (68.9) 170 (31.1)
181 (49.5) 185 (50.5)
44 (61.1) 28 (38.9)
267 (51.9) 247 (48.1)
314 (64.7)b 171 (35.3)
217 (59.3) 149 (40.7)
64 (88.9) 8 (11.1) 1.6 (1.4)
459 (89.3) 55 (10.7) 2.0 (1.7)
412 (85.0)b 73 (15.1) 1.4 (1.4)
306 (83.6) 60 (16.4) 1.9 (1.7)
a
Age (years) , n (%) 18–34 35–49 50–64 65⫹ Gender,a n (%) Male Female Race,a n (%) White African-American Hispanic Other Education,a n (%) Less than high school High school graduate College graduate MCES-D, mean (SD)a Antidepressant use in past 6 months,a n (%) No Yes Specialty care visit in past 6 months,a n (%) No Yes Suicidal ideation, n (%) No Yes Medical conditions, mean (SD)a a b
P⬍.001 from chi-square and analysis of variance tests. Data missing for 61 PIC subjects.
specified longitudinal models that smooth out temporal trends in the data, allowing the analyst to better estimate outcomes at particular time points. In addition, the composition of the baseline MHAP sample potentially differs across intervention conditions because depressed intervention patients exposed to effective quality-improvement efforts prior to recruitment may have remitted by the time of screening. To assess this possibility, we will compare screen-positive rates in MHAP intervention and control practices. Although variation in prerecruitment clinician training across the four projects potentially introduces the same problem, experimental research suggests that this variation will have considerably less impact on process and outcomes [33], so parallel adjustments will not be made. Second, the length of patient recruitment varied across practices in the same project and across projects: 3–18 months (HQID), 7–16 months (QuEST), 2–10 months (MHAP), and 4 –7 months (PIC). If providers improved their depression-management skills over time with additional experience, patients recruited later into the project may have realized better outcomes than those recruited earlier. Conversely, the opposite may occur if the impact of the provider training erodes over time. In order to account for this potential change over time, we will use time-trend analyses within each study to estimate the trajectory of
intervention effect as a function of the lag between when recruitment started and when the patient was recruited. Such trajectories will allow us to compare outcomes among patients recruited at comparable time points across the studies. Similarly, if providers extended their improved depressionmanagement skills to their entire caseload of subjects (as they were encouraged to do in the MHAP intervention) rather than exclusively to study subjects, patients recruited at the end of the recruitment window after being exposed to the intervention may be more treatment resistant. We can examine this possibility by investigating how pre-existing indicators of treatment resistance differ across subjects recruited in later phases across the four projects, and adjust as indicated. Third, projects differed in the sampling strategy they employed to identify patients for screening. HQID screened all consecutive patients in nonrandomly selected windows of time. MHAP screened all consecutive patients until a practice quota was achieved. PIC screened consecutive patients during randomly selected time blocks in a set window of time. QuEST screened all consecutive patients in a majority of practices and screened all consecutive patients during nonrandomly selected time blocks in the remaining practices until practice quotas were met. To adjust for the potential sampling bias, QuEST conducted a complete
K.M. Rost et al. / General Hospital Psychiatry 23 (2001) 239 –253
2-week census of all consecutive patients in all practices, and used conversion weighting to weight screened patients to all patients in the consecutive census [47]. The conversion weights compared the distribution of age, race, and gender observed for both the convenience sample and the consecutive sample, and then adjusted the distribution in the convenience sample to match the distribution in the consecutive sample. 5.4. Retention Retention rates for participating organizations, practices, and clinicians were 100% across all four projects. Retention rates for participating patients varied only slightly across projects at 6 months, ranging from 80.6% (HQID), 86.5% (PIC), 89.5% (MHAP), to 90.7% (QuEST). Retention rates were comparable between intervention and control patients in each project, except for QuEST, which was significantly more successful in following control subjects across all waves. Differences in patient retention rates across projects and within projects (QuEST only) will be controlled by using attrition weights to better represent the cohort eligible to complete the follow-up under consideration. Attrition weights assign successfully followed subjects with similar sociodemographic and clinical characteristics to dropouts a proportionately greater weight in the analysis to increase the representativeness of subjects completing follow-up to subjects originally enrolled in the study. 5.5. Data collection methods HQID, MHAP, and QuEST administered all postscreener patient assessments of core constructs by telephone. PIC administered all postscreener patient assessments of core constructs by self-report mail survey, using telephone and in-person follow-up as needed. Differences in survey administration lead to three potential sources of heterogeneity: 1) systematic response bias between selfreport mail (PIC) and telephone assessment (HQID, MHAP, and QuEST); 2) different levels of item-missingness (higher in PIC self-report mail surveys); and 3) different lag times between completion of screening and baseline assessment. First, in a methodological sub-experiment, PIC administered both telephone and mail surveys to representative patients in random order to derive a correction factor to control for survey administration bias. Second, we used multiple imputation methods [52,53] to address different levels of item-missingness across the four projects. To impute missing data, the analyst develops a model that predicts the likely distribution of missing data values, and randomly samples an imputed value from the predicted distribution. In the “hot deck” imputation method QID employed, the analyst selected one or more “donors” from the same strata to define the predicted distribution of the variable for subjects whose data were missing. The imputation procedure was
247
Table 7 Suggested methods to reduce effects of heterogeneity introduced by research design variation across QID projects Design issue
Major suggested method
Selection of organizations, practices, and clinicians Patient recruitment Non-participation bias
Use society relevance weights*
Variation in pre-recruitment intervention Length of recruitment window Sampling strategies for screening Patient retention Data collection methods Differences in response between self-report mail and telephone assessment Differential item missingness Differences in lag time between screening and baseline assessment
Use overall non-response weights Use time-trend analysis Use time-trend analysis Use conversion weights Use attrition weights Use correction factor from methods sub-study Use multiple imputation methods Impute baseline data from screening data for patients with extensive lags
* In a simplified example, since subjects who live in non-metropolitan counties constitute only 13% of QID subjects but approximately 20% of American patients seeking help in primary care settings, QID subjects who live in non-metropolitan counties will receive a greater weight in analyses which incorporate society relevance weights.
replicated multiple times to estimate the uncertainty the imputation introduces into the estimate. Third, to address the differences in lag times between screening and baseline assessment within and across projects, we imputed baseline depression severity scores from screening data for subjects who completed baseline data greater than 30 days after screening. We plan to conduct sensitivity analyses that exclude patients with delayed baseline interviews and use end-status analyses without baseline data to provide a comparison for those analyses that potentially could be affected by variations in time lag.
6. Summary The sources of heterogeneity discussed above that potentially threaten the internal and external validity of combined database analyses are summarized in Table 7, along with the statistical controls we propose to introduce to reduce the threats these deviations introduce. It is our hope that this effort to identify these threats and to propose carefully considered solutions to them will be useful to other health services researchers, as meta-analysis of patient methods have the potential to make a large contribution to synthesizing disparate literatures needed to inform the transformation of usual care to best practice.
248
K.M. Rost et al. / General Hospital Psychiatry 23 (2001) 239 –253
K.M. Rost et al. / General Hospital Psychiatry 23 (2001) 239 –253
249
250
K.M. Rost et al. / General Hospital Psychiatry 23 (2001) 239 –253
K.M. Rost et al. / General Hospital Psychiatry 23 (2001) 239 –253
251
252
K.M. Rost et al. / General Hospital Psychiatry 23 (2001) 239 –253
Acknowledgment The preparation of this manuscript was supported by the John D. and Catherine T. MacArthur Foundation and by the National Institutes of Mental Health R10 Cooperative Agreement Quality Improvement for Depression Grants MH50732, MH54444 and MH54443, and Agency for Health Care Policy and Research Grant HS08349. It was also supported by MH54623 and MH63651. The authors thank Christy Klein, Maureen Carney, Bernadette Benjamin, and Chantal Avila for help in the PIC study; Carole Oken and Mary Abdun–Nur for help in the MHAP study; Jeff Smith, Carl Elliott, and Paul Nutting for help in the QuEST study; Christine Nelson, Ray Turner, Tracey Hare, Hong Vu, and Jose Arbelaez for help in HQID and the combined database; Kathryn Magruder for help with the overall collaboration; and Bob Bell for statistical consultation. The authors acknowledge participating managed care organizations and participating primary care providers: Kaiser Permanente Medical Care Programs in Northern California Region, Oakland, CA; VA Medical Center, Sepulveda, CA; Ambulatory Sentinel Practice Network, Denver, CO, a practice-based research network of family physicians that voluntarily participates in research; NYLCare Health Plans of the Mid-Atlantic, Greenbelt, MD; Allina Medical Group, Twin Cities, MN; Columbia Medical Plan, Columbia, MD; Humana Health Care Plans, San Antonio, TX; MedPartners, Los Angeles, CA; PacifiCare of Texas, San Antonio, TX; and Valley-Wide Health Services, Alamosa, CO.
References [1] Wagner EH, Austin BT, Von Korff M. Organizing care for patients with chronic illness. Milbank Q 1996;74:511– 44. [2] Kessler RC, McGonagle KA, Zhao S, et al. Lifetime and 12-month prevalence of DSM-III-R psychiatric disorders in the United States: results from the National Comorbidity Survey. Arch Gen Psychiatry 1994;51:8 –19. [3] Wells KB, Stewart A, Hays RD, et al. The functioning and well-being of depressed patients: results from the Medical Outcomes Study. JAMA 1989;262:914 –9. [4] Broadhead WE, Blazer DG, George LK, et al. Depression, disability days, and days lost from work in a prospective epidemiologic survey. JAMA 1990;264:2524 – 8. [5] Mintz J, Mintz LI, Arruda MJ, et al. Treatments of depression and the functional capacity to work. Arch Gen Psychiatry 1992;49:761– 8. [6] Hirschfeld RMA, Montgomery SA, Keller MB, et al. Social functioning in depression: a review. J Clin Psychiatry 2000;61:268 –75. [7] Depression Guideline Panel. Depression in Primary Care, Vol. 2. Treatment of Major Depression. Clinical Practice Guideline, No. 5. (AHCPR Publication No. 93-0551). Rockville, MD: US Department of Health and Human Services, Public Health Service, Agency for Health Care Policy and Research, 1993. [8] Kerr EA, McGlynn EA, Van Vorst KA, et al. Measuring antidepressant prescribing practice in a health care system using administrative data: implications for quality measurement and improvement. Jt Comm J Q Improv 2000;26:203–16.
[9] Katon W, Von Korff M, Lin E, et al. Adequacy and duration of antidepressant treatment in primary care. Med Care 1992;30:67–76. [10] Wells KB, Katon W, Rogers B, et al. Use of minor tranquilizers and antidepressant medications by depressed outpatients: results from the Medical Outcomes Study. Am J Psychiatry 1994;151:694 –700. [11] Simon GE, Von Korff M, Wagner EH, et al. Patterns of antidepressant use in community practice. Gen Hosp Psychiatry 1993;15:399 – 408. [12] Schulberg HC, Block MR, Madonia MJ, et al. The ‘usual care’ of major depression in primary care practice. Arch Fam Med 1997;6: 334 –9. [13] Rost KM, Zhang M, Fortney J, et al. Persistently poor outcomes of undetected major depression in primary care. Gen Hosp Psychiatry 1998;20:12–20. [14] Rubenstein LV, Parker LE, Meredith LS, et al. Understanding teambased quality improvement for depression in primary care. Health Services Research, in press. [15] Katon W, Von Korff M, Lin E, et al. Collaborative management to achieve treatment guidelines: impact on depression in primary care. JAMA 1995;273:1026 –31. [16] Katon W, Robinson P, Von Korff M, et al. A multifaceted intervention to improve treatment of depression in primary care. Arch Gen Psychiatry 1996;53:924 –32. [17] Schulberg HC, Block MR, Madonia MJ, et al. Treating major depression in primary care practice: eight-month clinical outcomes. Arch Gen Psychiatry 1996;53:913–9. [18] Katzelnick DJ, Simon GE, Pearson SD, et al. Randomized trial of a depression management program in high utilizers of medical care. Arch Fam Med 2000;9:345–51. [19] Wells KB, Sherbourne CD, Schoenbaum M, et al. Impact of disseminating quality improvement programs for depression in managed primary care: a randomized controlled trial. JAMA 2000;283:212–20. [20] Rost K, Nutting P, Smith J, et al. Improving depression outcomes in community primary care practice: a randomized trial of the QuEST intervention. J Gen Intern Med 2001;16:143–9. [21] Hunkeler EM, Meresman JF, Hargreaves WA, et al. Efficacy of nurse telehealth care and peer support in augmenting treatment of depression in primary care. Arch Fam Med 2000;9:700 – 8. [22] Goldberg HI, Wagner EH, Fihn SD, et al. A randomized controlled trial of CQI teams and academic detailing: can they alter compliance with guidelines? Jt Comm J Q Improv 1998;24:130 – 42. [23] Katon W, Von Korff M, Lin E, et al. Stepped collaborative care for primary care patients with persistent symptoms of depression: a randomized trial. Arch Gen Psychiatry 1999;56:1109 –15. [24] Cooper–Patrick L, Gallo JJ, Gonzales JJ, et al. Race, gender, and partnership in the patient-physician relationship. JAMA 1999;282: 583–9. [25] Lin EHB, Simon GE, Katon WJ, et al. Can enhanced acute-phase treatment of depression improve long-term outcomes? A report of randomized trials in primary care. Am J Psychiatry 1999;156:643–5. [26] Tutty S, Simon G, Ludman E. Telephone counseling as an adjunct to antidepressant treatment in the primary care system. Effective Clin Practice 2000;4:170 – 8. [27] Worrall G, Angel J, Chaulk C, et al. Effectiveness of an educational strategy to improve family physicians’ detection and management of depression: a randomized controlled trial. Can Med Assoc J 1999; 161:37– 40. [28] Tiemens BG, Ormel J, Jenner JA, et al. Training primary-care physicians to recognize, diagnose and manage depression: does it improve patient outcomes? Psychol Med 1999;29:833– 45. [29] Lin EHB, Katon WJ, Simon GE, et al. Achieving guidelines for the treatment of depression in primary care: is physician education enough? Med Care 1997;35:831– 42. [30] Simon GE, Von Korff M, Rutter C, et al. Randomised trial of monitoring, feedback, and management of care by telephone to improve treatment of depression in primary care. BMJ 2000;320:550 – 4.
K.M. Rost et al. / General Hospital Psychiatry 23 (2001) 239 –253 [31] Miranda J, Munoz R. Intervention for minor depression in primary care patients. Psychosom Med 1994;56:136 – 42. [32] Callahan CM, Hendrie HC, Dittus RS, et al. Improving treatment of late life depression in primary care: a randomized clinical trial. J Am Geriatr Soc 1994;42:839 – 46. [33] Thompson C, Kinmonth AL, Stevens L, et al. Effects of a clinicalpractice guideline and practice-based education on detection and outcome of depression in primary care: Hampshire Depression Project randomised controlled trial. Lancet 2000;355:185–91. [34] Carr VJ, Lewin TJ, Reid ALA, et al. An evaluation of the effectiveness of a consultation-liaison psychiatry service in general practice. Aust N Z J Psychiatry 1997;31:714 –25. [35] Brown JB, Shye D, McFarland BH, et al. Controlled trials of CQI and academic detailing to implement a clinical guideline for depression. Jt Comm J Q Improv 2000;26:39 –54. [36] Olkin I. Statistical and theoretical considerations in meta-analysis. J Clin Epidemiol 1995;48:133– 46. [37] Olkin I. Meta-analysis: reconciling the results of independent studies. Stat Med 1995;14:457–72. [38] Rubin DM. A new perspective. In: Watcher KW, Straf ML, editors. The Future of Meta-Analysis. New York: Russell Sage Foundation, 1990. [39] Berkey CS, Hoaglin DC, Mosteller F, et al. A random-effects regression model for meta-analysis. Stat Med 1995;14:395– 411. [40] NCI Breast Cancer Screening Consortium. Screening mammography: a missed clinical opportunity? Results of the NCI Breast Cancer Screening Consortium and National Health Interview Survey studies. JAMA 1990;264:54 – 8. [41] Sturm R, Wells K. Health insurance may be improving— but not for individuals with mental illness. Health Serv Res 2000;35:253– 62. [42] Province MA, Hadley EC, Hornbrook MC, et al. The effects of exercise on falls in elderly patients: a preplanned meta-analysis of the FICSIT trials. JAMA 1995;273:1341–7.
253
[43] Hirano K, Imbens GW, Rubin DB, et al. Assessing the effect of an influenza vaccine in an encouragement design. Biostatistics 2000;1: 69 – 88. [44] Laird NM, Mosteller F. Some statistical methods for combining experimental results. Int J Technol Assess Health Care 1990;6:5–30. [45] Depression Guideline Panel. Depression in Primary Care, Vol. 1. Detection and Diagnosis. Clinical Practice Guideline, No. 5. (AHCPR Publication No. 93-0550). Rockville, MD: US Department of Health and Human Services, Public Health Service, Agency for Health Care Policy and Research, 1993. [46] Wells KB. The design of Partners In Care: evaluating the costeffectiveness of improving care for depression in primary care. Social Psychiatry Psychiatr Epidemiol 1999;34:20 –9. [47] Rost K, Nutting PA, Smith J, et al. Designing and implementing a primary care intervention trial to improve the quality and outcome of care for major depression. Gen Hosp Psychiatry 2000;22:66 –77. [48] Lomas J, Anderson GM, Domnick–Pierre K, et al. Do practice guidelines guide practice? The effect of a consensus statement on the practice of physicians. N Engl J Med 1989;321:1306 –11. [49] Kosecoff J, Kanouse DE, Rogers WH, et al. Effects of the National Institutes of Health consensus development program on physician practice. JAMA 1987;258:2708 –13. [50] World Health Organization. Composite International Diagnostic Interview for Primary Care, Version 2.0. Geneva: World Health Organization, 1996. [51] Zimmerman M, Coryell W, Wilson S, et al. Evaluation of symptoms of major depressive disorder: self-report vs. clinician ratings. J Nerv Ment Dis 1986;174:150 –3. [52] Little RL, Rubin DB. Statistical Analysis with Missing Data. New York: Wiley, 1987. [53] Schafer JL. Analysis of Incomplete Multivariate Data. London: Chapman & Hall, 1997.