Optimizing prediction of back pain outcomes

Optimizing prediction of back pain outcomes

Ò PAIN 154 (2013) 1391–1401 www.elsevier.com/locate/pain Optimizing prediction of back pain outcomes Judith A. Turner a,⇑, Susan M. Shortreed b, Ka...

301KB Sizes 0 Downloads 48 Views

Ò

PAIN 154 (2013) 1391–1401

www.elsevier.com/locate/pain

Optimizing prediction of back pain outcomes Judith A. Turner a,⇑, Susan M. Shortreed b, Kathleen W. Saunders b, Linda LeResche c, Jesse A. Berlin d, Michael Von Korff b a

Department of Psychiatry and Behavioral Sciences, University of Washington School of Medicine, Seattle, WA, USA Group Health Research Institute, Seattle, WA, USA c Department of Oral Medicine, University of Washington School of Dentistry, Seattle, WA, USA d Janssen Research & Development, LLC, Titusville, NJ, USA b

Sponsorships or competing interests that may be relevant to content are disclosed at the end of this article.

a r t i c l e

i n f o

Article history: Received 12 December 2012 Received in revised form 22 March 2013 Accepted 12 April 2013

Keywords: Prediction Back pain Outcomes Chronic pain Primary care

a b s t r a c t An accurate means of identifying patients at high risk for chronic disabling pain could lead to more costeffective care, with more intensive interventions targeted to those likely to benefit most. The Chronic Pain Risk Score is a tool developed to predict risk for chronic pain. The aim of this study was to examine whether its predictive ability could be enhanced by: (1) improved measures of the constructs it assesses (Improved Chronic Pain Risk Model); and (2) adding other predictors (Expanded Chronic Pain Risk Model). Patients initiating primary care for back pain (N = 571) completed measures used in the Chronic Pain Risk Score, Improved Model, and Expanded Model, then completed the Graded Chronic Pain Scale (GCPS) 4 months later (n = 521; 91% response rate). In predicting 4-month GCPS grade III or IV (moderate or severe pain-related activity interference), the Improved Model performed better than did the Chronic Pain Risk Score (Net Reclassification Index [NRI] = 0.32, P = 0.003). The Expanded Model improved significantly on the prediction of the Improved Model (NRI = 0.56, P < 0.001) and demonstrated excellent discriminative ability (AUC = 0.84, 95% CI = 0.79-0.88). The Improved Model (AUC = 0.79, 95% CI = 0.75-0.84) and the Chronic Pain Risk Score (AUC = 0.76, 95% CI = 0.71-0.81) showed acceptable discriminative ability. A limited set of measures may be used to predict risk for future clinically significant pain in patients initiating primary care for back pain, but further evaluation of prognostic models is needed. Ó 2013 International Association for the Study of Pain. Published by Elsevier B.V. All rights reserved.

1. Introduction Back pain is one of the most common reasons for physician visits [10]. Although most patients initiating care for back pain show marked improvement over the next 6 weeks, a minority sustain significant back pain and activity limitations over long periods of time [4,6]. Chronic back pain is frequently associated with diminished quality of life, reduced participation in activities, and psychological distress. At a societal level, it is responsible for substantial health care and lost work-productivity costs [4,7]. Early interventions to prevent acute back pain from progressing to chronic disabling pain have been proposed [15,54] but are unnecessary for the majority of patients, who will improve without additional treatment. An accurate means of identifying patients at high risk for developing chronic disabling back pain could help to target interventions to those who could benefit most. Furthermore, in⇑ Corresponding author. Address: Department of Psychiatry and Behavioral Sciences, University of Washington School of Medicine, Box 356560, Seattle, WA 98195, USA. Tel.: +1 206 543 3997; fax: +1 206 685 1139. E-mail address: [email protected] (J.A. Turner).

creased understanding of risk factors for developing clinically significant chronic pain could help to guide early interventions. Conclusions regarding risk factors for chronic back pain have varied across both systematic reviews and individual studies, likely as a result of differences in study samples, methods and measures [20]. Characteristics of patients with acute or subacute back pain that have been reported with some consistency to predict chronic back pain include more severe pain, greater functional disability, and radiculopathy/sciatica [5,20,40]. Various measures of maladaptive pain coping and work-related variables have also been found to be associated with subsequent development of chronic back pain [5,20,40]. There is conflicting evidence regarding the prognostic value of measures of depression and other psychological disorders [5,20,40]. Several tools have been devised to predict risk for chronic back pain on the basis of a combination of variables, but none has been extensively validated [5]. These tools include the Chronic Pain Risk Score [51], which uses a limited set of self-reported measures to estimate a patient’s likelihood of continued clinically significant back pain. Longitudinal studies of primary care and general population samples of individuals with pain of varying duration have

0304-3959/$36.00 Ó 2013 International Association for the Study of Pain. Published by Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.pain.2013.04.029

Ò

1392

J.A. Turner et al. / PAIN 154 (2013) 1391–1401

found that the Chronic Pain Risk Score predicts clinically significant pain 6 months to 5 years later at an acceptable level of accuracy not only for back pain, but also for other chronic pain conditions [12,36,43,49,51]. Von Korff and Miglioretti [51] emphasized the potential for improving prediction by modifying or adding to the measures used to generate the Chronic Pain Risk Score. The objectives of this study were to (1) extend evaluation of the Chronic Pain Risk Score to a novel population—primary care patients initiating a new episode of care for back pain—and (2) determine whether its predictive ability could be enhanced significantly by (a) using improved measures of the constructs assessed and (b) adding other measures found in previous studies to predict pain outcomes. We examined predictive accuracy at both group and individual levels. 2. Methods 2.1. Study setting, sample, and procedures Study participants were members of Group Health, a large integrated health care plan in Washington State. All study procedures were approved by the Group Health Institutional Review Board. Fig. 1 illustrates the study flow. Daily searches of Group Health’s automated data files were conducted to identify patients who were potentially eligible for the study based on the following requirements: (1) Group Health membership and residence in the greater Seattle area; (2) age 18 to 64 years; (3) continuous enrollment in Group Health for the previous 2 years (to ensure complete health care utilization data through automated data sources); and (4) primary care visit for back pain (index visit) within the prior 5 days. We used a previously developed [3] International Classification of Diseases (Clinical Modification, 9th revision [ICD-9-CM]) diagnostic code algorithm to identify patients making visits for mechanical low back pain (ie, pain not due to neoplastic, infectious or inflammatory cause or associated with pregnancy or major trauma). We used automated data to exclude patients for the following

reasons: (1) visit for back pain in the year prior to the index visit (because we were interested in new episodes of back pain care); (2) lumbar spine surgery (based on procedure codes previously used for this purpose [17]) in the past 2 years; (3) pregnancy (ICD-9-CM codes 630-676); (4) diagnosis of Parkinson’s disease or multiple sclerosis; and (5) treatment for cancer other than nonmelanoma skin cancer in the prior year. Potential study participants were mailed a letter describing the study, with a $2 bill enclosed. Interviewers then telephoned the prospective participants to explain the study, verify study eligibility, and obtain consent from those who agreed to enroll. Patients were asked if they had ever had back surgery; if they answered affirmatively, they were excluded from participation. Because we were interested in interviewing patients in close proximity to their index back-pain visit, patients who could not be reached within 14 days after this visit were excluded from study participation. We also excluded patients who said they had not made a visit within 14 days for back pain or who could not complete a telephone interview (eg, due to illness or language or hearing barriers). Study participants completed a baseline telephone interview and were asked to complete a follow-up telephone interview 4 months later. Outcomes were measured at 4 months because improvement in back pain occurs predominantly during the first 6 weeks after pain onset or care initiation, with only small reductions in pain and disability between 6 weeks and 1 year [6]; back pain outcomes at 3 to 6 months are similar to those at 1 year [5,39]; and chronic back pain has been defined as back pain continuing for at least 3 months [25]. Participants received $10 for each interview they completed. 2.2. Measures 2.2.1. Baseline The baseline variables included measures we believed might improve on measures of constructs assessed by the original Chronic Pain Risk Score (see Section 2.2.1.1.) plus measures of

Records Review: identify patients with a recent back pain visit and no prior back pain visits in past year, mail study invitations n=1574

Telephone screening n=865

Declined screening n=384

Enrolled and completed interview N=571

Completed 4-month interview N=521

Unable to contact n=325

Not eligible n=294 Said no back pain visit 131 Said no back pain 32 Language barrier 63 Prior back surgery 40 Pregnant 6 Cancer 3 Other 19

Could not be contacted for 4-month interview n = 23 Deceased or too ill to complete 4-month interview n = 2 Declined 4-month interview n = 25

Fig. 1. Study Flow Chart.

Ò

J.A. Turner et al. / PAIN 154 (2013) 1391–1401

additional variables that we thought, based on other research, might enhance the predictive ability of the Chronic Pain Risk Score (Section 2.2.1.2.). Most of these measures were obtained in the baseline telephone interviews with study participants. However, because health care organizations have increasingly adopted electronic health records as well as electronic health care utilization and diagnosis databases, we were also interested in examining whether electronic health care database information might provide additional contributions (beyond those of patient self-report measures) to a predictive model (Section 2.2.1.3.). 2.2.1.1. Improved measures of constructs assessed by the original Chronic Pain Risk Score. The original Chronic Pain Risk Score [51] is a single score derived from the 3 Graded Chronic Pain Scale (GCPS) [52] ratings of pain intensity, the 3 GCPS ratings of pain-related activity interference, the number of activity limitation days due to pain, the number of days with pain in the past 6 months, a depressive symptom scale, and the number of anatomical sites of pain. The score has a possible range of 0 to 28 (higher scores indicate greater risk). We made several changes to enhance assessment of the constructs of pain dysfunction (intensity and activity interference), pain diffuseness (number of anatomical sites of pain), and psychological distress, as described below. 2.2.1.1.1. Pain intensity/dysfunction. We created a single pain intensity/dysfunction score reflecting the mean of 6 items from the GCPS [46,52]: 3 ratings of pain intensity (average pain in the past month, worst pain in the past month and current pain), each on a scale of 0 (no pain) to 10 (pain as bad as could be); and 3 ratings of pain interference with activities in the past month (daily activities; recreational, social and family activities; and ability to work [including housework]), each on a scale of 0 (no interference) to 10 (unable to carry on any activities). The GCPS has been validated and shown to have good psychometric properties in large population surveys and in large samples of primary care patients with pain [46,52]. We combined the pain intensity and pain interference with activities items into a single scale in order to reduce the number of variables analyzed and because the mean pain intensity rating is correlated moderately with the mean activity interference rating (r = 0.58 in our sample). 2.2.1.1.2. Pain diffuseness. The original Chronic Pain Risk Score included 1 component reflecting pain diffuseness, as assessed by the presence or absence of pain at 4 specific anatomical sites in addition to the primary pain site. For the current study, we created a measure of pain diffuseness that included the pain-site items from the Patient Health Questionnaire-15 (PHQ-15) measure of somatic symptoms [30] plus other anatomical sites, for a total of 10 anatomical areas. Participants were asked how bothered they had been in the past 4 weeks (0 = not at all, 1 = a little, 2 = a lot) by stomach pain; back pain; pain in arms, legs or joints; headaches; chest pain; neck pain; head pain other than headaches; pelvic/ groin pain; facial pain; and widespread pain. This measure is scored by summing the pain bothersomeness ratings across the body sites (score range 0 to 20), just as the PHQ-15 measure is scored. 2.2.1.1.3. Pain persistence. We retained the item from the original Chronic Pain Risk Score that assessed number of days with back pain in the past 6 months. 2.2.1.1.4. Psychological distress. The original Chronic Pain Risk Score assessed psychological distress using a depression scale. The depressive symptom scale used in the original Chronic Pain Risk Score was the Symptom Checklist-90 [9], but other depressive symptom scales have been used since, and scoring rules for alternative depression scales have been provided [47]. In the current study, we assessed depressive symptoms using the PHQ-8 [33], a widely used, validated measure of depression in primary care. Along with depression, anxiety disorders are the most common

1393

psychiatric disorders in patients with chronic pain [8,16,29]. Therefore, we also administered the Generalized Anxiety Disorder-2 (GAD-2) [32], a validated anxiety disorder screening measure developed for use in primary care. Depression and anxiety commonly coexist, and when both are present, patient disability is greater than when either alone is present [31]. Measures of depression and anxiety are moderately correlated (r = 0.61 between the PHQ-8 and the GAD-2 in our sample). For these reasons, and to reduce the number of predictor variables analyzed, based on a previous combination of items from the PHQ measure of depression and the GAD measure of anxiety [31], so as to create a measure reflecting both anxiety and depression, we summed the PHQ-8 and the GAD-2 scores to obtain an overall index of level of depressive and anxiety symptoms (possible range, 0 to 30; higher scores indicate more severe levels). 2.2.1.2. Additional self-report measures that might enhance prediction. We evaluated the following variables, selected on the basis of previous research, for their ability to enhance prediction of disabling back pain: 2.2.1.2.1. Education. Participants reported the highest grade of education completed. 2.2.1.2.2. Pain duration and trajectory. Participants reported the duration of their current episode of back pain and whether, since the episode began, their back pain was the same, better, better and worse, or worse. 2.2.1.2.3. Work disability/litigation/compensation. We created a dichotomous variable based on the presence vs absence of 1 or more positive responses to questions about permanent or temporary inability to work for health reasons, involvement in or consideration of litigation for a back injury, and receipt of benefits from or consideration of filing a worker’s compensation or other disability claim. The number of patients who endorsed any one of these items was too small for us to analyze the individual items separately. 2.2.1.2.4. Recovery expectations. Participants rated their confidence that their back pain would be gone in 6 months on a scale of 0 to 10, with 10 meaning extremely confident. Similar ratings predict disability outcomes after occupational back injuries [44]. 2.2.1.2.5. Catastrophizing. Pain-related catastrophizing was assessed by the Pain Catastrophizing Scale [42], which has demonstrated reliability and validity [37,42]. Although the Pain Catastrophizing Scale has 3 subscales, we examined only the total score (range, 0 to 52; higher scores indicate greater catastrophizing) in order to reduce the number of predictors analyzed. 2.2.1.2.6. Fear-avoidance. Fear-avoidance was assessed by a 10item version [48] of the 17-item Tampa Scale for Kinesiophobia [45]. The 10-item version eliminates redundant and difficult-tounderstand items and was rescaled to yield a total score comparable to the total score of the original version [48] (score range, 17 to 68; higher scores indicate greater fear-avoidance). 2.2.1.2.7. Sleep problems. We administered 5 items from the Pittsburgh Sleep Quality Index [2]. To reduce the burden that would be entailed if the entire Pittsburgh Sleep Quality Index were administered, we selected the items that we thought were most relevant to sleep problems reported by patients with back pain. Participants indicated the frequency, in the previous month, of inability to get to sleep within 30 minutes; waking up in the middle of the night or the early morning; trouble sleeping because of pain; and trouble staying awake while driving, eating meals, or engaging in social activity. Participants also provided an overall rating of sleep quality during the past month. Item responses were summed to yield a total score (range, 0 to 15; higher scores indicate greater sleep problems). 2.2.1.2.8. Harm beliefs. Participants completed the 2-item Survey of Pain Attitudes (SOPA) harm scale [28], which assesses the belief

1394

Ò

J.A. Turner et al. / PAIN 154 (2013) 1391–1401

that hurt signals harm. To increase ease of administration and clarity in telephone administration (the original version was designed for use on a paper-and-pencil questionnaire), we modified the response choices from the original version (very untrue, somewhat untrue, neither true nor untrue, somewhat true, very true) to strongly disagree, disagree, neutral, agree, and strongly agree. We analyzed the mean of the 2 items, categorized (based on the level of agreement with the items and the sample distribution) as a low (61.0), medium (between 1 and 2.5), or high (P2.5) belief that hurt signals harm. 2.2.1.2.9. Control beliefs. Participants also completed the 2-item SOPA Control scale [28], which assesses perceived control over pain, with response choices modified as for the Harm scale. We analyzed the mean of the 2 items, categorized (based on the level of agreement with the items and the sample distribution), as low (61.5), medium (between 1.5 and 3), or high (P3) perceived control over pain. 2.2.1.2.10. Radiating pain. Participants indicated whether or not their back pain radiated down one or both legs below the knee. 2.2.1.2.11. Smoking status. Participants were asked how often they smoke tobacco products (never, occasionally, frequently, or daily). Given the relatively small number of patients who smoked, we categorized responses as never vs some. 2.2.1.2.12. Self-rated general health other than back pain. Participants rated their health in general, apart from their back pain, as excellent, very good, good, fair, or poor. 2.2.1.2.13. Days of opioid use, past 2 weeks. Participants were read a list of prescription opioid medications and asked how many days in the past 2 weeks they had taken a prescription opioid medication. Prior research has found that opioid use shortly after an injury is associated with worse long-term outcomes [14,53]. We categorized responses as 0, 1-3, or 4-14 days, to reflect no, minimal, or occasional-to-daily use. 2.2.1.2.14. Work-related stress. Participants with current jobs used a 4-point scale that ranged from strongly disagree to strongly agree to indicate agreement with each of 5 items describing their job (requiring working very fast, requiring working very hard, being very hectic, being asked to do an excessive amount of work, and having enough time to get the job done). These items have been found in previous research to predict chronic work disability in workers with recent back injuries [44]. Scores on the 5 items were averaged to create a mean score (range, 1 to 4), with higher scores indicating more stress. Because some patients were not working, we categorized responses as not working, low work stress (<2.8), or high work stress (2.8 to 4). 2.2.1.2.15. Jobs’ physical demands. Patients with current jobs were asked to rate their jobs’ physical demands as sedentary, light, medium, heavy, or very heavy, and they were asked how often their jobs require heavy lifting. These items have been found, in previous research, to predict chronic work disability in workers with recent back injuries [44]. 2.2.1.3. Additional electronic health care database variables that might enhance prediction. 2.2.1.3.1. Sociodemographic. We obtained information from the health plan’s electronic records on patient age and gender. 2.2.1.3.2. Body mass index. Using the most current electronic health care data on patient height and weight, we applied standard methods to calculate and categorize body mass index. The categories are <25 (normal/underweight); 25 to less than 30 (overweight); and 30 or greater (obese). 2.2.1.3.3. Medical comorbidity. We used the Romano adaptation of the Charlson medical comorbidity index based on International Classification of Diseases (ICD-9-CM) administrative data [41].

2.2.1.3.4. Anxiety/depression diagnosis. We examined anxiety and depressive disorder diagnoses and associated visits in the past 2 years and categorized them as none, primary care diagnosis only (no specialty care), or specialty care or inpatient admission. 2.2.1.3.5. Substance abuse diagnosis. We examined diagnoses of substance or alcohol abuse or dependence in the past 2 years and categorized them as none versus 1 or more. 2.2.1.3.6. Health care utilization. Using electronic visit and diagnosis databases, we calculated the number of different pain conditions patients made visits for, the total number of visits for pain, and the total number of non-pain visits in the past 2 years. Based on the sample distributions, which were skewed, we categorized each variable for analysis (Table 1). Table 1 Baseline variables and sample characteristics (N = 521 patients who completed the 4month follow-up interview). Variable Sociodemographic characteristics Age, years, mean (SD) Gender, female, % Education, college graduate, % Race/ethnicity, non-Hispanic white, % Pain-related characteristics Pain diffuseness (number and bothersomeness of pain sites), mean (SD) Pain intensity/dysfunction (GCPS intensity, interference items), mean (SD) Pain persistence (number of days with back pain, past 6 months), mean (SD) Back pain episode duration, days,a mean (SD) Back pain trajectory since episode began, % Same Better Better and worse Worse Work disability/litigation/compensation, % yes Psychological characteristics Depression/anxiety (sum of PHQ-8 [33] and PHQ-2 [32]), mean (SD) Recovery expectations, mean (SD) Catastrophizing, mean (SD) Fear-avoidance, mean (SD) Sleep problems, mean (SD)

Mean (SD) or percent 47.8 (12.8) 59.9 60.1 73.4 5.5 (3.0) 5.0 (1.9) 66.1 (64.2) 71.8 (112.9) 13.3 35.8 38.3 12.5 12.8 6.5 (5.8) 5.7 (3.4) 12.9 (10.2) 37.5 (9.0) 7.7 (3.2)

SOPA [26–28] Harm, mean (SD) <1.5, % 1.5-<2.5, % P2.5, %

1.6 (0.8) 32.3 46.7 21.0

SOPA [26–28] Control, mean (SD) <2, % 2-<3, % P3, %

2.4 (0.9) 20.2 40.9 39.0

Anxiety/depression diagnosis, EHR, past 2 years, % None Primary care diagnosis (no specialty care) Specialty care or inpatient admission Alcohol or drug abuse or dependence diagnosis, EHR, past 2 years, % yes

76.2 11.1 12.7 2.1

Biomedical, health, and health care characteristics Pain below the knee, % yes Current smoker, % yes Health other than back pain, % Excellent Very good Good Fair Poor Comorbidity (Romano version of Charlson score, EHR, past year), mean (SD)

21.4 14.6 19.8 40.3 29.0 8.6 2.3 0.4 (1.0)

Ò

J.A. Turner et al. / PAIN 154 (2013) 1391–1401 Table 1 (continued) Variable

BMI (from EHR), % <25 25-<30 P30

Mean (SD) or percent

27.8 35.7 36.5

Days’ supply of opioid and sedativeb medications, EHR, past 2 years, % 0-29 days 30-89 days P90 days

81.8 8.1 10.2

Number of different pain conditions made visits for, EHR, past 2 years, % 0 1-2 P3

24.6 51.8 23.6

Number of visits for pain, EHR, past 2 years, % 0 1-2 P3 Number of nonpain visits, EHR, past 2 years, % 0-5 6-10 P11

24.6 34.2 41.3 36.7 32.3 31.1

Number of symptom-related visits, EHR, past 2 years, % <1 1-<3 P3

45.7 33.4 20.9

Self-reported days of opioid use, past 2 weeks, % 0 1-3 4-14

64.8 12.3 22.9

Work variables Work-related stress, % Not working Low stress (<2.8, 1-4 scale) High stress (2.8-4, 1-4 scale)

17.7 58.0 24.4

Job physical demands, % Not working Less than heavy Heavy or very heavy

17.7 66.6 15.7

Frequency of lifting, % Not working Less than frequent Frequent or more

17.7 65.6 16.7

EHR, electronic health care record/database; SOPA, Survey of Pain Attitudes. a Truncated at 365; more than 365 days was coded as 365 due to questionable accuracy of recall beyond 365 days. b Sedative category includes sedatives, hypnotics, muscle relaxants, and anxiolytics.

2.2.1.3.7. Number of symptom-related visits. Von Korff et al. [50] developed a classification of ICD-9-CM diagnoses reflecting broad characteristics of presenting health problems. Each diagnosis is classified into 1 of 9 categories, including 2 chronic disease categories, 2 acute disease categories, symptoms or ill-defined conditions, mental or behavioral health conditions, dermatologic disorders, vision or hearing disorders, and conditions related to preventive care or pregnancy. In the current study, we examined the number of visits associated with diagnoses of symptoms or ill-defined conditions in the past 2 years. For visits with more than 1 diagnosis, the visit was coded as a fraction for each of the relevant diagnostic classification groups (eg, if 2 diagnoses are associated with a single visit and 1 of the diagnoses is a symptom or ill-defined condition, the visit is coded as a 0.5 symptom or ill-defined condition visit). Based on the sample distribution, which was skewed, we categorized the data for analysis (see Table 1).

1395

2.2.1.3.8. Days’ supply of opioid and sedative-hypnotic medication. From the electronic pharmacy database, we calculated total days’ supplies of all opioid, sedative, hypnotic, muscle relaxant, and anxiolytic medications dispensed in the past 2 years. Based on the sample distribution, which was skewed, we categorized the days’ supply for analysis (see Table 1). 2.2.2. Follow-up At the 4-month follow-up interview, study participants completed a subset of the measures administered at baseline. The only follow-up measure analyzed for the current study was the primary outcome, the GCPS chronic pain grade [52]. Chronic pain grade is determined by an algorithm based on the 3 pain intensity and the 3 pain-related activity-interference ratings (described in Section 2.2.1.1.), as well as on the number of pain-related activity-limitation days. There are 5 chronic pain grades: 0 = no pain; I = low pain intensity and low pain-related activity interference; II = high pain intensity and low pain-related activity interference; III = moderate pain-related activity interference; and IV = severe pain-related activity interference. We considered an unfavorable outcome to be grade III or IV. Some prior studies using the GCPS have defined an unfavorable outcome as also including grade II, encompassing persons with high pain intensity but little or no pain interference with activities [51]. However, we elected to use a more restrictive definition in this study so as to reflect chronic pain that is more significant in terms of patient quality of life and limitations in work, family, social, and recreational activities. 2.3. Statistical analyses 2.3.1. Predictive models We constructed 3 logistic regression models (1 each for the original Chronic Pain Risk Score, the Improved Chronic Pain Risk Model, and the Expanded Chronic Pain Risk Model) to predict chronic pain grade at 4 months (grade III or IV vs 0 to II). In the original Chronic Pain Risk Score model, the only predictor was a composite score derived from several measures; the Improved and the Expanded Chronic Pain Risk models each included multiple potential predictors, entered separately. 2.3.1.1. Chronic Pain Risk Score. The first logistic regression model included 1 predictor, the original Chronic Pain Risk Score. 2.3.1.2. Improved Chronic Pain Risk Model. This model was based on the original Chronic Pain Risk Score but included changes intended to improve measurement of the constructs assessed by the original risk score and therefore expected to improve prediction. We entered each of the following 4 measures (described in Section 2.2.1.1.) in the prediction model: pain intensity/dysfunction, pain diffuseness, pain persistence, and psychological distress. 2.3.1.3. Expanded Chronic Pain Risk Model. We used variable selection techniques to add other baseline self-report and electronic health care database variables (described in Sections 2.2.1.2. and 2.2.1.3.) to the Improved Chronic Pain Risk Model. We added each of these variables to the Improved Chronic Pain Risk Model 1 at a time to evaluate whether it made a significant (P < 0.05) independent contribution to the prediction of chronic pain grade III to IV at 4 months. The final Expanded Chronic Pain Risk Model consisted of the 4 variables in the improved model plus the additional variables that contributed significantly to this model.

1396

Ò

J.A. Turner et al. / PAIN 154 (2013) 1391–1401

2.3.2. Model evaluation and comparison We used t tests and v2 tests to evaluate differences between study participants who completed the 4-month interview and those who did not. We constructed the 3 logistic regression predictive models (Chronic Pain Risk Score, Improved Chronic Pain Risk Model, and Expanded Chronic Pain Risk Model) using the full sample of 4-month interview completers. For each model, we calculated the area under the receiver operating characteristic curve (AUC), the Hosmer-Lemeshow goodness-of-fit statistic [23], Akaike’s information criterion (AIC) [1], and the estimates of association (odds ratios with corresponding 95% CI and P values) for the predictors in the model. AUC values of 0.70 to less than 0.80 are considered acceptable discrimination; AUC values of 0.80 to less than 0.90 are considered excellent discrimination; and AUC values of 0.90 or greater (which are extremely unusual) are considered outstanding discrimination [24]. The Hosmer-Lemeshow goodness-of-fit statistic is a single summary statistic of logistic regression model calibration based on comparing observed and estimated expected frequencies within each decile of estimated risk (in our study, risk of grade III to IV back pain at 4 months) [24]. A significant P value indicates lack of fit. The AIC is an alternative goodness-of-fit measure that rewards models that have a high likelihood of fitting the observed data, while penalizing the addition of variables to avoid over-fitting. It is useful for selecting a model that fits the data well with as few variables as needed; when comparing models, the preferred model is the one with the lowest AIC value. In addition, using predicted probabilities generated by each model and a variety of cut-points for these predicted probabilities, we examined, with data from the full sample, the prevalence of positive risk scores and the positive predictive values (PPV), negative predictive values (NPV), sensitivities, and specificities of each model. We also calculated likelihood ratios (LR) as an alternative assessment of model performance [18]. The positive LR (LR+) is the ratio of the percentage of patients with bad outcomes who are predicted to have bad outcomes (true-positives) to the percentage of patients without bad outcomes who are predicted to have bad outcomes (false-positives). It was calculated by dividing the sensitivity by (1-specificity). The negative LR (LR–) is the ratio of the percentage of patients with bad outcomes who are predicted not to have bad outcomes (false-negatives) to the percentage of patients without bad outcomes who are predicted not to have bad outcomes (true-negatives). It was calculated as (1-sensitivity) divided by specificity. An LR (either LR+ or LR–) of 1 indicates that the percentage of patients with and without bad outcomes who have a given test result is the same (ie, the model is not helpful). We compared the performance of the 3 prognostic models using statistical analyses that employed bootstrapping [13], with 1000 iterations. For each bootstrap iteration, we used the individuals selected to be included in the bootstrap sample (in-sample) to estimate the predictive models; in addition, we used the individuals not included in the bootstrap sample (out-of-sample) to evaluate the generalizability of the predictive models to another sample of individuals selected from the same population. Each bootstrap sample was drawn with replacement and was the same size as that of the full analyzed sample [13]. For each iteration, we calculated the Hosmer-Lemeshow goodness-of-fit statistic [23] both in the sample of individuals selected for the bootstrap sample (in-sample) and in the sample of individuals who were not selected for the bootstrap sample (out-of-sample) using the model estimated from the bootstrap sample. We then calculated the percent of bootstrap iterations, both in-sample and out-of-sample, with a good fit as measured by comparing the Hosmer-Lemeshow statistic [23] to the appropriate v2 distribution cut-point. To compare directly the predictive performance of (1) the original Chronic Pain Risk Score vs the Improved Chronic Pain Risk

Model and (2) the Improved Chronic Pain Risk model vs the Expanded Chronic Pain Risk Model, we used the Net Reclassification Index (NRI) [38] calculated in both the full dataset and in each of the bootstrap samples. The NRI, a summary measure for comparing 2 prognostic models (new vs old), is estimated by 2 quantities measuring the improved accuracy of the new model. The first quantity of the NRI, calculated in the subset of the study participants who had grade III to IV pain at 4 months, was the proportion who had increased (as compared to the old model) predicted probabilities of grade III to IV pain in the new model minus the proportion who had decreased predicted probabilities of grade III to IV pain in the new model. If the new model improves prediction, this first term is positive. The second quantity of the NRI, calculated among the participants with grade 0 to II pain at 4 months, was the proportion with increased predicted probabilities of grade III to IV pain in the new model minus the proportion whose predicted probability of grade III to IV pain decreased in the new model. Thus, if the model improves predictive power, this second term is negative. The NRI is defined as the first quantity minus the second quantity. An NRI value of 0 indicates that the new model does no better than the old model in predicting the outcome in the whole sample. A negative value indicates that the old model is better than the new model; a positive value indicates that the new model is better [38]. Unlike many other measures of the performance of prediction models, the NRI permits a statistical test of whether differences in prediction between 2 models exceed chance expectation in the overall sample as well as in individuals who have unfavorable outcomes and in those who do not. In each bootstrap iteration, we calculated the NRI for the bootstrap sample (ie, the sample used to estimate the models) and for the individuals not included in the bootstrap sample (out-of-sample). The out-of-sample NRI is calculated using the model estimated on the bootstrap sample to predict probabilities for the individuals not used to fit the model. The out-of-sample NRI can help to evaluate the generalizability of the results to another sample from the same population.

3. Results 3.1. Response rates As depicted in Fig. 1, 1574 patients were identified as being potentially eligible and were approached for study participation. Among these, 865 (55%) expressed interest in study participation and agreed to be screened for the study; among those who were screened and eligible, all (N = 571; 60% of those contacted and eligible) enrolled and completed the baseline telephone interview. The most common reason for a potential participant’s exclusion from the study was inability to contact him or her in time to complete the baseline interview within 14 days of the index back pain visit (an eligibility criterion to ensure that all baseline data were obtained soon after the index visit). At 4 months, 521 participants (91% of the 571 patients enrolled) completed the follow-up interview. As compared on the baseline measures with study participants who did not complete the 4month assessment, the 4-month respondents were older (mean [SD] = 47.8 [12.8] vs 42.9 [12.4] years, P = 0.009); had higher recovery expectations (mean [SD] = 5.7 [3.4] vs 4.7 [3.1], P = 0.04); had a lower proportion of racial/ethnic minorities (26.6% vs 49.0%, P = 0.0009); had lower catastrophizing scores (mean [SD] = 12.9 [10.2] vs 17.2 [12.8], P = 0.03); had higher SOPA Control scores (mean [SD] = 2.4 [0.9] vs 2.2 [0.9], P = 0.03); had fewer days’ supply of opioids in the past 2 years (81.8% vs 68% with fewer than 30

Ò

1397

J.A. Turner et al. / PAIN 154 (2013) 1391–1401

days, P = 0.02); and reported fewer days of opioid use in the past 2 weeks (64.8% versus 48% with no use, P = 0.02). There was a trend toward a higher proportion of college graduates among the respondents (60.1% vs 46.0%, P = 0.053). Respondents and nonrespondents did not differ significantly (P < 0.05) on the 26 other baseline variables listed in Table 1. 3.2. Baseline sample characteristics Table 1 shows the characteristics of the study sample (the 521 participants who completed the 4-month follow-up assessment) at baseline on the study measures. The sample was predominantly female (59.9%), non-Hispanic white (73.4%), and college educated (60.1%), with a mean age of 47.8 years (age range, 18 to 64, median [interquartile range, IQR] = 51 [40 to 58] years). Although this was the first health care visit for back pain in at least 1 year for all participants, and most (67.2%) participants said their current episode of back pain had begun within the past 30 days, duration of back pain ranged widely (median [IQR] = 21 [10 to 60] days). Pain radiating to the leg below the knee was reported by 21.4% of the participants. 3.3. Chronic pain grade at follow-up At the 4-month follow-up, 21.7% of the 521 patients were categorized as chronic pain grade III to IV (back pain associated with moderate to severe activity interference). Among those with grade III to IV back pain at 4 months, 46.5% reported at the 4-month assessment that they had back pain on 75 or more days of the past 90 days; 25.4% reported back pain on 30 to 74 days; and 28.1% reported fewer than 30 days with back pain in the past 90 days. 3.4. Significant variables in the Improved and Expanded Chronic Pain Risk Models Table 2 shows the odds ratios (and 95% CIs) for the predictor variables in the Improved and Expanded Chronic Pain Risk

Models. In the Improved Chronic Pain Risk Model, the measures of pain diffuseness (OR = 1.22, 95% CI = 1.13-1.34), pain intensity/dysfunction (OR = 1.48, 95% CI = 1.27-1.73), and pain persistence (number of days with back pain in the past 6 months) (OR = 1.007, 95% CI = 1.004-1.011) were each associated significantly with 4-month Chronic Pain Grade, whereas the measure of psychological distress (depression and anxiety) was not (OR = 1.01, 95% CI = 0.96-1.05). The Expanded Chronic Pain Risk Model consisted of the Improved model plus the following additional variables, which were selected using our variable selection criteria: education (P = 0.0002 when entered in the Improved Model); the work disability, litigation, or compensation variable (P = 0.02); recovery expectations (P = 0.003); catastrophizing (P = 0.005); fear-avoidance (P = 0.01); and self-reported days used opioids in the past 2 weeks (P = 0.03). (See Supplementary Appendix for the full results of the variable selection analyses.) Individual measures that were significant predictors of 4-month chronic pain grade in the Improved model remained significant in the Expanded Model: pain diffuseness (OR = 1.22, 95% CI = 1.11-1.34); pain intensity/dysfunction (OR = 1.27, 95% CI = 1.05-1.53); and pain persistence (number of days with back pain in the past 6 months) (OR = 1.006, 95% CI = 1.001-1.01). The ORs for each of these 3 measures were per unit change; for example, odds of grade III to IV back pain at 4 months increased by 22% for a 1-point increase in the 0 to 10 pain intensity/dysfunction score. As was the case with the Improved Model, the measure of psychological distress (depression and anxiety) was not significant in the Expanded Model (OR = 0.97, 95% CI = 0.92-1.02). Among the variables not in the Improved Model that were added to the Expanded Model, only education and recovery expectations were significantly associated with the outcome in the full multivariate model. Patients with greater confidence that their back pain would be gone in 6 months had reduced odds (OR = 0.90, 95% CI = 0.82-0.98) of experiencing grade III to IV back pain at 4 months. For a 1-point increase on this 0 to 10 scale (where 10 = extremely confident), the odds of having grade III to IV back pain at 4 months decreased by 10%.

Table 2 Improved and Expanded Chronic Pain Risk Models: associations of predictor variables with grade III to IV back pain at 4 months (N = 521). Domain and variable

Improved Chronic Pain Risk Model adjusted OR (95% CI)

Sociodemographic characteristics Education (reference: not college graduate) College graduate Pain-related Pain diffusenessa Pain intensity/dysfunctiona Pain persistence (number of days with back pain, past 6 months)a Work disability/litigation/compensation (reference: no) Psychological Psychological distressa Recovery expectationsa Catastrophizinga Fear-avoidancea Biomedical, health, and health care Days of opioid use, past 2 weeks (reference: 0)b 1-3 4-14

Expanded Chronic Pain Risk Model adjusted OR (95% CI)

0.48 (0.28, 0.81)** 1.22 (1.13, 1.34)*** 1.48 (1.27, 1.73)*** 1.007 (1.004, 1.011)***

1.22 (1.11, 1.34)*** 1.27 (1.05, 1.53)*** 1.006 (1.001, 1.010)*** 1.72 (0.84, 3.52)

1.01 (0.96, 1.05)

0.97 0.90 1.03 1.01

(0.92, 1.02) (0.82, 0.98) (1.0, 1.06) (0.98, 1.05)

*

0.71 (0.29, 1.62) 1.92 (1.02, 3.58)

Note: Sample size in each model (Improved Model n = 507; Expanded Model n = 499) varies from full sample (n = 521) due to missing data. Odds ratios are not shown for variables in the Chronic Pain Risk Score because the score is a composite of the variables. Variables that were statistically significant predictors of grade III to IV back pain at 4 months in each multivariate model are shown in bold font. * P < 0.05 ** P < 0.01 *** P < 0.0001 a Analyzed as continuous variable; OR is per unit change. b Wald test P value for days used opioids in past 2 weeks = 0.052.

Ò

1398

J.A. Turner et al. / PAIN 154 (2013) 1391–1401

Table 3 AUC of each model and model comparison statistics. Statistic Full sample AUC (95% CI) Hosmer-Lemeshow Statistic (P value) AIC NRI (P value)a Bootstrap samplesa NRI (95% CI)b In-samplec Out-of-sampled Goodness of fite In-samplec Out-of-sampled

Chronic Pain Risk Score

Improved Chronic Pain Risk Model

Expanded Chronic Pain Risk Model

0.76 (0.71, 0.81)

0.79 (0.75, 0.84)

0.84 (0.79, 0.88)

6.22 (P = 0.62) 465.27 –

12.97 (P = 0.11) 438.58 0.32 (P = 0.003) (compared to Chronic Pain Risk Score)

10.94 (P = 0.21) 404.72 0.56 (P < 0.001) (compared to Improved Model)

0.37 ( 0.03, 0.71) 0.29 ( 0.03, 0.60)

0.59 (0.32, 0.83) 0.43 (0.09, 0.73)

38.2 66.2

64.2 54.9

-

65.1 80.3

Note: n = 499 in bootstrap analyses, which require data for all variables in all models. AIC, Akaike’s Information Criterion; AUC, area under the receiver operating characteristic curve. a 1000 bootstrap iterations were used. b Net Reclassification Index (NRI) comparisons: Improved Chronic Pain Risk Model was compared to the Chronic Pain Risk Score; Expanded Chronic Pain Risk Model was compared to the Improved Chronic Pain Risk Model. c In-sample statistics were calculated on the sample of individuals selected to be included in the iteration’s bootstrap sample in order to fit the models. d Out-of-sample statistics were calculated on the sample of individuals not selected for the bootstrap sample (ie, not used to estimate the predictive models). e Goodness of fit: the percent of in-sample and out-of-sample bootstrap iterations with a good fit, as measured by the Hosmer-Lemeshow statistic at a significance level of 0.05.

3.5. Comparisons of risk models Table 3 shows the results of the statistical analyses examining the discriminative ability of each of the 3 prognostic models as well as the analyses comparing the Improved Chronic Pain Risk Model to the Chronic Pain Risk Score and comparing the Expanded Chronic Pain Risk Model to the Improved Model. The AUC estimated in the full sample for discriminating between patients with vs without grade III to IV pain at 4 months was 0.76 (95% CI = 0.71, 0.81) for the Chronic Pain Risk Score; 0.79 (95% CI = 0.75-0.84) for the Improved Chronic Pain Risk Model; and 0.84 (95% CI = 0.790.88) for the Expanded Chronic Pain Risk Model. These values indicate acceptable discrimination for the Chronic Pain Risk score and the Improved Chronic Pain Risk Model and excellent discrimination for the Expanded Chronic Pain Risk Model [24]. The HosmerLemeshow goodness-of-fit statistics (where a significant P value indicates lack of fit) did not indicate a lack of fit in the full sample for any of the models (6.22, P = 0.62 for the Chronic Pain Risk Score; 12.97, P = 0.11 for the Improved Model; and 10.94, P = 0.21 for the Expanded Model). Comparison of model AIC values indicates that the Expanded Model is preferable to the Improved Model, which is preferable to the Chronic Pain Risk Score. Prediction in the full sample was significantly better when using the Improved Model compared to the Chronic Pain Risk Score (NRI = 0.32, P = 0.003). Most of this improvement came from improved prediction among the patients who did not have unfavorable outcomes. In this subgroup, the NRI was 0.21 (P < 0.0001), whereas among patients with unfavorable outcomes, the NRI was 0.11 (P = 0.24) (data not shown in table). In turn, the Expanded Model provided significantly better prediction than did the Improved Model (NRI = 0.56, P < 0.001). Once again, most of the improvement was due to better prediction among the patients who did not have unfavorable outcomes. In this subgroup, the NRI was 0.37 (P < 0.0001), whereas among patients with unfavorable outcomes, the NRI was 0.19 (P = 0.052) (data not shown in table). Results were similar in the in-sample and out-of-sample bootstrap analyses (see Table 3). The Improved Chronic Pain Risk Model yielded slightly better prediction than did the Chronic Pain Risk Score, both in-sample and out-of-sample, but with bootstrap 95% CIs containing 0. The Expanded Chronic Pain Risk Model yielded

better prediction than did the Improved Model both in-sample and out-of-sample, with bootstrap 95% CIs excluding 0. However, there was a potential lack of fit, especially for the Improved Chronic Pain Risk Model, in the bootstrap samples (see Table 3). Inspection of the results indicated that the potential lack of fit (as measured by the Hosmer-Lemeshow goodness-of-fit test) was driven by the lowest risk decile. In this low-risk group, the models (especially, the Improved Chronic Pain Risk Model) tended to underestimate the probability of unfavorable outcomes. Table 4 shows the results for various predicted probabilities of grade III to IV pain at 4 months for the Chronic Pain Risk Score, Improved Chronic Pain Risk Model, and Expanded Chronic Pain Risk Model. In each model, about a quarter of the sample had a predicted probability of 0.30 or higher of an unfavorable outcome. For all models, the PPV decreased as the cutpoint for a predicted probability of grade III to IV pain was lowered, with a maximum PPV of 64.7% for a cutpoint of 0.50 in the Chronic Pain Risk Score (no participants had a Chronic Pain Risk Score-predicted probability P0.80 of grade III to IV pain); a maximum PPV of 85.7% for a cutpoint of 0.80 in the Improved Model; and a maximum PPV of 87.5% for a cutpoint of 0.80 in the Expanded Model. The NPV was less sensitive to change in cutpoint, ranging from 81% for a cutpoint of 0.50 to 85.2% for a cutpoint of 0.30 in the Chronic Pain Risk Score; from 79.4% for a cutpoint of 0.80 to 87.5% for a cutpoint of 0.30 in the Improved Model; and from 81.0% for a cutpoint of 0.80 to 90.2% for a cutpoint of .30 in the Expanded Model. Sensitivity increased as the cutpoint was lowered in each model. Sensitivity ranged from 19.3% (cutpoint = 0.50) to 50% (cutpoint = 0.30) in the Chronic Pain Risk Score and from 5.5% (cutpoint = 0.80) to 56.0% (cutpoint = 0.30) in the Improved Model. For the Expanded Model, sensitivity was somewhat higher, with a range of 13.2% (cutpoint = 0.80) to 66.0% (cutpoint = 0.30). Specificity was slightly lower in the Chronic Pain Risk Score than in the other 2 models. Sensitivity was similar in the Improved and Expanded models, with a range of 84% (cutpoint = 0.30) to over 99% (cutpoint = 0.80) in both. Inspection of the likelihood ratios (see Table 4) indicates that in all models, patients with grade III to IV back pain at 4 months were more likely to have a predicted probability of this outcome P0.30 at baseline than were patients who did not have grade III to IV pain at 4 months. For example, using the Expanded Chronic Pain Risk

Ò

1399

J.A. Turner et al. / PAIN 154 (2013) 1391–1401 Table 4 Prediction of grade III to IV back pain at 4 months. Regression model predicted probability of grade III to IV back pain at follow-up

Prevalence of Positive Risk Score

PPV

NPV

Sensitivity

Specificity

From the Chronic Pain Risk Score P.80a P.50 P.45 P.40 P.35 P.30

– 6.6 9.4 13.5 18.1 25.8

– 64.7 59.2 57.1 50.0 42.5

– 81.0 81.9 83.5 84.2 85.2

– 19.3 25.4 35.1 41.2 50.0

– 97.0 95.1 92.6 88.4 81.0

– 6.4 5.2 4.7 3.6 2.6

– 0.8 0.8 0.7 0.7 0.6

From the Improved Chronic Pain Risk Model P.80 P.50 P.45 P.40 P.35 P.30

1.4 9.5 11.2 16.2 19.9 24.5

85.7 64.6 59.7 59.8 54.5 49.2

79.4 83.0 83.3 85.9 86.7 87.5

5.5 28.4 31.2 45.0 50.5 56.0

99.8 95.7 94.2 91.7 88.4 84.2

27.5 6.6 5.4 5.4 4.4 3.5

0.9 0.7 0.7 0.6 0.6 0.5

From the Expanded Chronic Pain Risk Model P.80 P.50 P.45 P.40 P.35 P.30

3.2 12.0 15.0 17.0 21.2 26.5

87.5 68.3 62.7 61.2 57.6 53.0

81.0 85.2 86.1 87.0 88.6 90.2

13.2 38.7 44.3 49.1 57.6 66.0

99.5 95.2 92.9 91.6 88.6 84.2

26.4 8.1 6.2 5.8 5.1 4.2

0.9 0.6 0.6 0.6 0.5 0.4

LR+

LR

LR+, positive likelihood ratio; LR–, negative likelihood ratio; NPV, negative predictive value (probability that a patient classified as low risk would not have grade III to IV pain at follow-up); PPV, positive predictive value (probability that a patient identified as high risk would have grade III to IV pain at follow-up). a No participants had a predicted probability P0.80 based on the Chronic Pain Risk Score.

Model, patients who had grade III to IV back pain at 4 months were 4.2 times more likely to have a predicted probability P0.30 than were patients who did not have grade III to IV pain at 4 months. A predicted probability <0.30 was two-fifths as likely for a patient who went on to have grade III to IV back pain at 4 months, as compared with a patient who did not have grade III to IV pain at 4 months. With increasing predicted probabilities, there were increases in the likelihood of the patient’s having grade III to IV back pain at follow-up. 4. Discussion We evaluated the Chronic Pain Risk Score [51] in patients initiating a new episode of back-pain primary care to test whether prediction of outcomes could be enhanced. We found that back pain outcomes could be predicted accurately in this population by a limited set of variables. Improvements in the Chronic Pain Risk Score significantly enhanced prediction. Adding other prognostic variables from sociodemographic (education) and psychological (recovery expectations) domains further improved prediction. This Expanded Chronic Pain Risk Model demonstrated excellent ability to discriminate, within 14 days of the initial visit, patients who had unfavorable outcomes 4 months later from those who did not. Classification of patients into risk groups has potential applications for case definition in epidemiologic studies, phenotype identification in genetic studies, evaluation of subgroup differences in treatment response in clinical trials, and selection of an ‘‘enriched’’ sample for a study in which it would be advantageous to enroll individuals with acute or subacute pain who are likely to go on to chronic pain. Moreover, accurate prediction of risk could allow targeting of early secondary prevention interventions to high-risk patients. In a large randomized trial [22], prognostic stratification of primary care back pain patients, with care pathways matched to prognostic risk group, resulted in improved functional outcomes and reduced costs. 50% and 80% predicted probability of an unfavorable outcome have been proposed as cutpoints for identifying patients with

possible and probable chronic pain, respectively [51]. However, in this sample, across predictive models, few patients had an 80% or greater predicted probability of an unfavorable outcome and a cutpoint of 50% had low sensitivity in correctly identifying patients with unfavorable outcomes. The lower prevalence of unfavorable outcomes in this study relative to other studies using the Chronic Pain Risk Score may be explained by 2 important differences: (1) our sample was limited to patients initiating new episodes of care for back pain; and (2) our definition of an unfavorable outcome was more stringent. We defined an unfavorable outcome as Chronic Pain Grade III or IV. The proportion with this outcome (22%) at 4 months was similar to the median proportion (26%) of primary care patients with poor back pain and function outcomes at 3 to 6 months reported across other studies [5]. An issue that merits further attention is the optimal definition of an unfavorable outcome in terms of both component criteria and threshold severity. General consensus is lacking concerning a clinically useful definition of chronic pain [35]. Although it is generally agreed that pain is chronic if it lasts longer than 3 months, there is wide variation in levels of pain intensity and physical and psychosocial dysfunction in patients with chronic pain. From a societal and health care system perspective, the major problem is chronic disabling pain, which is why we chose an outcome measure that reflected pain’s interference with activities. In all 3 predictive models, about 25% of patients had predicted probabilities P0.30 of experiencing grade III or IV pain at 4 months. In both the Improved and Expanded models, about half of these patients actually had this outcome. Using these models, 0.30 may be a reasonable cutpoint for clinical decision making with patients beginning a new episode of primary care for back pain. In the Expanded Model, this cutpoint yielded a 53% probability that a patient identified as being at high risk would have grade III or IV back pain 4 months later and a 90% probability that a patient classified as being at low risk would not have grade III or IV back pain 4 months later. This cutpoint correctly identified 66% of patients who had unfavorable outcomes and 84% of patients who did not.

1400

Ò

J.A. Turner et al. / PAIN 154 (2013) 1391–1401

The Expanded Model performed well in identifying patients with favorable outcomes, but it incorrectly predicted favorable outcomes for some patients. However, risk-stratified care can make allowances for imperfect prediction by encouraging patients who do not improve to inform their care providers. Further research is needed to identify risk factors that improve prediction for patients currently incorrectly classified as being at low risk. Research is also needed to confirm our results in independent samples and to determine whether some variables could be eliminated from the Expanded Model without reducing predictive accuracy. If the Expanded Model (or a variant) shows predictive accuracy in samples from other populations and settings, rules for using it to calculate risk scores could be developed. Its performance as a clinical screening tool could then be evaluated and compared with that of other risk-screening tools such as the STartTBack [21]. Such research should compare the tools not only in terms of predictive accuracy, but also with respect to administration time, ability to be hand-scored easily and quickly when computerized scoring is not available, and utility in guiding treatment to address specific risk factors. Patients in our study completed the baseline measures within 2 weeks after initial visits in new episodes of back-pain care. It is unknown whether results would have differed had patients completed the measures at the initial visit. Arguably, risk screening at a follow-up visit 2 to 3 weeks after the initial visit might be more useful in tailoring care than screening at the initial visit, given that many patients improve over that time period [39]. Alternatively, combining information from 2 or 3 assessments over the first few weeks in a new episode of care for back pain might yield better prediction than that from a single assessment [11]. In the Expanded Model, adjusting for all other variables, less than college education, more diffuse pain, greater pain intensity and dysfunction, greater pain persistence, and lower recovery expectations independently predicted grade III to IV back pain 4 months later. The significant baseline predictors were generally consistent with previously identified risk factors for persistent musculoskeletal pain across diverse anatomical sites [34]: pain characteristics (greater intensity, longer duration, number of previous episodes, and number of sites); greater disability; and psychological factors. This suggests that our findings may be generalizable to patients seeking care for musculoskeletal problems other than back pain. Although most prior studies have not found education to predict back pain outcomes [5,20,40], some, like ours, found that lower levels of education were associated with worse outcomes [19]. It is possible that education was a marker for other, unmeasured, prognostically-related characteristics. Our depression/anxiety measure was not a significant predictor after adjusting for baseline pain-related variables. This may be due in part to its moderate associations with these variables (pain diffuseness r = 0.43 and pain intensity/dysfunction r = 0.36), which would be expected to be strongly associated with the outcome (which is derived in part from measures of pain intensity and dysfunction [activity interference]). Furthermore, scores on the depression/anxiety measure were low on average; they might be a stronger predictor in samples that included more depressed patients. However, a systematic review concluded that most studies did not find depression to be a significant risk factor for chronic back pain in primary care and no studies found anxiety to be a significant predictor [40]. Further research is needed to clarify the use of depression and anxiety measures in predicting chronic pain outcomes, but such measures have utility for identifying treatable psychological disorders. We were interested in examining whether variables in electronic health care records (EHRs) might provide additional contributions to prediction. A question has been raised as to whether, for

example, a measure of comorbidity might improve prediction [36]. However, no EHR variable examined (comorbidity, mental health and substance use disorder diagnoses, body mass index, health care utilization) added significantly to the prediction obtained from patients’ self-report measures. We note several study limitations. First, the enrollment rate was low, largely due to the requirement that patients be interviewed within 14 days of their initial back pain visit. Second, a longer follow-up would have been helpful in determining the performance of prognostic models in predicting longer-term outcomes. However, systematic reviews have concluded that back pain outcomes at 3 to 6 months are similar to those at 1 year [5,39]. Third, the patient population in Group Health, although representative of Washington State adults in terms of age and race, is under-representative of those at the extreme ends of income, and our sample was, on average, highly educated. However, findings regarding the ability of the Chronic Pain Risk Score to predict clinically significant pain have been replicated in various countries and health care systems with less educated patients [12,36], suggesting that our predictive models might also be generalizable. In conclusion, the Expanded Chronic Pain Risk Model demonstrated excellent ability to discriminate between patients who would go on to have unfavorable vs more favorable back pain outcomes. The Chronic Pain Risk Score and Improved Chronic Pain Risk Model showed acceptable discriminative ability. Our results suggest that screening questionnaires concerning risk for chronic back pain should include measures of pain diffuseness, pain dysfunction (intensity and activity interference), and pain persistence. In settings in which a brief risk measure is required, a limited set of questions can yield reasonable predictions of back pain outcomes and may have clinical utility for guiding care. Conflict of interest statement None of the authors have conflicts of interest with respect to this work. Acknowledgements This research was supported by a grant from Janssen Pharmaceuticals (Titusville, NJ). The authors express appreciation to Paul Stang and Myoung Kim, Janssen Pharmaceuticals, for contributions to this work. Appendix A. Supplementary data Supplementary data associated with this article can be found, in the online version, at http://dx.doi.org/10.1016/j.pain.2013.04.029. References [1] Akaike H. A new look at the statistical model identification. IEEE Trans Auto Control 1974;19:716–23. [2] Buysse DJ, Reynolds III CF, Monk TH, Berman SR, Kupfer DJ. The Pittsburgh sleep quality index: a new instrument for psychiatric practice and research. Psychiatry Res 1989;28:193–213. [3] Cherkin DC, Deyo RA, Volinn E, Loeser JD. Use of the international classification of diseases (ICD-9-CM) to identify hospitalizations for mechanical low back problems in administrative databases. Spine 1992;17:817–25. [4] Chou R, Qaseem A, Snow V, Casey D, Cross JT, Shekelle P, Owens DK. Diagnosis and treatment of low back pain: a joint clinical practice guideline from the American College of Physicians and the American Pain Society. Ann Intern Med 2007;147:478–91. [5] Chou R, Shekelle P. Will this patient develop persistent disabling low back pain? JAMA 2010;303:1295–302. [6] Costa LD, Maher CG, Hancock MJ, McAuley JH, Herbert RD, Costa LO. The prognosis of acute and persistent low-back pain: a meta-analysis. CMAJ 2012;184:E613–624. [7] Dagenais S, Caro J, Haldeman S. A systematic review of low back pain cost of illness studies in the United States and internationally. Spine J 2008;8:8–20.

Ò

J.A. Turner et al. / PAIN 154 (2013) 1391–1401 [8] Demyttenaere K, Bruffaerts R, Ling S, Posada-Villa J, Kovess V, Angermeyer MC, Levinson D, de Girolamo G, Nakane H, Mneimneh Z, Lara C, de Graaf R, Scott KM, Gureye O, Stein DJ, Haro JM, Bromet EJ, Kessler RC, Alonso J, Von Korff M. Mental disorders among persons with chronic back or neck pain: results from world mental health surveys. PAINÒ 2007;129:332–42. [9] Derogatis LR, Rickels K, Rock AF. The SCL-90 and the MMPI: a step in the validation of a new self-report scale. Br J Psychiatry 1976;128:280–90. [10] Deyo RA, Mirza SK, Martin BI. Back pain prevalence and visit rates: estimates from US National Surveys, 2002. Spine 2006;31:2724–7. [11] Dunn KM, Croft P. Repeat assessment improves the prediction of prognosis in patients with low back pain in primary care. PAINÒ 2006;126:10–5. [12] Dunn KM, Croft PR, Main CJ, Von Korff M. A prognostic approach to defining chronic pain: replication in a UK primary care low back pain population. PAINÒ 2008;135:48–54. [13] Efron B, Tibshirani R. An introduction to the bootstrap. Boca Raton, FL: Chapman & Hall/CRC; 1993. [14] Franklin GM, Stover B, Turner JA, Fulton-Kehoe D, Wickizer T. Early opioid prescription and subsequent disability among workers with back injuries: the disability risk identification study cohort. Spine 2008;33:199–204. [15] Gatchel RJ, Polatin PB, Noe C, Gardea M, Pulliam C, Thompson J. Treatmentand cost-effectiveness of early intervention for acute low-back pain patients: a one-year prospective study. J Occup Rehabil 2003;13:1–9. [16] Gerhardt A, Hartmann M, Schuller-Roma B, Blumenstiel K, Bieber C, Eich W, Steffen S. The prevalence and type of Axis-I and Axis-II mental disorders in subjects with non-specific chronic back pain: results from a population-based study. Pain Med 2011;12:1231–40. [17] Gray DT, Deyo RA, Kreuter W, Mirza SK, Heagerty PJ, Comstock BA, Chan L. Population-based trends in volumes and rates of ambulatory lumbar spine surgery. Spine 2006;31:1957–63. [18] Grimes DA, Schulz KF. Refining clinical diagnosis with likelihood ratios. Lancet 2005;365:1500–5. [19] Grotle M, Foster NE, Dunn KM, Croft P. Are prognostic indicators for poor outcome different for acute and chronic low back pain consulters in primary care? PAINÒ 2010;151:790–7. [20] Hayden JA, Chou R, Hogg-Johnson S, Bombardier C. Systematic reviews of low back pain prognosis had variable methods and results—guidance for future prognosis reviews. J Clin Epidemiol 2009;62:781–96. [21] Hill JC, Dunn KM, Lewis M, Mullis R, Main CJ, Foster NE, Hay EM. A primary care back pain screening tool: identifying patient subgroups for initial treatment. Arthritis Rheum 2008;59:632–41. [22] Hill JC, Whitehurst DGT, Lewis M, Bryan S, Dunn KM, Foster NE, Konstantinou K, Main CJ, Mason E, Somerville S, Sowden G, Vohora K, Hay EM. Comparison of stratified primary care management for low back pain with current best practice (STarT Back): a randomised controlled trial. Lancet 2011;378: 1560–71. [23] Hosmer DW, Hosmer T, Le Cessie S, Lemeshow S. A comparison of goodnessof-fit tests for the logistic regression model. Stat Med 1997;16:965–80. [24] Hosmer DW, Lemeshow S. Applied logistic regression. 2nd ed. New York: John Wiley & Sons; 2000. [25] International Association for the Study of Pain. Classification of chronic pain: descriptions of chronic pain syndromes and definitions of pain terms. PAINÒ 1986;3:S1–225. [26] Jensen MP, Karoly P. Survey of pain attitudes: professional manual. Lutz, FL: Psychological Assessment Resources; 2008. [27] Jensen MP, Karoly P, Huger R. The development and preliminary validation of an instrument to assess patients’ attitudes toward pain. J Psychosom Res 1987;31:393–400. [28] Jensen MP, Keefe FJ, Lefebvre JC, Romano JM, Turner JA. One- and two-item measures of pain beliefs and coping strategies. PAINÒ 2003; 104:453–69. [29] Knaster P, Karlsson H, Estlander A-M, Kalso E. Psychiatric disorders as assessed with SCID in chronic pain patients: the anxiety disorders precede the onset of pain. Gen Hosp Psychiatry 2012;34:46–52. [30] Kroenke K, Spitzer RL, Williams JBW. The PHQ-15: validity of a new measure for evaluating the severity of somatic symptoms. Psychosom Med 2002;64:258–66.

1401

[31] Kroenke K, Spitzer RL, Williams JBW, Lowe B. An ultra-brief screening scale for anxiety and depression: the PHQ-4. Psychosomatics 2009;50:613–21. [32] Kroenke K, Spitzer RL, Williams JBW, Löwe B. The Patient Health Questionnaire Somatic, Anxiety, and Depressive Symptom Scales: a systematic review. Gen Hosp Psychiatry 2010;32:345–59. [33] Kroenke K, Strine TW, Spitzer RL, Williams JBW, Berry JT, Mokdad AH. The PHQ-8 as a measure of current depression in the general population. J Affect Disord 2009;114:163–73. [34] Mallen CD, Peat G, Thomas E, Dunn KM, Croft PR. Prognostic factors for musculoskeletal pain in primary care: a systematic review. Br J Gen Pract 2007;57:655–61. [35] Mehling WE, Gopisetty V, Acree M, Pressman A, Carey T, Goldberg H, Hecht FM, Avins AL. Acute low back pain and primary care: how to define recovery and chronification? Spine 2011;26:2316–23. [36] Muller S, Thomas E, Dunn KM, Mallen C. A prognostic approach to defining chronic pain across a range of musculoskeletal pain sites. Clin J Pain 2013;29:411–6. [37] Osman A, Barrios FX, Gutierrez PM, Kopper BA, Merrifield T, Grittmann L. The pain catastrophizing scale: further psychometric evaluation with adult samples. J Behav Med 2000;23:351–65. [38] Pencina MJ, D’Agostino RB, Vasan RS. Evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond. Stat Med 2008;27:157–72. [39] Pengel LHM, Herbert RD, Maher CG, Refshauge KM. Acute low back pain: systematic review of its prognosis. BMJ 2003;327:323–7. [40] Ramond A, Bouton C, Richard I, Roquelaure Y, Baufreton C, Legrand E, Huez J-F. Psychosocial risk factors for chronic low back pain in primary care: a systematic review. Fam Pract 2011;28:12–21. [41] Romano PS, Roos L, Jollis J. Further evidence concerning the use of a clinical comorbidity index with ICD-9-CM administrative data. J Clin Epidemiol 1993;46:1085–90. [42] Sullivan MJL, Bishop SR, Pivik J. The pain catastrophizing scale: development and validation. Psychol Assess 1995;7:524–32. [43] Thomas E, Dunn KM, Mallen C, Peat G. A prognostic approach to defining chronic pain: application to knee pain in older adults. PAINÒ 2008;139:389–97. [44] Turner JA, Franklin G, Fulton-Kehoe D, Sheppard L, Stover B, Wu R, Gluck JV, Wickizer TM. ISSLS prize winner: early predictors of chronic work disability: a prospective, population-based study of workers with back injuries. Spine 2008;33:2809–18. [45] Vlaeyen JW, Kole-Snijders AMJ, Boeren RGB, van Eek H. Fear of movement/ (re)injury in chronic low back pain and its relation to behavioral performance. PAINÒ 1995;62:363–72. [46] Von Korff M. Epidemiological and survey methods: assessment of chronic pain. In: Turk DC, Melzack R, editors. Handbook of pain assessment. New York: The Guilford Press; 2001. p. 603–18. [47] Von Korff M. Epidemiological and survey methods: assessment of chronic pain. In: Turk DC, Melzack R, editors. Handbook of pain assessment. New York: The Guilford Press; 2011. p. 455–73. [48] Von Korff M, Balderson BHK, Saunders K, Miglioretti DL, Lin EHB, Berry S, Moore JE, Turner JA. A trial of an activating intervention for chronic back pain in primary care and physical therapy settings. PAINÒ 2005;113:323–30. [49] Von Korff M, Dunn KM. Chronic pain reconsidered. PAINÒ 2008;138:267–76. [50] Von Korff M, Lin EHB, Fenton JJ, Saunders K. Frequency and priority of pain patients’ health care use. Clin J Pain 2007;23:400–8. [51] Von Korff M, Miglioretti DL. A prognostic approach to defining chronic pain. PAINÒ 2005;117:304–13. [52] Von Korff M, Ormel J, Keefe FJ, Dworkin SF. Grading the severity of chronic pain. PAINÒ 1992;50:133–49. [53] Webster B, Verma SK, Gatchel RJ. Relationship between early opioid prescribing for acute occupational low back pain and disability duration, medical costs, subsequent surgery and late opioid use. Spine 2007;32:2127–32. [54] Whitfill T, Haggard R, Bierner SM, Pransky G, Hassett RG, Gatchel RJ. Early intervention options for acute low back pain patients: a randomized clinical trial with one-year follow-up outcomes. J Occup Rehabil 2010;20:256–63.