Manual Therapy xxx (2015) 1e9
Contents lists available at ScienceDirect
Manual Therapy journal homepage: www.elsevier.com/math
Review article
Patient-Reported Outcome (PRO) questionnaires for people with pain in any spine region. A systematic review Edmund Leahy a, b, *, Megan Davidson a, Deenika Benjamin a, Henry Wajswelner a a b
Department of Rehabilitation, Nutrition and Sport, School of Allied Health, Level 5, HS3, La Trobe University, Bundoora, Vic 3086, Australia Physiotherapy Department, Northern Health, 185 Cooper St, Epping, Vic 3076, Australia
a r t i c l e i n f o
a b s t r a c t
Article history: Received 6 July 2015 Received in revised form 21 October 2015 Accepted 22 October 2015
Background/objective: This systematic review investigates the measurement properties of Patient-Reported Outcome (PRO) questionnaires which evaluate disability associated with pain in any area of the spine. Method: PRO questionnaires for people with pain in any spinal region were identified from existing systematic reviews and recent studies. Databases were searched for studies which evaluated the measurement properties of the included questionnaires to August 2015. Data synthesis used a levels of evidence approach which considered study methodological quality. Results: The Extended Aberdeen Back Pain Scale (EA), Functional Rating Index (FRI) and Spine Functional Index (SFI) were identified as eligible for this review. The FRI was evaluated in 15 studies, with positive results for internal consistency, structural validity, hypothesis testing and responsiveness, negative results for measurement error and conflicting results for reliability. The SFI was evaluated in 3 studies with positive results for internal consistency, reliability, content validity, and structural validity. Conflicting results were found for hypothesis testing. The EA was evaluated in 3 studies which found negative results for internal consistency and structural validity. Conclusions: The FRI is provisionally recommended for the assessment of disability in people with multiarea spinal pain. This conclusion is based on studies of mainly fair methodological quality. © 2015 Elsevier Ltd. All rights reserved.
Keywords: Spine Patient-reported Outcome Measurement properties
1. Introduction Patient-Reported Outcome (PRO) questionnaires provide an efficient and convenient method of assessing disability in people with spine pain. For the clinician, the benefits of using PRO questionnaires are not limited to monitoring treatment effectiveness. Through focussing on the patient's perspective, questionnaires can assist patient centred clinical reasoning as well as facilitating patient empowerment and self-management strategies (Kyte et al., 2015). There are many region-specific PRO questionnaires for low back and neck pain (Costa et al., 2007a; Schellingerhout et al., 2012), but none for upper back pain. Rather than developing a new questionnaire for upper back pain, other PRO questionnaires
* Corresponding author. Department of Rehabilitation, Nutrition and Sport, School of Allied Health, Level 5, HS3, La Trobe University, Bundoora, Vic 3086, Australia. Tel.: þ61 3 3479 5857, þ61 419001291 (mobile). E-mail addresses:
[email protected] (E. Leahy),
[email protected] (M. Davidson),
[email protected] (D. Benjamin), h.wajswelner@ latrobe.edu.au (H. Wajswelner).
which are not restricted to one spine region could be used (Feise and Menke, 2010; Gabel et al., 2013). The advantage of these questionnaires is that only a single questionnaire needs to be used no matter where or how many areas of the spine are involved. The improved efficiency from using a single questionnaire may assist clinicians overcoming the time problem which is reported as a barrier to PRO questionnaire use (Duncan and Murray, 2012). As the prevalence of multi-region spine pain is reported to be 9.3% which is higher than the prevalence of neck pain alone at 4.4% (Strine and Hootman, 2007), these PRO questionnaires for any-region spine pain could be of great clinical value if they have sound measurement properties. Well-designed systematic reviews enable researchers and clinicians to make informed decisions about which PRO questionnaires to use in specific populations. Systematic reviews of PRO questionnaires which adhere to the PRISMA guidelines and “COnsensus-based Standards for the Selection of health status Measurement INstruments” (COSMIN) checklist (Liberati et al., 2009; Mokkink et al., 2012) have been completed for questionnaires which evaluate the disability of patients with neck pain
http://dx.doi.org/10.1016/j.math.2015.10.010 1356-689X/© 2015 Elsevier Ltd. All rights reserved.
Please cite this article in press as: Leahy E, et al., Patient-Reported Outcome (PRO) questionnaires for people with pain in any spine region. A systematic review, Manual Therapy (2015), http://dx.doi.org/10.1016/j.math.2015.10.010
2
E. Leahy et al. / Manual Therapy xxx (2015) 1e9
(Schellingerhout et al., 2011, 2012). Although these reviews concluded that none of the questionnaires have been adequately assessed, they can still inform decisions regarding questionnaire selection. A similar review for PRO questionnaires for any-region spine pain has not been completed. Such a review would allow the comparison of available questionnaires and assist researchers and clinicians in choosing a suitable questionnaire. The objective of this review was to evaluate the measurement properties of PRO questionnaires which evaluate disability associated with pain in any or multiple areas of the spine. 2. Methods
reliability, measurement error, construct validity, hypothesis testing, cross-cultural validity, criterion validity, content validity or responsiveness. Review articles were excluded. 2.5. Data extraction Measurement property and descriptive data was independently extracted by two authors (EL, DB). Disagreements were resolved by discussion. Descriptive data extracted was based on the interpretability and generalisability sections of the COSMIN checklist as recommended by the developers (Terwee et al., 2012). This data included setting, sample number, patient characteristics, instrument language, gender and age.
2.1. Questionnaires selection 2.6. Quality assessment A list of spinal PRO questionnaires was compiled from the content and reference lists of recent neck and back reviews of PRO questionnaires (Grotle et al., 2005; Costa et al., 2007a; Ferreira et al., 2010; Terwee et al., 2011; Schellingerhout et al., 2012) and from a recent report of the development of a PRO questionnaire for any-region spinal pain (Gabel et al., 2013). Selection of PRO questionnaires was completed using predetermined inclusion criteria. For inclusion, disability related to the neck, upper back and low back region pain needed to be evaluated in one questionnaire. Questionnaires needed to be available in English and independently completed by patients. Those requiring an interview for completion were excluded. The list of potential PRO questionnaires was screened for eligibility by two authors (EL, MD). Questionnaires that had ‘low back’ or ‘neck’ in the name were excluded. For the remaining questionnaires additional information was sought from studies describing their development to determine eligibility. The final included questionnaires were decided through consensus agreement. 2.2. Design A systematic review was completed for the selected questionnaires using a pre-determined protocol based on the PRISMA statement (Liberati et al., 2009) and COSMIN checklist guidelines (Mokkink et al., 2012; Terwee et al., 2012). 2.3. Search strategy A title and abstract search was completed for included questionnaires from inception to 5th August 2015 using Pubmed and Cinahl (EBSCO platform) databases. Search keywords were the questionnaire names. No language limits or other filters were used. Articles were imported into an EndNote x6 file and duplicates removed. Citation searches were completed for the earliest publication for each questionnaire using Web of Science and Pubmed databases. Reference lists of included articles were hand searched for additional articles. Title and abstracts were independently evaluated for inclusion by two authors (EL, HW) with disagreements resolved through discussion. If required a third reviewer made the final decision. Full text of remaining articles was then independently assessed for eligibility.
Two reviewers (EL, DB) independently assessed and rated each study's methodological quality using the COSMIN checklist (Mokkink et al., 2012; Terwee et al., 2012) which has been recommended for use in systematic reviews of PRO questionnaires (Mokkink et al., 2010b). Disagreements were settled through discussion, with a third reviewer (MD) making the final decision. The COSMIN checklist is made up of 12 sections each of which has between 5 and 18 items. Nine of these sections evaluate study quality with respect to specific measurement properties. Studies are rated as either poor, fair, good or excellent for each measurement property. The 9 measurement properties fit into one of 3 domains which are reliability, validity and responsiveness. Measurement property definitions are described elsewhere (Mokkink et al., 2012). Cohen's Kappa was calculated to determine the level of inter-rater agreement for quality assessment. 2.7. Results synthesis Evaluation of each questionnaire's measurement properties was completed using levels of evidence approach previously used in systematic reviews of PRO questionnaires (Schellingerhout et al., 2011, 2012). This approach was adopted from the Cochrane Back Review Group (van Tulder et al., 2003) and enables synthesis of measurement property data with the studies' methodological quality. Measurement properties are rated as having strong, moderate, limited, conflicting or unknown level of evidence. Strength of evidence can be in a positive or negative direction. Criteria used to evaluate the strength and direction of evidence is described in Tables 1 and 2. Levels of evidence evaluation were completed separately for each of the questionnaire language versions. Where required by the levels of evidence approach (Schellingerhout et al., 2012), values were calculated from other measures reported. Smallest Detectable Change (SDC) was calculated from the Standardised Error of Measurement (SEM) using the formula SDC ¼ 1.96 √2 SEM (de Vet et al., 2006; Terwee et al., 2007, 2009). For evaluation of measurement error, the levels of evidence approach requires the Minimal Important Change (MIC) to be compared to the SDC to determine whether there is a positive or negative result (Schellingerhout et al., 2011, 2012). The MIC (Terwee et al., 2009) has the same definition as the Minimally Clinically Important Difference (MCID) (Jaeschke et al., 1989) therefore the terms were considered equivalent for this review.
2.4. Identification of eligible studies 3. Results Studies were included if they were full text original journal articles published in English and evaluated the measurement properties of the selected questionnaires in any population. Measurement properties evaluated were any of those defined by the COSMIN checklist (Mokkink et al., 2010c): internal consistency,
3.1. Selection of questionnaires Fifty-nine PRO questionnaires were identified, of which 3 fulfilled the inclusion criteria. These were the Extended Aberdeen
Please cite this article in press as: Leahy E, et al., Patient-Reported Outcome (PRO) questionnaires for people with pain in any spine region. A systematic review, Manual Therapy (2015), http://dx.doi.org/10.1016/j.math.2015.10.010
E. Leahy et al. / Manual Therapy xxx (2015) 1e9
3
Table 1 Levels of evidence for overall quality of a measurement property.a,b Level
Rating
Criteria
Strong Moderate Limited Conflicting Unknown
þþþ or þþ or þ or þ/ ?
Consistent findings in multiple studies of good methodological quality OR in one study of excellent methodological quality Consistent findings in multiple studies of fair methodological quality OR in one study of good methodological quality One study of fair methodological quality Conflicting findings Only studies of poor methodological quality
a b
þ Positive result, negative result. Reproduced from Schellingerhout et al. (2011) with permission from Schellingerhout.
Back Pain Scale (EA) (Williams et al., 2001), Spine Functional Index (SFI) (Gabel et al., 2013) and Functional Rating Index (FRI) (Feise and Menke, 2001). Forty spine region specific questionnaires were excluded. Questionnaires such as the Spinal Functional Sort (Robinson et al., 2003) were excluded due to questionnaire completion requiring assistance (n ¼ 2). Other exclusion reasons are outlined in Fig. 1. 3.2. Study selection Search strategy for included studies yielded 268 unique articles. Following the selection process, 21 studies were eligible. There was disagreement on inclusion of 2 articles during the title and abstract evaluation and 2 articles during the full text evaluation. Both were settled through discussion. Fig. 1 provides the articles number at each selection process stage. General characteristics and methodological evaluation of included studies are presented in Tables 3 and 4. For the COSMIN methodological quality rating there was a moderate level of interrater agreement (k ¼ 0.474, CI 0.385e0.563). Levels of evidence synthesis of the results for each language are presented in Table 5.
Questionnaire descriptions and synthesised results are presented below. Only study components with fair, good or excellent COSMIN ratings will be discussed. 3.3. EA The EA is a 4 part questionnaire with one section completed by all patients, then a separate section for the low back, upper back and neck regions. Patients only complete sections relevant to their problem. The EA has 32 questions, each with 3e9 response options. Each area of the spine can be scored separately or together in one final score. Total score ranges from 0 to 112 for patients with pain in all spinal regions. The questionnaire takes under 5 min to complete (Williams et al., 2001) and is available in English (Williams et al., 2001) and German (Osthus et al., 2006). The questionnaire is available from the original publication (Williams et al., 2001). There were no studies of sound methodological quality which evaluated reliability, responsiveness, measurement error, crosscultural validity or hypothesis testing of the EA. Levels of evidence synthesis found limited negative evidence for structural validity for both language versions. None of the factors in any of the three sub-
Table 2 Quality criteria for measurement properties.a,b Property Reliability Internal consistency
Measurement error
Reliability
Validity Content validity
Construct validity Structural validity
Hypothesis testing
Rating
Quality criteria
þ ? þ ? þ ?
(Sub)scale unidimensional AND Cronbach's a 0.70 Dimensionality not known OR Cronbach's a not determined (Sub)scale not unidimensional OR Cronbach's a < 0.70 MIC > SDC OR MIC outside the LOA MIC not defined MIC SDC OR MIC equals or inside LOA ICC/weighted Kappa 0.70 OR Pearson's r 0.80 Neither ICC/weighted Kappa, nor Pearson's r determined ICC/weighted Kappa <0.70 OR Pearson's r < 0.80
þ ?
The target population considers all items in the questionnaire to be relevant AND considers the questionnaire to be complete No target population involvement The target population considers items in the questionnaire to be irrelevant OR considers the questionnaire to be incomplete
þ ? þ
Factors should explain at least 50% of the variance Explained variance not mentioned Factors explain <50% of the variance (Correlation with an instrument measuring the same construct 0.05 OR at least 75% of the results are in accordance with the hypotheses) AND correlation with related constructs is higher than with unrelated constructs. Solely correlations determined with unrelated constructs Correlation with an instrument measuring the same construct <0.50 OR <75% of the results are in accordance with the hypotheses OR correlation with related constructs is lower than with unrelated constructs.
? Responsiveness Responsiveness
þ ?
(Correlation with an instrument measuring the same construct 0.50 OR at least 75% of the results are in accordance with the hypotheses OR AUC 0.70) AND correlation with related constructs is higher than with unrelated constructs Solely correlations determined with unrelated constructs Correlation with an instrument measuring the same construct <0.50 OR <75% of the results are in accordance with the hypotheses OR AUC < 0.70 correlation with related constructs is lower than with unrelated constructs
a MIC Minimal Important Change, SDC Smallest Detectable Change, LOA Limits Of Agreement, ICC Intraclass Correlation Coefficient, AUC Area Under the Curve, þ positive rating, ? indeterminate rating, negative rating. b Reproduced from Schellingerhout et al. (2011) with permission from Schellingerhout.
Please cite this article in press as: Leahy E, et al., Patient-Reported Outcome (PRO) questionnaires for people with pain in any spine region. A systematic review, Manual Therapy (2015), http://dx.doi.org/10.1016/j.math.2015.10.010
4
E. Leahy et al. / Manual Therapy xxx (2015) 1e9
Identification
Instrument selection
Patient-Rated Outcome Questionnaires extracted from reviews and original articles= 90
Questionnaires after duplicates removed = 59
Questionnaires fulfilled criteria = 3 Spinal Functional Index (SFI), Functional Rating Index (FRI), Extended Aberdeen Back Pain Scale. (EA)
Records identified through PubMed and Cinahl search (n = SFI 113, FRI 111, EA 15)
Questionnaires excluded = 56 Multiple questionnaires=2 Not spine specific=4 Neck specific =7 Low back specific=33 Specific condition = 5 Not disability=3 Requires interview = 2
Records through other sources (Citation search n = SFI 2, FRI 135, EA 17) (Reference list search = SFI 0, FRI 0, EA 0)
Screening
Records after duplicates removed (n = SFI 96, FRI 148, EA 24)
Included
Eligibility
Records title and abstract screened (n = SFI 96, FRI 148, EA 24)
Full-text articles assessed for eligibility (n = SFI 3, FRI 20,
Records excluded (n =SFI 93, FRI 128 EA 21)
Full-text articles excluded No questionnaire evaluation =2 Review article = 1 Appraisal = 1 Commentary = 1
Studies included in qualitative synthesis (n = SFI 3, FRI 15, EA 3)
Fig. 1. PRISMA flow chart.
scales explained more than 50% of the variance (Williams et al., 2001; Osthus et al., 2006). There was limited negative evidence for internal consistency (Williams et al., 2001; Osthus et al., 2006). One methodologically sound study evaluated content validity of the EA (Wiitavaara et al., 2009). This study evaluated questionnaire performance on the construct of pain and suffering. This is broader than the disability construct assessed in this review. Subjects with neck and shoulder pain were interviewed to determine symptoms experienced which were then compared with the questionnaires items. Too few symptoms were included in the EA, therefore its content validity was questioned (Wiitavaara et al., 2009).
3.4. FRI The FRI consists of 10 items rated on a 5 point scale. Patient completion has been reported to take on average 64 s (range 12e140 s) (Feise and Menke, 2001) and 84 s (SD 23 s) (Gabel et al., 2013). The questionnaire is available in the original publication (Feise and Menke, 2001). The FRI has been translated into Persian (Ansari et al., 2011), Brazilian-Portuguese (Costa et al., 2007b), Korean (Lee et al., 2006) and Simplified Chinese (Wei et al., 2012). There were 16 articles evaluating measurement properties of the FRI. Six studies evaluated measurement properties of the English FRI (Feise and Menke, 2001; Chansirinukor et al., 2005; Childs
Please cite this article in press as: Leahy E, et al., Patient-Reported Outcome (PRO) questionnaires for people with pain in any spine region. A systematic review, Manual Therapy (2015), http://dx.doi.org/10.1016/j.math.2015.10.010
E. Leahy et al. / Manual Therapy xxx (2015) 1e9
5
Table 3 Study characteristics.a Study
Population (pain region) Country
Setting
Questionnaire evaluated
Questionnaire language
Osthus et al. (2006)
Germany
Medical rehabilitation centre
EA
German
Sweden United Kingdom Iran Iran
Local paper recruitment General practitioner and osteopathy clinics
EA EA
Unclear not reported English
Ansari et al. (2011) Ansari et al. (2012)
Neck, upper back, low back Neck, shoulder Neck, upper back, low back Low back Neck
FRI FRI
Persian Persian
Bayar et al. (2004) Ceran and Ozcan (2006)
Low back Low back
Turkey Turkey
Public physio clinics Public physiotherapy clinics and primary health care centres Recruited from ageing centre registry Rehabilitation department
FRI FRI
Chansirinukor et al. (2005) Childs and Piva (2005) Costa et al. (2007b) Costa et al. (2008) Feise and Menke (2001)
Back pain
Australia
Physical therapy clinic
FRI
Unclear not reported English and interpreted into Turkish English
Low back Low back Low back Neck, upper back, low back Neck
Australia Brazil Brazil USA
Physical therapy clinic Physiotherapy clinics (public and private) Physiotherapy clinics public and private Chiropractic clinics
FRI FRI FRI FRI
English Brazilian-Portuguese Brazilian-Portuguese English
Australia; Korea Iran Australia
Not reported
FRI
Korean
Sports clubs FRI Physiotherapy clinics; insurance claims database; FRI
Persian English
FRI
English
FRI
Simplified Chinese
FRI
Simplified Chinese
SFI
Spanish
Wiitavaara et al. (2009) Williams et al. (2001)
Lee et al. (2006) Naghdi et al. (2015) Rebbeck et al. (2007)
Wei et al. (2012)
Low back Whiplash associated disorders Whiplash associated disorders Low back
China
Wei et al. (2015)
Neck
China
Cuesta-Vargas and Gabel (2014) Gabel et al. (2013)
Neck, mid-back, low back Neck, upper back, low back Neck, mid-back, low back
Spain
Physiotherapy practice and private rehabilitation centres Changhai Hospital of the Second Military Medical University Changhai Hospital of the Second Military Medical University Physiotherapy clinics
Australia
Physiotherapy clinics
SFI, FRI
English
Turkey
Physical therapy clinic
SFI
Turkish
Stewart et al. (2007)
Tonga et al. (2015) a
Australia
EA ¼ Extended Aberdeen Back Pain Scale, FRI¼Functional Rating Index, SFI¼Spinal Functional Index.
and Piva, 2005; Rebbeck et al., 2007; Stewart et al., 2007; Gabel et al., 2013). Four studies described translation into other languages (Lee et al., 2006; Costa et al., 2007b; Ansari et al., 2011; Wei et al., 2012). Eight studies have evaluated the FRI's measurement properties in these languages (Lee et al., 2006; Costa et al., 2007b, 2008; Ansari et al., 2011, 2012; Wei et al., 2012; Naghdi et al., 2015; Wei et al., 2015). Studies varied in terms of the subject characteristics as outlined in Table 3. Two studies evaluated measurement properties of the questionnaire in subjects with lower back, upper back and neck pain (Feise and Menke, 2001; Gabel et al., 2013). Three assessed the FRI in subjects with neck pain (Lee et al., 2006; Ansari et al., 2012; Wei et al., 2015), 9 with back pain (Bayar et al., 2004; Chansirinukor et al., 2005; Childs and Piva, 2005; Ceran and Ozcan, 2006; Costa et al., 2007b, 2008; Ansari et al., 2011; Wei et al., 2012; Naghdi et al., 2015), and 2 with whiplash associated disorders (Rebbeck et al., 2007; Stewart et al., 2007). Two studies used the FRI as a comparator for validation of another questionnaire (Rebbeck et al., 2007; Gabel et al., 2013). Exploratory factor analysis indicates there is strong evidence that the Persian FRI has a 2 factor structure when used in subjects with neck pain (Ansari et al., 2012) which is inconsistent with strong evidence of a single factor structure when used in subjects with low back pain (Naghdi et al., 2015). A single factor structure was found by exploratory factor analysis on subjects with low back, upper back and neck pain for the English FRI (Gabel et al., 2013). Three of the 11 studies evaluating internal consistency had acceptable methodological quality. The Persian FRI has strong
positive evidence for internal consistency with a Cronbach a of 0.79, 0.89 and 0.90 (Ansari et al., 2012; Naghdi et al., 2015). The English FRI has limited evidence for internal consistency with a Cronbach a of 0.948 (Gabel et al., 2013). There were conflicting results for reliability from 11 fair quality studies. There was differing evidence for the reliability of the English FRI with reliability coefficients of 0.63 (95% confidence interval [CI] ¼ 0.37e0.82) and 0.67 (CI ¼ 0.5e0.80) reported for low back pain patients (Chansirinukor et al., 2005; Childs and Piva, 2005), while an ICC of 0.948 (CI not reported) was found in patients with a combination of neck, upper back and low back pain (Gabel et al., 2013). There was limited positive evidence for the Korean FRI and Simple Chinese FRI in patients with neck pain (ICC ¼ 0.86, CI ¼ 0.75e0.92; ICC ¼ 0.97, CI ¼ 0.95e0.98) (Lee et al., 2006; Wei et al., 2015). There was moderate positive evidence for the Persian version of the questionnaire when evaluating neck (ICC ¼ 0.96, CI not reported) or low back pain (ICC ¼ 0.81, CI not reported; ICC ¼ 0.97, CI 0.92e0.99) (Ansari et al., 2011, 2012; Naghdi et al., 2015). There was also moderate positive evidence for the Brazilian-Portuguese FRI with back pain patients (ICC ¼ 0.95, CI ¼ 0.93e0.97; ICC ¼ 0.86, CI ¼ 0.77e0.95) (Costa et al., 2007b, 2008). Three fair quality studies evaluated measurement error of the FRI (Childs and Piva, 2005; Gabel et al., 2013; Naghdi et al., 2015). Only one study (Childs and Piva, 2005) reported a MCID (8.4), hence was the only study used to determine the measurement error. The calculated SDC (20.79) is greater than the MIC (8.4) reported and therefore there is limited negative evidence for measurement error of the English FRI in subjects with low back pain.
Please cite this article in press as: Leahy E, et al., Patient-Reported Outcome (PRO) questionnaires for people with pain in any spine region. A systematic review, Manual Therapy (2015), http://dx.doi.org/10.1016/j.math.2015.10.010
6
E. Leahy et al. / Manual Therapy xxx (2015) 1e9
Table 4 Methodological quality table.a Article
Instrument
Internal consistency
Reliability Measurement error
Content validity
Structural validity
Hypothesis testing
Cross culture validity
Responsiveness
Osthus et al. (2006)
EA
Fair
Poor
n/a
n/a
Fair
Poor
Poor
Wiitavaara et al. (2009) Williams et al. (2001) Ansari et al. (2011)
EA EA FRI
n/a Fair Poor
n/a Poor Fair
n/a n/a n/a
Excellent Poor n/a
n/a Fair n/a
n/a Poor Fair
Ansari et al. (2012) Bayar et al. (2004) Ceran and Ozcan (2006) Chansirinukor et al. (2005) Childs and Piva (2005) Costa et al. (2007b)
FRI FRI FRI FRI
Excellent Poor Poor n/a
Fair Poor Poor Fair
n/a n/a n/a n/a
n/a n/a n/a n/a
Excellent n/a n/a n/a
Fair Poor Poor n/a
Poor (translation only) n/a n/a Good (translation only) n/a n/a n/a n/a
FRI FRI
n/a Poor
Fair Fair
Fair n/a
n/a n/a
n/a n/a
Poor Fair
Costa et al. (2008) Feise and Menke (2001) Lee et al. (2006)
FRI FRI FRI
Poor Poor Poor
Fair Poor Fair
n/a Poor n/a
n/a Poor n/a
n/a n/a n/a
Poor Poor Poor
Naghdi et al. (2015) Stewart et al. (2007) Wei et al. (2012)
FRI FRI FRI
Excellent n/a Poor
Fair n/a Poor
Fair n/a n/a
n/a n/a n/a
Excellent n/a n/a
Poor n/a Fair
Poor Fair
Fair Fair
n/a Fair
n/a n/a
n/a Fair
Fair Fair
n/a Poor (translation only) n/a n/a Fair (translation only) n/a n/a Good (translation only) n/a n/a
Poor
n/a
n/a
n/a
n/a
Fair
n/a
Fair
Excellent
Fair
Fair
n/a
Excellent
Fair
n/a
Fair Excellent
Fair Good
Fair Good
Excellent n/a
Fair Excellent
Fair Fair
Poor (translation only) n/a Poor (translation only)
Wei et al. (2015) Gabel et al. (2013)
FRI FRI (comparator) Rebbeck et al. (2007) FRI (comparator) Cuesta-Vargas and Gabel SFI (2014) Gabel et al. (2013) SFI Tonga et al. (2015) SFI a
n/a Poor n/a n/a n/a n/a Poor Poor Poor Poor Poor Poor n/a Fair n/a n/a Fair
Fair n/a
EA ¼ Extended Aberdeen Back Pain Scale, FRI¼Functional Rating Index, SFI¼Spinal Functional Index, n/a ¼ not applicable.
2 of these studies (Rebbeck et al., 2007; Stewart et al., 2007), hence there was a moderate level of evidence for responsiveness of the FRI. Three fair to good quality studies translated the FRI into different languages (Lee et al., 2006; Ansari et al., 2011; Wei et al., 2012). No studies adequately addressed cross-cultural adaptation as described by COSMIN checklist.
No acceptable quality studies evaluated the content validity of the FRI. There were consistent findings for hypothesis testing in 7 fair quality studies (Rebbeck et al., 2007; Costa et al., 2007b; Ansari et al., 2011, 2012; Wei et al., 2012, 2015; Gabel et al., 2013). All found Pearson's correlations of >0.5 when compared with questionnaires with similar constructs such as the Persian Oswestry Disability Index (Ansari et al., 2011). Some also found Pearson's correlations of <0.5 with outcome measures which assessed different constructs such as the SF-12 mental component score (Feise and Menke, 2001; Wei et al., 2012). Three studies of fair quality evaluated the responsiveness of the English FRI (Rebbeck et al., 2007; Stewart et al., 2007; Gabel et al., 2013). The Area Under the Curve (AUC) was found to be >0.70 in
3.5. SFI The SFI consists of 25 items rated on a 3 point scale. The final score ranges from 0 to 25 and is converted to a percentage. Average patient completion time is reported as 122 s (SD 37 s) and scoring
Table 5 Levels of evidence table.a Questionnaire
Internal consistency
Reliability
Measurement error
Content validity
Structural validity
hypothesis testing
Cross cultural validity
Responsiveness
EA (English) EA (German) FRI (BrazilianPortuguese) FRI (English) FRI (Korean) FRI (Persian) FRI (Simplified Chinese) SFI (English) SFI (Spanish) SFI (Turkish)
?
? ? þþ
n n n
? n n
n
? ? þ
n ? n
? ? ?
þ ? þþþ ? þ þþþ þþþ
þ/ þ þþ þ þ þ þþ
n ? n ? ? ?
? n n n þþþ n n
þ n þþþ n þ þþþ þþþ
þþ ? þþ þþ þ þ þ
n n n n n n n
þþ ? n n ? n n
a þþþ or Strong evidence positive/negative result, þþ or moderate evidence positive/negative result, þ or limited evidence positive/negative result, þ conflicting evidence, ? unknown due to poor methodological quality, n no information available, EA Extended Aberdeen Back Pain Scale, FRI Functional Rating Index, SFI Spinal Functional Index.
Please cite this article in press as: Leahy E, et al., Patient-Reported Outcome (PRO) questionnaires for people with pain in any spine region. A systematic review, Manual Therapy (2015), http://dx.doi.org/10.1016/j.math.2015.10.010
E. Leahy et al. / Manual Therapy xxx (2015) 1e9
time 16 s (SD 4 s) (Gabel et al., 2013). The questionnaire is available from the original article (Gabel et al., 2013). The SFI has been translated into Spanish and Turkish (Cuesta-Vargas and Gabel, 2014; Tonga et al., 2015). Three studies evaluated the measurement properties of the SFI (Gabel et al., 2013; Cuesta-Vargas and Gabel, 2014; Tonga et al., 2015). These studies are of acceptable quality for construct validity, internal consistency, reliability, responsiveness, measurement error and hypothesis testing. There is strong positive evidence for structural validity with exploratory factor analysis finding a unidimensional structure of the English, Spanish and Turkish SFI. There is limited positive evidence for internal consistency of the English SFI with a Cronbach's a of 0.911. There is strong positive evidence for the internal consistency of the Spanish (a ¼ 0.845) and Turkish SFI (a ¼ 0.85). There is limited positive evidence for reliability of the English (ICC ¼ 0.972, CI not reported), Spanish (ICC 0.96, item range 0.93e0.98) and Turkish SFI's (ICC ¼ 0.93, item range 0.75e0.95) (Gabel et al., 2013; Cuesta-Vargas and Gabel, 2014; Tonga et al., 2015). Responsiveness was unable to be evaluated for the English SFI as the AUC was not calculated. Calculations used were not recommended by COSMIN or the levels of evidence approach used in this review. There was strong positive evidence for content validity with the questionnaire's development (Gabel et al., 2013) satisfying the requirements of the COSMIN checklist. The evidence regarding measurement error in all studies was inconclusive due to the authors not documenting a clear MIC. Hypothesis testing found that the English SFI had a Pearson's r correlation of 0.85 with the FRI (Gabel et al., 2013). The Turkish FRI had a Pearson's r correlations with disability questionnaires with similar contracts of 0.52, 0.58 and 0.71 (Tonga et al., 2015). The Spanish SFI had conflicting evidence for hypothesis testing as Pearson's r was 0.46 for the neck disability index, 0.79 for the Roland Morris Questionnaire and 0.59 for the Backache Index (Cuesta-Vargas and Gabel, 2014). 4. Discussion This review found 3 PRO questionnaires which evaluate disability of people with pain in any spine region: the EA, FRI and SFI. Overall, information on the measurement properties of these questionnaires is limited due to the small number of high quality studies. 4.1. Findings Generally, the few studies evaluating the EA were of poor methodological quality. Aspects of the studies that were of adequate methodological quality provided negative evidence for the EA. The SFI has limited to strong evidence for most measurement properties in 3 languages. The results were inconclusive for measurement error and responsiveness due to non-completion of statistical tests required by the levels of evidence approach (Schellingerhout et al., 2012). The developer of the questionnaire was an author in all 3 studies evaluating the SFI, hence its measurement properties are yet to be independently evaluated. The questionnaire subject to the most research is the English FRI. There was limited positive evidence for most the FRI's measurement properties and limited negative evidence for measurement error. The conflicting results for English FRI reliability may be due to only fair quality studies being available. Future studies with stronger methodology should provide a clearer picture of this measurement property. Although there was limited positive evidence for the structural validity for the FRI, there were conflicting results with regards to whether the structure was uni-dimensional or multi-dimensional.
7
These inconsistent results could be due to studies evaluating the questionnaire in different languages and on different populations. This is not an unusual finding as inconsistent measurement results have been found with previous research into measurement properties of translated questionnaires (Menezes Costa Lda et al., 2011; Schellingerhout et al., 2011). 4.2. Strengths The decision of whether to use a PRO questionnaire in research or clinical practice should not only be based on the measurement property result, but also the methodological quality of studies reporting these results. A strength of this review is the methodological quality evaluation of included studies. Previous reviews that included the FRI (Feise and Menke, 2010; Ferreira et al., 2010) and the EA (Resnick, 2005; Misailidou et al., 2010) have not done this, therefore may give misleading measurement property information. A self-reported limitation of the COSMIN checklist is low interrater agreement (Mokkink et al., 2010a). One recommendation to address this is for 2 reviewers to independently complete quality assessments and decide a final rating through consensus. This recommendation was adhered to in this review. 4.3. Limitations A limitation of the review was the strategy used to source relevant PRO questionnaires. It is recommended that search strategies for PRO questionnaire reviews include the patient population and constructs of interest as search terms (de Vet et al., 2011). As the patient population and constructs of interest in this review were extremely broad, it was decided that a different strategy would be more efficient. It is acknowledged that the strategy used is a potential source of bias and may have missed relevant PRO questionnaires. One of the difficulties with interpreting many of the included studies was that they did not calculate measurement values recommended by the COSMIN checklist (Mokkink et al., 2012) and required by the levels of evidence guidelines adopted by this review (Schellingerhout et al., 2011, 2012). For example, SDC was not reported in any of the studies, hence it was difficult to evaluate measurement error. This was further hampered by a lack of MIC determination by most studies. As MIC can differ depending on the population and baseline characteristics (Schuller et al., 2014) it is important that MIC is determined and reported for each individual study. Pooling of results using the levels of evidence in this review was not ideal due to the different populations in which measurement properties were evaluated. Some studies used subjects with low back pain, some with neck pain and some with combinations of the two as well as upper-back pain. The same individual question items may have different meanings depending on whether a subject has neck, back, upper back or multi-area spine pain, so ideally each of these populations should be studied and considered separately. None-the-less, summarizing of the levels of evidence for each questionnaire with detail on the specific measurement properties provides the best indication of the current state of evidence with regards to each questionnaire. 4.4. Implications for clinical practice The EA is not recommended for use due to the limited, negative evidence regarding its measurement properties. The FRI and SFI are recommended for use in patients with multi-region spine pain due to the limited to strong evidence for most of their measurement properties in multiple languages. Although the SFI has stronger
Please cite this article in press as: Leahy E, et al., Patient-Reported Outcome (PRO) questionnaires for people with pain in any spine region. A systematic review, Manual Therapy (2015), http://dx.doi.org/10.1016/j.math.2015.10.010
8
E. Leahy et al. / Manual Therapy xxx (2015) 1e9
evidence for most properties, these results are yet to be replicated by a research team that does not include the developer. The FRI has been subject to independent evaluation which has generally found positive evidence for most measurement properties, therefore it is recommended over the SFI. This recommendation is provisional due to inconsistent and negative results for some measurement properties in fair quality studies. For patients with isolated low back or neck pain, region-specific questionnaires should continue to be used unless future high quality evidence consistently demonstrates sound measurement properties for the SFI and FRI in these populations. This is unfortunate, as using the SFI or FRI would be more efficient than using multiple spine region-specific questionnaires for a busy clinician with a caseload which includes patients with pain in one or more regions of the spine. 4.5. Directions for future research Future research is required which evaluates the measurement properties of SFI and FRI in patients with both single and multiple areas of spine pain. As their measurement properties may vary depending on whether they are used in neck, back, upper back or multi-area spine pain populations, care should be taken to study these populations separately. This may resolve the current doubt over some measurement properties due to inconsistencies in results between studies. Adherence to the COSMIN consensus guidelines by future PRO questionnaire research would improve consistency and allow for synthesis of results in future reviews. Researchers should consider cross-cultural validity when evaluating translated questionnaires. 5. Conclusions The FRI is provisionally recommended over the SFI and EA for the clinical assessment of disability associated with multi-region spine pain. Researchers and clinicians planning to follow this recommendation should keep in mind that this conclusion is based mainly on studies of fair methodological quality. Funding None. Acknowledgements Dr J Schellingerhout for granting permission to reproduce tables. References Ansari NN, Feise RJ, Naghdi S, Ebadi S, Yoosefinejad AK. The functional rating index: reliability and validity of the Persian language version in patients with low back pain. Spine (Phila Pa 1976) 2011;36:E1573e7. Ansari NN, Feise RJ, Naghdi S, Mohseni A, Rezazadeh M. The functional rating index: reliability and validity of the Persian language version in patients with neck pain. Spine (Phila Pa 1976) 2012;37:E844e8. Bayar B, Bayar K, Yakut E, Yakut Y. Reliability and validity of the functional rating index in older people with low back pain: preliminary report. Aging Clin Exp Res 2004;16:49e52. Ceran F, Ozcan A. The relationship of the functional rating index with disability, pain, and quality of life in patients with low back pain. Med Sci Monit 2006;12: Cr435e9. Chansirinukor W, Maher CG, Latimer J, Hush J. Comparison of the functional rating index and the 18-item RolandeMorris disability questionnaire: responsiveness and reliability. Spine (Phila Pa 1976) 2005;30:141e5. Childs JD, Piva SR. Psychometric properties of the functional rating index in patients with low back pain. Eur Spine J 2005;14:1008e12. Costa LO, Maher CG, Latimer J. Self-report outcome measures for low back pain e searching for international cross-cultural adaptations. Spine 2007a;32: 1028e37.
Costa LO, Maher CG, Latimer J, Ferreira PH, Ferreira ML, Pozzi GC, et al. Clinimetric testing of three self-report outcome measures for low back pain patients in Brazil: which one is the best? Spine (Phila Pa 1976) 2008;33:2459e63. Costa LO, Maher CG, Latimer J, Ferreira PH, Pozzi GC, Ribeiro RN. Psychometric characteristics of the Brazilian-Portuguese versions of the functional rating index and the Roland Morris disability questionnaire. Spine (Phila Pa 1976) 2007b;32:1902e7. Cuesta-Vargas AI, Gabel CP. Validation of a Spanish version of the spine functional index. Health Qual Life Outcomes 2014;12:96. de Vet HC, Terwee CB, Knol DL, Bouter LM. When to use agreement versus reliability measures. J Clin Epidemiol 2006;59:1033e9. de Vet HCW, Terwee CBM, Lidwine B, Knol DL. Measurement in medicine: a practical guide. New York: Cambridge University Press; 2011. Duncan EA, Murray J. The barriers and facilitators to routine outcome measurement by allied health professionals in practice: a systematic review. BMC Health Serv Res 2012;12:96. Feise RJ, Menke JM. Functional rating index: a new valid and reliable instrument to measure the magnitude of clinical change in spinal conditions. Spine (Phila Pa 1976) 2001;26:78e86 [discussion 7]. Feise RJ, Menke JM. Functional rating index: literature review. Med Sci Monit 2010;16:Ra25e36. Ferreira ML, Borges BM, Rezende IL, Carvalho LP, Soares LPS, Dabes RAI, et al. Are neck pain scales and questionnaires compatible with the international classification of functioning, disability and health? A systematic review. Disabil Rehabil 2010;32:1539e46. Gabel CP, Melloh M, Burkett B, Michener LA. The spine functional index: development and clinimetric validation of a new whole-spine functional outcome measure. Spine J 2013 [published online ahead of print October 25 2013]. Grotle M, Brox JI, Vollestad NK. Functional status and disability questionnaires: what do they assess? A systematic review of back-specific outcome questionnaires. Spine 2005;30:130e40. Jaeschke R, Singer J, Guyatt GH. Measurement of health status. Ascertaining the minimal clinically important difference. Control Clin Trials 1989;10:407e15. Kyte DG, Calvert M, van der Wees PJ, ten Hove R, Tolan S, Hill JC. An introduction to patient-reported outcome measures (PROMs) in physiotherapy. Physiotherapy 2015;101:119e25. Lee H, Nicholson LL, Adams RD, Maher CG, Halaki M, Bae SS. Development and psychometric testing of Korean language versions of 4 neck pain and disability questionnaires. Spine (Phila Pa 1976) 2006;31:1841e5. Liberati A, Altman DG, Tetzlaff J, Mulrow C, Gotzsche PC, Ioannidis JP, et al. The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate health care interventions: explanation and elaboration. Ann Intern Med 2009;151:W65e94. Menezes Costa Lda C, Maher CG, McAuley JH, Hancock MJ, de Melo Oliveira W, Azevedo DC, et al. The Brazilian-Portuguese versions of the McGill Pain Questionnaire were reproducible, valid, and responsive in patients with musculoskeletal pain. J Clin Epidemiol 2011;64:903e12. Misailidou V, Malliou P, Beneka A, Karagiannidis A, Godolias G. Assessment of patients with neck pain: a review of definitions, selection criteria, and measurement tools. J Chiropr Med 2010;9:49e59. Mokkink LB, Terwee CB, Gibbons E, Stratford PW, Alonso J, Patrick DL, et al. Interrater agreement and reliability of the COSMIN (COnsensus-based Standards for the selection of health status Measurement Instruments) checklist. BMC Med Res Methodol 2010a;10:82. Mokkink LB, Terwee CB, Patrick DL, Alonso J, Stratford PW, Knol DL, et al. The COSMIN checklist for assessing the methodological quality of studies on measurement properties of health status measurement instruments: an international Delphi study. Qual Life Res 2010b;19:539e49. Mokkink LB, Terwee CB, Patrick DL, Alonso J, Stratford PW, Knol DL, et al. The COSMIN study reached international consensus on taxonomy, terminology, and definitions of measurement properties for health-related patient-reported outcomes. J Clin Epidemiol 2010c;63:737e45. Mokkink LB, Terwee CB, Patrick DL, Alonso J, Stratford PW, Knol DL, et al. The COSMIN checklist manual. 2012. p. 55. Naghdi S, Nakhostin Ansari N, Yazdanpanah M, Feise RJ, Fakhari Z. The validity and reliability of the functional rating index for evaluating low back pain in athletes. Scand J Med Sci Sports 2015 [Epub ahead of print]. Osthus H, Cziske R, Jacobi EA. German version of the extended Aberdeen back pain scale: development and evaluation. Spine (Phila Pa 1976) 2006;31:571e7. Rebbeck TJ, Refshauge KM, Maher CG, Stewart M. Evaluation of the core outcome measure in whiplash. Spine (Phila Pa 1976) 2007;32:696e702. Resnick DN. Subjective outcome assessments for cervical spine pathology: a narrative review. J Chiropr Med 2005;4:113e34. Robinson RC, Kishino N, Matheson L, Woods S, Hoffman K, Unterberg J, et al. Improvement in postoperative and nonoperative spinal patients on a selfreport measure of disability: the spinal function sort (SFS). J Occup Rehabil 2003;13:107e13. Schellingerhout JM, Heymans MW, Verhagen AP, de Vet HC, Koes BW, Terwee CB. Measurement properties of translated versions of neck-specific questionnaires: a systematic review. BMC Med Res Methodol 2011;11:87. Schellingerhout JM, Verhagen AP, Heymans MW, Koes BW, de Vet HC, Terwee CB. Measurement properties of disease-specific questionnaires in patients with neck pain: a systematic review. Qual Life Res 2012;21:659e70. Schuller W, Ostelo RW, Janssen R, de Vet HC. The influence of study population and definition of improvement on the smallest detectable change and the minimal
Please cite this article in press as: Leahy E, et al., Patient-Reported Outcome (PRO) questionnaires for people with pain in any spine region. A systematic review, Manual Therapy (2015), http://dx.doi.org/10.1016/j.math.2015.10.010
E. Leahy et al. / Manual Therapy xxx (2015) 1e9 important change of the neck disability index. Health Qual Life Outcomes 2014;12:53. Stewart M, Maher CG, Refshauge KM, Bogduk N, Nicholas M. Responsiveness of pain and disability measures for chronic whiplash. Spine (Phila Pa 1976) 2007;32: 580e5. Strine TW, Hootman JM. US national prevalence and correlates of low back and neck pain among adults. Arthritis Rheum 2007;57:656e65. Terwee CB, Bot SD, de Boer MR, van der Windt DA, Knol DL, Dekker J, et al. Quality criteria were proposed for measurement properties of health status questionnaires. J Clin Epidemiol 2007;60:34e42. Terwee CB, Mokkink LB, Knol DL, Ostelo RW, Bouter LM, de Vet HC. Rating the methodological quality in systematic reviews of studies on measurement properties: a scoring system for the COSMIN checklist. Qual Life Res 2012;21: 651e7. Terwee CB, Roorda LD, Knol DL, De Boer MR, De Vet HC. Linking measurement error to minimal important change of patient-reported outcomes. J Clin Epidemiol 2009;62:1062e7. Terwee CB, Schellingerhout JM, Verhagen AP, Koes BW, de Vet HC. Methodological quality of studies on the measurement properties of neck pain and disability questionnaires: a systematic review. J Manip Physiol Ther 2011;34:261e72.
9
Tonga E, Gabel CP, Karayazgan S, Cuesta-Vargas AI. Cross-cultural adaptation, reliability and validity of the Turkish version of the spine functional index. Health Qual Life Outcomes 2015;13:30. van Tulder M, Furlan A, Bombardier C, Bouter L. Updated method guidelines for systematic reviews in the cochrane collaboration back review group. Spine (Phila Pa 1976) 2003;28:1290e9. Wei X, Chen Z, Bai Y, Zhu X, Wu D, Liu X, et al. Validation of the Simplified Chinese version of the functional rating index for patients with low back pain. Spine (Phila Pa 1976) 2012;37:1602e8. Wei XZ, Xu XM, Zhao YF, Chen K, Wang F, Fan JP, et al. Validation of the Simplified Chinese version of the functional rating index for patients with nonspecific neck pain in mainland China. Spine 2015;40:E538e44. Wiitavaara B, Bjorklund M, Brulin C, Djupsjobacka M. How well do questionnaires on symptoms in neck-shoulder disorders capture the experiences of those who suffer from neck-shoulder disorders? A content analysis of questionnaires and interviews. BMC Musculoskelet Disord 2009;10:30. Williams NH, Wilkinson C, Russell IT. Extending the Aberdeen back pain scale to include the whole spine: a set of outcome measures for the neck, upper and lower back. Pain 2001;94:261e74.
Please cite this article in press as: Leahy E, et al., Patient-Reported Outcome (PRO) questionnaires for people with pain in any spine region. A systematic review, Manual Therapy (2015), http://dx.doi.org/10.1016/j.math.2015.10.010