The Quebec Back Pain Disability Scale: Conceptualization and development

The Quebec Back Pain Disability Scale: Conceptualization and development

] Clin Epidemiol Vol. 49, No. 2, pp. 151-161, CopyrIght 0 1996 Elsevier Science Inc. 0895-4356/96/$15.00 SSDI 0895-4356(96)00526-A 1996 ELSEVlER T...

1MB Sizes 0 Downloads 65 Views

] Clin Epidemiol Vol. 49, No. 2, pp. 151-161, CopyrIght 0 1996 Elsevier Science Inc.

0895-4356/96/$15.00 SSDI 0895-4356(96)00526-A

1996

ELSEVlER

The Quebec Back Pain Disability Scale: Conceptualization and Development Jacek A. Kopec, ’ ’ John M. Esdaik, ’ Michal Abrahamowicz, ’ Lucien Abe&aim,’ Sharon Wood-Dauphinee, 3 Donna L. Lumping,4 and J. Ivan Williams5 IDEPARTMENT MEDICINE, CENTRE, MCGILL

OF EPIDEMIOLOGY MONTREAL

JEWISH

GENERAL

UNIVERSITY,

HYGIENE

AND

AND

GENERAL

HOSPITAL,

MONTREAL, TROPICAL

BIOSTATISTICS

HOSPITAL,

MONTREAL, CANADA;

MEDICINE,

SUNNYBROOK

MCGILL

AND

DIVISION

UNIVERSITY, CANADA;

3SCHOOL

4DEPARTMENT LONDON,

HEALTH

U.K.; SCIENCE

OF CLINICAL MONTREAL,

OF PHYSICAL

OF PUBLIC AND

51NSTITUTE

CENTRE,

EPIDEMIOLOGY, CANADA;

NORTH

HEALTH FOR YORK,

AND

AND

DEPARTMENT

‘CLINICAL OCCUPATIONAL

POLICY,

CLINICAL

LONDON

EVALUATE

OF

EPIDEMIOLOGY THERAPY, SCHOOL

OF

SCIENCES,

CANADA.

ABSTRACT. The Quebec Back Pain Disability Scale is a new measure of functional disability for patients with back pain. Functional disability was operationalized in terms of perceived difficulty associated with simple physical activities. The content of the scale was developed in several stages, including a literature review, two studies seeking the opinions of patients and experts, pilot testing, and a large, longitudinal study of back pain patients. Forty-eight disability items were extensively studied using standard methods such as test-retest reliability, item-total correlations, and factor analysis, as well as modem techniques based on item response theory. Items that were highly effective in discriminating between different levels of disability were selected for the final, reduced scale. The scale has 20 items, representing six empirically derived categories of activities affected by back pain. Measurement properties of this instrument have been previously 49;2:151-161, 1996. discussed. J CLIN EPIDEMIOL KEY

WORDS.

Back pain, disability, activities of daily living, questionnaires, item response theory

INTRODUCTION

Self-reported functional outcomes are now routinely assessedin clinical trials and are increasingly implemented by clinicians to monitor the course of disease in individual patients. As the impact of functional outcomes on clinical and policy decision-making increases, so does the need for assessment devices of demonstrable quality. Over the past two decades, a large number of questionnaires and rating schemes have been used to assessthe functional status of patients with back pain [l-3]. The most widely accepted are the Sickness Impact Profile (SIP) [4], the Roland scale [S] derived from the SIP, the Oswestry questionnaire [6], Million’s visual analog scale [7], and Waddell’s disability index [8]. Several other scales have been proposed [9-141, but they have not gained as much popularity as the aforementioned measures. The scales generally reported good test-retest reliability. Their construct validity has been supported by correlations with pain, spinal impairment, and other variables [l-3]. The results of some randomized clinical trials and other studies suggest that the established scales can detect changes in functional status over time [2,3]. Nevertheless, selecting a measure of disability for patients with back pain has been a problem for researchers and clinicians, with many authors preferring to use their own, ad hoc scales. One reason is the paucity of comparative data on the properties of different scales. Comparisons across studies are problematic because of differences in the populations and methods of data collection, and head-to-head compar-

‘All correspondence should be addressed to Dr. Jacek A. Kopec, MD, PhD, Dep. of Preventive Medicine and Biostatistics, University of Toronto, 12 Queen’s Park Crescent West, Toronto, Ontario, Canada M5S IA8. Received in revised form 2 May 1995.

isons have been rare. Furthermore, current scales lack a strong concep tual basis and their content validity is uncertain [2,3]. There is considerable variation across the measures with respect to the concepts being assessed. In addition to physical disability, some questionnaires offer separate subscales for such concepts as social life, employment, emotional status, or medical care utilization [9-131. A variety of operational criteria for assessing disability have been used, including physical capacity, actual performance or avoidance of activities, difficulty, and pain. In some scales, intuitively derived methods of weighting the responses have been adopted [ 11,121. However, few authors have justified the selected concepts or provided a convincing rationale for the proposed weighting schemes. The types of activities included in different scales vary. Only two activities, standing and walking, are shared by the four most commonly used brief instruments. The Roland scale does not have any questions asking specifically about lifting, carrying, pulling, or pushing objects, and the Oswestry scale does not have any questions pertaining to bending or body movement. For most scales, the methods and criteria of item generation and selection have not been explained. To address these concerns and provide researchers and clinicians with an alternative to existing scales, we developed a new self-report measure of functional disability for patients with back pain. Our goal was to construct a multipurpose questionnaire that could be used as an outcome in clinical trials, for monitoring the progress of patients participating in treatment or rehabilitation programs, and for comparing different groups of back pain patients. Such a questionnaire should have a clear theoretical orientation, demonstrable content validity, and good psychometric properties. In this article we describe the development, evaluation, and selection of the items. Measurement properties of the final instrument, as well as head-to-head comparisons with other scales, have been published [ 151.

Kopec et al.

152 METHODS

Content Development PRELIMINARY STUDIES. Disability has been defined by the World Health Organization as “any restriction or lack of ability to perform an activity in a manner or within the range considered normal for a human being” [ 16, p. 281. We conducted two small studies to ascertain the impact of back pain on daily life and the importance of various activities in assessing the associated disability. In the first study, we surveyed 31 health care providers involved on a daily basis in the treatment and rehabilitation of patients with back pain (back pain experts). In the second study we interviewed 34 back pain patients attending an orthopedic clinic or a physiatry center (a convenience sample) using a semi-structured questionnaire. The experts were identified primarily through personal contacts. Almost half of them were physiotherapists, and the remainder were physicians, including family doctors and specialists in orthopedics, physiatry, rheumatology, and occupational medicine. The sample size was determined arbitrarily, on the basis of our previous experience. The experts were sent a list of 20 activities derived from existing back pain instruments and asked to rate each activity in terms of importance for measuring disability in back pain. They were also asked to add to the list any other activities they considered important, and to formulate three questions they would ask a patient with back pain to assess his/her disability level. According to the experts, the most important activities were (in order): standing, bending, walking, sitting, lifting, twisting, and reaching. The back pain patients interviewed in the second study were asked to list all activities they had difficulty performing or were unable to perform because of back pain, in such areas as self-care, housework, work outside the home, social life, and recreation. Detailed instructions, multiple prompts, and other techniques described in the literature 1171 were used to elicit as many responses as possible. For each complex task reported, for example, housekeeping, shopping or traveling, the respondent was asked to specify the physical activities (e.g., bending, lifting, sitting) that caused most difficulty while performing the larger task. The study produced a list of over 130 activities of varying complexity (Table 1). Note that this list was by no means exhaustive. Further interviews likely would have provided additional tasks affected by back pain, especially in the areas of work, social life, and recreation. However, many of these situation-related tasks would not apply to the general population of back pain patients. At the same time we found that limitations in simple physical activities, such as bending, walking, standing, or lifting, often were reported as causing difficulty in the performance of more complex tasks. ACTIVITIES. Current models of disability [l&21] assume that limitations in situation-related tasks and behaviors, including socalled instrumental activities of daily living, can be ascribed to underlying limitations in elementary (situation-free) physical activities. These activities have been described as “very fundamental actions used in many specific settings of daily life” [22]. Nagi [19] refers to restrictions in this type of activities as functional limitations. Thus, it should be possible to identify relatively few elementary actions needed to perform a virtually unlimited number of daily tasks. We applied this concept to develop a list of elementary physical activities potentially affected by back pain. The following sources have been used: (1) the lists of activities obtained in our preliminary studies; (2) five established functional scales for back pain; (3) over 30 ad hoc questionnaires that appeared in the back pain literature over the past 20 years [2,3]; and (4) other published disability scales, questionnaires, and inventories, including the International Classification of ImpairELEMENTARY

TABLE 1. Categories and examples back pain, elicited through interviews

Object handing Lifting heavy objects Lifting patients Lifting a baby Lifting bags/groceries Lifting tools Taking garbage out Shovelling snow Pushing heavy objects Pushing beds Pushing large brooms Vacuuming Moving furniture Movement Bending (in general) Bending rapidly Bending for a long period Picking up things Making beds Pedicure Putting on socks, shoes Cleaning the bathtub Washing floors Reaching to cupboards Getting into/out of bed Getting into/out of a car Making sudden move ments Aerobics Dancing logging Jumping

of activities affected by with 34 back pain patients Ambulation Walking (in general) Walking long distances Walking fast Going up/down stairs Climbing a stepladder Posture Standing (in general) Standing for long periods Sitting Sleeping Lying on the back

Complex activities Working (in general) Housekeeping Shopping Cooking Repairs around house Car repairs Gardening Playing with children Travelling Using public transportation Taking bath, shower Dressing Babysitting Camping Various sports Sexual activities

ments Disabilities and Handicaps published by the WHO [16]. Several additional activities were considered on clinical and biological grounds. After eliminating items that referred to complex tasks, situationrelated activities, or activities that were unlikely to be affected by back pain, we were left with the following list of 25 activities (in alphabetical order): bending, carrying objects, climbing stairs, grasping objects, holding objects, jumping, kneeling, lifting objects, lying, pulling, pushing, reaching, running, performing sexual activities, sitting, sleeping, squatting, standing, stooping, stretching, throwing objects, transferring, turning, twisting, and walking. Note that by focusing on elementary activities we achieved two objectives: elimination of redundant tasks and extensive coverage of important areas of activity. At this stage in content development, some overlapping of the list was deemed acceptable.

Item Construction In the next step we developed a pool of 48 items designed to assess limitations in elementary activities through self-report. For most of the activities we constructed at least two items, pertaining to different levels of performance. We specified the level of performance by using common, understandable units of distance, weight, time, etc., or by referring to very simple, daily tasks that are usually performed in a standard way (e.g., putting on socks to assess bending, carrying groceries to assess carrying.) Complex tasks, such as shopping, housework, or travelling, were avoided because many simple activities are neces-

153

Quebec Back Pain Disability Scale sary to perform those tasks, and because the required level of performance may depend on external circumstances and vary across individuals. We also tried to avoid tasks that are performed only occasionally, for example shovelling snow, or tasks whose performance may depend on age, gender, socioeconomic status, occupation, lifestyle, or cultural background of the respondent (e.g., cooking, gardening, most workrelated and sport activities). Scaling of Responses There has been some controversy in the literature as to whether the “true” level of disability is better assessedby measuring actual performance of daily activities or reported (potential) ability to perform them. Some authors believe that capacity-oriented questions tend to exaggerate the healthiness of the respondent, whereas performanceoriented questions provide a more realistic assessment of the actual disability [23]. Others [18] have argued that both types of measurement are important and one cannot be substituted for the other. In practical terms, the distinction has been blurred, as some scales use such constructs as difficulty or trouble with performance, need for human/ mechanical assistance, pain, or satisfaction with performance. We operationalized disability in terms of difficulty experienced while performing simple tasks. Published data indicate that most patients with back pain are able to function independently, but often experience difficulty or trouble performing daily activities [24]. We observed a similar pattern in our preliminary patient survey. We also found that the back pain experts considered the level of difficulty to be more useful in assessing disability than other common criteria such as pain, maximum performance, satisfaction, or independence. Furthermore, difficulty scales have been employed to measure functional status of patients with rheumatic diseases and other conditions [25,26]. Some authors have used the difficulty concept in assessingback pain patients [27-291, although these measures require further study. Finally, the Activity Space Model [21] provides theoretical reasons for operationalizing disability in terms of difficulty. Conceptually, the amount of difficulty experienced while performing a given task is a continuous variable, ranging from zero (no difficulty) to a maximum level beyond which the person is unable to perform the task. To measure the level of difficulty, we used a numerical, 1l-point scale, ranging from 0 to 10 (Fig. 1). A visual analog scale (VAS) was considered unacceptable because it would not be suitable for future applications in telephone interviews, and because it may be too complicated for self-administration [30]. Note that a numerical scale offers a more continuous measurement than a typical Likert scale, but empirical comparisons between these two types of scales are lacking.

some of the items were modified, the wording of instructions was improved, and minor changes in the format of the scale were introduced. As some patients found it difficult to select a number between 0 and 10, the middle point, corresponding to number 5, was labelled “moderately difficult.” The English version of the questionnaire was translated to French and then back-translated to English. Discrepancies between the back-translated version and the original English version were discussed with the translators before the final wording was adopted. All subjects admitted to the study were seeking professional help for back pain. They had to be over 18 years old and able to communicate in either English or French. Patients suffering from neck pain only, as well as those with serious functional impairments due to other conditions, were excluded. No special effort was made to classify the candidates according to the origin of pain, (e.g., “mechanical” versus “inflammatory”). The subjects were recruited through a network of treatment centers in Montreal, Quebec. Most of the subjects came from private and hospital-based physiotherapy clinics (n = 153), a physiatry center (n = 54), and a family group practice (n = 30). Other sources included an orthopedic clinic, a pain clinic, and a rheumatology clinic. None of the patients was hospitalized at the time of recruitment. Between September 1990 and May 1991 the questionnaire was completed by 242 individuals*under the supervision of a trained research assistant (time 1). The research assistant was also responsible for monitoring the time required to complete each section of the questionnaire, and for evaluating the respondent’s ability to understand the questions. After completing the questionnaire, the respondent was asked a few additional questions concerning the relevance and clarity of the items. He/she was then given another copy of the questionnaire, and instructed to complete it at home within 2-4 days and mail it back in a prestamped envelope (time 2). If necessary, follow-up phone calls were made after 1 week, and once a week thereafter until the questionnaire was mailed or the respondent refused to cooperate. A total of 212 individuals returned the second questionnaire (88% response rate) and 75% of them did so within 1 week. The median time interval between the completion of the first and second questionnaire was 4.2 days. A third, follow-up questionnaire (time 3) was mailed after 6 months to subjects who completed the first assessment in October through November 1990, after 4 months to subjects assessed in December 1990 through January 1991, and after 2 months to those assessed in February through May 1991. Weekly follow-up phone calls were made to ensure adequate response rate. The third questionnaire was completed by 178 individuals (74% response rate).

Analysis Data Collection The 48 difficulty items were part of a larger questionnaire that included the low back pain form of Deyo et al. [3 11, three established functional measures, i.e., the Roland scale [5], the Oswestry questionnaire [6], and the physical function scale of the SF-36 Health Survey (SF-36ph) [32,33], as well as basic socio-demographic information. The questionnaire was pilot tested on a sample of 23 back pain patients. As a result,

Moderatelydiffkult

Not difficult at all 0

We used Pearson correlation coefficients to compare test-retest reliabilities of the items. For a sample of 10 items we also calculated the intraclass correlation coefficient [34], but the difference between the two coefficients was never greater than 0.03. The analysis was restricted to a group of 98 individuals who completed the second questionnaire within 2 weeks of the initial assessment and who said their back pain stayed “about the same” between time 1 and time 2.

1

2 FIGURE.

3

4

1. Difficulty

5

6

Extremely difficult 7

8

scale used in the study.

9

10

Kopec et al.

154 We carried out two types of tests to assess the responsiveness of each item. First, we assumed that the mean level of disability for the entire sample (n = 178) would improve between time 1 and time 3, and used a paired t-test to measure the ability of each item to detect this change [35]. Second, we used a slightly modified version of the scale published by Jaeschke et al. [36] as an indicator of true change. The subjects were asked whether their overall ability to perform daily activities improved between time 1 and time 3, remained exactly the same, or declined. Since the validity of this type of question may decrease as the time interval between the two assessmentsgets longer, we restricted this analysis to the group of subjects who returned the follow-up questionnaire within 80 days. We compared subjects who reported an improvement (n = 37) with those who said they had deteriorated (n = 19). For each item we calculated the difference in the mean score between time 3 and time 1, and used a two-sample t-test on those differences as a measure of responsiveness (this corresponds to an F-test for time by group interaction in a repeated measures analysis of variance) 1341. Homogeneity of the items was investigated by examining the matrix of interitem correlations and calculating item-total correlations (with the item of interest eliminated from the calculation of total score) for all items, both at time 1 and time 2. To identify the empirical categories of disability in back pain, we factor analyzed 46 of the 48 difficulty items, using data from the first, supervised questionnaire (n = 242). Two items with a large number of missing values (sexual and sport activities) were deleted. Factor loadings for each item were examined after both orthogonal (varimax, SPSS) and oblique (oblimin) rotation. The results were similar, but oblique rotation seemed more appropriate conceptually, as the factors were likely to correlate. ITEM RESPONSEFUNCTIONS. We applied statistical methods based on

item response theory (IRT) [37] to evaluate the discriminating ability of each item at various disability levels. The model assumes that the probability that a respondent will choose option m for item i is a function of the respondent’s disability 8, i.e., Pi,(#) = Prob(u, = ml e), where u, denotes the score on item i. The relationship between F’,,(e) and @ is described by the option response function (option characteristic curve, CCC). To estimate and plot option characteristic curves for all difficulty items we used nonparametric methods of statistical modelling [38, 391 that avoid restrictive parametric assumptions about the shape of the OCCs. In this type of analysis, interpretation of the response function for each item is based on visual inspection of the estimated OCCs; i.e., it has a subjective, qualitative component. The ability of a given item to discriminate between different levels of disability is assessed by looking at the shape and position of each OCC and their relations to each other. In particular, the slope of the curve over a given range of disability levels provides a measure of the option’s discriminability within this range. Additional insight can be gained by looking at the estimated expected score function. (The expected score is a weighted mean of scores assigned to each option, with weights proportional to the probability of selecting the respective options at each disability level.) To obtain statistically stable and easily interpretable estimates of the option characteristic curves, the 1l-point difficulty scale was collapsed to a 5-point scale by combining options 1 + 2 + 3 (coded as 2), 4 + 5 f 6 (coded as 3), and 7 f 8 + 9 (coded as 4). The two extreme options, 0 and 10, were left as in the original scale and coded 1 and 5, respectively. As the best available estimate of global disability for each subject, we used the average of the mean scores on four measures: the Roland, Oswestry, and SF-36ph scales, plus our 48item

difficulty scale. To calculate this average, the mean scores were transformed to percentages of the maximum possible score (o-100). For the purpose of OCC estimation, disability scores were converted to standard normal quantiles. RESULTS study Populutitm The so&o-demographic and medical history data pertaining to the 242 study participants at time 1 are provided in Table 2. Fifty-four percent of the subjects completed the questionnaire in English, and 46% in French. Fifty-two percent worked full time and 10% worked part time. Twenty-nine percent were either seeking or receiving financial compensation for back problems. Eighty percent of the study participants had previous episodes of back pain and 44% said that the current episode had lasted for more than 1 year. Sixty-eight percent said the pain radiated to one or both legs and 44% said it radiated below the knee. Twelve percent of the participants had had an operation for back pain prior to the study.

TABLE 2. S~~o~ern~~~~ and medical history data for subjects participating in the study (N = 242) Variable

Levels <20 20-29 30-39 40-49 50-59 60+

Number

Percent

59 63 33 41

0.8 18.2 24.4 26.0 13.6 16.9

Sex

Males Females

120 122

49.6 50.4

Language

English French

130 112

53.7 46.3

Education

Elementary High school Post-secondary University

45 48 50 96

18.6 19.8 20.7 39.7

Employment

Full time Part time Not employed

125 24 88

51.7 9.9 36.4

Financial compensation

Yes No

70 165

28.9 68.2

Duration of current episode of back pain

Less than 1 week 1 to 6 weeks 6 weeks to 3 months 3 months to 1 year Mote than 1 year

8 Ii: 55 106

3.3 16.5 12.8 22.7 43.8

Previous back pain episodes

None previously 1 to 5 episodes More than 5 episodes

49 78 114

20.2 32.2 47.1

Radiation of pain

Back only Radiates to one leg Radiates to both legs

78 98 63

32.2 41.5 26.0

Previous surgery for back pain

No surgery One operation More than one

213 19 10

88.0

For somevariables, the total number of subjectsis lessthan 242 due to missing values.

Quebec Back Pain Disability

The distribution of the sample by age and sex was similar to that in the general population of back pain sufferers in Canada [24]. However, our sample had a relatively high proportion of persons with university education. Two probable reasons were somewhat better cooperation from private than hospital-based clinics, and higher refusal rate among less educated patients across all recruitment centers.

Item Acceptability For most items, the proportion of missing values was slightly lower at time 1 (supervised administration) than at time 2 or 3. Two of the 48 difficulty items were omitted by more than 10% of the subjects on at least one occasion, and 11 items were omitted by more than 3% of the subjects on at least one occasion. The items most frequently omitted were those referring to sexual and sport activities (volleyball/badminton). These items may not be suitable for use in the general population of back pain patients. We asked the respondents which parts of the questionnaire they liked or disliked, how relevant were the activities included in various scales, and how difficult it was to answer the questions. In these comparisons, the difficulty items analyzed in this report were the most “likeable” of all, and were more relevant to the patients than any other items except those in the Roland scale. However, they seemed slightly more difficult to answer than items in the established measures.

Item Reliability Test-retest correlation coefficients ranged from 0.65 to 0.91, thus demonstrating high reliability of the items (Table 3). The two items with the highest test-retest correlations, i.e., sexual and sport activities, also had the highest number of missing responses. Other highly reliable items were those asking about putting on socks, walking, running, and jumping. Items with the lowest test-retest correlations were those referring to holding small objects, squatting/crouching, stooping, and sitting. Reliability of some items seemed to vary between the English and French versions of the scale, but there were no apparent systematic differences in reliability according to language. Given that the reliability of scores obtained from a multiitem instrument with item-specific reliability coefficients between 0.6 and 0.7 may well exceed 0.9 [34], all items were considered sufficiently reliable to be included in the final scale.

Item Responsiveness The results of item responsiveness analysis are presented in Table 3. In the analysis of changes in disability scores between time 1 and time 3, the most responsive were items relating to putting on socks, carrying groceries, riding in a car, lifting 10 lb., vacuuming, and getting out of bed. When we analyzed the ability of each item to distinguish between patients self-rated as “better” versus “worse,” the highest values of the t-test were observed for riding in a car, standing, taking food out of the refrigerator, and carrying groceries.

Internal Consistency

155

Scale

and Factor Analysis

The correlation matrix for the 48 difficulty items at time 1 revealed a relatively high degree of interitem correlation. Correlation coefficients were all positive and ranged from 0.24 (between walking one block and sitting for 4 hours) to 0.87 (between lifting 10 Ibs. and lifting 20 Ibs.). The Cronbach’s alpha coefficient of internal consistency for this set of items was 0.98. Item-total correlations were generally very high, ranging from 0.57

to 0.86. After averaging the results obtained at times 1 and 2, the highest item-total correlations were observed for pulling/pushing doors, turning, carrying groceries, moving a table, and running 20 steps (Table 3). Items with the lowest item-total correlations were those pertaining to sleeping, lying in bed, and holding and grasping small objects. For most items, item-total correlations were similar in the French- and English-speaking respondents. The results of factor analysis are presented in Table 4. Based on eigenvalue > 1.0 as a cutoff point, seven factors were extracted, accounting for 74% of the variance. Most of the variance (53%) was explained by the first factor, and the plot of eigenvalues associated with each factor (the scree test) suggested that this set of items could be considered approximately unidimensional. Factor solutions with a smaller number of factors were tried but failed to provide a factor structure as meaningful as the seven-factor solution presented here. The factors could be referred to as movement, bed/rest, sittinglstanding, ambulation, handling of large/heavy objects, handling of small objects (dexterity), and bending/stooping. Several additional factor analyses were carried out. We repeated the analysis of all 46 items using data on 212 subjects obtained at time 2. Several items loaded on different factors than in the first analysis, but a similar seven-factor structure, was produced. For subjects who completed two questionnaires, we calculated average scores for each item and used these scores in the analysis. Six and seven factors seemed to give the most meaningful classification of items into various functional categories. Next, we removed items that seemed to represent more than one domain (factor loadings above 0.3 on at least two factors) or were not strongly related to any factor (no factor loading above 0.4). In a subsequent factor analysis of 30 remaining items, sixor seven-factor solutions again appeared plausible from a clinical point of view. Thus, it was possible empirically to distinguish seven overlapping areas of activity limitation associated with back pain. The factors seemed to have a biological basis and were unlikely to represent an artifact due to the number or wording of the items. At the same time, the results of the scree test and the relatively high correlations between the items justified combining them into a scale and computing a single overall disability score.

Item Discriminabihy The results of item response analysis presented here were obtained using data from the first, supervised administration of the questionnaire (time I). Examples of item response functions are given in Figs. 2, 3, and 4. (The graphs were plotted using the program TESTGRAF, developed by Dr. James Ramsay, McGill University, Montreal, Canada). On the y-axis is the probability of selecting each option for a given item. The overall level of functioning is shown on the x-axis. Thus for item 27 in Fig. 2 (pulling a heavy door), the most severely disabled people select option 5 (extreme difficulty) with 90% probability. As the level of functioning improves, the respondents are more likely to select option 4 or 3. Then option 2 takes over and, finally, those with only mild disability tend to select option 1 (no difficulty) with increasing probability. A similar pattern can be seen for item 39 (making a bed), except that option 5 is not the most commonly selected one even among the most severely disabled respondents in our sample (Fig. 3). These individuals are probably not disabled enough for this item to show its full discriminating power. The main difference between item 39 and item 2 (sitting for 1 hour) lies in the shape of the option characteristic curves. The difference is most evident for option 3, but can also be

Kopec et al.

156 TABLE

3. Measurement

properties of 48 functional

No.

Item

23 32 47’ 1 28 35 5’ 45’ 44’ 7’ 39’ 29 9 30’ 31 41’ 13 37 .

Hold small objects Grasp small objects Take food out of refrig. Walk 1 block In bed for 1 hr Sexual activities Climb 1 flight Get out of bed Sleep for 6 hr Reach to a shelf Make a bed In bed several hr Kneel down Throw a ball Stretch the back Turn over in bed Twist the body On bus for 10 min Put on socks Walk several blocks In car for 1 hr Move a table Lift 10 lb Squat/crouch Push a heavy door Turn around Sit for 1 hr Stoop for 10 min Stand for 30 min Use a vacuum Pull a heavy door Climb 4-6 flights Jump a few times Bend to the floor Carry groceries Lift 20 lb On bus for 30 min Run up 20 steps Bend over the sink Volleyball/badminton Carry 20 lb Walk several miles Run 2 blocks Stand for 2 hr In car for 4 hr Lift 40 lb Sit for 4 hr Carry 40 lb 1 block

::*

.

::* 18 :: 26 2 22 17’ 40 27’ 6 15 4 43’ 19 38 24 12’ 48 8 16’ 25’ 10 33 20’ 3’ 21

Mean score 1.5 1.6 2.1 2.1 2.6 3.2 3.3 3.5 3.6 3.6 3.7 3.7 3.8 ::: 3.9 3.9 4.0 4.0 4.3 4.4 4.4 4.4 4.4 4.7 4.8 4.8 4.9 5.1 5.1 5.2 5.2 5.4 5.5 5.5 5.7 5.8 5.9 6.0 6.3 6.6 6.6 6.7 6.9 7.0 7.1 7.4 8.0

disability

items in patients with back pain

Test-retest correlation 0.65 0.79 0.82 0.82 0.78 0.91 0.78 0.75 0.82 0.82 0.80 0.80 0.80 0.76 0.77 0.82 0.78 0.80 0.85 0.85 0.75 0.79 0.77 0.67 0.79 0.75 0.74 0.67 0.77 0.81 0.79 0.74 0.84 0.78 0.79 0.77 0.75 0.78 0.74 0.90 0.77 0.82 0.85 0.72 0.79 0.76 0.71 0.82

Average item-total correlation 0.60 0.62 0.73 0.66 0.62 0.66 0.75 0.73 0.59 0.75 0.78 0.61 0.77 0.76 0.77 0.72 0.76 0.78 0.68 0.75 0.70 0.81 0.76 0.75 0.84 0.83 0.67 0.74 0.75 0.72 0.86 0.74 0.79 0.74 0.81 0.78 0.81 0.81 0.72 0.72 0.79 0.71 0.79 0.67 0.68 0.77 0.64 0.74

Responsiveness he-post -0.60 -0.36 -2.18 -2.20 - 2.94 -2.16 -3.30 -5.23 -2.87 -0.22 -4.00 -2.07 - 1.60 -3.46 -4.40 -5.00 -2.63 -4.43 -6.07 -3.21 -5.28 -4.75 - 5.45 - 1.81 -3.81 -3.02 -3.92 -3.35 -4.39 -5.24 -2.93 -0.98 -5.15 -3.25 -5.52 -4.62 -4.43 -2.39 -4.30 - 1.36 -4.00 -2.81 -3.46 -4.16 -5.52 -2.99 -3.30 -3.36

(t-test) Two groups - 1.46 - 1.94 -2.77 -2.49 -0.03 -2.22 - 1.30 - 2.03 - 2.07 - 2.40 - 1.46 -0.51 - 1.47 - 1.39 -0.93 -0.41 - 1.52 - 1.74 -2.58 - 1.59 -3.95 -2.41 - 1.97 -0.54 -1.11 -0.93 - 1.89 -0.75 -2.75 - 1.58 -0.74 -2.54 - 1.40 - 2.06 -2.72 - 2.05 - 2.47 -0.55 -0.72 - 1.01 - 1.22 - 1.73 - 1.45 -3.54 -3.13 - 1.41 -0.61 - 1.04

Test-retest correlations are based on scoresat time 1 and time 2. Average item-total correlation is the mean of the correlations at time 1 and time 2. Responsivenessis measuredby two types of t-tests based on scoresat time 1 and time 3: a paired t-test for pre-post comparisonsin the whole population (T3 -Tl), and a two-sample t-test on score changesfor comparisonsbetween patients self-rated as “better” vs. “worse.” Items selected for the reducedscale are marked with an asterisk. seen for the remaining options. Relatively flat OCCs in item 2 result in a lower ability of this item to discriminate between various levels of disability. In Fig. 3 we also contrast item 47 (taking food out of the refrigerator) with item 1 (walking 1 block), both designed primarily to measure severe disability. Superior performance of item 47 is evident, as the options in this item are selected in accordance with the overall level

of disability, and the steep slopes for the OCCs allow for sharp distinctions between individuals at different disability levels (none of the respondents selected option 5). In Fig. 4 we show the CCCs for two of the most difficult activities: carrying 40 lbs. (item 2 1) and sitting for 4 hours (item 3)) along with the corresponding expected score functions. For item 2 1, most subjects with severe or moderate disability select option 5, so the item is not

157

Quebec Back Pain Disability Scale TABLE

4. Results

of factor

analysis

Item/category

Movement 30’ Throw a ball 15 Jump a few times 26 Turn around quickly 31 Stretch your back 24 Run up 20 steps 25’ Run 2 blocks 13 Twist the body

(after Factor

2

of 46 functional Factor

3

disability Factor

items in patients 4

Factor

Factor loadings above 0.3 are shown.

with

back Factor

0.31 0.42

= 242) Factor

0.32

0.31

0.79 0.77 0.68 0.67 0.58 0.37 0.36

0.34

pain (N 6

0.37 0.33

0.32 0.36 0.31

0.32 0.33

0.73 0.73 0.72 0.67 0.52 0.42 0.41 0.39

0.36

0.92 0.92 0.85 0.85 0.76 0.62 0.45 0.45 0.41 0.31

Handling of small objects 23 Hold small objects 32 Grasp small objects Bending/stooping 36’ Put on socks 39’ Make a bed 12’ Bend over the sink 40 Use a vacuum 4 Bend to the floor 22 Stoop for 10 min 47’ Take food out of refrig.

5

0.89 0.87 0.85 0.61 0.46

Sitting/standing 3’Sitfor4hr 33 In car for 4 hr 2 Sit for 1 hr 42’ In car for 1 hr 10 Stand for 2 hr 17’ Stand for 30 min 38 On bus for 30 min

Handling of large objects 19 Lift 20 lb 20’ Lift 40 lb 18 Lift 10 lb 8 Carry 20 lb 21 Carry 40 lb 1 block 43’ Carry groceries 14 Push a heavy door 27’ Pull a heavy door 46’ Move a table 7’ Reach to a shelf

rotation) Factor

0.64 0.57 0.48 0.46 0.43 0.39 0.38

Bed/rest 29 In bed several hr 28 In bed for 1 hr 44’ Sleep for 6 hr 45’ Get out of bed 41’ Turn over in bed

Ambulation 1 Walk 1 block 34’ Walk several blocks 6 Climb 4-6 flights 5’ Climb 1 flight 16’ Walk several miles 11 Squat/crouch 37 On bus for 10 min 9 Kneel down

oblique 1

0.33

0.79 0.78

0.37 0.39

0.66 0.59 0.56 0.56 0.55 0.50 0.45

7

158

Kopec et al.

Item 27

Itam 21 1.0 7

1.0

!z i

0.8

g x

0.6

0.6 Liihz& 0.6

0.9

5

0.4 0.2

3 1

0.0

1 -27 -1.74.7

63

FUNCTIONAL

i5

0.6

4 a 0

0.4 0.2

13 23 33 STATUS

FUNCTIONAL

STATUS

0.4 5.9

0.2

8 E

4.6 3.9

0.0

fn B g

2.9

gz

1.9

-27 -1.7 a.7 0.3 1.3 23 3.3

E 8 4.9 -..* --. =.\ UJ 3.9 ‘\ B 6 2.9 FI1 --.. :

0.9

0.9

STATUS

lam

nom 30

1.0

g

0.6

0.6

2

0.6

0.6

$

0.4

0.4

%

0.2

0.2

2

0.0

0.0 -27 -1.7-a7

a3

FUNCTIONAL

-27-1.7a7a3

13 23 33 STATUS

FUNCTIONAL

item 47 1.0

fj

0.6 tiiiil ,

E

0.6

o2 g

0.6

2

0.6

0.4

zg

0.4

0.2

g

0.2

0.0

STATUS

Item1

c 3

132333

1.0

1

0.0 -27 -1.747

03

FUNCTIONAL

13 23 33 STATUS

a7

-1.747-03

FUNCTIONAL

0.3 1.3 23 33

FUNCTIONAL

FIGURE. 2. Estimated option characteristic curves for an item assessing the amount of difficulty associated with pulling a heavy door (item 27). The level of difficulty is measured on a 5point scale. Probability of selecting each option is estimated as a function of the underlying functional status (reverse of the overall disability score, transformed to standard normal quantiles). Disability score for each subject has been calculated as the average of standardized mean scores on four measures: Roland, Oswestry, SF-36ph, and 48 difficulty items. The level of disability decreases (functional status improves) as one moves from left to right along the abscissa.

1.0

‘\ . .

1.9

G -27 4l4.7

FUNCTIONAL

mm3

5.9

13 23 33

STATUS

-2r 1.7a.7

03

FUNCTIONAL

13 i2

33

STATUS

FIGURE. 4. Estimated option characteristic curves and expected score functions for two items, measuring the amount of diiculty associated with carrying an object weighing 40 lbs. one block (item 21) and sitting for 4 hr (item 3). See Fig. 2 for details.

very useful in this subgroup (the flat part of the expected score function). However, the item appears to discriminate well among those with relatively mild disability, as these individuals select the remaining options (except option 3) according to their overall disability level. For item 3, the distinctions between the options are somewhat less clear but the expected score function is decreasing over the entire spectrum of disability. Thus item 3 seems more useful than item 21 for assessing severe to moderate disability (albeit not as useful as items 27 or 39), but is less effective among mildly disabled individuals. The analysis indicated that all of the difficulty items were contributing to the assessment of global disability in our sample, although there were differences in item performance at various levels of disability. Most of the items performed very well over a wide range of disability levels. Among the most useful were those asking about making a bed, turning over in bed, carrying groceries, pulling/pushing heavy doors, moving a table, and running a short distance. Less effective were items relating to lying in bed, sleeping, and sitting for 1 hour. A few items, for example, taking food out of the refrigerator or getting out of bed, showed excellent performance at high to moderate disability, but were less useful in discriminating among those reporting only mild disability. Two items, holding and grasping small objects, were clearly too easy for the population studied and their usefulness in the assessment of disability associated with back pain is questionable. Items referring to very difficult activities, such as sitting and standing for a long period of time or lifting and carrying heavy weights, did not generally perform as well as those asking about light to moderate activities. Nonetheless, the results suggest that these items may be useful in assessing mildly disabled individuals.

STATUS

Item FIGURE. 3. Estimated option characteristic curves for four items, assessing the amount of difficulty associated with the following activities: making a bed (item 39), sitting for 1 hr (item 2), taking food out of the refrigerator (item 47), and walking one block (item 1). See Fig. 2 for details.

Selection

The decision as to how many items to include in the final instrument was guided by the empirical results of item analysis as well as practical considerations. A major concern was to ensure that all types of physical activities relevant to back pain were represented. We also wanted the

Quebec Back Pain Disability

Scale

questionnaire to be highly reliable and discriminative over a wide range of disability levels, while at the same time being practical and acceptable to both patients and clinicians. To meet these requirements we selected 20 items (Table 3) representing six empirically derived categories of activity: bed/rest, sitting/ standing, ambulation, movement, bending/stooping, and handling of large/heavy objects. The items were selected primarily on the basis of their discriminating power for the desired range of disability levels, assessed by the OCCs. Other properties, such as sensitivity to change, acceptability, and clinical relevance, were also taken into account. For example, walking several blocks was the best item in the ambulation category. Climbing one flight of stairs was selected next, as it appeared more useful in the assessment of severely disabled respondents than walking one block. Then, walking several miles was added to improve precision for those with only mild disability. To avoid redundancy, if two or more candidate items were correlated above 0.8, only one item was selected. Examples of strongly correlated items include lifting and carrying various weights, kneeling and squatting/crouching, pushing and pulling doors, running different distances, walking one versus several blocks, or standing for various lengths of time. Since the results of factor analysis justified combining the items into a single disability score, items representing more than one factor, e.g., pulling doors or getting out of bed, were retained. We included the items relating to sleeping and sitting, despite their relatively modest discriminating power, as these activities may be important from a clinical point of view. None of the two items in the. dexterity domain was retained in the scale because of their highly skewed scores and relatively poor psychometric properties. We also decided to exclude the item referring to sexual activities because it often was omitted by the respondents and, in terms of the OCC analysis, was not very effective in measuring overall disability. Since we had no evidence that explicit differential weighting might improve the properties of the scale, the selected items were assigned equal weights. The resultant 20-item questionnaire demonstrated high internal consistency (Cronbath’s alpha 0.96).

DISCUSSION In this study disability was defined in terms of activity restrictions as proposed by Wood and accepted by the WHO. Operationalization of this concept in terms of difficulty was dictated by findings from the literature, our own empirical data, as well as theoretical considerations. However, other criteria for measuring restrictions in activity, such as capacity or actual performance, could also be justified. More research will be needed to compare the relative merits of these different measurement strategies in back pain. Our approach to item development was based on current models of disability which assume that limitations in complex tasks can be attributed to problems with elementary physical activities. The scale contains several activities not included in the established back pain questionnaires, such as carrying, pulling or throwing objects, reaching, or running. On the other hand, it does not have general questions about personal care, work, housework, or social life. For practical reasons, we had to exclude the question on sexual activities, even though it may be important conceptually. Poor response to this item has been a problem with other back pain questionnaires [40], but the factors contributing to the low response rate have not been studied in a systematic fashion. The results of factor analysis suggest that the emphasis on elementary physical activities may produce factors that are biologically plausible, and thus lends support to the methods of item construction imple-

159 mented in this study. Nonetheless, there are reasons for caution m the interpretation of these results. A number of items loaded on more than one factor, suggesting that even very simple tasks depend on more than one elementary activity. Also, because the number of subjects was only about 5 times as large as the number of items, the results may be statistically unstable. Activity limitations may correlate with each other for reasons other than a common biological cause, for example, due to similar performance levels (light activities, heavy activities) or the same immediate environment (activities performed in the kitchen, in bed, etc.). Also, a simple activity may be part of a more complex behavior (e.g., putting on socks, dressing, personal care). Jette [41] used factor analysis to classify 39 functional difficulty items in patients with arthritis into the following eight categories: (1) personal care, (2) kitchen chores, (3) physical mobility, (4) work/home repairs, (5) transfers, (6) upper extremity activities, (7) heavy home chores, and (8) light home chores. Factor analyses of items included in some shorter scales for back pain suggested either a single dimension [8], or two dimensions, differently labelled by different authors but essentially referring to “physical” and “psychosocial” functioning [lo, 11,141. Further empirical research may help define clinically relevant categories of disfunction in back pain. A major challenge for the scale developer is to ensure that the scale can discriminate between individuals with slightly different functional levels over the entire spectrum of disability. For example, Ware and Sherboume [32] speculated that the physical functioning scale in the SF-36 questionnaire may not be suitable for severely disabled patients, as it has only one item relating to basic activities such as dressing or bathing. However, it would be difficult to actually test how effectively a given item can discriminate between subjects within narrow ranges of functioning using classical methods of analysis. This task is greatly facilitated by the methods of item response theory (IRT). In IRT models, a regression approach is used to model response probabilities as functions of the underlying level of ability. The slope of the estimated option characteristic curve gives an indication of that option’s discriminating power at various disability levels. A multioption item can be ascertained by analyzing the contribution of each individual option. It is also possible to create banks of items with known characteristics, and to apply the technique known as tailored or adaptive testing [42]. The methods of item analysis based on item response theory have not, to our knowledge, been previously used in the development of health assessment questionnaires, although examples of their usefulness have been provided [43]. It should be noted that common parametric models for dichotomous items [44,45], and their multioption extensions [46,47], impose undesirable restrictions on the shape of the OCCs that depend on the a prim-i selected family of parametric functions (usually logistic). By contrast, the methods of nonparametric modelling that were employed in this study offer great flexibility in depicting the complex relationships that may be expected in responses to multioption health questionnaires. In addition, they are computationally much simpler than the methods used for parametric models. It has been argued [48,49] that the methods of item selection should depend on whether the purpose of the instrument is to discriminate between individuals, assess changes over time, or predict some target outcome or condition. Others have taken a different position, emphasizing the need for multipurpose instruments [50]. It is important to note that our goal was to develop a multipurpose measure of a conceptually defined construct, not a predictive index. Thus, our scale may be less effective in predicting a particular outcome, such as return to work, than would be an index developed specifically for that purpose. Furthermore, in our scale all items are weighted equally. Explicit

160

Kopec et al.

weighting of the items would make scoring more complicated and, in our view, should not be used unless it can be justified conceptually or empirically (e.g., in a predictive measure). According to Nunnally “there is overwhelming evidence that the use of differential weights seldom makes an important difference” [S 11. We used item discriminability rather than responsiveness as the main empirical criterion for item selection. Nonetheless, the items proved highly sensitive to change. Selecting the items based on their responsiveness would be problematic for several reasons. First, there is no agreement on the most appropriate method of studying this property [52-551. Second, sensible criteria for determining what constitutes good versus poor responsiveness are difficult to provide. A major reason is the lack of an acceptable gold standard to ascertain true change. Third, the responsiveness of the items may depend on the initial level of disability. Finally, an intervention may have several effects. In most instances it seems preferable to measure these effects separately rather than combine them into a single index. One must be cautious in generalizing the results obtained in this study to all patients with back pain. Even though we studied the properties of the items over a wide range of disability levels, none of the respondents in our sample was in a hospital at the time of assessment and most had experienced multiple episodes of back pain prior to the study. It seems likely that different items would have been selected had we studied hospitalized patients, or patients with acute back pain. It is conceivable that other characteristics of the respondents, such as receiving financial compensation for back pain or the level of education, also may affect the results of item analysis. Further research will be needed to establish the value of the scale under different conditions and in different populations.

The authors thank Dr. James 0. Ramsay for per-mission to use his statistical program

TESTGRAF, Dr. Matthew Liang for helpfu1 comments on an earlier draft of this manusnipt, and thef&wing persons for their help in patient recruitment: Karin Austin, Francine Bujold, Lynn Coti, Mary-Ann Dal&l, And& Gosselin, V&Tie Hen&y, Marlyn Kaplow, Patricia Webster, and Judy Woodhead This work was supported by a grant from the Institut de recherche en same et en s&riti du travail du Q&bec (IRSST). Dr. J.A. Kopec has been a recipient of a fellowship from the Natiorud Health Research and Deoelopment Program of Canada. Dr. J.M. Esdaile and Dr. M. Abmhamoticz are Senior Research Scholars of the Fowls de IA rechew& en santC du Que’bec.

References 1. Deyo RA. Measuring the functional status of patients with low back pain. Arch Phys Med Rehabil 1988; 69: 1044-1053. 2. Kopec JA, Esdaile JM. Functional disability scales. Proceedings of the Second International Congress on Objective Assessment in Rehabilitation Medicine. Montreal, October 5-6, 1992. 3. Kopec ]A, Esdaile JM. Functional disability scales for back pain. Spine 1995; 20: 1943-1949. 4. Bergner M, Bobbitt RA, Carter WB, Gilson BS. The Sickness Impact Profile: development and final revision of a health status measure. Med Care 1981; 19: 787-805. 5. Roland M, Morris R. A study of the natural history of back pain. Part I: Development of a reliable and sensitive measure of disability in low-back pain. Spine 1983; 8: 141-144. Fairbank JCT, Couper J, Davies JB, O’Brian JP. The Oswestry low bank pain disability questionnaire. Physiotherapy 1980; 66: 271-273.

Million

R, Hall W, Nilsen KH, Baker RD, Jayson MIV. Assessment of

the progress of rhe back pain patient. Spine Waddell G, Main CJ. Assessment of severity 1984; 9: 204-208. Evans JH, Kagan A. The development of measure the treatment outcome of chronic 11: 277-281.

1982; 7: 204-212. in low back disorders.

Spine

a funcrional rating scale to spinal patients. Spine 1986,

10. Tait RC, Pollard AC, Margolis RB, Duckro PN. The Pain Disability Index: psychometric and validity data. Arch Phys Med Rehabil 1987; 68: 438-441.

11. Lawlis GF, Cuencas R, Selby D, McCoy CE. The development of the Dallas Pain Questionnaire. An assessment of the impact of spinal pain on behavior. Spine 1989; 14: 510-516. 12. Greenough CG, Fraser RD. Assessment of outcome in patients with low back pain. Spine 1992; 17: 36-41. 13. Kames LD, Naliboff BD, Heinrich RL, Coscarelli Schag C. The Chronic Illness Problem Inventory: problem oriented psychosocial assessment of patients with chronic illness. Int J Psychiatry Med 1984; 14: 65-75. 14. Millard RW. The Functional Assessment Screening Questionnaire: application for evaluating pain-related disabdity. Arch Phys Med Rehabil 1989; 70: 303-307. 15. Kopec ]A, Esdaile JM, Abrahamowicz M, Abenhaim L, Wood-Dauphinee S, Lamping D, Williams JI. The Quebec Back Pain Disability Scale: measurement properties. Spine 1995; 20: 341-352. 16. International Disabilities and HandiClassification of Impairments, caps-A Manual of Classification Relating to the Consequences of Disease. Geneva, World Health Organization, 1980. techniques. 17. Cannel CF, Miller PV, Oksenberg L. Research on interviewing In: S Lainhard Ed. Sociological Methodology. San Francisco, Jossey-Bass, 1981, pp 398-437. 18. Alexander JL, Furher MJ. Functional assessment in individuals with physical impairments. In: AS Halpem, MJ Furher Eds. Functional Assessment in Rehabilitation. Baltimore, PH Brookes, 1984, pp 45-59. 19. Nagi SZ. Disability concepts revisited: implications for prevention. In: AM Pope, AR Tarlov Eds. Disability in America. Toward a National Agenda for Prevention. Washington, D.C., National Academy Press, 1991. 20. Badley EM. An introduction to the concepts and classifications of the international classification of impairments, disabilities, and handicaps. Disability and Rehabilitation 1993; 15: 161-178. 21. Kopec JA. Concepts of disability: The Activity Space Model. Sot Sci Med 1995; 40: 649-656. 22. Verbrugge LM. Disability. Rheum Dis Clin North Am 1990; 16: 741-761. 23. McDowell I, Newell C. Measuring Health. A Guide to Rating Scales and Questionnaires. New York, Oxford, Oxford University Press, 1987. Survey 24. Statistics Canada. Report of the Canadian Health and Disability 1983-1984. Department of the Secretary of State of Canada, 1986. 25. Jette AM, Davis AR, Cleary PD. The Functional Status Questionnaire: reliability and validity when used in primary care. J Gen Intern Med 1986; 1: 143-149. 26. Fries JF, Spitz P, Kraines RG, Holman HR. Measurement of patient outcome in arthritis. Arthritis Rheum 1980; 3: 137-145. 27. Lankhorst GJ, van der Stadt RJ, Vogelaar TW, van der Korst JK, Prevo AJH. Objectivity and repeatability of measurements in low back pain. Stand J Rehab Med 1982; 14: 21-26. 28. Sandstrem J: Clinical and social factors in rehabilitation of patients with chronic low back pain. Stand J Rehab Med 1986; 18: 35-43. 29. Millard RW. The functional assessment screening questionnaire: application for evaluating pain-related disability. Arch Phys Med Rehabil 1989; 70: 303-307. 30. Guyatt GH, Townsend M, Berman LB, Keller JL. A comparison of Liken and visual analogue scales for measuring change in function. J Chron Dis 1987; 40: 1129-1133. 31. Deyo RA, Cherkin D, Franklin G, Nichols JC. Low Back Pain Form 6.1. TyPE Specification. Quality Quest, 1989. 32. Ware JE, Sherbourne CD. The MOS 36-item Short Form Health Survey (SF-36). I. Conceptual framework and item selection. Med Care 1992; 30: 473-483. 33. McHomey CA, Ware JE Jr, Raczek AE: The MOS 36-item Short-Form Health Survey (SF-36): II. Psychometric and clinical tests of validity in measuring physical and mental health construcw. Med Care 1993; 31: 247-263. Scales. A Practical Guide 34. Streiner DL, Norman GR. Health Measurement fo Their Development and Use. Oxford, Oxford University Press, 1989. 35. Deyo RA, Diehr P, Patrick DL. Reproducibility and responsiveness of health status measures. Statistics and strategies for evaluation. Controlled Clin Trials 1991; 12: 142S-158s. $6. Jaeschke R, Singer J, Guyatt GH. Measurement of health status. Ascertaining the minimal clinically important difference. Controlled Clinical Trials 1989; 10: 407-415.

37. Lord FM. Applications of Item Response Theory to Practical Testing Problems. Hillsdale, NJ, Lawrence Erlbaum Associates, 1980. 38. Ramsay JO. Kernel smoothing approaches to nonparametric item characteristic curve estimation. Psychometrika 1991; 56: 61 l-630. 39. Abrahamowicz M, Ramsay OJ. Multicategorical spline model for item response theory. Psychometrika 1992; 57: 5-27.

Quebec

Back

Pain

Disability

161

Scale

40. Stratford PW, Binkley J, Solomon P, Gill C, Finch E. Assessing change over time in patients with low back pain. Phys Ther 1994; 74: 528-533. 41. Jette AM. Functional capacity evaluation: an empirical approach. Arch

Phys Med Rehabil 1980; 61: 85-89. 42. Lord FM. Tailored testing. In: FM Lord Ed. Applications of Item Response Theory to Practical Testing Problems. Hillsdale, NJ, Lawrence Erlbaum Associates, 1980, pp 150-161. 43. McArthur DL, Cohen MJ, Schandler SL. Rash analysis of functional assessment scales: an example using pain behaviors. Arch Phys Med Rehabil

1991; 72: 296-304. 44. Rash G. Probabilistic

Models for Some Intelligence and Attainment Tests. Copenhagen, Nielson and Lydiche, 1960. 45. Bimbaum A. Some latent trait models and their use in inferring an examinee’s ability. In: FM Lord, MR Novick Eds. Statistical Theories of Mental Test Scores. Reading, MA, Addison-Wesley, 1968, pp 397-424. 46. Bock RD. Estimating item parameters and latent ability when the responses are scored in two or more nominal categories. Psychometrika 1972; 37: 29-51. 47. Thissen D, Steinberg L. A response model for multiple choice items. Psy-

chometrika 1984; 49: 501-519.

48. Kirshner B, Guyatt GH. A methodological framework for assessing health indices. J Chronic Dis 1985; 38: 27-36. 49. Guyatt OH, Kirshner B, Jaeschke R. Measuring health status: what are the necessary measurement properties? J Clin Epidemiol 1992; 45: 1341-1345. 50. Williams ]I, Naylor CD. How should health status be assessed? Cautionary notes on procrustean frameworks. J Clin Epidemiol 1992; 45: 1347-135 1. 51. Nunnally JC. Psychometric Theory. New York, New York, McGraw-Hill, 1978. 52. Deyo PA, Centor MN. Assessing the responsiveness of functional scales to clinical change: an analogy to diagnostic test performance. J Chronic

Dis 1986; 39: 897-906. 53. Guyatt G, Walter S, Norman G. Measuring change over time: assessing the usefulness of evaluative instruments. J Chronic Dis 1987; 40: 171-178. 54. Kazis LE, Anderson JJ, Meenan RF. Effect sizes for interpreting changes in health status. Med Care 1989; 27: S178-S189. 55. Norman GR. Issues in the use of change scores in randomized trials. J Clin Epidemiol 1989; 42: 1097-1105.