Validity and reproducibility of the Spanish version of the sickness impact profile

Validity and reproducibility of the Spanish version of the sickness impact profile

J Clin Epidemiol Vol. 49, No. 3, pp. 359-365, CopyrIght 0 1996 Elsevier Science Inc. 08954356/96/$15.00 SSDI 0895s4356(95)00038-6 1996 ELSEVIER Va...

724KB Sizes 0 Downloads 20 Views

J Clin Epidemiol Vol. 49, No. 3, pp. 359-365, CopyrIght 0 1996 Elsevier Science Inc.

08954356/96/$15.00 SSDI 0895s4356(95)00038-6

1996

ELSEVIER

Validity and Reproducibility of the Spanish Version of the Sickness Impact Profile ‘DEPARTMENTS INSTITUT

MUNICIPAL

BARCELONA,

SPAIN,

D’INVESTIGAC16 AND

‘INSTITUT

OF EPIDEMIOLOGY MkDICA

AND

(IMIM),

DE SALUT

PUBLIC

UNIVERSITAT

PUBLICA

HEALTH, AUTbNOMA

DE CATALUNYA,

DE BARCELONA,

BARCELONA,

SPAIN

ABSTRACT. The Perfil de las Consecuencias de la Enfennedad (PCE), the Spanish version of the Sickness Impact Profile (SIP), was administered to 352 individuals who were grouped into 4 subsamples according to type and severity of illness. Differences among scores in the subsamples were used for assessing the discriminating ability of the PCE and correlation of PCE scores with theoretically comparable measures for convergent validity. Test-retest reliability was studied in a subgroup of 129 patients. The PCE scores self-assessment correlated well with self-perceived overall health (0.53), se If-assessment of sickness (OSl), of dysfunction (0.63). the Index of Restricted Activity (0.54), and the Index of Activities of Daily Living (0.45). A poor correlation with clinicians’ assessment of dysfunction (0.29) and speech therapists’ assessment of speech pathology (0.23) was found. Reproducibility across illnesses (0.95-0.98), types of adminis, and interviewers (0.93-0.99) was very high. The PCE is equivalent to the original tration (0.96-0.98) instrument in terms of validity and reliability, which aIlows its use in international studies. J CLIN EPIDEMIOL 49;3:359-365, 1996. KEY WORDS. Health-related aualitv of life. reliability, validity, Sickness Impac; Prohle

health

INTRODUCTION Given the expansion of cross-national research on quality of life, there is an increasing need for availability of comparable instruments [1,2]. However, the process of obtaining a cross-culturally equivalent version of a measure developed in a specific culture includes not only conceptual comparability, standardized translation, and resealing of items, but also evidence of reliability and validity of the version in the target culture [3]. The adaptation of an existing measure, although it may be less costly than developing a new instrument, requires an important investment in time and expertise. The cross-cultural adaptation of the Spanish version of the Sickness Impact Profile (SIP) started in 1988. At the beginning of the process, researchers contacted M. Bergner, one of the original SIP investigators, in order to discuss the concepts and content of the questionnaire [4] and to receive authorization for this purpose. The subsequent translation procedure has been described elsewhere [S]. Scaling of the questionnaire and comparison of the item weights with the original weights in the English version [6] were then carried out. To preserve comparability and interpretability of results, all studies were performed using the methods developed by the original SIP researchers. Validity and reliability, together with responsiveness, are-major psychometric characteristics of a health-related quality of life instrument [7]. Because of their importance, there are at least three main reasons to reassess validity and reliability of an adapted questionnaire: first, to check if the construct of the questionnaire is relevant for the culture to which it has been adapted (empirical checking of conceptual/functional equivalence of the construct in both cultures); second, to dem‘Address for correspondence: Drs. Xavier Badia and Jordi Alonso, Department of Epidemiology and Public Health, lnstitut Municipal d’Investigaci6 Medica (IMIM), Carrier del Doctor Aiguader 80, E-08003 Barcelona, Spain. (Received in revised form 13 February 1995.)

status

measurement,

cross-cultural

comparisons,

onstrate that the new questionnaire is well.calibrated as a measure of health-related quality of life and has psychometric properties similar to those of the original questionnaire; and third, to allow combination of data across nations in international studies. Unfortunately, these characteristics have been evaluated in only a few adaptations of healthrelated questionnaires [8,9]. It is also essential to assess how the adapted measure performs in the target culture in order to be able to use the questionnaire for research and clinical applications. A feasible and useful method with which to verify the degree of validity and reliability of the new version is to replicate preexisting studies performed with the original instrument [2,10]. If the cross-cultural adaptation process has been carried out successfully, that is, properties of the instrument have not varied, it is expected that similar results will be obtained. To complete the adaptation process, the validity and reproducibility of the Spanish version of the SIP (El PerjiI de las Consecuencias de la Enfermedad [PCE]) were assessed by replicating work done during the development of the original questionnaire [ 11,121. METHODS The PCE As the original English version, the PCE contained 136 items divided into 12 categories.’ Three categories (ambulation, mobility, and body care and movement) constituted the physical dimension; four categories (social interaction, alertness behavior, emotional behavior, and communication) constituted the psychosocial dimension; and the remaining five categories (sleep and rest, eating, work, home management, and recreation and pastimes) were considered to be indepen-

‘A manual of the Spanish version of the SIP is available from the authors on request.

360

X. Badia and J. Alonso

dent. Responders were asked to check only those items that described them on the day of interview and were due to their health. Each item of the PCE was assigned a weight, which was derived using the method of equal-appearing intervals [6], which produced an interval scale [ 131. The final score of each PCE category was calculated by adding the weight of each item, dividing this value by the maximum possible dysfunction score for that category, and multiplying by 100. Dimension scores were calculated by adding the scores of the categories present in the dimension, dividing them by the corresponding maximum dysfunction score, and then multiplying by 100. Similarly, the overall PCE score was calculated by adding the scale values of all items checked in the questionnaire, then dividing by the maximum possible dysfunction score for the PCE, and multiplying by 100. High PCE scores (close to 100) indicated a high level of dysfunction; on the other hand, low PCE scores (close to 0) indicated a low level of dysfunction.

Sample ad

Setting

The PCE was administered to 352 individuals who had been selected so as to include not only patients with different types and severities of dysfunction, but also apparently healthy subjects. The study population consisted of a systematic sample of 232 patients and a random sample of 120 health care consumers participating in a health insurante plan scheme [S]. Patients were consecutively recruited in different clinical settings by two rehabilitation therapists, one speech therapist, and three primary care physicians. Seventy-five patients with quadriplegia or paraplegia were attending the rehabilitation service of a teaching hospital for disabled individuals, 25 patients with speech disorders (most of them had been operated on for laryngeal cancer) were attending an outpateint clinic of a teaching hospital, and 132 patients with chronic illnesses were attending a primary health care center. All patients fulfilled the following criteria: (1) they were visited by a clinician within 1 week of selection, and (2) the clinician considered the patient to be physically and mentally able to complete or answer the questionnaire. These patients were selected to test the hypothesis that the PCE could discriminate among different groups, that is, that patients attending a rehabilitation service would score higher than patients with chronic illnesses and speech disorders because quadriplegia or paraplegia involves greater dysfunction. Moreover, these two latter groups of patients would score higher than consumers. Administration of the PCE took an average of 29.5 (2 10.8) minutes. The questionnaire was either administered by an interviewer or self-completed after receiving instructions from trained interviewers.

Assessment

of Validity

As in the original SIP study [ll], three types of construct validity measures were used: self-supported measures of health status, clinician’s assessment of the subject’s health status, and alternative functional assessment instruments. It was anticipated that moderate to high correlations between PCE scores and self-reported measures of health status, and moderate to low correlations between PCE scores and clinician’s assessment of the sublect’s health status, would be obtained [l 11. Three measures of self-reported health status were used as follows: 1. Self-perceived overall health, that is, patients were asked the following question: In general, how would you rate your overall health! The individuals were then asked to rate their overall health on a five-point scale (1, very good; 2, good; 3, fair; 4, poor; 5, very poor). 2. Self-assessment of sickness, that is, patients were read the follow-

ing statement: Think of your health today and consider any sickness or injury you might have. Check the term that best describes you. They were then asked to rate their level of sickness on a seven-point scale (1, not sick at all; 2, very slightly sick; 3, slightly sick; 4, moderately sick; 5, quite sick; 6, very sick; 7, extremely sick). 3. Self-assessment of dysfunction, that is, patients were read the following statement: People’s states of health sometimes affect the way they function, in other words, the way they carry out their life activities. They don’t do things in the usual way: they cut some things out, they do some in different ways. Now, we would like to know how your functioning is affected by your health today. Check the term that best describes you. They were then asked to rate their level of dysfunction on a sevenpoint scale (1, not at all; 2, very slightly; 3, slightly; 4, moderately; 5, quite; 6, very; 7, extremely). For the clinician’s assessment of the health status of the subject, a measure of severity of speech and communication disability and a _ measure of overall dysfunction were used. Instructions for the clinician were as follows: The speech impairment sometimes affects the patient’s way of functioning, in other words, [. .I. Now, we would like you to assess the dysfunction due to speech impairment in the overall health of the patient during the past week. The clinician checked a point on a categorical 1 (not at all)-to-15 (extremely dysfunctional) dysfunctional scale to rate each patient. Instructions for general practitioners were similar except for the last paragraph, which read: Now, we would like to know how your functioning was affected by his/her health during the past week. Similarly, physicians rated patient dysfunction on a l-to-15 categorical scale. Two rehabilitation therapists were trained to complete the Index of Activities of Daily Living (ADL) [14]. The Index ADL scale included items related to transferring, going to the toilet, dressing, bathing, feeding, and continence. Training constituted two l-hour sessions and one to three practice ratings of the same patient by two therapist observers. This was followed by a group discussion of ratings. Therapists rated 10 patients and the agreement was high (K = 0.85). The Index of ADL was completed on the same day that patients filled out the questionnaire. The Index of Restricted Activity (IRA) [15], which is currently being used in health interview surveys in Spain [16], included questions about number of days in bed, lost from work, and with activity limitations during the 2 weeks before the PCE administration. As in the original validation study of the SIP [17], the overall number of days was derived from an index (range, 0 to 14) as follows: number of days in bed plus (number of days not working minus number of days not working and in bed) plus number of days without other activity restrictions.

Assessment

of Reproducibility

The reproducibility of the PCE was assessed in 129 (56%) of the patients (total, 232). This subsample included 20 patients with quadriplegia or paraplegia, 21 patients with speech disorders, and 88 outpatients with chronic illnesses. The retest was carried out within a mean time of 24 hours after the initial PCE administration. Four interviewers and two administrative procedures (self-administration, interviewer administration) were used. An equal number of subjects (onethird of patients) and the same methods of administration were assigned to each interviewer. When the PCE was self-administered, the interviewer read the instructions, gave the questionnaire to the patient, and made arrangements to pick up the form after completion on the next day. When the questionnaire was administered by an interviewer, he/she read the PCE to the patient and recorded the

Validity

and Reproducibility

361

of Spanish SIP

item in both administrations was counted as an agreement and the opposite, as a disagreement. An overall PA coefficient and category PA coefficients were calculated. It was anticipated that if the Spanish version was equivalent to the original questionnaire, correlations with validity measures and testretest reliability of categories would be similar to those reported in the original SIP studies [ 11,121.

patient’s responses. The questionnaire was administered twice, resulting in the combination of four possible administrative procedures, that is, self-administration/self-administration; interviewer administration/interviewer administration; self-administration/interviewer administration; and interviewer administration/self-administration.

Statistical

Analysis

The capacity of the PCE to differentiate among subsamples was analyzed using analysis of variance (ANOVA). The relationships among construct validity measures, the overall PCE score, and the PCE category scores were analyzed by means of a correlation matrix [18]. As some of the validity measures were ordinal and others numerical, calculations are presented using Spearman or Pearson coefficients as appropriate. Test-retest reproducibility was assessed by calculating Pearson product-moment and intraclass correlation [19] coefficients between scores on the two administrations. The overall agreement expressed as percent agreement (PA) was calculated on the basis of the number of times an item was equally endorsed in the two administrations [20], according to the formula: PA = number of agreements/number of agreements + number of disagreements), in which coincidence of an

TABLE 1. Sociodemographic of illness

and health

status characteristics

RESULTS Table 1 shows sociodemographic and health status characteristics of the total sample participating in the study. Rehabilitation patients were younger, had less education, had the lowest self-perceived overall health, and the highest level of sickness and dysfunction. On the other hand, consumers showed the highest mean level of education and (as expected) reported better health status.

Assessment

of Validity

Figure 1 shows the scores by categories, dimensions, and overall PCE score in the different subsamples. The four subsamples were different

of patients Speech impairment

in the sample

participating

Chronic

Characteristic

(n = 75)

(II = 25)

outpatients (n = 132)

Gender (% male) Mean age (years) [SD] Age group distribution (%) 18-34 years 35-54 years 55-74 years 74+ years Percentage employed Education level distribution (%) University Secondary Primary None complete Self-perceived overall health” Very good Good Fair Poor Very poor Self-assessment of sickness” Not sick Slightly Moderately Quite-very Extremely Self/assessment of dysfunction’ fJot at all Slightly Moderately Quite-very Extremely Mean days of restricted activity in the past 2 weeks (SD)

74.7 42.1 [16.0]

92.0 62.6 [12.7] 0.0

Rehabilitation

in the validity

Consumers

study,

by type Total sample

(n = 120)

(n = 352)

36.4 56.8 [16. l]

50.0 54.0 [14.8]

53.1 53.1 [16.6]

7.6 40.2 34.8 17.4 28.7

6.7 35.8 57.5

12.8 38.6 39.5

4.0 33.3

32.0 44.0 24.0 44.0

99.9

7:::

8.0 44.0 38.7 9.3

12.0 28.0 52.0 8.0

18.3 41.4 25.9 14.4

37.5 39.1 23.4

22.2

4.0 34.7 36.0 24.1

16.0 28.0 44.0 8.0 4.0

0.8 20.5 52.3 11.4 0.8

12.5 64.2 20.8 1.7 0.8

6.5 38.9 37.5 10.5

25.3 20.0 24.0 25.3 5.3

24.0 32.0 20.0 24.0

15.2 21.8 30.3 8.3

56.7 31.7 7.5 4.1

32.3 29.3 20.5

0.0

0.0

0.0

11.6 1.1

2.7 22.7 30.8 36.1 8.0

16.0 24.0 16.0 44.0 0.0

10.6 28.1 30.3 16.7 0.0

49.2 40.0 7.5 3.3 0.0

22.4 30.7 21.6 18.2 1.7

9.1 (6.5)

1.2 (2.4)

3.1 (5.2)

0.5 (2.1)

3.3 (5.6)

36.0 42.7 17.3

1.3

“Totals may nt)t add up to 100 because of 19 chrome patients with missing data.

0.0

0.0

40.3

29.5 8.0

1.1

X. Badia and J. Alonso

362 mean per cent scores (higher scores mean high level of dysfunction)

100

FIGURE 1. PCE dimension and category scores by known groups. ANOVA for PCE percent scores. Adjusted F for multiple comparisons ( p < 0.01 ), except for comparisons between groups 2 and 3 (p > 0.05). (m) Group 1 (rehabilitation); ( +) group 2 (chronic conditions); (*) group 3 (speech pathology); and (m) group 4 (nonpatients).

did scores of speech pathology and rehabilitation patients (Table 2). In addition, lower correlations were obtained with the number of days of restricted activity and the clinicians’ assessment of dysfunction for patients with speech impairment and patients with chronic illnesses. The three PCE categories that correlated significantly (t, < 0.01) with clinicians’ self-assessment of dysfunction (range of correlation coefficients between 0.07 and 0.25) included ambulation (0.25), recreation and pastimes (0.23), and communication (0.23). The Index of ADL was positively correlated with the overall PCE score in the rehabilitation subsample, and also with category scores (range - 0.28 to 0.55). Correlations among the Index of ADL scores were moderate to high (p < 0.001) with body care and movement (0.55) and eating (0.45) (Table 3).

with regard to PCE scores. Rehabilitation patients scored higher than patients with speech disorders and patients with chronic diseases. Consumers scored the lowest. Differences in PCE scores were statistically significant (p < 0.01) among healthy versus ill subgroups and among illness subsamples, with the exception of patients with speech disorders and patients with chronic conditions. In general, self-perceived overall health, self-assessment of sickness, and self-assessment of dysfunction were highly correlated with the overall PCE score (Table 2), although some differences across subsamples were found. The highest correlation with the PCE scores was obtained with patients’ self-assessment of dysfunction for the total sample. Overall PCE scores of patients with chronic conditions and consumers correlated higher with self-assessment of dysfunction than

TABLE

Criterion

2. Correlation

measure

Self-perceived overall health’ Self-assessment of sickness Self-assessment of dysfunction Index of restricted activity Index of Activities of Daily Living’ Clinician’s assessment of dysfunction Clinician’s assessment of speech pathology

coefficient9

between

construct

validity

measures

and overall

PCE score,

by illness

subsample

Chronic outpatients (N= 113)

Consumers (N= 120)

0.59d

0.37d

0.41”

0.53d (-y

0.6od

0.4od

0.43d

0.51” (0.54)b

0.35’

0.38’

0.46d

0.49”

0.63d (0.52)b

0.31’

0.07

0.32d

0.24’

0.54d (0.61)b

0.45d

-

-

0.29' (0.49)b

Rehabilitation (N= 75)

0.46d

“Pearson product-moment correlation. *Original SIP correlations are included m parentheses. ‘Spearman rank-difference correlation. dp < 0.001. ‘I, < 0.01.

Speech impairment (N= 25)

sample 333)

0.29'

0.23

Total (N=

-

0.23 (0.32)h

Validity

and Reproducibility

363

of Spanish SIP

TABLE 3. Spearman rank-order correlation coefficients among the Index of ADL and categov, dimension, and overall PCE score Category

r,

Overall PCE Sleep and rest Eating Work Home management Recreation and pastimes Physical dimension Ambulation Mobility Body care and movement Psychosocial dimension Social interaction Alertness behavior Emotional behavior Communication

0.45” 0.11 0.45” 0.35b 0.35” 0.22b 0.50” 0.28b 0.17 0.55” 0.27s 0.31s 0.15 0.31b

-0.28

“p < 0.001. bp < 0.01.

Assessment

of Reproducibility

Patients selected for the retest had a lower level of education and reported worse health status than did patients of the initial sample (0 < 0.05), but there were no statistical differences in age and sex. Sixteen (12.4%) of the 129 patients reported minor changes in overall health between test and retest. As shown in Table 4, patients scored higher in work and recreationand-pastimes categories and lower in the eating category. Test-retest correlations were high for the overall PCE (0.96) and for all the categories (range, 0.84 to 0.96). All retest scores were lower than test scores (improvement in health scores of all categories in stable patients, t = 3.52, p < 0.001). For the overall PCE, the intraclass correlation coefficient was 0.95 and the percent agreement (PA) was 0.71, that

TABLE

4. Overall

and category PCE test-retest

Category/dimension Category Sleep and rest Eating Work Home management Recreation and pastimes Ambulation Mobility Body care and movement Social interaction Alertness behavior Emotional behavior Communication Dimension Physical dimension Psychosocial dimension Overall

PCE

is, 71% of similar responses in both administrations. The PA varied between 0.61 and 0.81 according to categories (all coefficients were P < 0.001) (Table 4). Pearson coefficients and percent agreements were similarly high and significant (p < 0.001) among the subsample of illness, type of administration, and interviewer (Table 5). Results remained unchanged when only patients reporting no changes in overall health status between test and retest were analyzed.

DISCUSSION The present study shows that the PCE, the Spanish version of the SIP, has construct validity and high test-retest reliability. Construct validity of this instrument was assessed by correlating PCE scores with theoretically related criterion measures in samples of individuals with different types and severities of conditions. The high levels of testretest reliability were influenced neither by the type of illness (0.96-0.98) nor by the type of administration (0.95-0.98) or the interviewer (0.93-0.99). The present study, together with previous work in which the Spanish version of the SIP demonstrated similar item scale values 6 and an internal structure similar to that of the original instrument (X. Badia, unpublished data), shows that the PCE should be considered comparable to the SIP for all practical purposes. In general, the results of the present study are similar to those obtained with the original SIP questionnaire [ 11,121. Correlations between the SIP and PCE with regard to patient self-assessment of sickness and dysfunction, Index of ADL, and Index of Restricted Activity were similar. On the other hand, correlations between PCE scores and clinician assessments of subject health status (0.23) and dysfunction (0.29) were somewhat lower than those reported for the original SIP (0.32 and 0.49, respectively). These differences were small and could be due either to differences in clinical experience of Spanish and U.S. health professionals used in the study to assess patient dysfunction, or to differences in the samples of patients in the studies. In spite of the fact that only minor differences were found regarding the construct validity of the PCE and that of the original

scores and reproducibility

correlationsa

and percent agreement

(IV = 129)

Mean test score

Mean retest score (24 hr)

17.6

15.8

1.8

7.5 49.3 22.0 24.2

7.5 46.6 22.9 22.5

18.3 12.8 15.4 16.6

17.9 12.0 14.9 15.0

0.0 2.7 0.9 1.7 0.4 0.8 0.5

20.2

18.6

1.6 ::7

0.84 0.90 0.87 0.93 0.91 0.94 0.94 0.96 0.94 0.92

19.8 19.5

16.7 19.0

0.91

(0.34) (0.35) (0.33) (0.35) (0.35) (0.31) (0.33) (0.32) (0.31) (0.35) 0.72 (0.30)

0.5

0.94

0.77 (0.34)

15.6 17.5

15.1 15.8

0.5 1.7

0.96 0.94

0.71 (0.25) 0.70 (0.26)

18.4

17.3

1.1

0.96

0.71 (0.20)

“All coefficients were statistically significant (p < 0.001). bOverall percent agreement.

Difference (test retest)

Pearson correlation

Overall PAb (SD) 0.73 0.81 0.77 0.70 0.67 0.74 0.72 0.71 0.72 0.61

364

X. Badia and J. Alonso

TABLE 5. Test-retest reproducibility correlation coefficientsa and overall percent agreement by illness subsample, type of administration, and interviewer (N = 129)

Subsample Rehabilitation Speech pathology Chronic Administration type s-s S-I I-S I-I Interviewer 1 2 3 4

Pearson correlation

Overall PA (SW

N

0.98 0.95 0.96

0.75 (0.13) 0.77 (0.15) 0.68 (0.21)

20 21 88

0.96 0.98 0.95 0.96

0.69 0.76 0.65 0.72

(0.20) (0.21) (0.18) (0.18)

28 33 28 40

0.99 0.93 0.96 0.95

0.72 0.66 0.73 0.74

(0.20) (0.24) (0.14) (0.17)

25 42 34 28

Abbreviations: PA = percent agreement; S = self-administered; I = interviewer admimstered. “All coefficients were statistically significant (b < 0.001).

SIP, we cannot exclude that remaining cultural differences may still exert an influence on the comparability of results. Cultural issues relating to the adaptation of PCE have been extensively discussed in a previous study by our group [6]. Although original work was replicated as closely as possible, our sample differed from that of the original SIP work in several aspects. We included both inpatients and outpatients in a rehabilitation service for handicapped people instead of only inpatients of a rehabilitation medicine service. Nevertheless, the average levels of dysfunction of both groups of patients were similar (the PCE score was nearly 25%). Because of logistical reasons, we recruited outpatients attending speech therapy instead of inpatients with recent surgery as in the American study. This probably accounted for the similar overall PCE scores between patients with speech impairment and patients with chronic conditions, whereas in the original study the former were closer to rehabilitation patients. It should be noted that in the present study, overall mean PCE scores showed different degrees of relationship with the theoretically comparable validity measures according to the type of illness. For rehabilitation patients (most of them suffering from quadriplegia because of spinal cord injuries), PCE scores were more highly correlated with self-reported dysfunction (0.35) than with self-reported sickness (0.26). The opposite was found in the sample of patients with speech disorders. This is not surprising given the differential impact of the illness in each group. While rehabilitation patients scored higher in the physical dimension (i.e., had more physical dysfunction), and also in work and home management, patients with speech impairment scored high in communication and work, but lower in the remaining categories (see Figure 1). A surprising finding was that 2.7% of patients with quadriplegia or paraplegia and 10% of patients with speech disorders reported no dysfunction, and that 22.7 and 24% of them, respectively, manifested themselves as having only slight dysfunction. Patients with a longstanding condition may have adapted to a lower level of function and changed their expectations. Thus, their perception of illness and dysfunction may have been modified (211. As in the original study, clinician ratings of dysfunction differed

from patient ratings. This suggests that using generic health status measures may be important to provide information not routinely gathered by physicians. This information would be potentially relevant when clinical decision making must be based on the impact of the disease. The reproducibility of the PCE was high in terms of item agreement (percent agreement) and category, dimension, and overall score. The reliability was not affected by the interviewer, the type of illness, or the type of administration. This may be because interviewers had been extensively trained and assessed before data collection using the original SIP manual. It may also be because the retest was conducted within 24 hours, thus reducing the possibility of changes in health status between the two observations and/or enhancing recall of previous responses. All test scores were higher than the retest scores, indicating that systematic differences may exist in spite of the stability of the patients and the stability of the instrument. This has also been reported by other authors [19] and it may be the effect of some kind of reactivity to the questionnaire (in the second administration the responder was already familiar with the instrument), placebo effects, or regression to the mean. We consider the present study to be a final step in the process of cross-cultural adaptation of a health profile into another language and culture. The original purpose of the project was to obtain a Spanish version that would be equivalent to the original SIP questionnaire [22]. Previous studies made by our group have shown that the Spanish version of the SIP had item and scale equivalence [6], and a similar internal structure (X. Badia and J. Alonso, unpublished data). The present study shows that the PCE has a degree of validity and reliability similar to that of the original instrument. All PCE studies have replicated the work for the development of the original instrument and were planned in accordance and under the advice of American SIP researchers. These steps in the research process were needed to achieve comparability between the original and the adapted versions [23]. Because the Spanish version of the SIP yielded results that can be properly compared to data obtained using the original SIP questionnaire, the PCE is ready to be used in international studies dealing with measures of health-related quality of life. We are indebted to the late Professor M. Bergner, Ph.D., for her guidance and support of the Spanish adaptation of the SIP. We thank Professor D. Patrick, Ph.D., M.S.P.H., for his valuable comments on an ea&er version of the manuscript, N. Camurilku fm help in data colkction. and Marta P&do, M.D., and Dave McFarlane, B. SC., for ediring the manuscript

and providing editorial assistance.

1. Hunt SM, Alonso J, Bucquet D, NIero M, Wiklund I, McKenna S. Crossculrural adaptation of health measures. Health Policy 1991; 19: 33-44. 2. Bullinger M, Anderson R, Cella, Aaronson N. Developing and evaluating cross-cultural instruments from minimum requirements to optimal models. Qua1 Life Res 1994; 2: 451-459. 3. Hui CH, Triandis HC. Measurement in cross-cultural psychology. A review and comparison of strategies. J Cross-Cult Psycho1 1985; 16: 131-152. 4. Bergner M, Bobbitt RA, Carter WB, Gdson BS. The Sickness Impact Profile: Development and final revision of a health status measure. Med

Care 1981; 19: 787-805. 5. Badia X, Alonso J. Adaptaci6n de una medlda de la disfunci6n relacionada

con la enfermedad: La versicin EspaAola del Sickness Impact Profile. Med Clin (Bare) 1993; 102: 90-95. 6. Badia X, Alonso J. Re-scaling the Spanish version of rhe Sickness Impact Profile: An opportunity for the assessment of cross-cultural equivalence. J Clin Epidemiol 1995; 48: 949-957. 7. Guyatt OH, Kirschner B, Jaeschke R. Measuring

health

status: What

are

Validity

8.

9.

10. 11.

12.

13. 14.

15.

and Reproducibility

365

of Spanish SIP

the necessary measurement propernes! J Clin Epidemiol 1992; 45: 1341-1345. De Bruin AF, Witte LP, Stevens F, Diederiks PM. Sickness Impact Profile: The state of the art of a generic functional measure. Sot Sci Med 1992; 35: 1003-1014. Hays RD, Anderson R, Revicki D. Psychometric considerations in evaluating health-related quality of hfe measures. Qua1 Life Res 1994; 2: 441-449. Hunt SM. Cross-cultural comparability of quality of life measures. Drug Inf J 1993; 27: 395-400. Bergner M, Bobhitt RA, Pollard WE, Martin DP, Gilson BS. The Sickness Impact Profile: Validation of a health status measure. Med Care 1976; 14: 57-67. Pollard WE, Bohbitt RA, Bergner M, Martin DP, Gilson BS. The Sickness Impact Profile: Rehabdity of a health status measure. Med Care 1976; 14: 146-155. Carter WB, Bohbltt RA, Bergner M, Gilson BS. Validation of an interval scaling: The Sickness Impact Profile. Health Sew Res 1976; 11: 516-528. Katz S, Ford AB, Moskowitz RW, Jackson BA, Jaffe MW. Studies of illness m the aged. The index of ADL: A standardized measure of biological and psychosocral function. JAMA 1963; 185: 914-919. Wilder CS. Dtsability days: United States, Vital and Health Statistics

16. 17.

18. 19. 20.

21. 22.

23.

Series IO-No. 143. DHHS Publ. No. (PHS) 83-1571. Public Health Service. U.S. Government Printing Office, Washington, D.C., July 1983. Alonso 1, Ant6 JM. Enquesta de Salud de Barcelona 1986. Ajuntament de Barcelona, Barcelona, Spain, 1986. Cilson BS, Gilson JS, Bergner M, Bobbitt RA, Kressel S, Pollard WE, Vesselago M. The Sickness Impact Profile. Development of an outcome measure of health care. Am J Public Health 1975; 65: 1304-1310. Campbell DT, Fiske DW. Convergent and discriminant validation by the multitratt-multimethod matrix. Psycho1 Bull 1959; 56: 81-105. Deyo RA, Diehr P, Patrick DL. Reproducibility and responsiveness of health status measures. Control Clin Trials 1991; 12: 142S-158s. Pollard WE, Bobbitt RA, Bergner M. Examination of variable errors of measurement in a survey-based social indicator. Sot Ind Res 1978; 5: 279-301. Sackett DL, Torrance GW. The utility of different health states as perceived by the general public. J Chron Dis 1978; 31: 697-702. Guillemin F, Bombardier C, Beaton D. Cross-cultural adaptation of health-related quality of life measures: Ltterature review and proposed guidelines. J Clin Epidemiol 1993; 46: 1417-1432. Alonso J, Prieto L, Ant6 JM. The Spanish version of the Nottingham Health Profile: A review of adaptation and instrument characteristics. Qua1 Life Res 1994; 3: 385-393.