Journal of Clinical Epidemiology 65 (2012) 1227e1235
The Bengali Short Form-36 was acceptable, reliable, and valid in patients with rheumatoid arthritis Abu H.M. Feroza, Md. Nazrul Islamb,*, Peter Meindert ten Kloosterc, Mahmud Hasanb, Johannes J. Raskerc, Syed A. Haqb b
a Department of Medicine, Shaheed Suhrawardy Medical College, Dhaka, Bangladesh Rheumatology Wing, Department of Medicine, Bangabandhu Sheikh Mujib Medical University, Dhaka, Bangladesh c Department of Psychology, Health & Technology, University of Twente, Enschede, The Netherlands
Accepted 15 May 2012
Abstract Objective: To develop a culturally adapted Bengali version of the Short Form-36 (SF-36) Health Survey and to test its acceptability, reliability, and validity in patients with rheumatoid arthritis (RA). Study Design and Setting: The US English SF-36 was translated into Bengali after established cross-cultural adaptation procedures. The questionnaire was interviewer administered to 125 consecutive outpatients with RA and readministered after 2 weeks to 40 randomly selected patients. Results: Most participants (86.4%) did not have any problem in understanding the Bengali SF-36 and 98.4% of the questionnaires were fully completed. Only the role-physical and role-emotional scales showed substantial floor and ceiling effects. Principal component analysis confirmed that the hypothesized two-factor structure and tests of scaling assumptions were 100% successful for all eight scales expect physical functioning (98.8%) and general health (77.5%). Cronbach’s a was higher than 0.78 and the testeretest reliability was high (r O 0.82) for all scales. Correlations with other disease activity parameters were generally as expected and summary scores were able to discriminate between relevant subgroups. Conclusion: The interviewer-administered Bengali SF-36 appears to be an acceptable, reliable, and valid instrument for measuring health-related quality of life in Bangladeshi patients with RA. The questionnaire should be further evaluated in people from the general population and in patients with different medical conditions. Ó 2012 Elsevier Inc. All rights reserved. Keywords: Health-related quality of life; Psychometrics; Rheumatoid arthritis; Short Form-36 Health Survey; Reliability; Validity
1. Introduction Measurements of health-related quality of life (HRQOL) are increasingly being used in clinical trials and health services research [1,2] and are particularly important for measuring the impact of chronic diseases [3]. Questionnaires are the most commonly used technique for collecting health-related information in clinical studies as these are inexpensive, easy, and simple to apply and may be used to measure a large number of health outcomes. HRQOL measures can be divided into generic and specific measures [4,5]. Generic measures are not specific to any disease or population, and such measures can be used across various diseases. Specific instruments are specific None of the authors have commercial or other associations that might pose a conflict of interest in connection with the work. * Corresponding author. Tel.: þ880-1678112396; fax: þ880-29668647. E-mail address:
[email protected] (Md.N. Islam). 0895-4356/$ - see front matter Ó 2012 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.jclinepi.2012.05.004
to a disease, to a population of patients, to a certain function, or to a problem. Major advantages of generic instruments include their ability to assess a variety of health domains and to compare HRQOL across populations, regardless of underlying conditions. Although specific measures have the potential to be more sensitive in detecting changes over time [4,5], recent studies in rheumatoid arthritis (RA) showed no difference in responsiveness between generic and disease-specific HRQOL instruments [6,7]. Among the several generic measures, the Medical Outcome Study 36-item Short-Form (SF-36) Health Survey developed by Ware and Sherbourne [8] in the United States is the most widely used [9e12]. The SF-36 has been extensively validated in both general and specific populations, including many studies in RA [13e18], and the usefulness of the instrument in estimating disease burden is illustrated in numerous articles describing more than 130 diseases and conditions [19].
1228
A.H.M. Feroz et al. / Journal of Clinical Epidemiology 65 (2012) 1227e1235
What is new? The Short Form-36 (SF-36) was translated and validated for use in Bengali patients with rheumatoid arthritis. The interviewer-administered Bengali SF-36 demonstrated psychometric properties similar to the original US English version and other international translations. The questionnaire should be evaluated and used in people from the general population and in patients with different medical conditions to assess and compare the health status and impact of different diseases in Bangladeshi patients.
With the growing international collaboration in clinical research, the need for cross-culturally applicable instruments for outcome measures has also increased [20]. One approach to meet this need is to translate and culturally adapt measures originally developed in English for use in a different cultural context [21]. The SF-36 has been translated with cultural adaptation in different languages, such as Korean, Turkish, German, Hebrew, Swedish, French, Danish, Norwegian, Japanese, and Italian and tested in more than 40 countries [19]. However, it is not yet available in Bangladesh in Bengali language. The aim of this study was to develop and evaluate a culturally adapted Bengali version of 3F-36-Health Survey for use in Bengali-speaking people. The developed questionnaire was applied in Bengali patients with RA to study its acceptability, reliability, and validity.
with Bengali as their mother tongue). A synthesis version was developed by the translators and other authors. The synthesized version was back translated into English by a university teacher who worked in UK for 6 years and a professional English translator. Both were blind to the original US version and naive to the concept measured. An expert committee that included health professionals (two rheumatologists, two psychiatrists, a biostatistician, a gastroenterology physician, a general medicine physician, and an epidemiologist) and the translators involved in the process reviewed all translations and verified the semantic, idiomatic, experiential, and conceptual equivalence between the source and Bengali version. Consensus was reached on any discrepancy, and a preliminary Bengali version of the questionnaire was developed for field-testing to check face and content validity. The translation was straightforward for most of the items and response choices except for the PF items 3b, 3c, 3h, and 3i, which refer to specific daily physical activities and the MH item 9b. The item 3b (moderate activities such as moving a table, pushing a vacuum cleaner, bowling, or playing golf) and 3c (lifting or carrying groceries) were culturally adapted to include activities commonly practiced by the Bangladeshi people and changed into ‘‘moderate activities such as pushing a medium size bucket full of water, throw a cricket ball, or sweep a floor with a broom’’ and ‘‘Carrying or lifting 8e10 kg weight.’’ The items 3h (walking several blocks) and 3i (walking one block) were modified to ‘‘several kilometers’’ and ‘‘1 km’’ as was done by others [23]. For item 9b (Have you been a very nervous person?), the phrase ‘‘nervous person’’ was translatable in Bengali. However, because the word ‘‘nervous’’ is also used and understood by our people, we kept both the Bengali translation and the word ‘‘nervous’’ as alternatives in this item. This resulted in a prefinal Bengali version of SF-36 that included five modified questions.
2. Materials and methods 2.1. Original SF-36 questionnaire
2.3. Field testing
The original SF-36 questionnaire contains 36 items, which are grouped into eight multi-item scales measuring physical functioning (PF), role-physical (RP), bodily pain (BP), general health (GH) perceptions, vitality (VT), social functioning (SF), role-emotional (RE), and mental health (MH). An additional item reports health transition over the past year [8].
The prefinal Bengali version of the SF-36 Health Survey questionnaire was pretested in a convenience sample of 30 patients with RA enrolled from the outpatient rheumatology clinic of the Bangabandhu Sheikh Mujib Medical University (BSMMU). After informed consent, the questionnaire was interviewer administered to each subject and was probed on what he or she thought regarding what each item meant and the chosen response. The vast majority of patients did not have problems answering the original questions. Also, the five adapted questions were understood by all the respondents. Four items (9a, 9c, 9f, and 9g) were not understood by most respondents and were retranslated into Bengali and back translated into English. Problems with these items were mostly related to the quite literal translation of expression such as ‘‘full of pep’’ and ‘‘down in the dumps.’’ The retranslated and added
2.2. Translation and cultural adaptation of SF-36 questionnaire We followed the proposed guidelines by Beaton et al. [22] for translation and cross-cultural adaptation of HRQOL measures. Two forward translations of the US English SF-36 into Bengali were done by two translators (one rheumatologist and another by a naive translator, both
A.H.M. Feroz et al. / Journal of Clinical Epidemiology 65 (2012) 1227e1235
questions were adapted into the final version after discussion with the expert committee. 2.4. Psychometric evaluation of the Bengali SF-36 2.4.1. Patients and data collection A new sample of 125 patients with RA were recruited for the psychometric study. This sample size was large enough to detect a medium-sized Spearman’s correlation of 0.30 as being significantly different from 0 (P ! 0.05, two sided) with a power of 0.90 (StudySize 2.0; CreoStat HB, Sweden). For the testeretest reliability, 40 patients were randomly selected, which was sufficient to detect at least a large correlation of 0.50 (P ! 0.05, two sided) with a power of 0.94. The study was approved by the Ethics Committee of BSMMU and performed following the declaration of Helsinki principles. As most patients were illiterate, the study was explained verbally to the patients and their families, and informed verbal consent was obtained from them before enrollment. RA outpatients were consecutively recruited between January and December 2005. All participants fulfilled the 1987 American College of Rheumatology (ACR) criteria for RA [24] and were able to understand and cooperate with the study procedure. Patients having a history of coexisting major illness, psychological illness, or who were unwilling to provide verbal informed consent were excluded. The subjects were interviewed by administering the Bengali SF-36. Addional information on age, gender, educational level, and occupation of the respondents was collected. The ACR core set of RA disease activity measures [25] was administered by an experienced rheumatologist. This included a tender joint count, swollen joint count, patient’s assessment of pain on a visual analog scale (VAS), and the patient’s and physician’s global assessments of disease activity on a VAS. Patient’s assessment of physical function was assessed using a validated Bengali version of health assessment questionnaire (HAQ) [26]. Acute phase reactions were measured by the erythrocyte sedimentation rate (Westergren ESR; millimeter/hour). Results from rheumatoid factor (RF) tests were obtained from the patient’s hospital record. A positive RF was defined as a titer of the RoseeWaaler reaction of 64 on at least one occasion. 2.4.2. Testeretest assessment A random sample of 40 patients was interviewed twice using the SF-36 with an interval of 2 weeks. During those 2 weeks, no intervention was given. 2.4.3. Scoring of the SF-36 SF-36 scales were scored using Likert’s method of summated ratings [27], which assumes that the items within a scale can be summated without score standardization or item weighing [19]. Scores for some of the items needed to be recoded so that all the items are scored in the same direction [28]. If less than 50% of the items were missing
1229
in a subscale, a person-specific estimate was inserted by averaging the available item responses and substituting this value for the missing items. If more than 50% of the items were missing, a scale score was not calculated [29]. Raw scale scores were summated and linearly transformed into a 0e100 scale, with higher scores indicating better health. Additionally, the eight scales were aggregated into a physical component summary (PCS) and a mental component summary (MCS), which were computed as z-scores to facilitate comparison with the 1990 US population norms [30]. 2.4.4. Statistical analysis Psychometric evaluations were performed following the approach developed by the International Quality of Life Assessment (IQOLA) project [27], which has been used in many other translations of the SF-36 [17,23,31e35]. Descriptive statistics were used to examine the completeness of the data and to characterize the score distributions, including scale ranges, means, standard deviations, and floor and ceiling effects. Internal construct validity was assessed by principal component analysis with orthogonal rotation of the scales and by examining Spearman’s correlations between the scales. Two dimensions (physical health and MH) have been shown to underlie the structure of the original US version of the SF-36 [28]. Next, Spearman’s correlations between the scales and the dimensions were compared with the hypothesized measurement model of the SF-36 [23]. Finally, Spearman’s correlations between the scales were examined. It was hypothesized that scales that were conceptually related (physical health or MH, respectively) would correlate substantially (O0.40). Too high correlations between scales (O0.70) were considered undesirable because this would question the distinctiveness of the concepts being measured. Scaling assumptions were examined using the item discriminant validity approach, which is based on a comparison of the magnitude of the correlation of an item with its hypothesized scale as compared with other scales [27]. Item discriminant validity is supported when an item correlates significantly higher (i.e., two standard errors [SEs] or greater) with its own scale (corrected for overlap) than with the other scales. Internal consistency was examined by Cronbach’s alpha coefficients and corrected item-scale correlations. Cronbach’s alpha coefficient measures the overall correlation between items within a scale and is considered acceptable O0.70. Item-scale correlation assesses the extent to which an item is related to the remainder items of its scale and should exceed 0.40. Testeretest reliability of each scale was assessed by Spearman’s correlations between the scores from the 40 patients who were interviewed twice. As with Cronbach’s alpha, testeretest reliability coefficients O0.70 were considered adequate for group comparisons [27]. Besides the IQOLA approach for examining the internal validity and reliability, external construct validity was assessed by Spearman’s correlations between scores on the
1230
A.H.M. Feroz et al. / Journal of Clinical Epidemiology 65 (2012) 1227e1235
SF-36 scales and the ACR measures of disease activity [18]. Additionally, the method of known-groups comparisons was used to examine whether the PCS and MCS were able to distinguish between patient groups based on the disease duration and positive or negative RF [18]. It was expected that scores would be significantly better for patients with shorter disease duration and negative RF.
3. Results 3.1. Patient characteristics A total of 125 consecutive Bengali patients with RA agreed to participate in the study. There were 95 (76%) female and 30 (24%) male patients with a mean age of 41.8 6 13.1 (range: 17e70) years. Ninety patients (72%) were RF positive. Illness duration varied between 3 months and 25 years. 3.2. Acceptability of the final Bengali version of SF-36 Missing values for the individual items of the SF-36 were very low, ranging from 0% for most items to 1.6% for one of the PF items (Supplementary Table 1 at www. jclinepi.com) and did not result in any missing values for the scale scores. Seventeen patients (13.6%) did not fully understand and 28 patients (22.4%) had difficulty with answering some of the items. A small number (16, 10, and 11, respectively) of patients commented that the questions on SF, PF, or RP were not relevant to them. The main reason was that they had few social or physical activities. No patients minded answering any of the questions. 3.3. Response distribution All values were observed for each item (Supplementary Table 1 at www.jclinepi.com). Patients showed restrictions on all physical scales (PF, RP, BP, and GH; Table 1). The items within three of the scales primarily measuring MH (SF, RE, and MH) were generally scored at higher levels, although the VT scale departed somewhat from this pattern.
3.4. Distribution of SF-36 scores Mean scales scores ranged from 28.80 (RP) to 68.38 (MH) (Table 1). A full range of scores was observed in all the scales except PF and MH in which there was no scoring at 100. The PF, RP, GH, and VT scales were slightly skewed to the left, whereas the SF, RE, and MH scores were somewhat skewed to the right. The percentage of patients scoring at the lowest level was pronounced in the RP and RE scales (40.8 and 29.6%, respectively). For the RE scale, the percentage scoring at the highest level was also pronounced (42.4%). 3.5. Internal construct validity Principal component analysis identified two underlying factors, one representing the ‘‘physical’’ aspects of health and one representing the ‘‘mental’’ aspects of health, which together explained 69% of the total variance. Correlations between the scales and their components largely confirmed the measurement model of the SF-36 (Table 2). As expected, the physical scales (PF, RP, and BP) had higher correlations with the PCS than with the MCS. The mental scales RE and MH showed the opposite pattern, with low correlations with the PCS and high correlations with the MCS. The VT scale correlated moderately to substantially with both the PCS and MCS. However, contrary to the hypothesized measurement model, the GH scale correlated only weakly with the MCS and the SF scale correlated higher with the PCS than with the MCS. The correlations between the scales (Table 3) ranged from 0.21 (PF and MH) to 0.69 (PF and BP), most well below the preset 0.70 limits for distinctiveness of the concept being measured. Generally, higher correlations were found between the scales within same dimension (physical or MH) and lower between the scales of different dimensions. Finally, PCS and MCS were negligibly correlated (r 5 0.14, P O 0.10), supporting the orthogonal nature of the summary scores.
Table 1. Descriptive statistics of the SF-36 scale scores (n 5 125) Scale Physical functioning Role-physical Bodily pain General heath Vitality Social functioning Role-emotional Mental health Physical component summarye Mental component summarye
# Items
Levelsa
Range
10 4 2 5 4 2 3 5
21 5 10 21 21 9 4 26
0e90 0e100 0e100 0e100 0e100 0e100 0e100 0e96
Abbreviations: SF-36, Short Form-36; SD, standard deviation. a Number of possible raw scores. b 95% confidence interval. c Percentage of respondents with worst possible score. d Percentage of respondents with best possible score. e z-Scores: normative population values are mean 5 0, SD 5 1.
Mean (SD) 32.20 28.80 37.41 41.60 46.24 51.80 55.45 68.38 2.18 1.20
(24.89) (31.43) (22.94) (19.76) (18.99) (30.35) (43.16) (16.00) (1.19) (1.08)
CIb
% Floorc
% Ceilingd
27.79e36.60 23.23e34.36 33.35e41.48 38.10e45.09 42.87e49.60 46.42e57.17 47.82e63.10 65.55e71.21 2.39, 1.97 1.01e1.39
8.0 40.8 7.2 0.8 2.4 3.2 29.6 0.8
0 8.0 0.8 0.8 1.6 12.8 42.4 0
A.H.M. Feroz et al. / Journal of Clinical Epidemiology 65 (2012) 1227e1235
1231
Table 2. Correlations between scales and components expected on the basis of the SF-36 measurement model and actual Spearman’s correlations between scale scores and rotated principal components (n 5 125) Hypothesized correlation Scale
Physical component
Observed correlation Physical component
Mental component
Physical functioning
Mental component
0.76
0.28
Role-physical
0.72
0.33
Bodily pain
0.73
0.23
General health
0.69
0.14*
Vitality
0.65
0.44
Social functioning
0.65
0.49
Role-emotional
0.32
0.76
Mental health
0.12*
0.82
Abbreviation: SF-36, Short Form-36. : Strong association (r 0.70); : moderate-to-substantial association (0.30 ! r ! 0.70); : weak association (r 0.30). P-value ! 0.05 for all correlation coefficients, except for the values marked with an asterisk symbol (P-values ! 0.10; P ! 0.20).
3.6. Tests of scaling assumptions
3.7. Testeretest reliability
Standard deviations of the items within the scales were generally comparable (Table 4). Corrected correlations between the items and their hypothesized scales ranged from 0.36 to 0.76 and were 0.4 or above for all items except for two items from the GH scale. Item-scale correlations within each scale were roughly equal except for the item 3a and 4a. Generally, items were significantly highly correlated with their hypothesized scale (i.e., O2 SE) than with the other scales. Exceptions to this were item 3a from the PF scale and items 1 and 11c from the GH scale. Consequently, the scaling success rate on discriminant validity was 100% for all scales except PF and GH (Table 5). Cronbach’s alpha ranged from 0.78 (GH) to 0.94 (PF) and exceeded the 0.70 standard for all scales.
Testeretest reliability ranged from 0.82 to 0.92 (Table 5) and was adequate for all scales. 3.8. External construct validity Correlations between the subscales and summary scores of SF-36 and the ACR disease activity parameters is shown in Table 6. Correlations ranged from 0.14 (MH vs. ESR) to 0.87 (PF vs. HAQ). The PCS and MCS also correlated strongly with the measures of pain, the HAQ, and disease activity. Scores for both the PCS and MCS were significantly (P ! 0.05) better for patients with a disease duration of less than
Table 3. Spearman’s correlations between the SF-36 scales (n 5 125) Correlation between scales Scales Physical functioning Role-physical Bodily pain General health Vitality Social functioning Role-emotional Mental health
Physical functioning Role-physical Bodily pain General health Vitality Social functioning Role-emotional Mental health 1.00 0.51 0.69 0.66 0.61 0.48 0.37 0.21
1.00 0.52 0.41 0.54 0.41 0.27 0.29
Abbreviation: SF-36, Short Form-36. P-value ! 0.05 for all correlation coefficients.
1.00 0.53 0.54 0.44 0.36 0.23
1.00 0.62 0.47 0.46 0.41
1.00 0.53 0.41 0.42
1.00 0.40 0.45
1.00 0.54
1.00
1232
A.H.M. Feroz et al. / Journal of Clinical Epidemiology 65 (2012) 1227e1235
Table 4. Item means (SD) and Spearman’s correlationsa with scales (n 5 125) Scale Scale
Item
Mean (SD)
Physical functioning
Role-physical
Bodily pain
General health
Vitality
Social functioning
Role-emotional
Mental health
Physical functioning
3a 3b 3c 3d 3e 3f 3g 3h 3i 3j
1.20 1.52 1.55 1.50 1.92 1.68 1.40 1.76 2.34 2.11
(0.46) (0.63) (0.71) (0.72) (0.75) (0.69) (0.70) (0.74) (0.59) (0.51)
0.49 0.75 0.70 0.76 0.72 0.64 0.76 0.75 0.65 0.61
0.43 0.52 0.50 0.45 0.38 0.40 0.48 0.39 0.32 0.40
0.42 0.58 0.55 0.48 0.44 0.45 0.52 0.47 0.44 0.50
0.37 0.48 0.46 0.43 0.39 0.39 0.49 0.44 0.39 0.43
0.27 0.42 0.41 0.39 0.32 0.34 0.37 0.35 0.31 0.37
0.21 0.45 0.38 0.42 0.36 0.32 0.44 0.43 0.37 0.42
0.19 0.31 0.29 0.25 0.22 0.24 0.27 0.22 0.21 0.25
0.18 0.32 0.29 0.31 0.26 0.26 0.28 0.29 0.26 0.29
Role-physical
4a 4b 4c 4d
1.34 1.22 1.46 1.24
(0.42) (0.48) (0.48) (0.31)
0.34 0.45 0.51 0.56
0.53 0.69 0.70 0.72
0.41 0.52 0.54 0.57
0.34 0.45 0.44 0.50
0.33 0.48 0.39 0.44
0.36 0.42 0.36 0.44
0.42 0.43 0.32 0.40
0.27 0.35 0.23 0.30
2.79 (1.03) 2.49 (1.05)
0.54 0.66
0.53 0.62
0.75 0.75
0.54 0.60
0.48 0.56
0.44 0.62
0.32 0.42
0.38 0.48
1 11a 11b 11c 11d
2.27 2.47 2.32 2.84 2.57
(0.73) (1.44) (1.30) (1.26) (1.17)
0.62 0.26 0.40 0.31 0.48
0.54 0.30 0.36 0.28 0.43
0.66 0.27 0.44 0.31 0.54
0.61 0.36 0.58 0.38 0.66
0.58 0.34 0.46 0.32 0.56
0.52 0.32 0.41 0.26 0.45
0.35 0.26 0.23 0.17 0.24
0.48 0.30 0.37 0.27 0.42
Vitality
9a 9e 9g 9i
2.97 3.30 3.60 3.10
(1.15) (1.05) (1.42) (1.44)
0.41 0.48 0.37 0.34
0.44 0.50 0.37 0.35
0.45 0.55 0.42 0.43
0.51 0.60 0.50 0.48
0.65 0.73 0.68 0.70
0.46 0.54 0.52 0.50
0.34 0.41 0.31 0.31
0.50 0.69 0.48 0.46
Social functioning
6 10
2.54 (1.17) 2.69 (1.26)
0.44 0.51
0.42 0.49
0.49 0.57
0.49 0.53
0.53 0.59
0.73 0.73
0.41 0.46
0.58 0.56
Role-emotional
5a 5b 5c
1.61 (0.52) 1.58 (0.52) 1.42 (0.51)
0.27 0.32 0.22
0.46 0.44 0.31
0.34 0.38 0.29
0.31 0.33 0.23
0.34 0.42 0.28
0.38 0.47 0.33
0.65 0.68 0.57
0.46 0.52 0.38
Mental health
9b 9c 9d 9f 9h
4.58 3.88 3.76 4.19 3.70
0.25 0.29 0.30 0.29 0.35
0.22 0.28 0.30 0.30 0.33
0.32 0.37 0.38 0.39 0.41
0.32 0.37 0.44 0.44 0.50
0.40 0.43 0.52 0.53 0.57
0.43 0.51 0.49 0.55 0.52
0.41 0.46 0.44 0.47 0.43
0.67 0.69 0.72 0.74 0.67
Bodily pain General health
7 8
(1.15) (1.54) (1.33) (1.49) (1.18)
Abbreviation: SD, standard deviation. P-value ! 0.05 for all correlation coefficients. P-value for bold correlation coefficients ! 0.01. a Corrected for overlap.
5 years (n 5 75; PCS 5 1.61 6 1.19; MCS 5 1.31 6 1.14) vs. those with a disease duration longer than 5 years (n 5 50; PCS 5 2.52 6 0.57; MCS 5 1.21 6 1.13) and in patients who were RF negative (n 5 35; PCS 5 1.82 6 1.67; MCS 5 1.20 6 1.14) vs. RF positive (n 5 90; PCS 5 2.31 6 1.14; MCS 5 1.11 6 1.12) confirming the knowngroups validity of the SF-36 component scores.
4. Discussion An important consideration when using an HRQOL questionnaire is the cultural appropriateness of the measure. In this study, the standard US English SF-36 (version 1) was crossculturally translated and adapted for use in the Bengali culture
in accordance with standard methodology [22]. The findings showed that the interviewer-administered Bengali SF-36 appears to be an acceptable, reliable, and valid instrument for measuring HRQOL in Bangladeshi patients with RA. Four items of the SF-36 were modified; and in another item, the English word ‘‘nervous’’ was adopted as this was better understood than the Bengali word ‘‘bicholito.’’ Similar modifications have been done in other studies [17,23,32]. As reported in previous studies outside the United States [27], some of the moderate physical activities, such as playing golf, bowling, and vacuum cleaning are rarely performed in Bangladesh. The five modified questions were understood by all respondents in the field test of the prefinal version. Four additional questions that were not understood by most respondents during the field test were retranslated into
A.H.M. Feroz et al. / Journal of Clinical Epidemiology 65 (2012) 1227e1235
1233
Table 5. Tests of scaling assumptions (Cronbach’s alpha, item internal consistency, item discriminant validity, and scaling success) (n 5 125) and testeretest reliability (n 5 40) Correlations between items and scales Scale Physical functioning Role-physical Bodily pain General health Vitality Social functioning Role-emotional Mental health
# Items
Cronbach’s alpha
Item internal consistencya
Item discriminant validityb
10 4 2 5 4 2 3 5
0.94 0.88 0.86 0.78 0.85 0.85 0.79 0.87
0.49e0.76 0.53e0.72 0.75 0.36e0.66 0.65e0.73 0.73 0.57e068 0.67e0.74
0.18e0.58 0.23e0.57 0.32e0.66 0.17e0.66 0.31e0.60 0.42e0.59 0.22e052 0.22e0.57
Scaling success Success/ totalc 79/80 32/32 16/16 31/40 32/32 16/16 24/24 40/40
Scaling success (%)
Testeretest (Spearman’s r)
98.8 100 100 77.5 100 100 100 100
0.92 0.82 0.85 0.86 0.86 0.82 0.82 0.86
P-values ! 0.05 for all correlation coefficients. a Range of correlations (Spearman’s) between items and hypothesized scales corrected for overlap. b Range of correlations (Spearman’s) between items and other scales. c Number of significantly higher (O2 standard errors) correlations between items and hypothesized scales/number of correlations.
Bengali after consultation with the MH experts from the expert committee, back translated into English, reviewed and discussed by the expert committee for reaching consensus. Difficulties in translating these descriptions of states of mind, such as ‘‘full of pep,’’ ‘‘down in the dump,’’ ‘‘down hearted and blue,’’ and ‘‘worn out,’’ have also been reported in other countries [36]. Partly owing to these difficulties, these words have been substituted by less colloquial phrases in version 2 of the US English SF-36 [19]. As expected, the mean PCS score in the psychometric evaluation study was much lower than the general US population mean. In contrast, the MCS score was more than a standard deviation above the population norm. Although the exact reason for this high level of MH is unclear, it is in close accordance with a study among British patients with RA. The very low percentage of missing data are most likely the result of the fact that the questionnaire was interviewer administered instead of self-reported in this study [37,38]. It may also be related the low age of our sample because Loge et al. [17] reported higher missing rates among the older patients. The floor and ceiling problems found in both role scales are not specific to the Bengali SF-36, but are inherent to
two-point response format. In version 2 of the US SF-36, the number of response options for these items has been increased, which was shown to substantially reduce this problem [19]. The internal construct validity was supported by the scales’ correlations with the PCS and MCS of health. Principal component analysis supported the existence of the two hypothesized physical health and MH dimensions. Also, the observed correlational pattern between SF-36 scales and the rotated components showed higher correlations of physical scales with the first factor, whereas the MH and RE scales correlated weakly with this factor. The reverse was found with the second factor. The correlations of the scales with their principal components were similar to the hypothesized measurement model of the SF-36 and those found in British [18] and Japanese patients [23], with the exception of the VT scale. This scale correlated highest with the physical factor, whereas in the United States it loaded fairly evenly on both factors. Recently, the same problem with this scale was observed in patients with severe functional somatic syndromes [39]. Moreover, both the GH scale and the SF scale correlated more weakly with the mental factor and the SF scale correlated higher with
Table 6. Spearman correlations between scales and ACR disease activity parameters (n 5 125) Scale Physical functioning Role-physical Bodily pain General health Vitality Social functioning Role-emotional Mental health Physical component summary Mental component summary
ESR 0.40 0.24 0.37 0.16* 0.25 0.30 0.15* 0.14* 0.33 0.22
SJC 0.42 0.35* 0.47 0.37 0.39 0.41 0.19* 0.23* 0.45 0.36
TJC
Phy-global
Pt-global
HAQ
VAS
0.48 0.44 0.56 0.44 0.50 0.49 0.36* 0.44 0.61 0.54
0.37* 0.49 0.52 0.36* 0.43 0.45 0.33* 0.33* 0.52 0.49
0.40 0.51 0.67 0.41 0.42 0.65 0.35 0.29 0.62 0.55
0.87 0.53 0.57 0.50 0.50 0.65 0.35* 0.30* 0.77 0.59
0.56 0.61 0.80 0.49 0.52 0.61 0.36 0.49 0.74 0.63
Abbreviations: ACR, American College of Rheumatology; ESR, erythrocyte sedimentation rate; SJC, swollen joint count; TJC, tender joint count; Phy-global, physician’s global assessment of disease activity; Pt-global, patient’s global assessment of disease activity; HAQ, Health Assessment Questionnaire; VAS, patient’s assessment of pain on visual analog scale. P-value ! 0.01 for all the correlation coefficients, except for the values marked with an asterisk symbol, which denote P-value ! 0.05.
1234
A.H.M. Feroz et al. / Journal of Clinical Epidemiology 65 (2012) 1227e1235
the physical factor than hypothesized. This corresponds to a recent study that demonstrated condition-specific loadings of these three scales on factors of physical health and MH [40], depending on whether the patients’ main condition was a physical or mental illness. All items passed the test of item internal consistency with exception of two GH items (11a and 11c) that assess patients’ future health expectations rather than their current health. This is consistent with other studies that found these items to be poorly correlated with their own scale and with any other SF-36 scale [17,39]. With exception of one PF item and the same two GH items, all items also passed the test for discriminant validity, which was also consistent with the studies by Loge et al. [17,39]. The five revised items showed a strong correlation with their hypothesized scales (ranging from 0.60 to 0.74), a finding comparable with other studies that showed values ranging from 0.62 to 0.81 [17,23]. The high correlations between scales measuring the same domain of health and the lower correlation between scales measuring separate domains of health confirmed the convergent and divergent validity of the eight scales. This study also provided support for the reliability of the Bengali SF-36. The reliability of all scales was well above the 0.70 standard for group comparisons. Cronbach’s alphas were similar to those reported in other studies where alphas ranged from 0.78 to 0.95 [17,23]. The testeretest reliability of the Bengal SF-36 (ranging from 0.82 to 0.92) was comparable with previous reports in Norwegian, British, and Japanese populations, where correlations ranged from 0.87 to 0.99 [17,18,23]. Significant correlations were found between the SF-36 scales and the ACR disease activity parameters of RA, confirming the external construct validity of the Bengal SF-36. Correlations were comparable with those found by Ruta et al. [18] who found correlations ranging from 0.12 (ESR vs. MH) to 0.89 (HAQ vs. PF). As expected, PCS and MCS scores were significantly better for those with shorter disease duration and those with seronegative RF. Overall, these findings suggest that the concepts of health status embodied in the SF-36 can be conveyed in Bengali. It should be noted that although the SF-36 is a generic instrument, the field-testing and psychometric evaluation of the Bengali version in this study was performed in patients with RA only. Additionally, patients with coexisting mental illness were excluded. This may have affected the problems encountered with the patients’ understanding of some of the VT and MH items during the field-testing phase. Given that these items were retranslated for the final version, the current Bengali SF-36 may be too narrow in its focus on RA. To adequately examine the generic nature and properties of the Bengali SF-36, it should be evaluated in more diverse samples, preferably from both the general population and other clinical populations. In sum, the interviewer-administered Bengali SF-36 appears to be an acceptable, reliable, and valid instrument
for measuring HRQOL in Bangladeshi patients with RA. The questionnaire should be further evaluated in people from the general population and in patients with different medical conditions to confirm its generic properties and ability to measure and compare HRQOL across different patient groups.
Acknowledgments The authors convey their regards and gratefulness to honorable Professor Yunus, Professor of Medicine, Section of Rheumatology, University of Illinois, College of Medicine at Peoria, USA, for his active inspiration, valuable suggestions, and teaching on the concept of health-related quality of life instruments. They are thankful to Dr. Siraj, Honorary Medical officer, Bangabandhu Sheikh Mujib Medical University, for his cordial help in collecting the sample and data. The authors would remain ever grateful to all the patients whom they needed the most for carrying out this research.
Appendix Supplementary data Supplementary data related to this article can be found online at http://dx.doi.org/10.1016/j.jclinepi.2012.05.004. References [1] Bowling A. Measuring disease: a review of disease specific quality of life measurement scales. 2nd ed. Buckingham, UK: Open University Press; 2001. [2] Fayers PM, Machin D. Quality of life: the assessment, analysis and interpretation of patient-reported outcomes. 2nd ed. Chichester, UK: Wiley; 2007. [3] Patrick DL, Erickson P. Health status and health policy: quality of life in health care evaluation and resource allocation. New York, NY: Oxford University Press; 1993. [4] Guyatt GH. A taxonomy of health status instruments. J Rheumatol 1995;22:1188e90. [5] Patrick DL, Deyo RA. Generic and disease-specific measures in assessing health status and quality of life. Med Care 1989;27(3 Suppl): S217e32. [6] Hagen KB, Smedstad LM, Uhlig T, Kvien TK. The responsiveness of health status measures in patients with rheumatoid arthritis: comparison of disease-specific and generic instruments. J Rheumatol 1999; 26:1474e80. [7] Veehof MM, ten Klooster PM, Taal E, van Riel PL, van de Laar MA. Comparison of internal and external responsiveness of the generic Medical Outcome Study Short Form-36 (SF-36) with disease-specific measures in rheumatoid arthritis. J Rheumatol 2008;35:610e7. [8] Ware JE Jr, Sherbourne CD. The MOS 36-item short-form health survey (SF-36). I. Conceptual framework and item selection. Med Care 1992;30:473e83. [9] Garratt A, Schmidt L, Mackintosh A, Fitzpatrick R. Quality of life measurement: bibliographic study of patient assessed health outcome measures. BMJ 2002;324:1417. [10] Kalyoncu U, Dougados M, Daures JP, Gossec L. Reporting of patient-reported outcomes in recent trials in rheumatoid arthritis: a systematic literature review. Ann Rheum Dis 2009;68:183e90.
A.H.M. Feroz et al. / Journal of Clinical Epidemiology 65 (2012) 1227e1235 [11] Polinder S, Haagsma JA, Belt E, Lyons RA, Erasmus V, Lund J, et al. A systematic review of studies measuring health-related quality of life of general injury populations. BMC Public Health 2010; 10:783. [12] Busija L, Pausenberger E, Haines TP, Haymes S, Buchbinder R, Osborne RH. Adult measures of general health and health-related quality of life: Medical Outcomes Study Short Form 36-Item (SF-36) and Short Form 12-Item (SF-12) Health Surveys, Nottingham Health Profile (NHP), Sickness Impact Profile (SIP), Medical Outcomes Study Short Form 6D (SF-6D), Health Utilities Index Mark 3 (HUI3), Quality of Well-Being Scale (QWB), and Assessment of Quality of Life (AQOL). Arthritis Care Res 2011;63:S383e412. [13] Birrell FN, Hassell AB, Jones PW, Dawes PT. How does the short form 36 health questionnaire (SF-36) in rheumatoid arthritis (RA) relate to RA outcome measures and SF-36 population values? A crosssectional study. Clin Rheumatol 2000;19:195e9. [14] Koh ET, Leong KP, Tsou IY, Lim VH, Pong LY, Chong SY, et al. The reliability, validity and sensitivity to change of the Chinese version of SF-36 in oriental patients with rheumatoid arthritis. Rheumatology (Oxford) 2006;45:1023e8. [15] Kosinski M, Keller SD, Hatoum HT, Kong SX, Ware JE Jr. The SF-36 Health Survey as a generic outcome measure in clinical trials of patients with osteoarthritis and rheumatoid arthritis: tests of data quality, scaling assumptions and score reliability. Med Care 1999;37(5 Suppl): MS10e22. [16] Linde L, Sorensen J, Ostergaard M, Horslev-Petersen K, Hetland ML. Health-related quality of life: validity, reliability, and responsiveness of SF-36, 15D, EQ-5D [corrected] RAQoL, and HAQ in patients with rheumatoid arthritis. J Rheumatol 2008;35:1528e37. [17] Loge JH, Kaasa S, Hjermstad MJ, Kvien TK. Translation and performance of the Norwegian SF-36 Health Survey in patients with rheumatoid arthritis. I. Data quality, scaling assumptions, reliability, and construct validity. J Clin Epidemiol 1998;51:1069e76. [18] Ruta DA, Hurst NP, Kind P, Hunter M, Stubbings A. Measuring health status in British patients with rheumatoid arthritis: reliability, validity and responsiveness of the short form 36-item health survey (SF-36). Br J Rheumatol 1998;37:425e36. [19] Ware JE Jr. SF-36 health survey update. Spine 2000;25:3130e9. [20] Berzon R, Hays RD, Shumaker SA. International use, application and performance of health-related quality of life instruments. Qual Life Res 1993;2:367e8. [21] Wagner AK, Gandek B, Aaronson NK, Acquadro C, Alonso J, Apolone G, et al. Cross-cultural comparisons of the content of SF-36 translations across 10 countries: results from the IQOLA Project. International Quality of Life Assessment. J Clin Epidemiol 1998;51:925e32. [22] Beaton DE, Bombardier C, Guillemin F, Ferraz MB. Guidelines for the process of cross-cultural adaptation of self-report measures. Spine 2000;25:3186e91. [23] Fukuhara S, Bito S, Green J, Hsiao A, Kurokawa K. Translation, adaptation, and validation of the SF-36 Health Survey for use in Japan. J Clin Epidemiol 1998;51:1037e44. [24] Arnett FC, Edworthy SM, Bloch DA, McShane DJ, Fries JF, Cooper NS, et al. The American Rheumatism Association 1987 revised criteria for the classification of rheumatoid arthritis. Arthritis Rheum 1988;31:315e24.
1235
[25] Felson DT, Anderson JJ, Boers M, Bombardier C, Chernoff M, Fried B, et al. The American College of Rheumatology preliminary core set of disease activity measures for rheumatoid arthritis clinical trials. Arthritis Rheum 1993;36:729e40. [26] Hossain M. Modification and Bengali Translation of Childhood Health Assessment Questionnaire (CHAQ) for Assessing the Outcome Measure in Juvenile Idiopathic Arthritis patients [thesis]. Bangladesh: Bangabandhu Sheikh Mujib Medical University; 2006. [27] Ware JE Jr, Gandek B. Methods for testing data quality, scaling assumptions, and reliability: the IQOLA Project approach. International Quality of Life Assessment. J Clin Epidemiol 1998;51:945e52. [28] McHorney CA, Haley SM, Ware JE Jr. Evaluation of the MOS SF-36 Physical Functioning Scale (PF-10): II. Comparison of relative precision using Likert and Rasch scoring methods. J Clin Epidemiol 1997;50:451e61. [29] McHorney CA, Ware JE Jr, Raczek AE. The MOS 36-Item ShortForm Health Survey (SF-36): II. Psychometric and clinical tests of validity in measuring physical and mental health constructs. Med Care 1993;31:247e63. [30] Ware JE, Kosinski M, Keller SD. SF-36 physical and mental summary scales: a user’s manual. Boston, MA: The Health Institute, New England Medical Center; 1994. [31] Aaronson NK, Muller M, Cohen PD, Essink-Bot ML, Fekkes M, Sanderman R, et al. Translation, validation, and norming of the Dutch language version of the SF-36 Health Survey in community and chronic disease populations. J Clin Epidemiol 1998;51:1055e68. [32] Lam CL, Gandek B, Ren XS, Chan MS. Tests of scaling assumptions and construct validity of the Chinese (HK) version of the SF-36 Health Survey. J Clin Epidemiol 1998;51:1139e47. [33] Montazeri A, Goshtasebi A, Vahdaninia M, Gandek B. The Short Form Health Survey (SF-36): translation and validation study of the Iranian version. Qual Life Res 2005;14:875e82. [34] Lim LL, Seubsman SA, Sleigh A. Thai SF-36 health survey: tests of data quality, scaling assumptions, reliability and validity in healthy men and women. Health Qual Life Outcomes 2008;6:52. [35] Laguardia J, Campos MR, Travassos CM, Najar AL, Anjos LA, Vasconcellos MM. Psychometric evaluation of the SF-36 (v.2) questionnaire in a probability sample of Brazilian households: results of the survey Pesquisa Dimensoes Sociais das Desigualdades (PDSD), Brazil, 2008. Health Qual Life Outcomes 2011;9:61. [36] Ware JE Jr, Keller SD, Gandek B, Brazier JE, Sullivan M. Evaluating translations of health status questionnaires. Methods from the IQOLA project. International Quality of Life Assessment. Int J Technol Assess Health Care 1995;11:525e51. [37] Perkins JJ, Sanson-Fisher RW. An examination of self- and telephoneadministered modes of administration for the Australian SF-36. J Clin Epidemiol 1998;51:969e73. [38] Weinberger M, Oddone EZ, Samsa GP, Landsman PB. Are healthrelated quality-of-life measures affected by the mode of administration? J Clin Epidemiol 1996;49:135e40. [39] Schr€oder A, Oernboel E, Licht RW, Sharpe M, Fink P. Outcome measurement in functional somatic syndromes: SF-36 summary scores and some scales were not valid. J Clin Epidemiol 2012;65:30e41. [40] Hann M, Reeves D. The SF-36 scales are not accurately summarised by independent physical and mental component scores. Qual Life Res 2008;17:413e23.