Minimal detectable change of the Personal and Social Performance scale in individuals with schizophrenia

Minimal detectable change of the Personal and Social Performance scale in individuals with schizophrenia

Psychiatry Research 246 (2016) 725–729 Contents lists available at ScienceDirect Psychiatry Research journal homepage: www.elsevier.com/locate/psych...

191KB Sizes 0 Downloads 20 Views

Psychiatry Research 246 (2016) 725–729

Contents lists available at ScienceDirect

Psychiatry Research journal homepage: www.elsevier.com/locate/psychres

crossmark

Minimal detectable change of the Personal and Social Performance scale in individuals with schizophrenia ⁎

Shu-Chun Leea,d, Shih-Fen Tangb, Wen-Shian Luc, Sheau-Ling Huangd,e, , Nai-Yu Denga, Wen-Chyn Luea, Ching-Lin Hsiehd,e a

Taipei City Psychiatric Center, Taipei City Hospital, Taipei, Taiwan Taoyuan Psychiatric Center, Ministry of Health and Welfare, Taoyuan, Taiwan c School of Occupational Therapy, Chung Shan Medical University, and Occupational Therapy Room, Chung Shan Medical University Hospital,Taichung, Taiwan d School of Occupational Therapy, College of Medicine, National Taiwan University, Taipei, Taiwan e Department of Physical Medicine and Rehabilitation, National Taiwan University Hospital, Taipei, Taiwan b

A R T I C L E I N F O

A BS T RAC T

Keywords: Personal and Social Performance scale Schizophrenia Minimal detectable change

The minimal detectable change (MDC) of the Personal and Social Performance scale (PSP) has not yet been investigated, limiting its utility in data interpretation. The purpose of this study was to determine the MDCs of the PSP administered by the same rater or different raters in individuals with schizophrenia. Participants with schizophrenia were recruited from two psychiatric community rehabilitation centers to complete the PSP assessments twice, 2 weeks apart, by the same rater or 2 different raters. MDC values were calculated from the coefficients of intra- and inter-rater reliability (i.e., intraclass correlation coefficients). Forty patients (mean age 36.9 years, SD 9.7) from one center participated in the intra-rater reliability study. Another 40 patients (mean age 44.3 years, SD 11.1) from the other center participated in the inter-rater study. The MDCs (MDC%) of the PSP were 10.7 (17.1%) for the same rater and 16.2 (24.1%) for different raters. The MDCs of the PSP appeared appropriate for clinical trials aiming to determine whether a real change in social functioning has occurred in people with schizophrenia.

1. Introduction Social dysfunction, one of the diagnostic criteria specified in the Diagnostic and Statistical Manual of Mental Disorders, 5th ed. (DSM5) (American Psychiatric Association, 2013), is an important characteristic of schizophrenia. This impairment leads to an individual's inability to meet societally defined roles such as homemaker, worker, student, spouse, family member, or friend (Mueser and Tarrier, 1998). To manage and monitor social functioning, clinicians need to routinely assess the social functioning of individuals with schizophrenia. The Personal and Social Performance scale (PSP), which is based on the “social and occupational dysfunction criterion” included in the DSM-IV (Morosini et al., 2000), was developed specifically for the assessment of social functioning in schizophrenia (Morosini et al., 2000). The PSP is widely used and has been translated into several languages. The psychometric properties (including validity, intra-rater reliability, and inter-rater reliability) of the PSP are sufficient (Morosini et al., 2000; Nasrallah et al., 2008; Patrick et al., 2009; Schaub et al., 2011; Tianmei et al., 2011). Thus, the PSP has great potential in both ⁎

clinical and research settings. Any measurement entails random measurement error, which is critical for data interpretation. The random measurement error of a measure can be estimated by calculating the standard error of measurement (SEM) and minimal detectable change (MDC) (Goldsmith et al., 1993; Schuck and Zwingmann, 2003; Steffen and Seney, 2008). The SEM indicates the amount of random variation of a measurement score. The MDC represents the minimal amount of change in consecutive assessments that is not due to random variation (i.e., random measurement error) in assessments at a certain confidence level (e.g., 95%) (Haley and Fragala-Pinkham, 2006). Thus, the MDC of a measure is critical for users to interpret the results of repeated assessments. For example, a clinician can employ a measure's MDC as a threshold to determine whether the change score in the measure is due to an individual's improvement (or deterioration) or to simply random measurement error. Such an application cannot be offered by traditional reliability indexes (e.g., intraclass correlation coefficient and Cronbach's alpha). In addition, the MDC is useful for researchers in reporting the proportion of an experimental group in a

Corresponding author. E-mail address: [email protected] (C.-L. Hsieh).

http://dx.doi.org/10.1016/j.psychres.2016.10.058 Received 14 January 2016; Received in revised form 23 June 2016; Accepted 28 October 2016 Available online 29 October 2016 0165-1781/ © 2016 Elsevier Ireland Ltd. All rights reserved.

Psychiatry Research 246 (2016) 725–729

S.-C. Lee et al.

score according to their knowledge and/or the interview results. The final result is a single overall rating from 1 to 100, where higher scores indicate better personal and social functioning (Morosini et al., 2000).

clinical trial achieving change scores beyond the MDC of the outcome measures (Haley and Fragala-Pinkham, 2006). Such information can help researchers translate their findings for clinicians. Therefore, the MDC of a measure is useful for both researchers and clinicians in interpreting whether an individual's change score of follow-up assessments is true (beyond random measurement error). The MDC of the PSP, however, is largely unknown, which limits the utility in data interpretation of the PSP. Thus, the aim of this study was to estimate the MDC of the PSP in individuals with schizophrenia. We calculated the MDC of the PSP administered by a single rater (intrarater) and by different raters (inter-rater) for users to interpret the results of routine and follow-up assessments.

2.4. Data analysis All statistical analyses were performed using SPSS (IBM Corp. Released 2012. IBM SPSS Statistics for Windows, Version 21.0. Armonk, NY: IBM Corp). 2.4.1. Intra- and inter-rater reliability The intra- and inter-rater reliabilities were assessed with the Intraclass correlation coefficients (ICCs) (Bartko, 1966) for the total scores and subdomains of the PSP. The ICC(2,1), used in this study, is commonly utilized to examine the extent of agreement between repeated measurements (Shrout and Fleiss, 1979). ICC values ≥0.80 indicated high agreement, values of 0.60–0.79 indicated moderate, and those < 0.59 indicated poor agreement (Bushnell et al., 2001).

2. Methods 2.1. Participants Individuals with schizophrenia were recruited from two community psychiatric rehabilitation centers at Taipei City Hospital in Taiwan in December 2014. Inclusion criteria were as follows: (1) diagnosis of schizophrenia or schizoaffective disorder according to the Diagnostic and Statistical Manual of Mental Disorders, Text Revision, 4th edition (DSM-IV-TR); (2) age of 18–65 years; (3) stable clinical condition (no psychiatric emergency treatment records and consistent dose of antipsychotic medication received for at least 3 months); and (4) ability to understand and provide informed consent. This study was approved by the Institutional Review Board of Taipei City Hospital.

2.4.2. Minimal detectable change (MDC) and MDC percentage (MDC %) The MDC was calculated by multiplying the SEM by the z score and the square root of 2. The SEM and the MDCs were calculated using the following formulas (Haley and Fragala-Pinkham, 2006):

MDC =Z-score × SEM ×

SEM = SD baseline ×

2

(1−r test−retest )

(1) (2)

The Z-score represents the confidence interval (CI) from the standard normal distribution (e.g., 1.96 for 95% CI). The SD is the standard deviation of the first test scores, and r is the coefficient of the intra-rater or inter-rater reliability (i.e., the ICC). The multiplier of 2 represents the variance of 2 successive assessments (Haley and Fragala-Pinkham, 2006). We further provided MDCs with 90% and 80% confidence levels for users’ reference. In addition, we calculated the MDC% to represent the extent of random measurement error caused by same raters or different raters. The MDC% was calculated by dividing the MDC with the average of all test and retest scores and multiplying by 100% (i.e., MDC% =100*(MDC/mean of all test and retest data)) (Flansbjer et al., 2005). The MDC% is independent of measurement units (Huang et al., 2011), so it can be used for comparing the extent of random measurement error between different measures or different methods of data collection (e.g., data collected by different raters or same raters). The MDC% is also useful for determining whether the amount of random measurement errors is acceptable or satisfactory. An MDC95% of less than 30% was considered acceptable (Huang et al., 2011).

2.2. Procedures Participants who met the inclusion criteria and signed the informed consent were assessed by the raters in a quiet environment. The participants from one center were assessed twice, 2 weeks apart, by a psychologist for the intra-rater reliability study. The participants from the other center were also assessed twice, 2 weeks apart, by an occupational therapist and a psychologist in a counterbalanced order for the inter-rater reliability study. Demographic and clinical information of the participants were collected from clinical records. We adopted the methods for the training program on using the PSP of Morosini et al. (2000). The training program was conducted by the principal investigator, who is an experienced user of the PSP. Both raters in this study had more than 10 years of experience working with patients with schizophrenia. Prior to the study, they received training on how to administer the PSP. The training program included reading the manual and receiving instruction on how to use the PSP from the principal investigator. Then the raters practiced administering the PSP on 12 patients. The raters and the principal investigator discussed any disagreements and questions encountered to clarify the testing procedures.

2.4.3. Systematic bias Systematic bias was examined by a paired t-test to determine the statistical significance of the test and retest scores (Fleiss, 1986). We also calculated the effect size (Kazis et al., 1989), which was the mean changes divided by the standard deviation of the 1st session scores, to determine the extent of bias. According to Cohen's criteria, an effect size greater than 0.8 was considered large; one of 0.5–0.8, moderate; and one of 0.2–0.5, small (Cohen, 1988).

2.3. Instrument The PSP contains 4 subdomains: socially useful activities, personal and social relationships, self-care, and disturbing and aggressive behaviors (Morosini et al., 2000). Each subdomain is assessed on a 6-degree scale of severity from absent, mild, and manifest to marked, severe, or very severe, in which 1 means absence of disability and 6 means very severe. The degree of severity in these 4 subdomains is assessed by the rater's knowledge of the patient. The rater also interviews the patients, other mental health professionals, or other caregivers for further information needed to make the rating. Based on a certain combination of severity scores of the 4 subdomains, personal and social functioning is rated on a 100-point scale with 10-point intervals. Each 10-point interval has its own operational criteria, which are provided in the PSP Scoring Guidelines (Patrick et al., 2009). Within this 10-point interval, raters make judgments on adjusting the

3. Results One group of 40 patients participated in the intra-rater reliability study, and the other 40 patients participated in the inter-rater reliability study. In the sample for testing intra-rater reliability, the mean age was 36.9 years (SD 9.7), and 42.5% of the participants were male. The average duration of illness was 13.2 years. In the sample for testing inter-rater reliability, 42.5% of the participants were also male, the mean age was 44.3 years (SD 11.1), and the average duration of 726

Psychiatry Research 246 (2016) 725–729

S.-C. Lee et al.

and social relationships” and “disturbing and aggressive behaviors” presented small effect sizes of 0.25 and 0.20, respectively (Table 3).

Table 1 Characteristics of the patients in the intra-rater and inter-rater studies. Characteristic

Intra-rater reliability (n=40)

Inter-rater reliability (n=40)

p

4. Discussion

n (%), Mean ± SD Gender (%) Male Female Age Onset age Duration of illness Education (%) Junior high school Senior high school College and above Marital status (%) Single Married Divorced

Our results showed that the ICC value for the intra-rater reliability study was 0.79, indicating moderate intra-rater agreement. One possible reason for this slight inconsistency is the subjective nature of the PSP (i.e., the rater's judgments based on the interview for the ratings). A rater is required to individually interview and rate the patient each time for the assessment of the PSP. However, the scoring guidelines of the PSP do not provide specific interview questions for the assessment. Thus, for the same patient, raters may ask different questions at the two interviews. Therefore, the different interview content and resulting judgments may have lowered the consistency between intra-rater assessments. Nevertheless, the commonly cited minimal standard for reliability coefficients (e.g., ICC value) is 0.7 for group comparisons (Fitzpatric et al., 1998; Frost et al., 2007). Therefore, the moderate intra-rater agreement of the PSP is sufficient for use in research contexts. Previous studies have shown a wide variety of intra-rater reliabilities of the PSP (ICCs ranging from 0.45 to 0.95) (Nafees et al., 2012; Nasrallah et al., 2008; Patrick et al., 2009; Tianmei et al., 2011). Two studies showed high intra-rater reliability (ICC=0.91–0.95, two assessments administered 2–5 days apart) (Patrick et al., 2009; Tianmei et al., 2011). However, because of the short periods of repeated assessments, the high intra-rater reliability found in both studies might be due to the memory effect of the raters. Nafees et al. found that the ICC was 0.45 (two assessments administered 8–10 days apart), which might be explained by the low between-subject variability; i.e., the low standard deviations of 8.33 at the first test session and 6.33 at the retest session. In the study of Nasrallah et al., the PSP total scale had an ICC of 0.76 (two assessments administered 24 weeks apart with stable Clinical Global Impression-Severity scores), which is similar to ours. These observations might lend support for our findings. For the inter-rater reliability, the ICC was 0.57, indicating poor inter-rater agreement. It is reasonable that the level of inter-rater (different raters) reliability was less than that of intra-rater (same raters) reliability. However, our ICC value was lower than those reported previously (ICCs=0.82–0.98) (Morosini et al., 2000; Patrick et al., 2009; Schaub et al., 2011; Tianmei et al., 2011). Three reasons may explain the insufficient inter-rater reliability. First, such differences might be due to differences in study design. Some previous studies (Morosini et al., 2000; Patrick et al., 2009; Schaub et al., 2011; Tianmei et al., 2011) examined inter-rater reliability using the design of independent and simultaneous rating on the same patients by all raters. Although such a design could ensure that the patients’ social functions would not change, the results might have been overestimated because the administration procedures were simplified. The assessment of the PSP requires a rater to individually interview and rate the patient. Thus, raters' interview skills and rating judgment would affect the results of assessments. Particularly, in clinical settings, it is not common, or is even impossible, for patients to be assessed by two or more raters simultaneously. Therefore, we designed our study to reflect

1.000 17 (42.5%) 23 (57.5%) 36.9 ± 9.7 23.7 ± 8.4 13.2 ± 8.1

17 (42.5%) 23 (57.5%) 44.3 ± 11.1 22.1 ± 7.1 22.2 ± 9.7

7.6 17.6 75.2

5.0 57.5 37.5

35 (87.5%) 5 (12.5%) 0

35 (87.5%) 3 (7.5%) 2 (5.0%)

0.002 0.362 < 0.001 0.001

0.194

illness was 22.2 years. These two groups were similar in terms of gender, onset age, and marital status, but significantly different in terms of age, duration of illness, and education (p≦0.002). Table 1 presents further details of the participants. 3.1. Intra- and inter-rater reliability Regarding the intra-rater reliability, the ICC values for the total scores and 4 subdomains were 0.79 and 0.66–0.77, respectively (Table 2). In the inter-rater reliability study, the ICC values for the total scores and 4 subdomains were 0.57 and 0.43–0.68, respectively (Table 3). 3.2. MDC and MDC % The MDC95 (MDC95%) values of the PSP scale total scores for intrarater reliability and inter-rater reliability were 10.7 (17.1%) and 16.2 (24.1%), respectively. The 4 subdomains of the PSP for intra- and inter-rater reliability had MDC95 (MDC95%) values of 1.1–1.4 (48.6– 74.2%) and 1.1–1.9 (52.4–85.6%), respectively. MDCs with 90% and 80% confidence levels are also listed in Tables 2, 3. 3.3. Systematic bias In the intra-rater reliability study, the results did not show systematic bias in the total scores (t=−0.62, p=0.537) or 4 subdomain scores (t socially useful activities=0.68, p=0.498; t personal and social relationships=−1.84, p=0.073; t self-care =−0.26, p=0.800; t disturbing and aggressive behaviors=0.27, p=0.785). The subdomain of personal and social relationships nevertheless showed a small effect size (Table 2). In addition, systematic bias was found between repeated assessments in the interrater reliability study in the total scores (t=−3.00, p=0.005) and in the subdomain of socially useful activities (t=3.15, p=0.003). Their effect sizes were 0.43 and 0.63, respectively. The subdomains of “personal

Table 2 Test-retest results and minimal detectable changes (MDCs) of the PSP based on same rater (n=40). Domain/subdomain

PSP total score socially useful activities personal and social relationships self-care disturbing and aggressive behaviors

Mean (SD) 1st session

2nd session

62.6 (8.9) 2.8 (1.1) 2.1 (0.9) 2.0 (0.9) 1.5 (0.7)

63.2 (8.0) 2.7 (1.0) 2.3 (0.8) 2.1 (0.8) 1.5 (0.7)

Paired t-test (p-value)

Effect size

ICC (95% CI)

SEM

MDC95 (MDC95%)

MDC90

MDC80

−0.62 (0.537) 0.68 (0.498) −1.84 (0.073) −0.26 (0.800) 0.27 (0.785)

0.07 0.09 0.22 0.11 0.00

0.79 0.77 0.69 0.72 0.66

3.87 0.48 0.49 0.43 0.40

10.7 (17.1%) 1.3 (48.6%) 1.4 (62.5%) 1.2 (58.8%) 1.1 (74.2%)

9.0 1.1 1.2 1.0 0.9

7.0 0.8 0.9 0.8 0.7

727

(0.63, (0.60, (0.48, (0.53, (0.45,

0.88) 0.87) 0.82) 0.84) 0.81)

Psychiatry Research 246 (2016) 725–729

S.-C. Lee et al.

Table 3 Test-retest results and minimal detectable changes (MDCs) of the PSP based on different raters (n=40). Domain/subdomain

Mean (SD)

Paired t-test (pvalue)

Effect size

ICC (95% CI)

SEM

MDC95 (MDC95%)

MDC90 (MDC90%)

MDC80 (MDC80%)

0.57 (0.29, 0.76) 0.43 (0.64, 0.66) 0.68 (0.48, 0.82) 0.56 (0.30, 0.74) 0.38 (0.07, 0.61)

5.86

16.2 (24.1%)

13.6

10.6

0.69

1.9 (78.5%)

1.6

1.2

0.43

1.2 (52.4%)

1.0

0.8

0.51

1.4 (80.1%)

1.2

0.9

0.39

1.1 (85.6%)

0.9

0.7

1st session

2nd session

PSP total score

65.7 (8.3)

69.3 (9.3)

−3.00 (0.005)

0.43

socially useful activities

2.7 (0.8)

2.2 (0.9)

3.15 (0.003)

0.63

personal and social relationships self-care

2.4 (0.8)

2.2 (0.8)

1.30 (0.200)

0.25

1.8 (0.8)

1.8 (0.8)

−0.22 (0.830)

0.00

disturbing and aggressive behaviors

1.2 (0.5)

1.3 (0.5)

−0.57 (0.570)

0.20

We found that different raters showed systematic bias on the total scores of the PSP and its subdomain of socially useful activities (e.g., housework, voluntary work). The bias might have been caused by the different criteria applied by our raters. Our raters’ training included three stages: preparation, rating training on patients, and confirmation. The training appeared to be sufficient. However, there was a small effect of bias on the total score and a moderate effect on a subdomain. Thus, further rater training, particularly on the subdomains, is suggested. Fortunately, the exact value of systematic bias of the total score between raters was 3.6, which was very small compared to the highest possible score (100). Three limitations are noted. First, the sample was recruited from community rehabilitation centers only. We did not recruit patients from other ethnicity or the other types of settings (e.g., day-care centers, acute wards). Such sampling biases might have limited the generalization of our findings. Second, the sample size of this study was modest (n=40) for calculating the MDCs of the PSP. Further studies with a larger sample size (e.g., n≧50) are warranted to further validate our findings. Third, the dissimilarity of the 2 study groups (e.g., older age and higher education in the inter-rater study) might have confounded the differences of the reliability indexes between the intrarater and the inter-rater studies. Thus, further validation of our results is warranted. In conclusion, the findings of this study indicate that the PSP has acceptable intra-rater and inter-rater MDC% when applied to individuals with schizophrenia. The MDC of the PSP, obtained from either a single rater or different raters, is useful for users to determine whether the changes in social functioning in an individual with schizophrenia are beyond random measurement error.

the situation in clinical settings by having the interviews and ratings conducted independently by one rater and then by another rater 2 weeks later. However, our design might have resulted in lower reliability than those in previous studies. Second, the interview of the PSP appears unstructured. As mentioned in the previous discussion on intra-rater reliability, the current version does not provide either guidelines or specific questions for raters to administer the interview. Thus, different raters were very likely to ask the participants different questions, leading to different scores and thus resulting in insufficient inter-rater reliability. A recent study proposed a more structured interview for the PSP (Wu et al., 2013), which might improve interrater reliability. Third, the variance of the PSP total score between the participants was small (SDtotal scores=8.3–9.3), as compared to that in a previous study (SDtotal scores=16.8–19.0) (Menezes et al., 2012). The value of the ICC is calculated by one minus the ratio of withinparticipants variance to total variance. The small between-participants variance in this study may have resulted in a small value of ICC. Future studies may employ a more structured interview method to administer the PSP to individuals with different levels (larger variation) of social functioning to further examine the inter-rater reliability. Our results showed that the MDC95 for the same raters was 10.7 points, while that for different raters was 16.2 points. These results indicate that changes in the PSP of greater than 11/17 points can indicate real improvement or deterioration with 95% certainty in intra/ inter successive assessments, not due to random measurement error or chance variability. Therefore, the MDC of the PSP is useful for data interpretation and clinical decision making for individual patients. As expected, the intra-rater MDC% of the PSP total scores (17.1%) was smaller than the inter-rater MDC% of the PSP total scores (24.1%), indicating that data collection with same raters has less random measurement errors than that with different raters. Particularly, the intra- and inter-rater MDC% were both below 30%, indicating that the PSP has acceptable random measurement error whether rated by the same or different raters. Thus, although using the same raters for administering the PSP is suggested, data collection with the same raters and with different raters rendered limited, acceptable amounts of random measurement error of the total score of the PSP. However, all of the MDC%s of the subdomains were nearly 50.0% or larger, whether rated by the same rater or different raters. These results indicate a substantial amount of random measurement error of the subdomains, implying that only the total scores, and not the subdomain scores, are appropriate for users. For users' reference, we present the values of MDCs at 80%, 90% and 95% confidence levels (Tables 2, 3) to further address the possible reliability between repeated assessments administered by the same and different raters. In reality, patients’ change scores may be lower than the MDC95 (whether intra-rater or inter-rater assessments). Thus, such values are particularly useful for clinicians to check how much confidence they should have in a patient's change in PSP scores after two consecutive assessments.

Acknowledgements We thank Dr. En-Chi Chiu, Miss Min-Lun Chang, and Miss WanHui Yu for their assistances on the data collection and preparation of the manuscript. This study was supported by a research grant from the Taipei City Government (No. 56) in 2015. References American Psychiatric Association, A.P., 2013. Diagnostic and Statistical Manual of Mental Disorders 5th ed.. American Psychiatric Publishing, Arlington, VA. Bartko, J.J., 1966. The intraclass correlation coefficient as a measure of reliability. Psychol. Rep. 19, 3–11. Bushnell, C.D., Johnston, D.C., Goldstein, L.B., 2001. Retrospective assessment of initial stroke severity comparison of the NIH stroke scale and the Canadian neurological scale. Stroke 32, 656–660. Cohen, J., 1988. Statistical Power for the Social Sciences. Laurence Erlbaum and Associates, Hillsdale, NJ. Fitzpatric, R., Davey, C., Buxton, M., Jones, D., 1998. Evaluating patient based outcome measures for use in clinical trial. Health Technol. Assess. 2, 1–74. Flansbjer, U.B., Holmbäck, A.M., Downham, D., Patten, C., Lexell, J., 2005. Reliability of gait performance tests in men and women with hemiparesis after stroke. J. Rehabil. Med. 37, 75–82.

728

Psychiatry Research 246 (2016) 725–729

S.-C. Lee et al.

patients with schizophrenia. Schizophr. Res. 140, 71–76. Nasrallah, H., Morosini, P., Gagnon, D.D., 2008. Reliability, validity and ability to detect change of the Personal and Social Performance scale in patients with stable schizophrenia. Psychiatry Res. 161, 213–224. Patrick, D.L., Burns, T., Morosini, P., Rothman, M., Gagnon, D.D., Wild, D., Adriaenssen, I., 2009. Reliability, validity and ability to detect change of the clinician-rated Personal and Social Performance scale in patients with acute symptoms of schizophrenia. Curr. Med. Res. Opin. 25, 325–338. Schaub, D., Brüne, M., Jaspen, E., Pajonk, F.G., Bierhoff, H.W., Juckel, G., 2011. The illness and everyday living: close interplay of psychopathological syndromes and psychosocial functioning in chronic schizophrenia. Eur. Arch. Psychiatry Clin. Neurosci. 261, 85–93. Schuck, P., Zwingmann, C., 2003. The ‘smallest real difference’as a measure of sensitivity to change: a critical analysis. Int. J. Rehabil. Res. 26, 85–91. Shrout, P.E., Fleiss, J.L., 1979. Intraclass correlations: uses in assessing rater reliability. Psychol. Bull. 86, 420. Steffen, T., Seney, M., 2008. Test-retest reliability and minimal detectable change on balance and ambulation tests, the 36-item short-form health survey, and the unified parkinson disease rating scale in people with parkinsonism. Phys. Ther. 88, 733–746. Tianmei, S., Liang, S., Yun'ai, S., Chenghua, T., Jun, Y., Jia, C., Xueni, L., Qi, L., Yantao, M., Weihua, Z., 2011. The Chinese version of the personal and social performance scale (PSP): validity and reliability. Psychiatry Res. 185, 275–279. Wu, B.J., Lin, C.H., Tseng, H.F., Liu, W.M., Chen, W.C., Huang, L.S., Sun, H.J., Chiang, S.K., Lee, S.M., 2013. Validation of the Taiwanese mandarin version of the personal and social performance scale in a sample of 655 stable schizophrenic patients. Schizophr. Res. 146, 34–39.

Fleiss, J., 1986. The Design and Analysis of Clinical Experiments. Wiley, Hoboken, NJ. Frost, M.H., Reeve, B.B., Liepa, A.M., Stauffer, J.W., Hays, R.D., 2007. What is sufficient evidence for the reliability and validity of patient‐reported outcome measures? Value Health 10, S94–S105. Goldsmith, C., Boers, M., Bombardier, C., Tugwell, P., 1993. Criteria for clinically important changes in outcomes: development, scoring and evaluation of rheumatoid arthritis patient and trial profiles. OMERACT Committee. J. Rheumatol. 20, 561–565. Haley, S.M., Fragala-Pinkham, M.A., 2006. Interpreting change scores of tests and measures used in physical therapy. Phys. Ther. 86, 735–743. Huang, S.L., Hsieh, C.L., Wu, R.M., Tai, C.H., Lin, C.H., Lu, W.S., 2011. Minimal detectable change of the timed “Up & Go” test and the dynamic gait index in people with parkinson disease. Phys. Ther. 91, 114–121. Kazis, L.E., Anderson, J.J., Meenan, R.F., 1989. Effect sizes for interpreting changes in health status. Med. Care 27, S178–S189. Menezes, A.K.P.M., Macedo, G., Mattos, P., Sá Júnior, A.Rd, Louzã, M.R., 2012. Personal and social performance (psp) scale for patients with schizophrenia: translation to Portuguese, cross-cultural adaptation and interrater reliability. J. Bras. Psiquiatr. 61, 176–180. Morosini, P., Magliano, L., Brambilla, L., Ugolini, S., Pioli, R., 2000. Development, reliability and acceptability of a new version of the DSM‐IV Social and Occupational Functioning Assessment Scale (SOFAS) to assess routine social funtioning. Acta Psychiatr. Scand. 101, 323–329. Mueser, K.T.E., Tarrier, N.E., 1998. Handbook of Social Functioning in Schizophrenia. Allyn & Bacon, Boston. Nafees, B., de Jonge, Pv.H., Stull, D., Pascoe, K., Price, M., Clarke, A., Turkington, D., 2012. Reliability and validity of the Personal and Social Performance scale in

729