Available online at www.sciencedirect.com
European Psychiatry 23 (2008) 575e579 http://france.elsevier.com/direct/EURPSY/
Original article
Axis V e Global Assessment of Functioning Scale (GAF), further evaluation of the self-report version A. Ramirez, L. Ekselius, M. Ramklint* Department of Neuroscience, Psychiatry UAS, University Hospital, ing 87, SE-751, 85 Uppsala, Sweden Received 15 February 2008; received in revised form 4 May 2008; accepted 7 May 2008 Available online 5 September 2008
Abstract Objective. e The study aimed to examine agreement between patients’ and professional staff members’ ratings on the Global Assessment of Functioning scale (GAF). Methods. e A total of 191 young adult psychiatric outpatients were included in a naturalistic, longitudinal study. Axis I and axis II disorders were assessed by means of the Structured Clinical Interview for DSM-IV. Before and after treatment, patients and trained staff members did a GAF rating. Agreement between GAF ratings was analyzed using the intra-class correlation coefficient (ICC). Results. e The overall intra-class correlation coefficients before and after treatment were 0.65 and 0.86, respectively. Agreement in different axis I diagnostic groups varied, but was generally lower before treatment as compared to after treatment (0.50e0.66 and 0.78e0.90, respectively). Excessive psychiatric co-morbidity was associated with the lowest inter-rater reliability. Agreement, with respect to change in GAF scores during treatment, was good to excellent in all groups. Conclusion. e Overall, agreement between patients’ and professionals’ ratings on the GAF scale was good before and excellent after treatment. The results support the usefulness of the self-report GAF instrument for measuring outcome in psychiatric care. However, more research is needed about the difficulties in rating severely disordered patients. Ó 2008 Elsevier Masson SAS. All rights reserved. Keywords: Axis V; Global assessment of functioning; GAF; Self-report instrument; Treatment outcomes
1. Introduction The Global Assessment of Functioning Scale (GAF) was introduced as a rating scale for axis V in DSM-III-R in 1987 [1]. A GAF rating is an overall judgment about the patient’s current level of functioning on a scale from 1 to 100 [2]. A single rating on the GAF scale integrates three different dimensions of functioning, psychological, social and occupational. GAF ratings have been found to be related more to diagnosis and symptoms than to social functioning [13]. Assessment of level of functioning is assumed to contribute information of importance for treatment planning, prediction of outcome and evaluating changes over time [2,14]. The scale is
* Corresponding author. Tel.: þ46 18 6115233; fax: þ46 18 51 58 10. E-mail address:
[email protected] (M. Ramklint). 0924-9338/$ - see front matter Ó 2008 Elsevier Masson SAS. All rights reserved. doi:10.1016/j.eurpsy.2008.05.001
recommended as a global severity measure in the assessment of outcome in routine mental health care [17,21]. Reliability of the GAF scale ratings in research settings is excellent [7], but it is clearly lower in clinical studies [9]; for a review see Goldman et al. [6]. Factors associated with inter-rater reliability of the GAF are the raters’ level of education, experience and training in using the GAF and their attitude towards the scale [8,11,20]. A self-report version of the GAF scale, utilized in a psychiatric outpatient population, was reported to have acceptable reliability when patients’ and experts’ ratings were compared [3]. Self-report instruments give patients the opportunity to express their own opinions. However, it has been claimed that self-rating procedures are more sensitive to treatment effects than observer rating-scales [10]. Reliability of self-rated GAF as a measure of outcome has not previously been studied.
576
A. Ramirez et al. / European Psychiatry 23 (2008) 575e579
The aim of the present study was to investigate agreement between patients’ and staff members’ ratings on the GAF both before and after treatment. This was done in a clinical setting. 2. Materials and methods This naturalistic, longitudinal study was conducted at Flogsta outpatient clinic at the Uppsala Department of Psychiatry between October 1, 2002, and September 30, 2004. All patients between 18 and 25 years of age were consecutively included within the first year (until September 30, 2003). During the year of study enrolment 217 patients came for an initial appointment and were then invited to participate in the present study. After receiving a detailed description, written informed consent was obtained from 200 (92.2%) participants and 191 completed the initial GAF assessments and thus comprise the present study sample. Out of the 191 included participants, 153 (80.1%) were women and 38 (19.9%) were men. Their mean age was 22.4 1.9 years. Non-participants (n ¼ 26), of whom 21 (80.8%) were women and 5 (19.2%) were men, had a mean age of 22.0 1.7 years. There were 53 patients that did not perform their final GAF assessment. 3. Procedure The overall design of the study comprised an initial diagnostic assessment of axis IeIV, according to DSM-IV [2], and assessment of axis V (GAF) both before and after appropriate treatment. The diagnostic procedure included three patient visits followed by a team conference at which axis I, IV and V (GAF) diagnoses were established. Patients were subsequently given appropriate treatment. After finalizing treatment, patients had a last visit during which the professional who met the patient made a second assessment of GAF. During the study period treatment was finalized for 139 participants. At the time of completion, 52 patients were still undergoing treatment when their second and final GAF was assessed. Finally, axis II diagnoses were assessed at a suitable time point when axis I symptoms were resolved. Each patient’s own GAF rating was performed using a self-report version [3] at the same time as the assessment by the professional was made. At all times both patients and professionals were blinded both to previous ratings and to the ratings of others. The study was approved by the ethics committee of the Uppsala University Hospital. 3.1. Assessments The self-report version of the GAF, previously described by Bodlund et al. [3], consists of the original scale but with fewer defining characteristics than in the original version. Psychiatric disorders and personality disorders were assessed using the Structured Clinical Interview for DSM-IV axis I disorders, clinical version (SCID-CV) [4], and the Structured Clinical Interview for DSM-IV personality disorders (SCID-II) [5]. The first or the last author performed all
SCID-interviews. Complete inter-rater reliability was obtained for eight randomly selected SCID-I interviews. Based on six randomly selected SCID-II interviews, the overall Kappa coefficient was 0.89 for categorical personality disorder diagnoses. All staff members were trained and well experienced in the GAF assessment technique. Prior to the start of the study, a high inter-rater reliability was achieved. Furthermore, during the whole study period, weekly co-ratings during the team conferences were performed in order to secure inter-rater reliability. 3.2. Statistics Axis I diagnoses were grouped into five groups: mood disorders, anxiety disorders, eating disorders, substance related disorders and other axis I disorders. The mood disorders group consists of major depressive disorder, dysthymic disorder, depressive disorder NOS, bipolar I disorder, bipolar II disorder and bipolar disorder NOS. The anxiety disorders group consists of panic disorder with and without agoraphobia, agoraphobia without panic disorder, specific phobia, social phobia, obsessiveecompulsive disorder, posttraumatic stress disorder, generalized anxiety disorder and anxiety disorder NOS. The eating disorder group comprises anorexia nervosa, bulimia nervosa and eating disorder NOS. Substance related disorders includes dependence and abuse of alcohol and other substances. Other axis I disorders includes primary insomnia, adjustment disorders, undifferentiated somatoform disorder, hypochondriasis and body dysmorphic disorder. Kappa statistics were used for inter-rater reliability concerning categorical axis I and II disorders. GAF ratings were analyzed for inter-rater reliability by using the intra-class correlation coefficient (ICC), two-way mixed effects model [18]. Intra-class correlation coefficients are considered excellent if greater than 0.74, good if ranging from 0.60 to 0.74, and fair if ranging from 0.40 to 0.59 [18]. Comparisons of mean differences in GAF scores between groups were done using Student’s t-test. 4. Results 4.1. Axis I and II disorders Table 1 shows the distribution of different diagnostic groups. Mood and anxiety disorders were the most prevalent disorders. Co-morbidity was common, with 129 (67.5%) individuals with disorders from more than one diagnostic group (Table 1). Personality disorders were diagnosed in 45 (23.6%) subjects (Table 1). The most frequent personality disorders were avoidant (n ¼ 23) and borderline (n ¼ 15). Seven individuals fulfilled criteria for two or more personality disorders. 4.2. Axis V/GAF At the first assessment overall mean GAF scores obtained by staff members and patients were 54.6 7.6 and 55.5 11.4, respectively. After treatment the corresponding figures were
A. Ramirez et al. / European Psychiatry 23 (2008) 575e579
577
Table 1 GAF scores rated before and after treatment in different diagnostic groups. Comparison of GAF rated by experts and by patients Diagnoses according to DSM-IV axis I N
GAF before treatment (n ¼ 191)
GAF after treatment (n ¼ 138)
Expert mean SD Self mean SD ICCs p
n
Expert mean SD Self mean SD ICCs p
Total group
191 54.6 7.6
55.5 11.4
0.65
<0.0001 138 71.0 11.6
72.9 14.4
0.86
<0.0001
Diagnostic groups Mood disorders Anxiety disorders Eating disorders Other axis I disorders Substance related disorders
139 128 55 22 17
53.6 7.0 52.4 6.2 53.2 7.3 55.4 8.1 51.1 6.8
53.6 10.9 52.9 9.7 52.9 14.4 56.8 12.6 54.9 13.7
0.65 0.50 0.66 0.54 0.57
<0.0001 <0.0001 <0.0001 0.04 n.s.
99 86 42 18
71.1 11.9 68.3 10.8 69.4 13.1 67.1 11.2
73.2 15.3 69.8 13.9 71.0 15.1 71.2 15.9
0.86 0.85 0.90 0.78
<0.0001 <0.0001 <0.0001 0.002
56 62 33 34
58.2 7.6 54.1 7.1 50.9 5.3 50.6 5.0
60.1 12.0 55.5 10.4 51.6 9.1 49.9 10.8
0.67 0.66 0.32 0.14
<0.0001 <0.0001 n.s. n.s.
45 46 26 16
74.3 10.3 69.2 11.8 69.5 13.8 66.3 9.0
78.3 12.0 70.8 14.8 70.0 15.5 67.9 15.1
0.82 0.88 0.94 0.66
<0.0001 <0.0001 <0.0001 0.02
146 56.1 7.6 45 49.8 5.6
56.9 11.0 50.8 11.4
0.67 0.36
<0.0001 110 73.4 10.8 n.s. 28 60.4 9.5
75.0 13.1 64.6 16.6
0.84 0.87
<0.0001 <0.0001
Co-morbidity One axis I disorder group Two co-morbid axis I disorder groups Three co-morbid axis I disorder groups Four co-morbid axis I disorder groups Personality disorders No axis II disorder/no PD Any axis II disorder
Participants (n ¼ 191) recruited from a total group of young (18e25 years) adult psychiatric outpatients.
71.0 11.6 and 72.9 14.4. The ICC coefficient between staff members’ and patients’ GAF ratings was 0.65 before treatment and 0.86 after treatment. GAF scores and the ICC in the different diagnostic groups are shown in Table 1. GAF ratings done by patients after treatment were missing in 53 cases. No differences, either in self-rated or staff memberrated initial GAF scores, were found between those with complete ratings as compared to those who did not complete the second rating (56.0 12.0 vs. 54.2 9.6; t ¼ 1.08; p ¼ 0.28 and 52.9 7.5 vs. 55.2 7.6; t ¼ 1.90; p ¼ 0.06, respectively). No significant differences were found between men and women, according to self-assessed GAF ratings, neither before (56.7 10.0 vs. 55.2 11.7) (t ¼ 0.82; p ¼ 0.41) nor after treatment (74.0 16.8 vs. 72.6 13.8) (t ¼ 0.38; p ¼ 0.70). In patients with excessive co-morbidity, GAF scores were low and agreement between staff members and patients before treatment was low (Table 1). However, agreement after
treatment increased and was at least good for those with two or three disorders (Table 1). The mean GAF scores before treatment in personality-disordered patients as rated by staff members and patients were 49.8 5.6 and 50.8 11.4, respectively. After treatment, the corresponding figures were 60.4 9.5 and 64.6 16.6. The ICCs for ratings before and after treatment were 0.36 and 0.87, respectively (Table 1). Changes in GAF ratings from initial to final assessments were calculated. There were 187 patients who had both assessments done by the experts, with a mean change of 14.7 10.4, and 137 patients had done both ratings themselves, with a mean change of 17.0 15.7. Agreement according to GAF change between patients and staff members in the total group was 0.77. ICCs for agreement between patients and staff members according to change in GAF scores within different diagnostic groups are presented in Table 2.
Table 2 Change in GAF ratings from initial to final ratings, comparisons between ratings done by experts and by patients Diagnoses according to DSM-IV
GAF change experts (n ¼ 138)
GAF change patients (n ¼ 138)
N
Mean SD
n
Mean SD
ICCs
p
138
15.8 10.9
138
16.9 15.6
0.77
<0.0001
Diagnostic groups Mood disorders Anxiety disorders Eating disorders Other axis I disorders
99 86 42 18
17.3 11.5 15.7 10.8 15.6 10.3 11.8 10.9
99 86 42 18
19.4 15.7 16.9 15.7 17.7 14.4 13.4 18.8
0.77 0.77 0.71 0.66
<0.0001 <0.0001 0.0001 0.02
Co-morbidity One axis I disorder group Two co-morbid axis I disorder groups Three co-morbid axis I disorder groups Four co-morbid axis I disorder groups
45 46 26 16
15.2 11.0 16.5 10.2 17.6 13.5 15.2 8.9
45 46 26 16
17.1 16.9 15.8 13.5 17.8 18.2 20.0 15.3
0.77 0.68 0.89 0.63
<0.0001 0.0001 <0.0001 0.03
110 28
16.9 11.1 11.2 8.9
110 28
17.7 15.4 13.9 16.3
0.77 0.75
<0.0001 <0.0001
Total group
Personality disorders No personality disorder Any personality disorder
Participants (n ¼ 138) recruited from a total group of young (18e25 years) adult psychiatric outpatients. Groups with less than 10 cases were excluded.
578
A. Ramirez et al. / European Psychiatry 23 (2008) 575e579
5. Discussion The present study examined the reliability of patients’ and trained staff members’ GAF ratings before and after treatment. Generally, the results demonstrate good agreement before and excellent agreement after treatment. However, in patients with personality disorders or in those with excessive co-morbidity only low agreement was achieved before treatment but improved to good or excellent after treatment. The inter-rater reliability of GAF ratings, performed by trained clinicians and experts, has been shown to be excellent [7,22]; whereas, agreement between routine clinical scores and scores done by researchers is low [22]. Previously, only one study has compared GAF ratings of clinicians and psychiatric outpatients [3]. High concordance before treatment was found in that study, which is in line with our findings. However, Pearson’s correlation coefficient was used to assess agreement, and not the ICC, which is a methodological shortcoming. Reduced psychosocial functioning, related to personality disorders and assessed as GAF, shows only small changes over time [19]; whereas, improvement of mood disorders is associated with substantial increase in GAF scores [15]. In our studied group, increase in GAF scores was highest in those affected by mood disorders and in those without any personality disorders. In the present study, only low agreement between patients and professionals was achieved in patients diagnosed with personality disorders and in those with excessive co-morbidity. From the Bodlund study [3], the lowest agreement between patients and professionals on GAF ratings was found in patients with mood disorders and in those with any cluster C personality disorder. However, in this studied sample, ICC for ratings in personality-disordered patients did improve substantially after treatment. This indicates that mood disorder severity lowered both ratings and agreement most. The GAF scale provides defining characteristics including several psychiatric symptoms, but the instructions are incomplete. This could also explain the greater difficulty in assessing morbidity than full or partial recovery. Several studies have shown that clinicians’ GAF ratings reflect improvements during treatment [16]. According to our results there was excellent agreement between experts’ and patients’ opinions on the degree of improvement measured as an increase in GAF scores (overall ICC ¼ 0.77). However, patients rated their mean increase in GAF slightly higher than experts in all diagnostic groups (Table 2). This difference is the opposite of the expected bias caused by those providing treatment [12]. The greatest strength of the present study is the careful and comprehensive assessment performed in a large sample of psychiatric outpatients. Furthermore, during the whole study period, staff members performed weekly GAF co-ratings in order to secure inter-rater reliability. However, it cannot be excluded that the different GAF rating procedures used at first and second assessment have affected the reliability of the professionals’ GAF ratings. Another possible limitation is that 52 patients were still under treatment at follow-up. Furthermore,
the young age of the studied group may limit the ability to generalize the results to older patients. There is a need for systematic follow-up of patients, and there is increasing pressure to use outcome measures in routine clinical practice. For obvious reasons it is also important to examine patients’ subjective opinions about how their condition and level of functioning have changed over time. Self-report and expert evaluations are assumed to be complementary, and there is no strong evidence that one method is more valid than the other. The findings of the present study support the usefulness of the GAF self-report version for measuring outcome in routine clinical care. However, since this is the first study evaluating reliability after treatment, more research is needed. 6. Conclusion Agreement between patients’ and professionals’ ratings on the GAF scale was good before and excellent after treatment. The results support the usefulness of the self-report GAF instrument for measuring outcome in psychiatric care. However, more research is needed concerning the difficulties in rating severely disordered patients. Acknowledgement This research was supported by the Ma¨rta and Nicke Nasvell Foundation, Ratiopharm AB; research grant, the Alcohol Research Council of the Swedish Alcohol Retailing Monopoly; 02/14:1, and the Fredrik and Ingrid Thuring Foundation. We want to thank Mr. Hans Arinell, B.Sc., for his contributions to the statistical analyses. References [1] APA. Diagnostic and statistical manual of mental disorders 3rd ed., rev. Washington DC, 1987. [2] APA. Diagnostic and statistical manual of mental disorders, DSM-IV. 4th ed. Washington DC: American Psychiatric Association; 1984. [3] Bodlund O, Kullgren G, Ekselius L, Lindstro¨m E, von Knorring L. Axis V-global assessment of functioning scale: evaluation of a self-report version. Acta Psychiatr Scand 1994;90:342e7. [4] First MB, Spitzer RL, Gibbon M, Williams JBW. Structured clinical interview for DSM-IV axis I disorders, clinical version (SCID-CV). Washington, D.C: American Psychiatric Press, Inc.; 1996. [5] First MB, Spitzer RL, Gibbon M, Williams JBW. Structured clinical interview for DSM-IV personality disorders (SCID-II). Washington, D.C: American Psychiatric Press, Inc.; 1997. [6] Goldman HH, Skodol AE, Lave TR. Revising axis V for DSM-IV: a review of measures of social functioning. Am J Psychiatry 1992;149: 1148e56. [7] Hilsenroth MJ, Ackerman SJ, Blagys MD, Baumann BD, Baity MR, Smith SR, et al. Reliability and validity of DSM-IV axis V. Am J Psychiatry 2000;157:1858e63. [8] Janca A. Reliability of DSM-IV axis V scales. Am J Psychiatry 2001; 158:1935e7. [9] Jones SH, Thornicroft G, Coffey M, Dunn G. A brief mental health outcome scale-reliability and validity of the global assessment of functioning (GAF). Br J Psychiatry 1995;166:654e9.
A. Ramirez et al. / European Psychiatry 23 (2008) 575e579 [10] Kellner R, Rada RT, Andersen T, Pathak D. The effects of chlordiazepoxide on self-rated depression, anxiety, and well-being. Psychopharmacology (Berl) 1979;64:185e91. [11] Loevdahl H, Friis S. Routine evaluation of mental health: reliable information or worthless ‘‘guesstimates’’? Acta Psychiatr Scand 1996; 93:125e8. [12] Moos RH, McCoy L, Moos BS. Global assessment of functioning (GAF) ratings: determinants and role as predictors of one-year treatment outcomes. J Clin Psychol 2000;56:449e61. [13] Moos RH, Nichol AC, Moos BS. Global assessment of functioning ratings and the allocation and outcomes of mental health services. Psychiatr Serv 2002;53:730e7. [14] Piersma HL, Boes JL. The GAF and psychiatric outcome: a descriptive report. Community Ment Health J 1997;33:35e41. [15] Pinto OC, Akiskal HS. Lamotrigine as a promising approach to borderline personality: an open case series without concurrent DSM-IV major mood disorder. J Affect Disord 1998;51:333e43. [16] Rund BR, Moe L, Sollien T, Fjell A, Borchgrevink T, Hallert M, et al. The psychosis project: outcome and cost-effectiveness of
[17] [18] [19]
[20]
[21]
[22]
579
a psychoeducational treatment programme for schizophrenic adolescents. Acta Psychiatr Scand 1994;89:211e8. Salvi G, Leese M, Slade M. Routine use of mental health outcome assessments: choosing the measure. Br J Psychiatry 2005;186:146e52. Shrout PE, Fleiss JL. Intraclass correlations: uses in assessing rater reliability. Psychol Bull 1979;86:420e8. Skodol AE, Pagano ME, Bender DS, Shea MT, Gunderson JG, Yen S, et al. Stability of functional impairment in patients with schizotypal, borderline, avoidant, or obsessiveecompulsive personality disorder over two years. Psychol Med 2005;35:443e51. Soderberg P, Tungstrom S, Armelius BA. Reliability of global assessment of functioning ratings made by clinical psychiatric staff. Psychiatr Serv 2005;56:434e8. Startup M, Jackson MC, Bendix S. The concurrent validity of the global assessment of functioning (GAF). Br J Clin Psychol 2002;41: 417e22. Vatnaland T, Vatnaland J, Friis S, Opjordsmoen S. Are GAF scores reliable in routine clinical use? Acta Psychiatr Scand 2007;115: 326e30.