Rasch validation of the PHQ-9 in people with visual impairment in South India

Rasch validation of the PHQ-9 in people with visual impairment in South India

Journal of Affective Disorders 167 (2014) 171–177 Contents lists available at ScienceDirect Journal of Affective Disorders journal homepage: www.els...

419KB Sizes 0 Downloads 3 Views

Journal of Affective Disorders 167 (2014) 171–177

Contents lists available at ScienceDirect

Journal of Affective Disorders journal homepage: www.elsevier.com/locate/jad

Research report

Rasch validation of the PHQ-9 in people with visual impairment in South India$ Vijaya K. Gothwal n, Deepak K. Bagga, Rebecca Sumalini Meera and L.B. Deshpande Centre for Sight Enhancement, Vision Rehabilitation Centres, L.V. Prasad Eye Institute, Kallam Anji Reddy Campus, L.V. Prasad Marg, Banjara Hills, Hyderabad 500034, Andhra Pradesh, India

art ic l e i nf o

a b s t r a c t

Article history: Received 31 December 2013 Received in revised form 9 June 2014 Accepted 10 June 2014 Available online 19 June 2014

Background: The Patient-Health Questionnaire (PHQ-9) is a widely used screening instrument for depression. Recently, its properties as a measure were investigated using Rasch analysis in an Australian population with visual impairment (VI) and it was demonstrated to possess excellent measurement properties, but the response scale required shortening (modified PHQ-9). However, further validation was recommended to substantiate its use with the growing population of VI. Therefore, we aimed to use Rasch analysis to evaluate the measurement properties of the modified PHQ-9 in an Indian population with VI. Methods: 303 patients with VI (mean age 40.2 years; 71% male) referred to Vision Rehabilitation Centres were administered the PHQ-9 by trained interviewer. Rasch analysis was used to investigate the psychometric properties of the modified PHQ-9. Results: Rasch analysis showed good fit to the model, no misfitting items and an acceptable person separation reliability (0.82). Dimensionality testing supported combining 9 items to create a total score. Targeting was sub-optimal (  1.30 logits); more difficult items are needed. One item (‘trouble falling asleep’) showed notable differential item functioning, DIF (1.18 logits) by duration of VI. Limitations: The generalisability of these results might be restricted to patients with VI presenting to a tertiary eye care centre. Conclusions: Except for DIF, the performance of the modified PHQ-9 is consistent with that of the original, albeit in a different cultural context (Indian population with VI). Clinicians/researchers can readily use the modified PHQ-9 without formal training in Rasch procedures given the provision of ready-to-use spreadsheets that convert raw to Rasch-scaled scores. However the conversions will apply only if the sample being tested is similar to that of the present study. & 2014 Elsevier B.V. All rights reserved.

Keywords: Depression Patient Health Questionnaire-9 Rasch analysis Visual impairment

1. Introduction Vision loss prevents people from engaging in a range of activities considered important to human life, thereby, affecting their mental health in the form of depression, stress, etc.(Casten and Rovner, 2008). The rates of depression have been reported to vary from 14.3% in persons with minimal vision loss to 25% in those with severe vision loss (Augustin et al., 2007). Depression aggravates disability by significantly reducing a person's ability to function in daily life (Rovner and Casten, 2008) as much as, for example, up to 8.3 times in older adults with age-related macular degeneration (Rovner and Casten, 2002). More importantly, if left undetected and untreated, depression is the single most costly ☆ This study was supported in part by the Hyderabad Eye Research Foundation, Hyderabad, India. n Corresponding author. Tel.: þ 91 40 30612835; fax: þ 91 40 23548271. E-mail address: [email protected] (V.K. Gothwal).

http://dx.doi.org/10.1016/j.jad.2014.06.019 0165-0327/& 2014 Elsevier B.V. All rights reserved.

disorder in terms of days lost to illness, impact on the family and employer, and risk of suicide (Murray and Lopez, 1996). Depression has been reported as an independent risk factor for cardiovascular disease and a significant predictor of mortality (Rovner, 1993). Consequently, it is imperative to find ways to detect, treat and prevent depression among vulnerable populations, such as the visually impaired. Such strategies assume further significance in the field of low vision rehabilitation (LVR) as the prevalence of depression is reportedly high among persons who seek LVR services (Horowitz et al., 2005; Rovner et al., 2007). Furthermore, the presence of depression can adversely affect the outcome of LVR services so it is recommended to manage depression in this group prior to commencing the services (Crews et al., 2006; Horowitz et al., 2005; Rovner et al., 2002, 1996). Depression often goes undetected in clinical routine (Thombs et al., 2007) and since depressive symptoms are associated with several risk factors, and are of potential clinical relevance, research has increasingly focused on depressive symptoms in the visually

172

V.K. Gothwal et al. / Journal of Affective Disorders 167 (2014) 171–177

impaired that do not fulfil the criteria for a diagnosis of major depression. Although mental health is typically evaluated by taking history and a detailed clinical assessment, patient selfreport instruments are increasingly being used to supplement clinical information and evaluate patient outcomes. Several established screening instruments for depression are available like the Beck Depression Inventory, Hospital Anxiety and Depression scale and the Centre for Epidemiologic Studies Depression scale (Beck et al., 1961; Radloff, 1977; Zigmond and Snaith, 1983). However, recently, the depression module of the Patient Health Questionnaire (PHQ-9) has increasingly been applied (Kroenke et al., 2001). Although developed in the late 1990s, the PHQ-9 has gained increasing use in both research and practice in the last decade (Kroenke et al., 2001, 2010; Lowe et al., 2004b; Wang et al., 2012). Nonetheless, it has been applied in non-clinical settings and population-based research as well (Keddie, 2011; Martin et al., 2006; Schomerus et al., 2009; Wang et al., 2012). It has been used by a few investigators in India as well (Ganguly et al., 2013; Poongothai et al., 2011, 2009a, 2009b). The PHQ-9 has appealed (over other instruments for depression) to clinicians and researchers alike given its brevity and that it can be used as a diagnostic algorithm for decisions regarding classification about a DSM-IV based probable diagnosis of major and minor depression or as a dimensional measure assessing severity of depressive symptoms by calculating a total score with scores ranging from 0 to 27 and cut-points of 5, 10, 15 and 20 representing mild, moderately severe and severe levels of depressive symptoms (Kroenke et al., 2001). Most of the established instruments for depression were originally developed on the basis of classical test theory (CTT) and many studies reported excellent reliability and validity of these instruments when relying upon CTT assumptions (Kroenke et al., 2010; Lowe et al., 2004a; Poongothai et al., 2009b; Thombs et al., 2007). As mentioned earlier, the PHQ-9 was developed as a screening tool, and such tools are often not designed to perform as measurement scales, rather to focus on a cut point, which determines, in this case, depression. However, in the last few years, it had been demonstrated that diagnostic instruments could benefit substantially from modern statistical approaches like models of item response theory (IRT), e.g., the Rasch model. The major advantage of the Rasch analysis is the use of a common interval scale to represent items and persons (Bond and Fox, 2001; Weimo, 1996; Wright and Masters, 1982). Rasch measurement takes ordinal level data such as the Likert scoring from response categories and converts it to interval-level data. The resulting equal-interval units of measurement are called logits (or log odds) and more importantly, any difference in logits which could be between any persons or any items anywhere on the measurement scale remains the same throughout the scale. This property assumes significance as it legitimises the use of robust parametric statistics and enables meaningful comparisons regarding the amount of person ability and item difficulty. The Rasch model also requires that all the items reflect varying amounts of the trait being measured in that the items have equal discrimination such that items discriminate well for persons who possess a lot of the underlying trait (for example, depression in depression scales) as they do for persons who possess little of the trait under measurement. The estimations of item difficulty and person ability are statistically independent from the items in the instrument and the samples of persons examined (Andrich, 1978; Bond and Fox, 2001). This attribute called ‘invariance’ is very important in interpreting test results. For example, a metric scale should retain its length calibrations regardless of whether it measures the height of a person, length of a table, etc. and for the same purposes the person abilities should retain the same level of ability regardless of which items are used. Rasch analysis allows a detailed investigation of many aspects of an instrument including the response format, fit

of individual items, dimensionality, targeting and the detection of differential item functioning (see statistical analysis for details). Applying IRT techniques, a slightly more differentiated picture of the psychometric properties of the established screening instruments for depression emerged. For example, by using IRT modelling it was shown that unidimensionality – an important aspect of test theory cannot be assumed for some instruments (de Bonis et al., 1991; Licht et al., 2005). Furthermore, it was shown that instruments containing items related to somatic symptoms could lead to severe problems when assessing patients with comorbid somatic diseases. If patients suffering from a severe somatic illness reported somatic symptoms in a depression questionnaire those symptoms may be ascribed to the somatic ailment or a depressive episode (Alexopoulos et al., 2002; Siegert et al., 2010). This may lead to artificially increased depression scores. Moreover, using IRT methods it was shown that established instruments could be shortened without loss of information (Tang et al., 2005). Generally, in many studies applying IRT techniques, sound psychometric characteristics of depression screening instrument could only be found if at least some items were removed from the scale. The question, which items had to be removed largely depended on the sample investigated (Cole et al., 2004; Forjaz et al., 2009; Kendel et al., 2010; Siegert et al., 2010). However, sample dependent psychometric characteristics of screening instruments may aggravate the comparison of results across different samples or studies. Recently, the PHQ-9 has been re-validated using Rasch analysis in different samples (Forkmann et al., 2013; Kendel et al., 2010; Lamoureux et al., 2009). Kendel et al. (2010) reported the results of Rasch analysis of the PHQ-9 in patients undergoing coronary artery bypass graft surgery that showed insufficient adherence to Rasch model requirements, i.e., insufficient model fit of six items and age-related DIF for these items. Lamoureux et al. (2009) revisited the PHQ-9 in an Australian visually impaired population and reported the instrument to fit the Rasch model, albeit with a shortened response scale from four to three categories. However, the authors suggested the need for further research using the modified response scale prior to recommending it for use in clinical situations. Despite encouraging psychometric data with the PHQ-9 in an Australian visually impaired adult population, the revised instrument (modified PHQ-9) remains untested in a similar Indian sample. Taken together, these studies indicate that Rasch analysis may have the potential to shed further light on the psychometric properties of the modified PHQ-9 beyond the results reported in studies relying on CTT assumptions. The primary aim of the current study was therefore to Rasch analysis to investigate the measurement properties of the modified PHQ-9 in a visually impaired Indian population. Given that the earlier Rasch analyses of the PHQ-9 did not provide ready-touse conversion sheets from raw to Rasch scores, we aimed to provide such sheets. We envisaged that the provision of such sheets would encourage the use of the Rasch validated modified PHQ-9 among researchers who are unfamiliar with Rasch analysis.

2. Methods 2.1. Sample Participants were adults with low vision referred to the Vision Rehabilitation Centres, L V Prasad Eye Institute (LVPEI), India for management of low vision. A single interviewer administered the PHQ-9 in a face-to-face interview of each participant after explaining briefly the purpose and nature of the study. Included participants were aged 18 years or older, had low vision (best-corrected visual acuity in better eye o6/18 or Z6/18 with restricted visual

V.K. Gothwal et al. / Journal of Affective Disorders 167 (2014) 171–177

Table 1 Participant characteristics of those who responded to the Patient Health Questionnaire-9. Characteristic

Result

Age (mean7 SD), years Gender (n, %), male Duration of visual impairment, years (mean 7 SD) Location (n, %), urban Education, n (%) None/Primary school Secondary school High school/Graduate Postgraduate Systemic co-morbidity, (n, %), present Principal cause of vision loss, n (%) Retinal disordersa Glaucoma Optic atrophy Amblyopia Other Unknown Visual impairment, n (%) Mildb ( Z 20/60) Moderate ( o20/60–20/200) Severe ( o 20/200)

40.2 7 17.4 216 (71) 12.4 7 12.2 202 (66.7) 69 97 87 49 80

(22.8) (32.0) (28.7) (16.2) (26.4)

140 (46.2) 51 (16.8) 19 (6.3) 10 (3.3) 81 (26.7) 2 (0.7) 56 (18.5) 134 (44.2) 113 (37.3)

a Includes Retinitis pigmentosa, diabetic retinopathy, uveal coloboma, agerelated macular degeneration and albinism. b These patients had restricted visual fields (less than 20 degrees in better eye).

fields) and could respond to the instrument. Participants with additional disability (physical, hearing, etc.) were excluded as also those who were totally blind in both eyes (bilateral absence of light perception). Institutional review board of the LVPEI provided approval for the conduct of the study and ethical approval was obtained from the Ethics committee for Human Research at LVPEI. All the participants provided informed consent. The study was conducted in accordance with the tenets of the Declaration of Helsinki. A total of 303 participants were included and their mean age was 40.2 years (SD, 17.4). The majority (71%) was men. The sociodemographic data was extracted from the clinical records. Table 1 summarises participant characteristics. 2.2. Patient Health Questionnaire-9 The PHQ-9 is a nine item depression module derived from the primary care evaluation of mental disorders (PRIME-MD, Pfizer Inc., New York, NY) tool (Kroenke et al., 2001). The PRIME-MD is an instrument to help primary care clinicians make criteria-based diagnoses of five types of DSM-IV disorders commonly encountered in medical patients: mood, anxiety, somatoform, alcohol, and eating disorders (Spitzer et al., 1994). The PHQ-9 helps screen for depression by relying on DSM-IV criteria for the diagnosis of depression and consists of 9 items which include: (1) lack of interest, (2) depressed mood, (3) sleeping difficulties, (4) tiredness, (5) appetite problems, (6) negative feelings about self, (7) concentration problems, (8) psychomotor agitation/retardation, and (9) suicidal ideation. Four of the 9 items (Nos. 3, 4, 5, and 8) are related to a somatic factor. Although the PHQ-9 consists of a four category response scale, as mentioned in Section 1, Lamoureux et al. (2009) demonstrated that a three category scale (i.e. combining the original categories 1 [several days] and 2 [more than half the days]) fits the Rasch model better in visually impaired patients. Consequently, in the present study, participants were asked to rate the extent to which the symptoms were present during the last 2 weeks using a three category scale: ‘not at all’, ‘some of the days’, and ‘nearly every day’. The total score ranges from 0 to 18 with higher scores representing greater

173

amounts of depression. Using standard procedures, local language versions were obtained using forward–backward translations of the PHQ-9 and these were used in the present study.

2.3. Statistical analysis Rasch analysis was performed using the Andrich rating scale model for polytomous data (Andrich, 1978) in the WINSTEPS programme (Linacre, 2008) (version 3.68). Rasch models are a variant of IRT which model a relationship between the level of a latent trait (for the PHQ-9 it is severity of symptoms) and the items used for measurement. Procedures of Rasch analysis procedures have been provided in details in earlier papers (Massof, 2002; Pesudovs et al., 2007). A fundamental criterion underlying the Rasch models is unidimensionality and this was investigated using goodness of fit statistics (infit and outfit). The infit or information-weighted mean square (MnSq) statistics is sensitive to unexpected behaviour of the patient's responses on items near the patient's measure level, and outfit statistics are sensitive to unexpected behaviour far from the person's measure level. In the present study, we used infit statistics as measure of item fit and used an acceptable range of infit MnSq between 0.7 and 1.3; items with values outside this range were considered as misfits. Large misfit (values 41.3) indicates that the observed values of items deviate from the model expected values and misfitting items should be removed and deletion of misfitting items (in order of misfit value magnitude) occurred one at a time until the best item structure was obtained in which all items fit. Unidimensionality was also examined with principal components analysis of residuals (PCA). We used the criteria that the variance explained by Rasch measure should be comparable for both empirical calculation and that explained by the model (Bond and Fox, 2001), and the eigenvalue of unexplained variance explained by the first contrast should be smaller than 2.0 (Linacre, 2005). An important parameter that should be assessed prior to estimating the item difficulty and person ability parameters is the category structure. Consistency of item responses with the underlying construct is indicated by the ordered set of response thresholds (at which a participant has a 50% chance of choosing one category over another). Measurement precision was assessed using person separation reliability (PSR) with minimum acceptable PSR of 0.80 and this indicates the ability of the instrument to distinguish among at least three strata of depression in the participants. The suitability of item endorsement for the visually impaired population was examined by inspection of target (i.e., the extent to which the set of items is of appropriate endorsability for the level of participant's depression). In a well-targeted measure (i.e. with a balance of easy to endorse and difficult to endorse items) the mean location of the items and persons will be in close proximity to each other (optimal targeting o 0.5 logits, good targeting 0.5–1.0 logits). Mistargeting indicates that either the items are too easy or difficult to endorse for the level of depression of the participants. If the data fit the Rasch model, Rasch analysis allows detection of differences in item difficulties between different groups within a sample. This is referred to as the differential item functioning (DIF). We selected the DIF variables a priori in the present study. DIF was investigated for age groups (split at median age; o 38 years and Z38 years), gender, duration of vision impairment, VI (spilt at median; o120 months and Z 120 months), and systemic and ocular co-morbidity (present or absent). DIF was considered to be absent if it was less than 0.50 logits, and minimal (but probably inconsequential) if between 0.50 and 1.0 logits and notable if 41.0 logits (Wright and Douglas, 1975, 1976). Descriptive statistics were

174

V.K. Gothwal et al. / Journal of Affective Disorders 167 (2014) 171–177

analysed using IBM SPSS software (version 19.0.0; SPSS Inc., Chicago, IL).

3. Results 3.1. Analysis of response categories Response categories were ordered. This indicated that the participants utilised the categories as intended. 3.2. Overall performance of the modified PHQ-9 Person separation reliability was borderline in that it was just at the acceptable limit indicating that the PHQ-9 could reliably distinguish between at least three strata of participant's depression levels (mild, moderate and severe) (Table 2). All items had infit MnSq values within the suggested limits. PCA analysis of the residuals identified that the variance explained by the measure for the empirical calculation (52.4%) was very similar to the model (52.8 %) and the unexplained variance explained by the first contrast was 1.8 eigenvalue units. Taken together these results confirm the unidimensionality of the modified PHQ-9. Targeting was  1.30 logits indicating the levels of depression among the participants were much lower than that could be captured by the items in the modified PHQ-9 (Fig. 1). The three least difficult to endorse items (i.e., participants chose lower response categories) were ‘poor appetite’, ‘thoughts that you would be better off dead’ and ‘trouble falling or staying asleep’. By comparison, the three most difficult to endorse items (i.e., participants chose higher response categories) were ‘trouble concentrating on things such as reading the newspaper’, ‘feeling down, depressed, hopeless’, and ‘feeling bad about yourself’. 3.3. Differential item functioning A single item (item 3—‘trouble falling or staying asleep, or sleeping too much’) displayed DIF by duration of VI (Table 2). Participants with longer duration ( Z120 months) rated this item as more difficult to endorse (by 1.18 logits) than those with shorter duration regardless of their actual level of depressive severity. However there were significant differences in the certain demographic characteristics between these two subgroups of participants. Those with longer duration of VI were significantly younger (36.647 16.33 years vs 44.23 717.87, po 0.0001) and had worse logMAR visual acuity in the better eye (0.94 70.41 vs 0.78 70.43, p ¼0.004), when compared with those with relatively shorter duration of VI. 3.4. Conversion of raw scores to Rasch-scaled scores Since populations under study vary, it is always best to implement Rasch measurement properties by actually performing the Rasch Table 2 Fit parameters of the modified Patient Health Questionnaire-9.

Number of items Number of misfitting items Person separation reliability Mean item location Mean person location Principal components analysis (eigenvalue for first contrast) Differential item functioning (notable)

Rasch model

Patient Health Questionnaire-9

 0 Z 0.80 0 0 o 2.0

9 0 0.82 0  1.30 1.8

41.0

1

analysis. However, other investigators may wish to use the modified PHQ-9 and also gain the interval scoring benefits of Rasch analysis, without performing Rasch analysis themselves. Therefore, we have provided a series of Excel (Microsoft, Redmond WA, USA) spreadsheets which convert raw (ordinal) PHQ-9 scores to Rasch measurement estimates. These spreadsheets can be downloaded directly from the journal's website or obtained by contacting the corresponding author. Given that the scoring schema used to develop the readyto-use-spreadsheet is based on our study population, we caution that these conversions can be applied only if the population upon which it is being tested is similar to that of the present study.

4. Discussion Results of Rasch analysis of the modified PHQ-9 in our visually impaired adult population revealed that the instrument has acceptable psychometric properties: unidimensionality, good reliability, satisfactory targeting, and a well-functioning rating scale. Given the unidimensionality of the modified PHQ-9, interval levels scores can be obtained from the items for this sample. To our knowledge, this is the first use of the modified PHQ-9 in a visually impaired adult population making it difficult to compare our findings. Nonetheless, Lamoureux et al. (2009) reported similar good performance of the PHQ-9, albeit original, in their relatively small sample (n¼103) of visually impaired Australian patients. As mentioned in our Section 1, although PHQ-9 has been validated in Indian population using CTT (Poongothai et al., 2009b), ours is the first report of the application of Rasch analysis to the PHQ-9 in this population. The performance of the modified PHQ-9 is not without its drawbacks in our population. Firstly, there was some evidence of DIF by duration of VI, albeit for a single item—‘trouble falling or staying asleep’, which could perhaps indicate a lack of unidimensionality. However, item fit statistics and PCA of residuals did not suggest that the modified PHQ-9 breached unidimensionality. Once an item has been flagged for DIF, it is important to ascertain the source of the DIF. In the present study, we investigated for uniform DIF (it occurs when there is a consistent difference in item performance between two groups of participants matched in the ability measured by the instrument) and this may have occurred because participants with longer duration of VI were significantly younger (most of them had VI since early childhood or from birth) and had worse visual acuity in their better eye than those with relatively shorter duration of VI. Given the younger age with long standing VI, they may have had greater concerns regarding their visual condition as compared to those with shorter duration of VI, and were perhaps constantly thinking of it at the subconscious level due to which they may have had trouble with sleeping, resulting in DIF for item 3. By comparison, Lamoureux et al. (2009) did not find any DIF in their population using the original PHQ-9. As the authors speculate, the lack of DIF, however, in their study may be attributed to a small sample size. Given that too large sample sizes may result in ‘artificial misfit’ or DIF detection (Linacre, 1994), it is important that our results are replicated in an independent sample before any definitive conclusions regarding the presence or absence of DIF in visually impaired can be drawn. If duration of VI related DIF can be confirmed in crossvalidation, the determination of duration of VI-specific norm values for the PHQ-9 would be required. There is not one universally agreed-on correction for DIF. One option is to drop the items that display DIF, but this is usually more applicable for tests in the development phase. In educational research, one or more DIF items from a large item pool if eliminated generally does not affect overall measurement precision given that they can be replaced with equally efficient non-DIF items (Orlando and Marshall, 2002; Raju et al., 1995). By comparison, in health

V.K. Gothwal et al. / Journal of Affective Disorders 167 (2014) 171–177

175

Fig. 1. Person-item map for 9-item PHQ. Participants are located on the left of the dashed line and participants with higher symptomatology are located at the top of the map. Items are on the right of the dashed line and symptoms having lower impact are located towards the top of the map. Each ‘x’ and ‘.’ represent three and one participant respectively. M, mean; S, 1 SD from the mean; T, 2 SD from the mean.

outcomes research, such a luxury is often unavailable as items are often not as expendable. More importantly, eliminating items could jeopardise the content validity of the instrument and this problem is particularly salient for short instruments such as the 9-item modified PHQ-9 (person separation reliability for the instrument was 0.78 after deletion of item 3 in the present study). A second recommended option to ameliorate DIF is rewording of the item rather than eliminating the item itself (Orlando and Marshall, 2002). Going forward, it appears that rewording the item (No. 3) causing DIF in the modified PHQ-9 may help eliminate DIF and also get rid of the double barrelled nature of the item in future studies. Double barrelled items such as item 3 result in confusion for the participants while responding as they aim to combine several items into one. For example, in the case of item 3 – ‘trouble falling asleep’ is a different entity as compared to ‘staying asleep’ and ‘sleeping too much’. Despite these conflicting issues all of them have been combined into a single item. Therefore, we would

suggest retaining ‘trouble falling asleep’ instead of the original item 3 and this modification needs to be tested in future studies using the modified PHQ-9 in patients with VI. Secondly, targeting was suboptimal, in that, the level of depression in our visually impaired population was beyond that being tapped by the items on the PHQ-9. That is, the items were too easy to endorse by our sample. As can be seen in Fig. 1, the spread of participants ranges from  5.0 to þ4.0 logits (9.0 logits) as compared to smaller spread of items that ranges from 1.84 to þ1.36 logits (3.20 logits). It appears from our study that the depression in visually impaired Indian population might be characterised by symptoms of disturbed sleep, poor appetite or overeating, and suicidal ideation. Note that this explanation can only be suspected on the basis of our data, and future investigations are needed to shed light on these issues. Although Lamoureux et al. (2009) used the original PHQ-9 (vs the modified PHQ-9 in the present study), the targeting of item endorsability to

176

V.K. Gothwal et al. / Journal of Affective Disorders 167 (2014) 171–177

patient's level of depression was better in our visually impaired patients (  1.30 logits) as compared to the Australian cohort (  2.21 logits) suggesting that the items were perhaps more suited to assess the depression levels in our population than the latter. This difference in targeting is perhaps not surprising given that targeting is sample dependent (Gothwal et al., 2009). Taken together, the results of the present study using the PHQ-9 and that by Lamoureux et al. suggest that the information about visually impaired patients that can be extracted from answering the PHQ-9 items is thus very limited. Poor targeting may lead to unstable item calibrations or artificial misfit (both of which did not occur in the present study however perhaps because of large sample size), so it should be kept in mind when appreciating the current results. Furthermore, non-linearity (i.e. ceiling and floor) effects pose a problem in instruments as this property may result in underestimation of the clinical improvements for the participants with mild levels of depression. Results from the present study indicate that modified PHQ-9 could be refined further by enriching lower extreme of the scale with more difficult to endorse items. However, such a strategy falls within the purview of the developers of the PHQ-9. While new items can be generated and added to legacy instruments such as the PHQ-9, this approach requires re-validation in a new population each time modifications are made. While possible, a comparatively superior strategy would be formation of item banks and use of computer adaptive testing (CAT) for its implementation (Cook et al., 2005; Fayers, 2007; Hays and Lipscomb, 2007). Item banks that contain Rasch calibrated items pooled from different instruments that assess depression which can be administered to participants by a computerised algorithm that targets the ability of the participant according to his or her response have been developed and used (Forkmann et al., 2009; Pilkonis et al., 2011). Such a strategy would help eliminate the limitation of poor targeting. Furthermore, a relatively smaller number of items would be required to specifically target a given participant with the resultant effect of reduced participant burden. The availability of interval-level scores for PHQ-9 in those with VI will permit the use of parametric statistics, for example, in studies where the aim is to assess change in the level of depression or its severity over time or difference between groups (Merbitz et al., 1989; Wright and Linacre, 1989). However, in order to be fully useful in clinical practice and research the score needs to be transferable between populations. There are two main ways in which this could be carried out: the repeated use of the Rasch model, a conversion table or ready to use spreadsheets that convert raw to Rasch-scaled scores. If the Rasch model were to be used in every dataset, a slightly different score range would result on each occasion, but this would allow people to gain a score even if they did not complete all the items. This option also requires that the clinician or researcher has access to Rasch analysis software. The alternative option would be to use a spreadsheet whereby the clinician or researcher can enter the raw scores and will be able to obtain the Rasch scaled scores simultaneously. This offers the advantage over the former in that the user does not need to be familiar with Rasch analysis or possess the special software.

results are attributable to a hospital-based visually impaired population but not to the broader clinical population of individuals with VI. This may also have some implications for Rasch modelling in terms of targeting (poor in our case) whereby there were many participants whose level of depression was much beyond that could be captured by the most difficult item (‘trouble concentrating in things’). However, the sample size is a particular strength of this study. Moreover, the Rasch analysis enables sample-free test calibration indicating that the analysis of the instrument is not bound by the characteristics of the sample (Wright and Stone, 1979). Secondly, the study did not evaluate associations of depression items with visual functioning (VF) or vision-related quality of life (VRQoL), and clinical measures of vision. It is plausible that there may be group differences by level of VF, VRQoL, visual acuity, etc. However the possibility of such a finding appears remote given that Lamoureux et al. (2009) failed to find any such association. Lastly, our analysis of the PHQ-9 was limited to the interpretation of uniform DIF; however, DIF may also be nonuniform (Crane et al., 2006); for example, the amount of DIF could vary according to the level of the PHQ-9 score. However the interpretation would be less straightforward, even if it were possible to extend the methodology to evaluate the practical impact of nonuniform DIF. More importantly, there is also a lack of published guidelines as to what constitutes clinically important nonuniform DIF. Nonetheless, future studies using the PHQ-9 should attempt to explore the presence and/or impact of nonuniform DIF in their populations.

6. Conclusions As infectious diseases especially in developing countries (such as India) are progressively controlled, depression is predicted to become a major health burden worldwide. Given this screening and treatment of depression must be seen as a priority medical challenge for the 21st century. The modified PHQ-9 is a brief instrument to assess depression that can be administered relatively easily in an interview format by trained staff to visually impaired patients during outpatient low vision rehabilitation consultations. More importantly, the PHQ-9 is unidimensional ‘in practice’; however, clinicians and researchers should be mindful of the DIF due to duration of VI. The instrument fits the Rasch measurement model well and so an interval-level score can be produced. Further work is needed to determine this fit in a more general visually impaired population. A spreadsheet has also been provided which clinicians and researchers can utilize to rescore their PHQ-9 measurements into an interval-level score to aid the psychometric utility of this important depression assessment instrument. However it should be borne in mind that these conversions can be applied only if the population upon which it is being tested is similar to that of the present study.

Role of funding source The funding source had no role in any of the study procedures, data collection or preparation of the manuscript.

5. Limitations There are some possible limitations of the study. Firstly, all patients were recruited at a single tertiary eye care centre, specifically vision rehabilitation centres. Given that it is the only provider of comprehensive low vision rehabilitation services in the region, it is possible that patients with severe visual impairment may have been referred to this centre, and this sample may represent a severely (vision) impaired population. Therefore, our

Conflict of interest The authors have no personal financial interest in the development, production, or sale of any device discussed herein. None of the authors have any conflict of interest.

Acknowledgements The authors would like to thank the participants for volunteering their time towards the study.

V.K. Gothwal et al. / Journal of Affective Disorders 167 (2014) 171–177

References Alexopoulos, G.S., Borson, S., Cuthbert, B.N., Devanand, D.P., Mulsant, B.H., Olin, J.T., Oslin, D.W., 2002. Assessment of late life depression. Biol. Psychiatry 52, 164–174. Andrich, D.A., 1978. A rating scale formulation for ordered response categories. Psychometrika 43, 561–573. Augustin, A., Sahel, J.A., Bandello, F., Dardennes, R., Maurel, F., Negrini, C., Hieke, K., Berdeaux, G., 2007. Anxiety and depression prevalence rates in age-related macular degeneration. Investig. Ophthalmol. Vis. Sci. 48, 1498–1503. Beck, A.T., Ward, C.H., Mendelson, M., Mock, J., Erbaugh, J., 1961. An inventory for measuring depression. Arch. Gen. Psychiatry 4, 561–571. Bond, T.G., Fox, C.M., 2001. Applying the Rasch model: Fundamental Measurement in the Human Sciences. Lawrence Erlbaum Associates, London. Casten, R., Rovner, B., 2008. Depression in age-related macular degeneration. J. Vis. Impair. Blind. 102, 591–599. Cole, J.C., Rabin, A.S., Smith, T.L., Kaufman, A.S., 2004. Development and validation of a Rasch-derived CES-D short form. Psychol. Assess. 16, 360–372. Cook, K.F., O’Malley, K.J., Roddey, T.S., 2005. Dynamic assessment of health outcomes: time to let the CAT out of the bag? Health Serv. Res. 40, 1694–1711. Crane, P.K., Gibbons, L.E., Jolley, L., van Belle, G., 2006. Differential item functioning analysis with ordinal logistic regression techniques. DIFdetect and difwithpar. Med. Care 44, S115–S123. Crews, J., Jones, G., Kim, J., 2006. Double jeopardy: the effects of comorbidity conditions among older people with vision loss. J. Vis. Impair. Blind. 100, 824–848. de Bonis, M., Lebeaux, M.O., de Boeck, P., Simon, M., Pichot, P., 1991. Measuring the severity of depression through a self-report inventory. A comparison of logistic, factorial and implicit models. J. Affect. Disord. 22, 55–64. Fayers, P.M., 2007. Applying item response theory and computer adaptive testing: the challenges for health outcomes assessment. Qual. Life Res. 16 (Suppl 1), S187–S194. Forjaz, M.J., Rodriguez-Blazquez, C., Martinez-Martin, P., 2009. Rasch analysis of the hospital anxiety and depression scale in Parkinson's disease. Mov. Disord. 24, 526–532. Forkmann, T., Boecker, M., Norra, C., Eberle, N., Kircher, T., Schauerte, P., Mischke, K., Westhofen, M., Gauggel, S., Wirtz, M., 2009. Development of an item bank for the assessment of depression in persons with mental illnesses and physical diseases using Rasch analysis. Rehabil. Psychol. 54, 186–197. Forkmann, T., Gauggel, S., Spangenberg, L., Brahler, E., Glaesmer, H., 2013. Dimensional assessment of depressive severity in the elderly general population: psychometric evaluation of the PHQ-9 using Rasch analysis. J. Affect. Disord. 148, 323–330. Ganguly, S., Samanta, M., Roy, P., Chatterjee, S., Kaplan, D.W., Basu, B., 2013. Patient Health Questionnaire-9 as an effective tool for screening of depression among Indian adolescents. J. Adolesc. Health 52, 546–551. Gothwal, V.K., Wright, T.A., Lamoureux, E.L., Lundstrom, M., Pesudovs, K., 2009. Catquest questionnaire: re-validation in an Australian cataract population. Clin. Exp. Ophthalmol. 37, 785–794. Hays, R.D., Lipscomb, J., 2007. Next steps for use of item response theory in the assessment of health outcomes. Qual. Life Res. 16 (Suppl 1), S195–S199. Horowitz, A., Reinhardt, J.P., Boerner, K., 2005. The effect of rehabilitation on depression among visually disabled older adults. Aging Ment. Health 9, 563–570. Keddie, A.M., 2011. Associations between severe obesity and depression: results from the National Health and Nutrition Examination Survey, 2005–2006. Prev. Chronic Dis. 8, A57. Kendel, F., Wirtz, M., Dunkel, A., Lehmkuhl, E., Hetzer, R., Regitz-Zagrosek, V., 2010. Screening for depression: Rasch analysis of the dimensional structure of the PHQ-9 and the HADS-D. J. Affect. Disord. 122, 241–246. Kroenke, K., Spitzer, R.L., Williams, J.B., 2001. The PHQ-9: validity of a brief depression severity measure. J. Gen. Intern. Med. 16, 606–613. Kroenke, K., Spitzer, R.L., Williams, J.B., Lowe, B., 2010. The Patient Health Questionnaire somatic, anxiety, and depressive symptom scales: a systematic review. Gen. Hosp. Psychiatry 32, 345–359. Lamoureux, E.L., Tee, H.W., Pesudovs, K., Pallant, J.F., Keeffe, J.E., Rees, G., 2009. Can clinicians use the PHQ-9 to assess depression in people with vision loss? Optom. Vis. Sci. 86, 139–145. Licht, R.W., Qvitzau, S., Allerup, P., Bech, P., 2005. Validation of the Bech-Rafaelsen Melancholia Scale and the Hamilton Depression Scale in patients with major depression; is the total score a valid measure of illness severity? Acta Psychiatr. Scand. 111, 144–149. Linacre, J.M., 1994. Sample size and item calibrations stabilityRasch Meas. Trans. 7, 328. Linacre, J.M., 2005. A User's Guide to WINSTEPS. Winsteps.com, Chicago. Linacre, J.M., 2008. WINSTEPS Rasch Measurement Computer Program. Winsteps. com, Chicago. Lowe, B., Spitzer, R.L., Grafe, K., Kroenke, K., Quenter, A., Zipfel, S., Buchholz, C., Witte, S., Herzog, W., 2004a. Comparative validity of three screening questionnaires for DSM-IV depressive disorders and physicians' diagnoses. J. Affect. Disord. 78, 131–140. Lowe, B., Unutzer, J., Callahan, C.M., Perkins, A.J., Kroenke, K., 2004b. Monitoring depression treatment outcomes with the Patient Health Questionnaire-9. Med. Care 42, 1194–1201.

177

Martin, A., Rief, W., Klaiberg, A., Braehler, E., 2006. Validity of the Brief Patient Health Questionnaire Mood Scale (PHQ-9) in the general population. Gen. Hosp. Psychiatry 28, 71–77. Massof, R.W., 2002. The measurement of vision disability. Optom. Vis. Sci. 79, 516–552. Merbitz, C., Morris, J., Grip, J.C., 1989. Ordinal scales and foundations of misinference. Arch. Phys. Med. Rehabil. 70, 308–312. Murray, C.J.L., Lopez, A.D., 1996. The Global Burden of Disease. A Comprehensive Assessment of Mortality and Disability from Diseases, Injuries, and Risk Factors in 1990 and Projected to 2020. Harvard School of Public Health, Cambridge, MA. Orlando, M., Marshall, G.N., 2002. Differential item functioning in a Spanish translation of the PTSD checklist: detection and evaluation of impact. Psychol. Assess. 14, 50–59. Pesudovs, K., Burr, J.M., Harley, C., Elliott, D.B., 2007. The development, assessment, and selection of questionnaires. Optom. Vis. Sci. 84, 663–674. Pilkonis, P.A., Choi, S.W., Reise, S.P., Stover, A.M., Riley, W.T., Cella, D., 2011. Item banks for measuring emotional distress from the Patient-Reported Outcomes Measurement Information System (PROMIS(R)): depression, anxiety, and anger. Assessment 18, 263–283. Poongothai, S., Anjana, R.M., Pradeepa, R., Ganesan, A., Unnikrishnan, R., Rema, M., Mohan, V., 2011. Association of depression with complications of type 2 diabetes—the Chennai Urban Rural Epidemiology Study (CURES-102). J. Assoc. Phys. India 59, 644–648. Poongothai, S., Pradeepa, R., Ganesan, A., Mohan, V., 2009a. Prevalence of depression in a large urban South Indian population—the Chennai Urban Rural Epidemiology Study (CURES-70). PLoS One 4, e7185. Poongothai, S., Pradeepa, R., Ganesan, A., Mohan, V., 2009b. Reliability and validity of a modified PHQ-9 item inventory (PHQ-12) as a screening instrument for assessing depression in Asian Indians (CURES-65). J. Assoc. Phys. India 57, 147–152. Radloff, L.S., 1977. The CES-D scale: a self-report depression scale for research in the general population. Appl. Psychol. Meas. 1, 385–401. Raju, N.S., van der Linden, W.J., Fleer, P.F., 1995. IRT-based internal measures of differential functioning of items and tests. Appl. Psychol. Meas. 19, 353–368. Rovner, B.W., 1993. Depression and increased risk of mortality in the nursing home patient. Am. J. Med. 94, 19S–22S. Rovner, B.W., Casten, R.J., 2002. Neuroticism predicts depression and disability in age related macular degeneration. J. Am. Geriatr. Soc. 48, 1097–1100. Rovner, B.W., Casten, R.J., 2008. Preventing late-life depression in age-related macular degeneration. Am. J. Geriatr. Psychiatry 16, 454–459. Rovner, B.W., Casten, R.J., Hegel, M.T., Hauck, W.W., Tasman, W.S., 2007. Dissatisfaction with performance of valued activities predicts depression in age-related macular degeneration. Int. J. Geriatr. Psychiatry 22, 789–793. Rovner, B.W., Casten, R.J., Tasman, W.S., 2002. Effect of depression on vision function in age-related macular degeneration. Arch. Ophthalmol. 120, 1041–1044. Rovner, B.W., Zisselman, P.M., Shmuely-Dulitzki, Y., 1996. Depression and disability in older people with impaired vision: a follow-up study. J. Am. Geriatr. Soc. 44, 181–184. Schomerus, G., Matschinger, H., Angermeyer, M.C., 2009. Attitudes that determine willingness to seek psychiatric help for depression: a representative population survey applying the Theory of Planned Behaviour. Psychol. Med. 39, 1855–1865. Siegert, R.J., Tennant, A., Turner-Stokes, L., 2010. Rasch analysis of the Beck Depression Inventory-II in a neurological rehabilitation sample. Disabil. Rehabil. 32, 8–17. Spitzer, R.L., Williams, J.B., Kroenke, K., Linzer, M., deGruy 3rd, F.V., Hahn, S.R., Brody, D., Johnson, J.G., 1994. Utility of a new procedure for diagnosing mental disorders in primary care. The PRIME-MD 1000 study. J. Am. Med. Assoc. 272, 1749–1756. Tang, W.K., Wong, E., Chiu, H.F., Lum, C.M., Ungvari, G.S., 2005. The Geriatric Depression Scale should be shortened: results of Rasch analysis. Int. J. Geriatr. Psychiatry 20, 783–789. Thombs, B.D., Magyar-Russell, G., Bass, E.B., Stewart, K.J., Tsilidis, K.K., Bush, D.E., Fauerbach, J.A., McCann, U.D., Ziegelstein, R.C., 2007. Performance characteristics of depression screening instruments in survivors of acute myocardial infarction: review of the evidence. Psychosomatics 48, 185–194. Wang, S.Y., Singh, K., Lin, S.C., 2012. Prevalence and predictors of depression among participants with glaucoma in a nationally representative population sample. Am. J. Ophthalmol. 154 (436–444), e432. Weimo, Z., 1996. Should total scores from a rating scale be used directly ? Res. Q. Exerc. Sport 67, 363–372. Wright, B.D., Douglas, G.A., 1975. Best Test Design and Self-Tailored Testing (MESA Research Memorandum No. 19). Statistical Laboratory, Department of Education, University of Chicago, Chicago, IL. Wright, B.D., Douglas, G.A., 1976. Rasch Item Analysis by Hand (MESA Research Memorandum No. 21). Statistical Laboratory, Department of Education, University of Chicago, Chicago, IL. Wright, B.D., Linacre, J.M., 1989. Observations are always ordinal; measurements, however, must be interval. Arch. Phys. Med. Rehabil. 70, 857–860. Wright, B.D., Masters, G.N., 1982. Rating Scale Analysis. MESA Press, Chicago. Wright, B.D., Stone, M.H., 1979. Best Test Design. MESA Press, University of Chicago, Social Research, Chicago. Zigmond, A.S., Snaith, R.P., 1983. The hospital anxiety and depression scale. Arch. Gen. Psychiatry 67, 361–370.