Measurement equivalence of the Toronto Structured Interview for Alexithymia across language, gender, and clinical status

Measurement equivalence of the Toronto Structured Interview for Alexithymia across language, gender, and clinical status

Psychiatry Research ∎ (∎∎∎∎) ∎∎∎–∎∎∎ Contents lists available at ScienceDirect Psychiatry Research journal homepage: www.elsevier.com/locate/psychre...

246KB Sizes 0 Downloads 33 Views

Psychiatry Research ∎ (∎∎∎∎) ∎∎∎–∎∎∎

Contents lists available at ScienceDirect

Psychiatry Research journal homepage: www.elsevier.com/locate/psychres

Measurement equivalence of the Toronto Structured Interview for Alexithymia across language, gender, and clinical status Kateryna V. Keefer a, Graeme J. Taylor b, James D.A. Parker c, Ruth Inslegers d, R. Michael Bagby e,n a

Department of Psychology, University of Western Ontario, London, Ontario, Canada Department of Psychiatry, University of Toronto and Mount Sinai Hospital, Toronto, Ontario, Canada c Department of Psychology, Trent University, Peterborough, Ontario, Canada d Faculty of Psychology and Educational Sciences, Ghent University, Belgium e Departments of Psychology and Psychiatry, University of Toronto, 1265 Military Trail, Toronto, Ontario, Canada M1C 1A4 b

art ic l e i nf o

a b s t r a c t

Article history: Received 2 January 2015 Received in revised form 22 March 2015 Accepted 27 April 2015

The Toronto Structured Interview for Alexithymia (TSIA) has been translated into Dutch, German, and Italian and validated in clinical and nonclinical populations. In order to make valid comparisons across different population groups, it is important to establish measurement equivalence across variables such as language, gender, and clinical status. Our objective in this study was to establish measurement equivalence in relation to language (English, Dutch, German, and Italian), gender, and clinical status (non-clinical, psychiatric, and medical) using differential item functioning (DIF). The sample was composed of 842 adults representing the four language groups, all of whom had undergone the TSIA assessment as part of several earlier studies. Ordinal Logistic Regression was employed to explore DIF of the TSIA items. Although several items were found to exhibit DIF for language, gender, or clinical status, all of these effects were within an acceptable range. These findings provide support for the measurement equivalence of the TSIA, and allow researchers to reliably compare results from studies using the TSIA across the four language groups, gender, and clinical status. & 2015 Published by Elsevier Ireland Ltd.

Keywords: Alexithymia Differential item functioning Language equivalence Test translations Validity

1. Introduction Over the past two decades there has been a rapid expansion of empirical research on the alexithymia construct including brain imaging studies, genetic studies, attachment studies, explorations of associations between alexithymia and various psychiatric and medical disorders, and investigations of the influence of alexithymia on treatment outcomes (for recent reviews see Taylor and Bagby, 2012; Luminet et al., 2013; Moriguchi and Komaki, 2013; Taylor and Bagby, 2013). Most of this research has used the selfreport 20-item Toronto Alexithymia Scale (Bagby et al., 1994a; Bagby et al., 1994b) to measure alexithymia. Recognizing that the validity of research findings is increased by using more than one method to measure a construct, several alexithymia researchers are now using or advocating a multi-method measurement approach to assessing the construct (e.g., Taylor et al., 1997; Lumley et al., 2007; Moriguchi et al., 2007). For this purpose, Bagby et al. (2006) developed the Toronto Structured Interview for Alexithymia (TSIA) as an interview-based method for measuring

n

Corresponding author. Tel.: þ 1 416 5084134; fax: þ1 416 5868654. E-mail address: [email protected] (R. Michael Bagby).

the construct that may be used in conjunction with self-report or performance-based measures. The TSIA has been translated into Dutch, German, and Italian languages, and reliability, factorial validity, and concurrent validity of the measure have been demonstrated across clinical and nonclinical samples (Bagby et al., 2006; Grabe et al., 2009; Caretti et al., 2011; Inslegers et al., 2013). Notwithstanding cross-validation of the instrument and the use of back-translation procedures in developing the Dutch, German, and Italian translations of the TSIA, as noted by Ellis (1989), these steps are not sufficient to establish the property of measurement equivalence of any psychological or psychiatric test. Translated tests are said to exhibit measurement equivalence only “when individuals who are equal in the trait measured by the test but who come from different cultural and linguistic groups have the same observed score” (Ellis, 1989, p. 912). In earlier studies some group differences in TSIA mean scores were found with respect to language and clinical status. The TSIA mean scores were generally higher in clinical samples than in nonclinical samples (Bagby et al., 2006; Caretti et al., 2011). The mean TSIA score of a Dutch-speaking psychiatric patient sample was lower than the mean score for an Italian-speaking psychiatric sample, although similar to the mean scores of English-speaking and German-speaking psychiatric patient samples (Inslegers et al.,

http://dx.doi.org/10.1016/j.psychres.2015.04.044 0165-1781/& 2015 Published by Elsevier Ireland Ltd.

Please cite this article as: Keefer, K.V., et al., Measurement equivalence of the Toronto Structured Interview for Alexithymia across language, gender, and clinical status. Psychiatry Research (2015), http://dx.doi.org/10.1016/j.psychres.2015.04.044i

2

K.V. Keefer et al. / Psychiatry Research ∎ (∎∎∎∎) ∎∎∎–∎∎∎

2013). And the mean TSIA score of a Dutch-speaking sample of patients with chronic tinnitus was lower than the mean score of an Italian-speaking sample of patients with hypertension and circulatory problems (Inslegers et al., 2013). The question arises, however, as to whether these group differences reflect true differences in the alexithymia trait or are due to measurement artifacts. For example, TSIA items may work in different ways in different languages and cultures, and in clinical vs non-clinical populations; items may also be interpreted differently by men and women. Without first establishing measurement equivalence, one cannot make valid comparisons of mean scores of any psychological test across subgroups (Zumbo, 1999; Petersen et al., 2003). Thus, prior to imbuing the above observed group differences with substantive meaning, it is essential to demonstrate that the psychometric properties of the TSIA are equivalent across groups, and that these group differences are not an artifact of systematic measurement bias. A psychiatric or psychological instrument is said to be invalid with a particular group of respondents if it contains items that are systematically differently endorsed by that group compared to other groups for reasons other than true differences in the underlying construct being measured (American Educational Research Association et al., 2014). Such response bias may imply that the construct assessed by that instrument is not being measured similarly across different population groups. This type of bias is commonly referred to as differential item functioning (DIF). For this reason DIF analyses constitute an important aspect of the instrument development, validation, and translation process, with the goal of identifying non-invariant items so that they can be removed, revised, or accounted for accordingly (Petersen et al., 2003; Walker, 2011). Although the TSIA and its constitute items set (s) have been subjected to numerable empirical analyses, this instrument has not been subjected to a statistical method evaluating DIF. We think such analyses are particularly important for the TSIA in that the alexithymia construct has captured attention in various countries, language groups and cultures, and continues to be investigated in both clinical and non-clinical populations (Taylor and Bagby, 2012). Two types of DIF can be distinguished: uniform and nonuniform (Walker, 2011). Uniform DIF occurs when an item shows consistent bias regardless of the level of the trait (i.e., equally more difficult for one group than another), resulting in systematic group differences in the frequency with which the item is endorsed. Non-uniform DIF occurs when the magnitude and/or the direction of the bias varies depending on the level of the trait (e.g., more difficult for one group at high levels than at low levels of the trait), resulting in systematic group differences in the precision with which the item can discriminate between high-trait vs low-trait individuals. In addition to identifying problematically noninvariant items, DIF analyses can help determine whether the item bias lies in the prevalence (uniform DIF) or in the relevance (non-uniform DIF) of the item as an indicator of the target construct. Of the two types, non-uniform DIF poses greater threat to cross-group comparisons of mean scores, as it essentially amounts to an “apples vs oranges” scenario. Our objective in the current study was to analyze the TSIA items for the presence of uniform and non-uniform DIF in relation to three common grouping factors: language (English, Dutch, German, and Italian), clinical status (non-clinical, psychiatric, and medical), and gender. If the TSIA items demonstrate trivial or no DIF across these groups, then researchers can be confident that observed group differences in TSIA scores represent true differences in the alexithymia trait. Notwithstanding, the finding of non-trivial DIF may be informative in its own right in that it may reveal meaningful differences in the way various groups manifest certain alexithymia markers represented by the TSIA items.

Indeed, some DIF analyses may represent valid group differences in the very nature of the alexithymia construct.

2. Method 2.1. Participants and procedure Data for this study came from a total of 842 participants representing four language groups (English, German, Dutch, and Italian), all of whom had undergone the TSIA assessment in their respective language as part of earlier studies. The English-speaking group consisted of 97 psychiatric outpatients (20 men, 77 women; mean age¼39.43 years, S.D. ¼12.10) and 136 community-based adults without a psychiatric diagnosis (41 men, 95 women; mean age¼32.70 years, S.D.¼ 11.26) recruited via advertisements on the university campus and in psychiatric facilities in metropolitan Canada (for further details see Bagby et al., 2006). The Germanspeaking group consisted of 237 psychiatric inpatients and outpatients (94 men, 143 women; mean age¼ 40.43 years, S. D.¼13.31) recruited from psychiatric facilities in Germany and Switzerland (for further details see Grabe et al., 2009). The Dutchspeaking group consisted of 85 psychiatric inpatients with a mood and/or anxiety disorder (32 men, 53 women; mean age¼39.92 years, S.D. ¼12.26) and 76 medical outpatients suffering from chronic tinnitus (48 men, 28 women; mean age¼47.82 years, S. D.¼13.42) recruited from several hospitals in the Dutch-speaking region of Belgium (for further details see Inslegers et al., 2013). The Italian-speaking group consisted of 62 psychiatric outpatients with a generalized anxiety disorder, dysthymia, or eating disorder (21 men, 41 women; mean age¼ 31.77 years, S.D. ¼11.82), 69 medical outpatients with essential hypertension and circulatory problems (28 men, 41 women; mean age¼ 40.49 years, S.D. ¼ 12.47), and 80 healthy adults (24 men, 56 women; mean age¼ 35.82 years, S. D.¼11.28) recruited from several cities in Italy (for further details see Caretti et al., 2011). The two patient samples were recruited from among consecutive referrals at psychiatric and medical clinics, and the non-clinical sample was recruited via advertisements at hospitals and university campuses (for further details see Caretti et al., 2011). The TSIA was administered by several trained interviewers. Further details about sample characteristics, recruitment strategies, interview procedures, and inter-rater reliabilities can be found in the respective studies cited above.

2.2. Measure The TSIA consists of a set of 24 interview questions (items) that are divided into 6-item subscales for each of the four facets of the alexithymia construct described by Nemiah et al. (1976): difficulty identifying subjective feelings; difficulty describing feelings to others; externally oriented style of thinking; imaginal processes. Some example questions are: “When you are upset, do you know whether you are feeling angry, sad, or anxious?”; “Is it usually easy for you to find words to describe your feelings to others?”; “Do you think about past emotional experiences to help you cope with more recent emotional problems?”; Is it rare for you to fantasize?”. For each interview question there is as set of prompts and/or probes designed to elicit responses that are scored on a scale ranging from 0 to 2 according to guidelines outlined in a manual (Bagby et al., 2005; Grabe et al., 2014; Taylor et al., 2014). To aid in scoring, participants are also asked to give examples to illustrate their responses to the questions.

Please cite this article as: Keefer, K.V., et al., Measurement equivalence of the Toronto Structured Interview for Alexithymia across language, gender, and clinical status. Psychiatry Research (2015), http://dx.doi.org/10.1016/j.psychres.2015.04.044i

K.V. Keefer et al. / Psychiatry Research ∎ (∎∎∎∎) ∎∎∎–∎∎∎

2.3. Statistical analyses The commonly used method of ordinal Logistic Regression (LR) was employed to explore the DIF of the TSIA items for each of the four TSIA subscales (Zumbo, 1999). The total score on the relevant subscale was used as the matching variable (i.e., estimate of trait level), and the presence of DIF was analyzed with respect to three grouping factors: (1) language (English, German, Dutch, Italian); (2) clinical status (psychiatric, medical, non-clinical); and (3) gender (male, female). The LR method involves a series of nested models, where each TSIA item is regressed first onto the matching variable alone (Model 1), then onto the grouping factor in addition to the matching variable (Model 2), and then onto the interaction term of the matching variable by the grouping factor in addition to their main effects (Model 3). The presence of DIF is identified if there is a significant difference in fit between Model 3 and Model 1, suggesting that group membership influences item scores in addition to trait level. The type of DIF (if present) is determined by testing the difference in fit between Model 2 and Model 1 for uniform DIF, and between Model 3 and Model 2 for nonuniform DIF. Difference in fit between the models was determined based on a significant chi-square difference test. In addition, the magnitude of the effect size (quantified with Nagelkerke’s Pseudo R-square) was used in conjunction with the chi-square significance test to identify the extent of DIF. This dual-criterion approach is recommended due to the known tendency of the chi-square test to overidentify DIF items in large samples even if the effects are negligible, whereas using an effect-size measure of practical significance reduces Type I errors (Zumbo, 1999; Jodoin and Gierl, 2001; French and Maller, 2007). Thus, an item was classified as exhibiting non-trivial DIF if it met both of the following criteria: (1) significant chi-square difference test between Model 1 and Model 3 (p o0.008, Bonferroni corrected for multiple comparisons), and (2) change in R-square from Model 1 to Model 3 of 0.035 or greater, which represents at least a moderate effect (Jodoin and Gierl, 2001). In the event of non-trivial DIF, the LR analyses for that subscale are typically re-run using a purified matching variable (i.e., removing the DIF item from the total subscale score, except for the analyses involving the DIF item itself), until no changes in identified DIF items are seen on two consecutive re-runs.1 The nature of group differences and the impact of DIF (if present) on group differences were explored using analyses of variance (ANOVAs) to determine if different substantive conclusions would result from using modified subscales with DIF items removed compared to the original subscales. All analyses were carried out in SPSS.

3

only (they were represented within all four language groups); thus, the observed language group differences could not be accounted for only by the clinical status of the samples. In terms of differences by clinical status, medical patients tended to score the highest and non-clinical participants the lowest across the four TSIA subscales. However, this general trend did not replicate within the different language groups (there was a significant language group by clinical status interaction), likely owing to the heterogeneous nature of the patients’ diagnoses. Gender-wise, men tended to score higher than women across the four TSIA subscales, and there were no significant interactions between gender and language group or clinical status. To determine whether these observed group differences could be an artifact of DIF, tests of DIF by language group, clinical status, and gender were conducted next. 3.2. Tests of DIF The results of the LR analyses by language group, contrasting the English-speaking group with the German- Dutch-, and Italianspeaking groups (combined across clinical status and gender) are displayed in Table 2. As anticipated, the extent of DIF was higher based on the chi-square significance test than based on the effect size criterion. According to the chi-square test, 10 of the 24 TSIA items (41.7%) were identified as exhibiting DIF. However, the chisquare test is known to produce false positives in large samples. Indeed, none of these effects reached moderate magnitudes based on the R-square effect size (all ΔR2 r0.034). Therefore, in terms of practical significance, all TSIA items can be said to function equivalently across the four language groups tested. The results of the LR analyses by clinical status, contrasting non-clinical participants with psychiatric and medical patients (combined across language group and gender) are displayed in Table 3. Based on the chi-square significance test, 6 of the 24 TSIA items (25%) were identified as exhibiting some DIF. However, all of these effects were well within the trivial magnitude range based on the R-square effect size (all ΔR2 r0.028). Therefore, all TSIA items can be said to function equivalently for psychiatric, medical, and non-clinical participants. The results of the LR analyses by gender (combined across language group and clinical status) are displayed in Table 4. Based on the chi-square significance test, 2 of the 24 TSIA items (8.3%) were identified as exhibiting some DIF. However, all of these effects were well within the trivial magnitude range based on the R-square effect size (all ΔR2 r 0.015). Therefore, all TSIA items can be said to function equivalently for men and women.2

4. Discussion 3. Results 3.1. Descriptive Statistics Means, standard deviations, and ANOVA results for the four TSIA subscales are displayed in Table 1. In terms of language group differences, Italian-speaking participants scored significantly higher than the other three language groups on all four TSIA subscales, whereas German-speaking participants scored significantly lower than the other three language groups on the Difficulty Describing Feelings and Imaginal Processes subscales. Of note, the same pattern of language group differences was obtained when comparing the TSIA scores of psychiatric patients 1 In this study purification was not necessary given the trivial magnitude of DIF (see Results).

The results of this study provide support for measurement equivalence of the English, Dutch, German, and Italian language versions of the TSIA, and thereby supplement results from previous studies demonstrating reliability and validity of the instrument (Bagby et al., 2006; Grabe et al., 2009; Caretti et al., 2011; Inslegers et al., 2011). Although as many as 42% of the TSIA items were identified as exhibiting some DIF based on the chi-square significance test, the R-square effect-size measure  which is 2 A series of secondary LR analyses were conducted where age was entered as a continuous variable in Models 2 and 3, to test for potential DIF based on participants’ age (range 17 74 years). Based on the chi-square significance test, 3 of the 24 TSIA items (12.5%) were identified as exhibiting some DIF. However, all of these effects were well within the trivial magnitude range based on the R-square effect size (all ΔR2 r 0.013). Therefore, all TSIA items can be said to function equivalently across adulthood.

Please cite this article as: Keefer, K.V., et al., Measurement equivalence of the Toronto Structured Interview for Alexithymia across language, gender, and clinical status. Psychiatry Research (2015), http://dx.doi.org/10.1016/j.psychres.2015.04.044i

K.V. Keefer et al. / Psychiatry Research ∎ (∎∎∎∎) ∎∎∎–∎∎∎

4

Table 1 Mean scores, standard deviations, and ANOVA results for group differences.

Table 3 Tests of differential item functioning by clinical status.

Group

Identifying

Describing

External

Imaginal

Scale/Item

Δχ2(4)

ΔR2

DIF

Language: English (n ¼233) German (n¼ 237) Dutch (n¼ 161) Italian (n¼ 211) F (3, 841) ¼ Eta-squared¼

3.34 (2.85)a 3.29 (2.60)a 3.91 (3.29)a 4.94 (2.84)b 15.68** 0.053

5.12 (3.43)a 4.25 (3.12)b 5.65 (3.73)a 6.11 (3.13)c 12.62** 0.043

4.91 (3.23)a 5.04 (2.84)a 5.61 (3.48)a 6.09 (2.99)b 6.70** 0.023

4.99 (3.02)a 4.59 (2.99)b 5.20 (3.43)a 5.51 (2.90)c 3.49* 0.012

Identifying 01 05 09 13 17 21

10.12 10.24 15.32* 4.50 12.20 32.08*

0.007 0.008 0.011 0.003 0.010 0.028

Not present Not present Trivial Not present Not present Trivial

Clinical status: Psychiatric (n¼ 481) Medical (n¼ 145) Non-clinical (n¼ 216) F (2, 841) ¼ Eta-squared¼

3.98 (2.97)a 4.41 (3.31)a 3.13 (2.48)b 9.68** 0.023

5.12 (3.34)a 6.04 (3.90)b 4.91 (3.11)a 5.39** 0.013

5.34 (3.08)a 6.37 (3.40)b 4.80 (2.97)a 11.05** 0.026

5.04 (3.09)a 5.61 (3.63)b 4.69 (2.58)c 3.94* 0.009

Gender: Men (n ¼308) Women (n¼ 534) F (1, 841) ¼ Eta-squared¼

Describing 02 06 10 14 18 22

17.39* 21.29* 2.62 3.36 12.88 3.22

0.010 0.012 0.002 0.002 .0008 0.002

Trivial Trivial Not present Not present Not present Not present

4.01 (3.09) 3.74 (2.85) 1.69 0.002

5.80 (3.54) 4.89 (3.28) 14.15** 0.017

5.97 (3.27) 5.04 (3.03) 17.30** 0.020

5.34 (3.21) 4.88 (2.99) 4.55* 0.005

External 03 07 11 15 19 23

11.38 16.08* 9.98 1.81 8.99 9.64

0.008 0.015 0.007 0.001 0.007 0.006

Not present Trivial Not present Not present Not present Not present

Imaginal 04 08 12 16 20 24

3.55 5.40 4.05 8.09 10.61 15.33*

0.002 0.004 0.004 0.007 0.009 0.010

Not present Not present Not present Not present Not present Trivial

Note. Identifying ¼difficulty identifying feelings subscale; Describing ¼difficulty describing feelings subscale; External ¼externally oriented thinking subscale; Imaginal ¼ imaginal processes subscale. Cell values with different subscripts are significantly different, based on the Student  Newman  Keuls post-hoc test of homogeneous subsets. n

po 0.05 p o0.01

nn

Table 2 Tests of differential item functioning by language group. Scale/Item

2

Δχ (6)

2

ΔR

Note. DIF¼ differential item functioning; Identifying ¼difficulty identifying feelings subscale; Describing ¼ difficulty describing feelings subscale; External¼externally oriented thinking subscale; Imaginal ¼ imaginal processes subscale.

DIF

n

Identifying 01 05 09 13 17 21

8.00 3.76 23.01* 10.94 23.64* 15.25

0.006 0.003 0.016 0.007 0.019 0.014

not present not present trivial not present trivial not present

Describing 02 06 10 14 18 22

14.78 14.66 13.00 9.77 17.25 9.98

0.009 0.008 0.009 0.005 0.011 0.008

not not not not not not

External 03 07 11 15 19 23

26.62* 38.57* 12.10 6.82 19.36* 13.33

0.019 0.034 0.009 0.005 0.015 0.009

trivial trivial not present not present trivial not present

Imaginal 04 08 12 16 20 24

43.57* 28.51* 10.38 29.46* 18.98* 20.63*

0.031 0.019 0.009 0.023 0.016 0.014

trivial trivial not present trivial trivial trivial

present present present present present present

Note. DIF ¼differential item functioning; Identifying ¼difficulty identifying feelings subscale; Describing ¼ difficulty describing feelings subscale; External¼ externally oriented thinking subscale; Imaginal ¼imaginal processes subscale. n

po 0.008 (Bonferroni corrected for multiple comparisons)

more robust against inflated Type I error rates (Zumbo, 1999; Jodoin and Gierl, 2001)  indicated that the magnitudes of these effects were all negligible in terms of their practical significance. Therefore, all TSIA items can be said to function equivalently

p o 0.008 (Bonferroni corrected for multiple comparisons)

across gender, clinical status, and the four language groups tested in this study. The finding of measurement equivalence, in turn, allows making substantive interpretations of the observed group differences in the TSIA scores. For example, the findings that men tended to score higher than women, and that Italian-speaking individuals scored higher than Dutch-, English-, and Germanspeaking individuals, are not due to measurement artifacts but reflect true differences in the alexithymia trait. Moreover, since non-trivial DIF was not found on any of the TSIA items, no items need to be singled out for revision or retranslation. Strengths of this study were the large size of the sample for analyzing DIF, and that DIF was analyzed across three important population variables – language, gender, and clinical status. The conclusions drawn from the results, however, are limited to the specific groups included in the analyses and cannot be generalized to other groups. For example, although the TSIA items exhibited measurement equivalence across Dutch, English, German, and Italian languages, measurement equivalence cannot be assumed to hold between respondents speaking other languages. Although age was ruled out as a potential source of DIF among the adult participants of this study, the TSIA items may still function differentially for adolescents. Likewise, the current study did not assess DIF in relation to socio-economic status (SES) due to different ways of operationalizing SES across sub-samples; the TSIA items may function differentially for respondents with varying education and income levels. It can be argued also that our DIF analyses with respect to clinical status may have been undermined by the heterogeneity of diagnoses admixed in the medical group. For example, the Dutch-speaking medical sample consisted of patients with chronic tinnitus, whereas the Italian-speaking medical sample consisted of patients with cardiovascular problems. It is possible that these two diagnoses may be associated with

Please cite this article as: Keefer, K.V., et al., Measurement equivalence of the Toronto Structured Interview for Alexithymia across language, gender, and clinical status. Psychiatry Research (2015), http://dx.doi.org/10.1016/j.psychres.2015.04.044i

K.V. Keefer et al. / Psychiatry Research ∎ (∎∎∎∎) ∎∎∎–∎∎∎ Table 4 Tests of differential item functioning by gender. Scale/Item

Δχ2(2)

ΔR2

DIF

Identifying 01 05 09 13 17 21

7.48 0.36 5.26 2.96 3.74 2.96

0.005 0.000 0.004 0.002 0.003 0.003

not not not not not not

present present present present present present

Describing 02 06 10 14 18 22

6.37 1.14 4.62 1.86 1.75 1.93

0.004 0.001 0.003 0.001 0.001 0.001

not not not not not not

present present present present present present

External 03 07 11 15 19 23

3.44 8.04 20.60* 2.71 4.25 0.59

0.003 0.008 0.015 0.002 0.004 0.000

not present not present trivial not present not present not present

Imaginal 04 08 12 16 20 24

0.02 12.85* 3.09 4.65 1.56 2.03

0.000 0.009 0.003 0.004 0.001 0.006

not present trivial not present not present not present not present

Note. DIF ¼differential item functioning; Identifying ¼ difficulty identifying feelings subscale; Describing¼ difficulty describing feelings subscale; External¼ externally oriented thinking subscale; Imaginal ¼imaginal processes subscale. n

po 0.008 (Bonferroni corrected for multiple comparisons).

measurement non-invariance relative to each other or to other medical conditions – an issue that could not be de-confounded from language effects in the current study. Future validation work on the TSIA should extend the current DIF analyses to these and other sub-groups pertinent to the alexithymia research. Acknowledgement The authors are grateful to our colleagues in Belgium, Germany, Italy, and Switzerland, who made available their data so that we could conduct the current study. Work on this study was partially supported by a Postdoctoral Fellowship from the Social Sciences and Humanities Research Council of Canada to K. V. Keefer. References American Educational Research Association, American Psychological Association, and National Council on Measurement in Education, 2014. Standards for Educational and Psychological Testing. American Educational Research Association, Washington, DC.

5

Bagby, R.M., Parker, J.D.A., Taylor, G.J., 1994a. The twenty-item Toronto alexithymia scale  I. Item selection and cross-validation of the factor structure. Journal of Psychosomatic Research 38, 23–32. Bagby, R.M., Taylor, G.J., Dickens, S.E., Parker, J.D., 2005. The Toronto Structured Interview for Alexithymia Administration and Scoring Guidelines. Unpublished manual. Bagby, R.M., Taylor, G.J., Parker, J.D.A., 1994b. The twenty-item Toronto alexithymia scale  II. Convergent, discriminant, and concurrent validity. Journal of Psychosomatic Research 38, 33–40. Bagby, R.M., Taylor, G.J., Parker, J.D.A., Dickens, S.E., 2006. The development of the Toronto structured interview for alexithymia: item selection, factor structure, reliability and concurrent validity. Psychotherapy and Psychosomatics 75, 25–39. Caretti, V., Porcelli, P., Solano, L., Schimmenti, A., Bagby, R.M., Taylor, G.J., 2011. Reliability and validity of the Toronto structured interview for alexithymia in a mixed clinical and nonclinical sample from Italy. Psychiatry Research 187, 432–436. Ellis, B.B., 1989. Differential item functioning: implications for test translations. Journal of Applied Psychology 74, 912–921. French, B.F., Maller, S.J., 2007. Iterative purification and effect size use with logistic regression for differential item functioning detection. Educational and Psychological Measurement 67, 373–393. Grabe, H.J., Löbel, S., Dittrich, D., Bagby, R.M., Taylor, G.J., Quilty, L.C., Spitzer, C., Barnow, S., Mathier, F., Jenewein, J., Freyberger, H.J., Rufer, M., 2009. The German version of the Toronto structured interview for alexithymia: factor structure, reliability, and concurrent validity in a psychiatric patient sample. Comprehensive Psychiatry 50, 424–430. Grabe, H.J., Rufer, M., Bagby, R.M., Taylor, G.J., Parker, J.D.A., 2014. Strukturiertes Toronto Alexithymie Interview Manual. Verlag Hans Huber, Hogrefe AG, Bern. Inslegers, R., Meganck, R., Ooms, E., Vanheule, S., Taylor, G.J., Bagby, R.M., De Fruyt, F., Desmet, M., 2013. The Dutch language version of the Toronto structured interview for alexithymia: reliability, factor structure and concurrent validity. Psychologica Belgica 53, 93–116. Jodoin, M.G., Gierl, M.J., 2001. Evaluating type I error and power rates using an effect size measure with the logistic regression procedure for DIF detection. Applied Measurement in Education 14, 329–349. Luminet, O., Vermeulen, N., Grynberg, D., 2013. L’Alexithymie. De Boeck, Bruxelles. Lumley, M.A., Neely, L.C., Burger, A., 2007. The assessment of alexithymia in medical settings: implications for understanding and treating health problems. Journal of Personality Assessment 89, 230–246. Moriguchi, Y., Decety, J., Ohnishi, T., Maeda, M., Mori, T., Nemoto, K., Matsuda, H., Komaki, G., 2007. Empathy and judging other’s pain: an fMRI study of alexithymia. Cerebral Cortex 17, 2223–2234. Moriguchi, Y., Komaki, G., 2013. Neuroimaging studies of alexithymia: physical, affective, and social perspectives. BioPsychoSocial Medicine 7, 8. Nemiah, J.C., Freyberger, H., Sifneos, P.E., 1976. Alexithymia: a view of the psychosomatic process. In: Hill, O.W. (Ed.), Modern Trends in Psychosomatic Medicine, Vol. 3. Butterworths, London, pp. 430–439. Petersen, M.A., Groenvold, M., Bjorner, J.B., Aaronson, N., Conroy, T., Cull, A., Fayers, P., Hjermstad, M., Sprangers, M., Sullivan, M., 2003. Use of differential item functioning analysis to assess the equivalence of translations of a questionnaire. Quality of Life Research 12, 373–385. Taylor, G.J., Bagby, R.M., 2012. The alexithymia personality dimension. In: Widiger, T.A. (Ed.), The Oxford Handbook of Personality Disorders. Oxford University Press, New York, NY, pp. 648–673. Taylor, G.J., Bagby, R.M., 2013. Psychoanalysis and empirical research: the example of alexithymia. Journal of the American Psychoanalytic Association 61, 99–133. La Valutazione Dell’Alessitimia con la TSIA. In: Taylor, G.J., Bagby, R.M., Caretti, V., Schimmenti, A. (Eds.), Raffaello Cortina Editore, Milano. Taylor, G.J., Bagby, R.M., Parker, J.D.A., 1997. Disorders of Affect Regulation: Alexithymia in Medical and Psychiatric Illness. Cambridge University Press, Cambridge, United Kingdom. Walker, C.M., 2011. What’s the DIF? Why differential item functioning analyses are an important part of instrument development and validation. Journal of Psychoeducational Assessment 29, 364–376. Zumbo, B.D., 1999. A Handbook on the Theory and Methods of Differential Item Functioning (DIF): Logistic Regression modeling as a Unitary Framework for Binary and Likert-Type (Ordinal) Item Scores. Directorate of Human Resources Research and Evaluation, Department of National Defense, Ottawa, ON.

Please cite this article as: Keefer, K.V., et al., Measurement equivalence of the Toronto Structured Interview for Alexithymia across language, gender, and clinical status. Psychiatry Research (2015), http://dx.doi.org/10.1016/j.psychres.2015.04.044i