Addictive Behaviors 38 (2013) 2279–2287
Contents lists available at SciVerse ScienceDirect
Addictive Behaviors
Rasch model of the GAIN Substance Problem Scale among Canadian adults seeking residential and outpatient addiction treatment Chris Kenaszchuk a, T. Cameron Wild b, Brian R. Rush a, Karen Urbanoski a,⁎ a b
Centre for Addiction and Mental Health, 33 Russell St., Toronto, ON M5S 2S1, Canada School of Public Health, University of Alberta, 7-30 University Terrace, Edmonton T6G 2T4, Canada
H I G H L I G H T S • • • • •
We examined the validity of the GAIN Substance Problem Scale using Rasch analysis. The SPS measures a single latent continuum of addiction severity. SPS items correspond to a narrow range of addiction severity. Items were too severe for many clients to endorse, especially 6 weeks into treatment. Findings support the changes proposed for DSM-5.
a r t i c l e
i n f o
Keywords: GAIN Substance Problem Scale Treatment Rasch analysis Problem severity Differential item functioning
a b s t r a c t Background: The GAIN Substance Problem Scale (SPS) measures alcohol and drug problem severity within a DSM-IV-TR framework. This study builds on prior psychometric evaluation of the SPS by using Rasch analysis to assess scale unidimensionality, item severity, and differential item functioning (DIF). Methods: Participants were attending residential or outpatient treatment in Alberta and Ontario, Canada, respectively (n = 372). Rasch analyses modeled a latent problem severity continuum using SPS scores at treatment admission and 6-week follow-up. We examined DIF by gender, treatment modality (outpatient vs. residential), and assessment timing (baseline vs. follow-up). Results: Model fit was good overall, supporting unidimensionality and a single underlying continuum of substance problem severity. Relative to person severity, however, the range of item severities was narrow. Items were too severe for many clients to endorse, particularly at follow-up. Overall, the rank order of item severities was stable across gender, treatment modality, and time point. Although traditional Rasch criteria indicated a number of statistically significant and substantive DIF estimates across modality and time points, effect size indices did not suggest a net effect on total scale scores. Conclusions: The analysis broadly supports use of the SPS as an additive measure of global substance severity in men and women and both residential and outpatient settings. Although DIF was not a major concern, there was evidence of item redundancy and suboptimal matching between items and persons. Findings highlight potential opportunities for further improving this scale in future research and clinical applications of the GAIN. © 2013 Elsevier Ltd. All rights reserved.
1. Introduction Assessment forms the basis of quality clinical care, from treatment planning to monitoring client progress and outcomes. There are ongoing concerns, however, about the quality and efficiency of clinical assessment in addiction treatment (Hilton, 2011). Standardized ⁎ Corresponding author at: Centre for Addiction and Mental Health, Dalla Lana School of Public Health, University of Toronto, 33 Russell St., Toronto, ON M5S 2S1, Canada. Tel.: +1 416 535 8501x4812; fax: +1 416 979 4703. E-mail address:
[email protected] (K. Urbanoski). 0306-4603/$ – see front matter © 2013 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.addbeh.2013.02.013
assessment procedures are needed that are reliable and valid across heterogeneous client populations and treatment settings. The suite of assessment tools known as the Global Appraisal of Individual Needs (GAIN) (Dennis, White, Titus, & Unsicker, 2006) represents one such set of instruments that is widely used in the field. The GAIN-Individual (GAIN-I) is a comprehensive biopsychosocial assessment tool designed to support screening, diagnosis, treatment planning, client monitoring and treatment evaluation. It has been adopted by over 300 treatment agencies in the US and Canada, and has been used in numerous multisite research projects. Among its roughly 100 scales is the Substance Problem Scale (SPS), a short
2280
C. Kenaszchuk et al. / Addictive Behaviors 38 (2013) 2279–2287
measure of symptoms related to the problematic use of alcohol and other drugs, based on DSM-IV-TR criteria for substance use disorders. As part of a broader standardized assessment package designed to support diagnosis and treatment planning, clinical decision-making and outcome monitoring, the SPS has high clinical utility (Dennis, Funk, Godley, Godley, & Waldron, 2004; Dennis et al., 2006). It is short and easily administered and scored, with items covering the spectrum of substance-related issues, from lower severity problems and substance-induced disorders to more severe symptoms of dependence. Scoring procedures allow for both probable diagnosis and a dimensional measure of severity, thus offering flexibility for its use in clinical and research settings. Prior psychometric evaluation supports the reliability and validity of the SPS for adults and adolescents in addiction treatment (Conrad, Conrad, Dennis, Riley, & Funk, 2011; Dennis et al., 2006; Scott & Dennis, 2009). Among the procedures available for assessing validity of multi-item scales, Rasch analysis offers a method of locating scale items and respondents along a single latent continuum, in this case, substance problem severity. In addition, when the data fit the Rasch model, the sum of a respondent's item endorsements (for example, a symptom count) is a sufficient measure of their severity. This reflects the way items are typically used in scales, including the SPS, giving the results of Rasch analyses high practical utility. Consistent with findings reported for other measures and diagnostic interviews in general population and clinical samples (Borges et al., 2011; Harford, Yi, Faden, & Chen, 2009; Hasin, Fenton, Beseler, Park, & Wall, 2012; Kahler & Strong, 2006; Ray, Kahler, Young, Chelminski, & Zimmerman, 2008; Saha, Stinson, & Grant, 2007; Saha et al., 2012; Wu et al., 2012), Rasch analysis of the SPS supports a unidimensional latent continuum of problem severity (Conrad, Dennis, Bezruczko, Funk, & Riley, 2007; Conrad et al., 2011). Contrary to the DSM-based model, however, abuse items tend to be distributed across the full continuum, rather than being clustered below dependence items in terms of severity (i.e., “difficulty” or “ability” in the traditional terminology of item response theory). Evidence has also been found for differential item functioning (DIF) by age (b 18 vs. 18+) and primary problem substance (Conrad et al., 2007, 2011). DIF can affect the generalizability of results that are based on the SPS, and therefore, is an important area for evaluation. To support its use as a measure of treatment need and outcome, it is important that the scale quantifies severity consistently across clinical subgroups. Evidence of DIF by age and problem substance in prior work provides a rationale for further evaluation of DIF by other clinically relevant factors. Two factors of high practical importance include treatment modality and repeated measures over time. Notably, substance problem severity would be expected to differ across these factors. Residential treatment is typically seen as a more intensive level of care, appropriate for more severely impaired individuals, while, regardless of modality, substance problem severity would be expected to decline over time with the receipt of treatment. However, while item prevalence would be expected to differ across treatment modalities and over an episode of care, the rank ordering of items along the latent continuum of severity should remain stable. Measurement equivalence across modalities and over time enhances the SPS's validity for multi-site and longitudinal measurement. No studies to date have evaluated the stability of item level severity (i.e., DIF), by modality or assessment timing. Further assessment of item functioning by gender is also warranted given some evidence of DIF by gender in evaluations of other, similar measures of substance problems (Harford et al., 2009; Kahler, Hoeppner, & Jackson, 2009; Kahler & Strong, 2006; Wu et al., 2012). This study builds upon prior psychometric work evaluating the SPS. We use the Rasch measurement model to (1) assess whether SPS items tap a single latent dimension of substance problem severity and to locate the placement of items along this dimension; and (2) test for DIF by gender, treatment modality (i.e., outpatient vs. residential care), and assessment timing (i.e., admission and 6 weeks later). By assessing unidimensionality and item severity, we are seeking to replicate and
extend the prior work of Conrad et al. (2007, 2011) into a novel and heterogeneous sample of Canadian adults in treatment. Work of this nature is highly relevant, as the GAIN-I continues to support assessment and outcome monitoring in a growing number of clinical and research settings (Dennis et al., 2006). 2. Methods 2.1. Sample Participants were 610 adults (18 years and older) entering addiction treatment at one of two agencies located in Ontario or Alberta, Canada, and enrolled in a prospective study of treatment processes (Wild, Urbanoski, Wolfe, & Rush, 2011). Participants were mostly men (58.9%) and average age was 33.5 years (SD = 10.7). Most (54.0%) were single; 24.7% were married or partnered and 20.6% were separated or divorced. Half were White, European, and/or Canadian; 18% were First Nations, 6.6% were Métis, 2.5% were Inuit, 3.9% reported mixed or other ethnic backgrounds, including African, Asian, and South American (20.0% did not report an ethnic background). Just under half had less than a high school education (45.6%) and 27.2% had a high school diploma. Social assistance was the primary income source for 34.1% of participants, while 11.1% reported having no income. Allowing for multiple problem substances per participant, the most common problem substances were alcohol (75.0%), cocaine or crack (51.8%), cannabis (41.7%), methamphetamine or ecstasy (13.1%), and prescription opiates (12.6%). 2.2. Treatment Study sites included a residential treatment center located in Alberta and an outpatient treatment agency in Ontario. The residential treatment center offered programs of 42 and 90 days, the latter designed for young adults. Treatment was abstinence-oriented, and included group and individual counseling, life skills training and spiritual development programming. The outpatient treatment center in Ontario also offered two programs. The first provided individual counseling and case management, with groups for skills training and relapse prevention. Session frequency and treatment duration were dictated by client needs and counselor resources. The second was a group program consisting of four 2-hour sessions held consecutively on a weekly basis. It was primarily educational, designed to assist clients in examining their substance use and learning about processes of change and options for treatment. The target clientele was those who had not set a goal related to their substance use, including many who were mandated by the legal or social service systems to attend treatment. It was possible for clients to attend the group program prior to, or simultaneously with, individual counseling. 2.3. Procedures At both study sites, recruitment took place at a routine group orientation and assessment session attended by all new clients. A member of the research team explained the study, obtained informed written consent for study participation and chart review, and administered the baseline questionnaire or arranged another time to meet. Baseline questionnaires were completed an average of 2.9 days after admission (SD = 4.8). Follow-up questionnaires were completed approximately 6 weeks later (M = 43.0 days, SD = 23.1), either in person at the treatment agency (74.6%), by telephone (24.6%), or by email (0.9%). At the time of follow-up, 51.6% of participants were still attending treatment. Of the 610 participants enrolled at baseline, 440 (72.1%) were followed up. Women were significantly more likely to be followed up than were men (79.3% vs. 69.3%; χ2(1) = 7.23, p = .007), and those who were followed up reported greater identified motivation for treatment at admission (5.9 ± 1.6 vs. 5.5 ± 1.9 on a 7-point scale;
C. Kenaszchuk et al. / Addictive Behaviors 38 (2013) 2279–2287
t(568) = −3.05, p = .002; see Urbanoski & Wild, 2012). Clients who were legally mandated were less likely to be followed up (66.5% vs. 77.8%; χ2(1) = 8.84, p = .003). There were no differences (p > .05) in follow-up rates by age, education, employment status, marital status, problem substances or a count of substance-related symptoms in the month prior to admission. Given the relative ease of locating those who were still attending treatment, retention was highly predictive of follow-up, with 98.1% of those retained completing the follow-up questionnaire relative to 46.7% of those who had dropped out (χ2(1) = 201.64, p b .001). Participants received $10 in cash following the interviews in Ontario (total = $20) and $20 gift certificates following the interviews in Alberta (total = $40; site differences in compensation were due to established amounts from prior studies). Approval was obtained from the Research Ethics Boards at the University of Alberta, the Centre for Addiction and Mental Health, University of Toronto, and Lakeridge Health Services. 2.4. Measures The SPS (Dennis et al., 2006) provides a global measure of selfreported symptoms of problematic alcohol and drug use (Table 1). Sixteen items assess the occurrence and recency of DSM-IV-TR symptoms of abuse and dependence, substance-induced health and psychiatric disorders, and lower severity problems. Each item was scored as occurring within the past month (2), 2–12 months ago (1), or longer than 12 months ago/never (0). Prior analyses support this response scale over the original 4-level scale that contained separate response options for “longer than 12 months ago” and “never” (Conrad et al., 2007). Responses are used to identify probable substance abuse and dependence, and can been summed to provide a count of substance problems. Psychometric testing in adolescent and adult clinical samples indicates high internal consistency (α = 0.88–0.93) and good test– retest reliability (r = 0.70–0.81) (Dennis et al., 2006; Garner, Godley,
2281
Funk, Lee, & Garnick, 2010; Scott & Dennis, 2009). Greater clinical complexity in terms of co-occurring mental disorders is associated as expected with dependence (vs. abuse) (Chan, Dennis, & Funk, 2008; Rush, Dennis, Scott, Castel, & Funk, 2008). Further, problem severity, indexed by the symptom count, is positively associated with frequency of substance use and levels of social and environmental risk, and is sensitive to change in response to treatment (Dennis et al., 2004; Garner, Godley, Funk, Dennis, & Godley, 2007; Godley, Kahn, Dennis, Godley, & Funk, 2005). Participants completed the SPS as part of self-reported research questionnaires administered at admission and 6-week follow-up. Mean symptom counts at admission and follow-up were 6.0 (SD = 5.8, min–max = 0–16) and 2.6 (SD = 4.3, min–max = 0–16), respectively.
2.5. Analysis Analyses focused on binary indicators of past-month symptoms. To test unidimensionality, we conducted principal axis factoring on the baseline inter-item tetrachoric correlations using SAS 9.2. Factor extraction was based on the Kaiser–Guttman criterion (eigenvalues > 1.0), Scree plots, model fit statistics (CFI, TLI, RMSEA, and WRMR), and interpretability of item factor loadings for 1-, 2- and 3-factor models. Rasch analysis was conducted with Winsteps 3.73. The Rasch model estimates person and item severity on a single latent dimension (i.e., problematic substance use) using an equal-interval log-odds scale. The analysis differentiates between extreme persons, who endorse no or all items, and non-extreme persons. Item severity pertains to item location on the latent dimension; specifically, the point on the latent continuum where there is a 50% chance of endorsement. We constructed a Wright map to provide a visual representation of item and person severities, mapped on the same logit scale (ranging from −4 to +4). The map supplements numerical results by identifying points of severity for which there are many (or few) items, and showing how these relate to person-level severity in the sample.
Table 1 Baseline item prevalence and Rasch model parameters. Item
%
Severity
[When was the last time that…] 8 — your alcohol or drug use caused you to have repeated problems with the law? (Legal problems) 5 — your alcohol or drug use caused you to have numbness, tingling, shakes, blackouts, hepatitis, TB, sexually transmitted disease or any other health problems? (Physical health) 9 — you kept using alcohol or drugs even after you knew it could get you into fights or other kinds of legal trouble? (Trouble/fights) 7 — you used alcohol and drugs where it made the situation unsafe or dangerous for you, such as when you were driving a car, using a machine, or where you might have been forced into sex or hurt? (Hazardous use) 15 — your use of alcohol and drugs caused you to give up, reduce or have problems at important activities at work, school, home or social events? (Give up acts) 6 — you kept using alcohol or drugs even though you knew it was keeping you from meeting your responsibilities at work, school, or home? (Role failure) 10 — you needed more alcohol or drugs to get the same high or found that the same amount did not get you as high as it used to? (Tolerance) 11 — you had withdrawal problems from alcohol or drugs like shaking hands, throwing up, having trouble sitting still or sleeping, or that you used any alcohol or drugs to stop being sick or avoid withdrawal problems? (Withdrawal) 12 — you used alcohol or drugs in larger amounts, more often or for a longer time than you meant to? (Loss of control) 14 — you spent a lot of time either getting alcohol and drugs, using alcohol or drugs, or feeling the effects of alcohol or drugs (high or sick)? (Time consuming) 13 — you were unable to cut down or stop using alcohol or drugs? (Can't stop) 2 — your parents, family, partner, co-workers, classmates or friends complained about your alcohol or drug use? (Complaints) 1 — you tried to hide that you were using alcohol or drugs? (Hiding use) 4 — your alcohol or drug use caused you to feel depressed, nervous, suspicious, uninterested in things, reduced your sexual drive? (Mental health) 16 — you kept using alcohol or drugs even after you knew it was causing or adding to medical, psychological or emotional problems you were having? (Despite health) 3 — you used alcohol or drugs weekly? (Weekly use)
Model fit S.E.
InfitMnSq
PtBiserCorr
17.5 27.4
2.57 1.36
0.20 0.17
1.40 1.09
0.42 0.62
32.8
0.78
0.17
1.23
0.64
33.6
0.70
0.17
0.92
0.73
38.2
0.22
0.17
0.74
0.81
40.3
−0.02
0.17
0.68
0.83
40.3
−0.02
0.17
0.75
0.81
40.9
−0.07
0.17
0.98
0.76
41.7
−0.16
0.17
0.72
0.83
41.7
−0.16
0.17
0.76
0.82
42.5 45.7
−0.25 −0.61
0.17 0.18
0.92 1.40
0.78 0.67
46.0 46.2
−0.64 −0.67
0.18 0.18
1.62 0.93
0.61 0.78
46.8
−0.74
0.18
0.73
0.83
58.9
−2.29
0.20
1.16
0.70
N = 372 (244 non-extreme and 128 extreme persons); bolded values indicate infit mean square of b.50 or >1.50.
2282
C. Kenaszchuk et al. / Addictive Behaviors 38 (2013) 2279–2287
We used infit statistics and point biserial correlations to judge fit of the data to the Rasch model (Wright & Masters, 1982). Infit values of 0.50–1.50 indicate adequate model fit (Wright & Linacre, 1994). Point biserial correlations between each item and the total of the remaining items (i.e., indicating the strength of an item's association with the latent trait) should be positive (Linacre, 2011). After fitting the Rasch model, we tested the assumption of item local independence by conducting a principal components analysis of item residuals and examining their bivariate correlations. Local independence means that the underlying latent trait represents the only influence on item responses, such that responses are independent from one another once that influence is removed. It is satisfied if item residuals exhibit low correlations with one another (r b .30) (Wright, 1996) and do not yield a factor of their own (eigenvalues b 1.5; Linacre, 1998; Smith & Miao, 1994). We tested for DIF by gender, treatment modality (residential vs. outpatient), and assessment time (baseline vs. follow-up) using the Mantel–Haenszel chi-square statistic (α = .05) with the Hochberg procedure to adjust for multiple comparisons (Hochberg, 1988). To avoid interpreting statistically significant differences of insufficient magnitude to affect the interpretation of total scale scores, we used a threshold of 0.50 logits to identify meaningful DIF (Lai, Teresi, & Gershon, 2005; Tristan, 2006). Finally, to help judge the practical effect of DIF on observed scale scores (i.e., calculated as the sum of the 16 binary items), we calculated two effect size indices for DIF at the scale level (Meade, 2010). The signed test difference in the sample (STDS) is the expected group difference in observed scale means due to DIF. The expected test score standardized difference (ETSSD) can be interpreted according to Cohen's d standards for effect sizes (Cohen, 1988). We used VisualDF (Meade, 2010) to calculate these statistics. The analysis of DIF by assessment time relied on repeated measures of the SPS and, therefore, could be impacted by within-subject correlations over time. We used a recommended procedure to test whether non-independence of the observations over time impacted the results (Mallinson, 2011). This involved re-calculating the item severities using a new data file in which a randomly selected 50% of cases contributed baseline data and the remaining cases contributed follow-up data. That is, each subject was represented once in the random data file, contributing data from either baseline or follow-up, thus removing the possibility of within-person dependency. Next, we calculated person severities for the original baseline and follow-up data, with the item severities fixed or ‘anchored’ to the values obtained using the random data file. These person severities were compared to those in the original matched-data analysis (i.e., without using the anchored item severities from the random data file). Small differences between the two sets of person severities suggest that the non-independence of observations over time does not impact the results. Analyses were conducted using all available data at each time point (n = 534 at baseline and 417 at follow-up, excluding those with missing data for one or more SPS item) and on the subsample with complete SPS data at both time points (n = 372). Results were similar, and so we present results from the subsample with complete data to facilitate comparisons between time points. 3. Results 3.1. Unidimensionality and local independence Exploratory factor analyses of the baseline data (varimax and geomin rotations) yielded a one-factor solution (factor loadings > 0.60; CFI = 0.99, TLI = 0.99, RMSEA = 0.06, WRMR = 0.05). The first three eigenvalues were 12.37, 0.55, and 0.24. In a supplementary two-factor model, two items (Legal problems and Trouble/fights) had sufficient overlap to yield sizable factor loadings (>0.60) on a second factor. With the small number of items loading on this second factor (b3),
combined with the eigenvalues and model fit indices, results supported a one-factor solution. Principal components analysis of the item residuals after fitting the Rasch model also generally supported local independence. At baseline, the first residual component accounted for 1.8 units of variance, slightly exceeding the recommended threshold of 1.5. Factor loadings exceeded 0.40 for three items: Legal problems = 0.54, Complaints = 0.51, and Trouble/fights = 0.44. The largest correlation between residuals was 0.36, between Legal problems and Trouble/fights. At follow-up, the first residual component accounted for 1.9 units of variance. Factor loadings exceeded 0.40 for four items: Trouble/fights = 0.74, Legal problems = 0.57, Tolerance = 0.45, and Hazardous use = 0.45. The largest residual correlation was 0.33, again between Legal problems and Trouble/fights.
3.2. Item and person severity Item and person severity estimates are expressed as equal interval log-odds units so that the distance between 0 and − 1.00, for example, is equivalent to the distance between 1.00 and 2.00. In this Rasch model, the item severity estimates are points on the latent substance problem continuum at which the item has a 50% probability of being endorsed. At baseline, item endorsement ranged from 17% for Legal problems to 59% for Weekly use (Table 1). Items ranged in severity from − 2.29 for Weekly use to 2.57 for Legal problems. Many items were nonetheless closely spaced at the middle of the severity continuum. Role failure, Tolerance, Withdrawal, Loss of control, Time consuming, Can't stop, Complaints, Hiding use, Mental health and Despite health were clustered in the low negative range, while Hazardous use and Trouble/fights were clustered in the low positive range. Average person severity was − 0.93. Infit values indicated acceptable model fit for all items except Hiding use. Point biserial correlations were good, falling below 0.50 for only one item (Legal problems). Relative to baseline, item endorsement was lower at follow-up, ranging from 9% for Legal problems to 28% for Weekly use (Table 2). The range for item severities was also narrower (−1.67 to 1.38), and there was less clustering of items. For example, a number of items that clustered in the low negative range at baseline (Mental health, Complaints, and Hiding use) spread out to cover a greater range of severity at follow-up (−0.88 to 0.84). However, three items (Can't stop, Complaints, and Despite health) remained closely spaced in the low severity range. Average person severity was −2.69. Infit values were acceptable for all items except Withdrawal, and all point biserial correlations exceeded 0.50. Table 2 Follow-up item prevalence and Rasch model parameters. Item
%
Severity
8 — Legal problems 5 — Physical health 7 — Hazardous use 1 — Hiding use 15 — Give up acts 14 — Time consuming 10 — Tolerance 6 — Role failure 12 — Loss of control 9 — Trouble/fights 13 — Can't stop 2 — Complaints 16 — Despite health 4 — Mental health 11 — Withdrawal 3 — Weekly use
9.4 10.5 11.0 11.8 13.4 15.1 15.6 15.9 16.7 17.2 18.6 18.8 18.8 22.0 26.6 27.7
1.38 1.13 1.07 0.84 0.53 0.23 0.14 0.09 −0.05 −0.14 −0.35 −0.39 −0.39 −0.88 −1.53 −1.67
Model fit S.E.
InfitMnSq
PtBiserCorr
0.25 0.24 0.24 0.23 0.23 0.22 0.22 0.22 0.21 0.21 0.21 0.21 0.21 0.20 0.19 0.19
1.17 1.02 0.87 1.31 0.82 0.71 0.76 0.66 0.76 0.95 1.31 1.20 0.67 1.00 1.67 1.19
0.58 0.65 0.69 0.60 0.75 0.80 0.79 0.82 0.79 0.73 0.63 0.66 0.82 0.73 0.54 0.64
N = 372 (178 non-extreme and 194 extreme persons); bolded values indicate infit mean square of b.50 or >1.50.
C. Kenaszchuk et al. / Addictive Behaviors 38 (2013) 2279–2287
PERSON More severe
4
###
. ## 3
. ## 2
. ###
### 1
.# .# .# 0
.# . .#
-1
.# .#
.# -2
. ##
-3
. ###
Less severe
-4
#############
MAP + | | | | | | + | | | | | | + | | | | | | + | | | | | | + | | | | | | + | | | | | | + | | | | | | + | | | | | | +
2283
ITEM
Legal problems
Physical health
Hazardous use Trouble/fights
Give up acts Role failure Tolerance Withdrawal Loss control Time consuming Can’t stop Hiding use Complaints Mental health Despite health
Weekly use
# = 8 participants; . = 1–7 participants Fig. 1. Person–item map of baseline responses on the SPS (n = 372).
Comparing descriptively across the two time points, the hierarchical ordering of item severities was largely constant. Eight items did not change hierarchical order over time. Four items (Trouble/fights, Withdrawal, Time consuming, and Hiding use) shifted by more than two rank positions in the hierarchical ordering of item severities over time. The Wright map (Fig. 1) displays the equal interval log-odds scale that is common to both persons and items as a dashed vertical line separating persons on the left (signified by the symbols “.” and “#”) from item labels on the right. The logit scale range is given to the left of these as numbers from −4 to 4, with higher scores corresponding to higher levels of severity. The figure illustrates the number of persons
who would have a 50% likelihood of endorsing scale items at given points on the log-odds scale. Items that are ‘easier’ to endorse (i.e., those which correspond to a lower level of substance problem severity) are located closer to the bottom of the scale. Many participants have 50% probability of endorsing Weekly use, while only those with the most severe substance problems appear to endorse Legal problems. It is evident from Fig. 1 that items do not span the entire range of person severity in this sample. At the extreme low end of problem severity there were no items but many persons (i.e., 104 participants, 28% of the sample). Many items clustered between −1 and + 1, while at the upper end of the distribution just one item targeted
2284
C. Kenaszchuk et al. / Addictive Behaviors 38 (2013) 2279–2287
Table 3 Baseline item prevalence and differential item functioning (DIF) by sex and treatment modality. Item
Sex
Treatment modality
Male
8 — Legal problems 5 — Physical health 7 — Hazardous use 9 — Trouble/fights 15 — Give up acts 6 — Role failure 14 — Time consuming 11 — Withdrawal 12 — Loss of control 10 — Tolerance 13 — Can't stop 2 — Complaints 16 — Despite health 4 — Mental health 1 — Hiding use 3 — Weekly use
Female
Residential
Outpatient
%
Severity
%
Severity
DIF
%
Severity
%
Severity
DIF
15.3 18.2 28.2 29.2 32.5 34.5 35.9 35.9 36.8 36.8 39.7 40.7 40.7 42.6 43.5 55.5
2.28 1.89 0.74 0.64 0.29 0.09 −0.06 −0.07 −0.16 −0.16 −0.47 −0.57 −0.57 −0.77 −0.88 −2.26
21.2 41.0 41.7 39.1 46.8 49.4 50.6 49.4 48.7 46.2 46.8 54.5 55.8 51.9 50.0 63.5
2.90 0.73 0.66 0.92 0.11 −0.18 −0.33 −0.18 −0.10 0.18 0.11 −0.81 −0.98 −0.48 −0.25 −2.19
−0.61 1.16 0.08 −0.28 0.17 0.27 0.27 0.10 −0.06 −0.35 −0.58 0.24 0.41 −0.29 −0.62 −0.07
20.7 36.0 44.1 37.8 50.5 53.6 53.6 51.8 53.2 53.2 54.5 53.6 59.5 58.6 50.5 62.2
2.96 1.38 0.63 1.22 −0.02 −0.37 −0.37 −0.16 −0.32 −0.32 −0.48 −0.37 −1.12 −0.99 −0.02 −1.51
12.7 14.7 18.0 25.3 20.0 20.7 24.0 24.7 24.7 21.3 24.7 34.0 28.0 28.0 39.3 54.0
1.62 1.31 0.84 −0.02 0.59 0.51 0.13 0.06 0.06 0.43 0.06 −0.92 −0.30 −0.30 −1.46 −3.10
1.34 0.08 −0.22 1.23 −0.61 −0.88 −0.50 −0.22 −0.38 −0.75 −0.53 0.55 −0.82 −0.69 1.44 1.59
N = 372 [by sex: 141 non-extreme males (out of 209) and 98 non-extreme females (out of 156); by treatment modality: 148 non-extreme residential cases (out of 222) and 96 non-extreme outpatient cases (out of 150)]; bolded values indicate DIF > .50 logits and is statistically significant; positive DIF means that an item is more severe for males than females, or more severe for residential patients than outpatients.
about 40 persons. All of these participants' severities were higher than the top item's difficulty. The same general findings emerged from a map of the follow-up data (not shown); however, item severities were higher and the person severity distribution shifted substantially towards the lower end of the continuum. At follow-up, less than half of participants had at least a 50% probability of endorsing any single item, indicating that the scale tapped a higher level of severity than was exhibited by participants. 3.3. Differential item functioning At baseline, items were endorsed more frequently by women than men, and by those in residential relative to outpatient treatment (Table 3). In terms of item contribution to the underlying construct of substance problem severity, however, there was no evidence of DIF by gender. For treatment modality, Weekly use was significantly more severe for residential than for outpatient clients. At follow-up, item endorsement was fairly equal by gender and
tended to be higher among those in outpatient relative to residential treatment (Table 4). There was significant DIF by gender for Weekly use, the item being more severe for females. Hiding use and Weekly use were more severe for residential than outpatient participants, while Withdrawal was more severe for outpatient than for residential participants. The test for the impact of within-subject correlations supported the validity of the DIF analysis comparing the baseline and follow-up scores. Specifically, the person severities calculated using the item severities from the random data file consisting of unmatched baseline and follow-up data differed from the person severities calculated using the original, matched responses by less than 0.20 logits (available from the author upon request). These small differences between the two sets of person severity estimates indicate that non-independence of the observations did not affect the results (plots comparing the anchored and original person severities for the first 10 subjects are presented in the Appendix A). Across time points, item endorsement was higher at baseline relative to follow-up (Table 5). Significant DIF
Table 4 Follow-up item prevalence and differential item functioning (DIF) for sex and treatment modality. Item
Sex
Treatment modality
Male
5 — Physical health 7 — Hazardous use 8 — Legal problems 1 — Hiding use 15 — Give up acts 6 — Role failure 10 — Tolerance 14 — Time consuming 9 — Trouble/fights 12 — Loss of control 16 — Despite health 13 — Can't stop 2 — Complaints 4 — Mental health 11 — Withdrawal 3 — Weekly use
Female
Residential
Outpatient
%
Severity
%
Severity
DIF
%
Severity
%
Severity
DIF
8.6 9.6 10.1 12.4 13.9 15.3 16.3 16.3 17.2 18.2 18.2 18.7 19.6 21.5 24.9 32.1
1.54 1.31 1.21 0.73 0.47 0.22 0.07 0.07 −0.09 −0.24 −0.24 −0.31 −0.46 −0.74 −1.22 −2.19
13.5 12.8 9.0 10.9 13.5 17.3 15.4 14.1 18.0 15.4 19.9 19.2 17.9 23.1 30.1 21.8
0.59 0.72 1.63 1.15 0.59 −0.13 0.21 0.46 −0.24 0.21 −0.56 −0.45 −0.24 −1.05 −2.01 −0.86
0.95 0.59 −0.42 −0.42 −0.12 0.36 −0.15 −0.40 0.15 −0.45 0.32 0.14 −0.22 0.31 0.80 −1.34
10.4 11.3 9.9 6.8 13.5 13.5 14.9 12.6 13.5 13.1 14.9 16.2 17.1 16.2 28.8 12.2
1.00 0.73 1.14 2.19 0.08 0.09 −0.28 0.33 0.08 0.20 −0.28 −0.62 −0.83 −0.62 −2.96 0.33
10.7 10.0 8.7 19.3 13.3 19.3 16.7 18.7 22.7 22.0 24.7 22.0 21.3 30.7 23.3 50.7
1.23 1.34 1.58 0.10 0.83 0.09 0.41 0.18 −0.26 −0.19 −0.46 −0.19 −0.12 −1.05 −0.33 −3.08
−0.23 −0.62 −0.44 2.09 −0.76 0.00 −0.69 0.15 0.33 0.39 0.18 −0.43 −0.71 0.43 −2.63 3.41
N = 372 [by sex: 99 non-extreme males (out of 209) and 76 non-extreme female (out of 156); by treatment modality: 84 non-extreme residential cases (out of 222) and 94 non-extreme outpatient cases (out of 150)]; bolded values indicate DIF > .50 logits and is statistically significant; positive DIF means that an item is more severe for males than females, or for residential patients than outpatients.
C. Kenaszchuk et al. / Addictive Behaviors 38 (2013) 2279–2287 Table 5 Item prevalence and differential item functioning (DIF) by time point. Item
Baseline
8 — Legal problems 5 — Physical health 9 — Trouble/fights 7 — Hazardous use 15 — Give up acts 6 — Role failure 10 — Tolerance 11 — Withdrawal 12 — Loss of control 14 — Time consuming 13 — Can't stop 2 — Complaints 1 — Hiding use 4 — Mental health 16 — Despite health 3 — Weekly use
Follow-up
%
Severity
%
Severity
17.5 27.4 32.8 33.6 38.2 40.3 40.3 40.9 41.7 41.7 42.5 45.7 46.0 46.2 46.8 58.9
2.53 1.33 0.77 0.68 0.21 −0.02 −0.02 −0.07 −0.16 −0.16 −0.25 −0.60 −0.63 −0.66 −0.72 −2.24
9.4 10.5 17.2 10.8 13.4 15.9 15.6 26.6 16.7 15.1 18.6 18.8 11.8 22.0 18.8 27.7
1.37 1.13 −0.14 1.07 0.52 0.08 0.13 −1.53 −0.05 0.22 −0.36 −0.40 0.84 −0.89 −0.40 −1.68
DIF
1.15 0.21 0.91 −0.38 −0.31 −0.10 −0.14 1.46 −0.10 −0.38 0.12 −0.20 −1.47 0.23 −0.32 −0.57
N = 372 [244 non-extreme baseline cases (out of 372) and 178 non-extreme follow-up cases (out of 372)]; bolded values indicate DIF > .50 logits and is statistically significant; positive DIF means that an item is more severe at baseline than at follow-up.
was observed for two items. Hiding use was more severe at follow-up, while Withdrawal was more severe at baseline. We examined the impact of DIF by comparing average person severity at baseline and follow-up with and without the items exhibiting DIF. Omitting Weekly use, mean person severity at baseline decreased by only 0.04 logits, from −0.93 to −0.97. Omitting Weekly use at follow-up resulted in a smaller increase of 0.01 logits, from −2.70 to −2.69. Similar small differences were shown in mean person severity with the omission of Hiding use and Withdrawal (0.05 and 0.01 logits, respectively). Effect size indices indicated minimal impact of DIF on observed scale scores (Table 6). For instance, across modalities at follow-up, the difference in observed scale means due to DIF (STDS) was 0.045 points (on the 16-point scale), while the expected scale score difference due to DIF (ETSSD) was 0.015 standard deviation units larger for outpatients. Findings of similarly small magnitude were noted for the baseline and follow-up comparison. 4. Discussion This study replicates and extends prior psychometric evaluation of the SPS using Rasch analyses in a heterogeneous sample of adults entering addiction treatment in Canada. With the analyses focused on the binary indicators of problem and symptom counts, results should not be generalized to applications that use scoring procedures for the 3- or 4-point versions of the SPS items. Our results nonetheless support prior work, as well as contributing novel information on scale performance across treatment modalities and time points. First, findings support the unidimensionality of the SPS, tapping a single underlying latent dimension of substance problem severity. This is consistent with results from a prior study of the SPS (Conrad et al., 2007, 2011), and with analyses of substance problems in clinical and community samples (Borges et al., 2011; Harford et al., 2009; Hasin et al., 2012; Kahler & Table 6 Scale-level effect size indices for DIF. Test
Residential vs. outpatient, at follow-upa
Baseline vs. follow-upb
STDS ETSSD
0.045 0.015
−0.061 −0.019
STDS = signed test difference in the sample; ETSSD = expected test score standardized difference. a Reference category = residential. b Reference category = baseline.
2285
Strong, 2006; Krueger et al., 2004; Ray et al., 2008; Saha et al., 2007, 2012; Wu et al., 2012). Good fit of the Rasch model suggests that this self-report measure can index global substance severity in an additive manner for men and women in Canadian residential and outpatient settings. There was, however, evidence of item redundancy, with a clustering of dependence items around the mean of item severity, and a lack of items in the high and low severity ranges of the continuum. This is also broadly consistent with prior Rasch analysis (Conrad et al., 2011). Replication of these findings suggests opportunities for scale improvement, including the removal of redundant items and the addition of high and low severity items to provide better coverage of the full range of clinical severity. Similar to prior analyses, this analysis failed to reproduce the hierarchical conceptualization of DSM-IV symptoms of abuse and dependence (Harford et al., 2009; Kahler & Strong, 2006; Ray et al., 2008; Saha et al., 2007; Wu et al., 2012). The hierarchy presumes that symptoms and behaviors are differentially relevant to determining the degree of a person's substance-related problems. At baseline, the low end of the severity continuum was dominated by indicators of problematic consumption and substance-induced symptoms, although one dependence criterion (Despite health) was also located in this range. The majority of the upper severity range was represented by criteria for abuse rather than dependence, plus one item representing a substance-induced symptom (Physical health). Dependence symptoms tended to cluster in the mid-severity range. The presumed hierarchy was even more disorganized at follow-up, where the low severity range became further characterized by dependence criteria. Together, these studies suggest that these criteria do not cluster together as proposed into qualitatively distinct entities. Such findings provide valuable empirical evidence for ongoing debates surrounding proposed changes to substance use disorders in DSM-5. In particular, the high severity and low endorsement of substance-related legal problems, and its relatively low correlation with the latent trait at baseline, is consistent with prior research that supports its likely exclusion from the DSM-V (Hasin et al., 2012; Saha et al., 2012). Nonetheless, any decision to omit this item from the SPS, to maintain consistency with DSM-5 criteria, should be weighed against implications for the scale's ability to index severity at the high end of the continuum. Overall, item performance was good; however, a number of findings warrant attention. Hiding use at baseline and Withdrawal at follow-up had high infit mean square values, indicating slightly poor fit to the model. The prior Rasch analysis of the SPS conducted using US treatment data also identified a high infit value for Hiding use (Conrad et al., 2011). Although there were no highly locally dependent items, residual analysis on the follow-up data was modestly suggestive of a second dimension represented by Trouble/fights and Legal problems. In this respect, it is notable that the Trouble/fights item has been reworded in a more recent version of the SPS to refer to continued use “… even though it was causing social problems, leading to fights, or getting you into trouble with other people” (Conrad et al., 2011). The wording of this item in our version, which does not reference social problems, may have resulted in too much overlap with the Legal problems item. We found instances of DIF by gender and by treatment modality. Six weeks after beginning treatment, men were more likely than women with equal levels of problem severity to use weekly or more often. At both time points, weekly use was more severe in the residential setting, indicating that residential clients were less likely than outpatient clients at a given level of severity to endorse the item. At follow-up, for clients at a given level of severity, those in the residential setting were less likely to endorse having tried to hide their substance use and more likely to endorse experiencing withdrawal in the past month. The latter findings make intuitive sense, considering the restricted access to substances in residential treatment. The majority of SPS items had equal item severity calibrations at baseline and follow-up. However, time-based DIF was discovered for
C. Kenaszchuk et al. / Addictive Behaviors 38 (2013) 2279–2287
two items: Hiding use and Withdrawal. Noted above, these same two items exhibited model misfit (i.e., high infit mean square values). The DIF results suggest that their meanings also depend on the timing of measurement. For a given level of severity, clients were less likely to endorse having tried to hide their substance use prior to follow-up (relative to baseline) and more likely to endorse experiencing withdrawal in the past month. The suitability of a scale as a measure of change over time depends on its ability to function similarly at various points throughout and following treatment. While we can reasonably expect participants' problem severity to change over time, we would not expect the item severity hierarchy to change. Unstable item severity calibrations indicate that, holding severity constant, the probability of endorsing the item is conditional on measurement timing. Such conditionality indicates potential problems with construct validity. That said, the net effect of DIF appeared to be inconsequential. There were no clear results indicating that any items should be used only for men or women. Analyses excluding the items showing setting or time-based DIF suggested negligible practical effects. In observed scale and standardized metrics, the DIF effects were small and would have almost no discernible impact on obtained scale scores (i.e., less than half a point on the scale's summated score range of 0–16). There was cancelation of DIF across persons and items, rendering its net effect unimportant. Researchers who use the scale in residential settings should recognize that it is difficult for clients to endorse Hiding use and Weekly use; these items could be irrelevant to residential treatment. Nonetheless, it should not be problematic to retain them, as there would be no net effect on the internal and external validity of conclusions. The question of greatest importance is whether the SPS is free of bias in its intended uses. Our results support its use for questions and comparisons that involve gender, treatment modality, and withinsubject changes over the course of treatment. Average person severity measured with the SPS declined almost 2 logits from baseline to follow-up, indicating substantial declines in problem severity in the initial weeks of treatment. A sizeable proportion of the sample was unlikely to endorse any of the items at follow-up. From the perspective of treatment evaluation, this is a desirable result. However, from the Rasch model perspective, it comes at the cost of introducing imprecision into the item severity estimates. This is seen in the estimates' standard errors, which were 20%–50% larger at follow-up than at baseline. The sample was rather poorly matched to the item severities at follow-up, such that substance problem severity was measured less accurately at follow-up than at baseline.
Given the growing popularity of the GAIN assessment instruments in clinical services and research (Dennis et al., 2006), future work should investigate potential opportunities for trimming redundant items on the SPS and expanding the range of clinical severity that can be reliably measured over the course of treatment. Role of funding sources This research was supported by a grant from the Canadian Institutes of Health Research (CIHR MOP 96662). Dr. Wild was supported by a Health Scholar Award from Alberta Innovates: Health Solutions during this research. Support to CAMH for salary of scientists and infrastructure has been provided by the Ontario Ministry of Health and Long Term Care (MOHLTC). CIHR, Alberta Innovates, and MOHLTC had no further roles in study design; data collection, analysis, or interpretation; writing of the report; or the decision to submit the paper for publication. Contributors All authors contributed to the design of the psychometric study. Drs. Urbanoski, Rush, and Wild were involved with data collection. Mr. Kenaszchuk conducted the analysis. All authors contributed to the interpretation of the results. Mr. Kenaszchuk and Dr. Urbanoski conducted the literature search and wrote the first draft of the manuscript. All authors contributed to and approved the final manuscript. Conflict of interest No conflict is declared.
Appendix A. Impact of within-subject correlations on repeated measures DIF: a comparison of “anchored” and “unanchored” person severities for the first 10 subjects
Baseline person abilities: anchored and unanchored 5.0 4.0 3.0 2.0
Ability (logits)
2286
0.0 1
2
3
4
5
6
7
8
9
10
Mean all
Anchored Unanchored
-1.0 -2.0 -3.0 -4.0 -5.0
Follow-up person abilities: anchored and unanchored
4.1. Strengths and limitations
0.5 0.0 1
2
3
4
5
6
7
8
9
10
Mean all
-0.5 -1.0
Ability (logits)
Strengths include the heterogeneous sample, including clients with a range of problem severity, and the prospective study design, permitting an analysis of item performance and person–item match over time. However, sample attrition was relatively high at 28%, and comparisons indicated that the follow-up sample differed from the full sample recruited at baseline in a number of ways. That said, SPS scores at baseline (i.e., count of past-month problems) did not differ between study completers and drop-outs. It is also notable that, because the two participating treatment centers are located in different provinces, their services are covered by separate, although comparable, universal health insurance plans. As a result, the comparisons by modality (i.e., residential vs. outpatient) are potentially confounded by unmeasured systemic and regional factors, and replication is warranted.
1.0
-1.5 -2.0 -2.5
Anchored Unanchored
-3.0 -3.5 -4.0 -4.5 -5.0
4.2. Conclusions References This study contributes to the evidence base supporting the use of the SPS for clinical assessment and outcome evaluation. There was, however, a discord in problem severity between persons and items, and the items covered only a fairly narrow range of problem severity.
Borges, G., Cherpitel, C. J., Ye, Y., Bond, J., Cremonte, M., Moskalewicz, J., et al. (2011). Threshold and optimal cut-points for alcohol use disorders among patients in the emergency department. Alcoholism, Clinical and Experimental Research, 35, 1270–1276.
C. Kenaszchuk et al. / Addictive Behaviors 38 (2013) 2279–2287 Chan, Y. F., Dennis, M. L., & Funk, R. R. (2008). Prevalence and comorbidity of major internalizing and externalizing problems among adolescents and adults presenting to substance abuse treatment. Journal of Substance Abuse Treatment, 34, 14–24. Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Erlbaum. Conrad, K. J., Conrad, K. M., Dennis, M. L., Riley, B. B., & Funk, R. R. (2011). Validation of the Substance Problem Scale (SPS) to the Rasch measurement model, GAIN methods report 1.2. Chicago, IL: Chestnut Health Systems (Retrieved 12/05/12 from: http://www.gaincc.org/psychometrics-publications/working-papers/) Conrad, K. J., Dennis, M. L., Bezruczko, N., Funk, R. R., & Riley, B. B. (2007). Substance use disorder symptoms: Evidence of differential item functioning by age. Journal of Applied Measurement, 8, 373–387. Dennis, M. L., Funk, R., Godley, S. H., Godley, M. D., & Waldron, H. (2004). Cross-validation of the alcohol and cannabis use measures in the Global Appraisal of Individual Needs (GAIN) and Timeline Followback (TLFB; Form 90) among adolescents in substance abuse treatment. Addiction, 99(Suppl. 2), 120–128. Dennis, M. L., White, M., Titus, J. C., & Unsicker, J. (2006). Global Appraisal of Individual Needs: Administration guide for the GAIN and related measures (Version 5). Bloomington, IL: Chestnut Health Systems. Garner, B. R., Godley, M. D., Funk, R. R., Dennis, M. L., & Godley, S. H. (2007). The impact of continuing care adherence on environmental risks, substance use, and substance-related problems following adolescent residential treatment. Psychology of Addictive Behaviors, 21, 488–497. Garner, B. R., Godley, M. D., Funk, R. R., Lee, M. T., & Garnick, D. W. (2010). The Washington Circle continuity of care performance measure: Predictive validity with adolescents discharged from residential treatment. Journal of Substance Abuse Treatment, 38, 3–11. Godley, M. D., Kahn, J. H., Dennis, M. L., Godley, S. H., & Funk, R. R. (2005). The stability and impact of environmental factors on substance use and problems after adolescent outpatient treatment for cannabis abuse or dependence. Psychology of Addictive Behaviors, 19, 62–70. Harford, T. C., Yi, H. Y., Faden, V. B., & Chen, C. M. (2009). The dimensionality of DSM-IV alcohol use disorders among adolescent and adult drinkers and symptom patterns by age, gender, and race/ethnicity. Alcoholism, Clinical and Experimental Research, 33, 868–878. Hasin, D. S., Fenton, M. C., Beseler, C., Park, J. Y., & Wall, M. M. (2012). Analyses related to the development of DSM-5 criteria for substance use related disorders: 2. Proposed DSM-5 criteria for alcohol, cannabis, cocaine and heroin disorders in 663 substance abuse patients. Drug and Alcohol Dependence, 122, 28–37. Hilton, T. F. (2011). The promise of PROMIS(R) for addiction. Drug and Alcohol Dependence, 119, 229–234. Hochberg, Y. (1988). A sharper Bonferroni procedure for multiple tests of significance. Biometrika, 75, 800–802. Kahler, C. W., Hoeppner, B. B., & Jackson, K. M. (2009). A Rasch model analysis of alcohol consumption and problems across adolescence and young adulthood. Alcoholism, Clinical and Experimental Research, 33, 663–673. Kahler, C. W., & Strong, D. R. (2006). A Rasch model analysis of DSM-IV alcohol abuse and dependence items in the National Epidemiological Survey on Alcohol and Related Conditions. Alcoholism, Clinical and Experimental Research, 30, 1165–1175.
2287
Krueger, R. F., Nichol, P. E., Hicks, B. M., Markon, K. E., Patrick, C. J., Iacono, W. G., et al. (2004). Using latent trait modeling to conceptualize an alcohol problems continuum. Psychological Assessment, 16, 107–119. Lai, J. S., Teresi, J., & Gershon, R. (2005). Procedures for the analysis of differential item functioning (DIF) for small sample sizes. Evaluation & the Health Professions, 28, 283–294. Linacre, J. M. (1998). Detecting multidimensionality: Which residual data-type works best? Journal of Outcome Measurement, 2, 266–283. Linacre, J. M. (2011). A user's guide to WINSTEPS. Chicago, IL: MESA. Mallinson, T. (2011). Rasch analysis of repeated measures. Rasch Measurement Transactions, 25, 1317 Retrieved 12/05/12 from: http://www.rasch.org/rmt/rmt251b.htm Meade, A. W. (2010). A taxonomy of effect size measures for the differential functioning of items and scales. The Journal of Applied Psychology, 95, 728–743. Ray, L. A., Kahler, C. W., Young, D., Chelminski, I., & Zimmerman, M. (2008). The factor structure and severity of DSM-IV alcohol abuse and dependence symptoms in psychiatric outpatients. Journal of Studies on Alcohol and Drugs, 69, 496–499. Rush, B. R., Dennis, M. L., Scott, C. K., Castel, S., & Funk, R. R. (2008). The interaction of co-occurring mental disorders and recovery management checkups on substance abuse treatment participation and recovery. Evaluation Review, 32, 7–38. Saha, T. D., Compton, W. M., Chou, S. P., Smith, S., Ruan, W. J., Huang, B., et al. (2012). Analyses related to the development of DSM-5 criteria for substance use related disorders 1. Toward amphetamine, cocaine and prescription drug use disorder continua using Item Response Theory. Drug and Alcohol Dependence, 122, 38–46. Saha, T. D., Stinson, F. S., & Grant, B. F. (2007). The role of alcohol consumption in future classifications of alcohol use disorders. Drug and Alcohol Dependence, 89, 82–92. Scott, C. K., & Dennis, M. L. (2009). Results from two randomized clinical trials evaluating the impact of quarterly recovery management checkups with adult chronic substance users. Addiction, 104, 959–971. Smith, R. M., & Miao, C. Y. (1994). Assessing unidimensionality for the Rasch measurement. In M. Wilson (Ed.), Objective measurement: Theory and practice, Vol. 2. (pp. 316–327) Norwood, NJ: Ablex. Tristan, A. (2006). An adjustment for sample size in DIF analysis. Rasch Measurement Transactions, 20, 1070–1071. Urbanoski, K. A., & Wild, T. C. (2012). Assessing self-determined motivation for addiction treatment: Validity of the Treatment Entry Questionnaire. Journal of Substance Abuse Treatment, 43, 70–79. Wild, T. C., Urbanoski, K., Wolfe, J., & Rush, B. (2011). Motivational impact of social control and coercion on early engagement in addiction treatment. Paper presented at the Issues of Substance (National Addiction Conference), Vancouver, BC. Wright, B. D. (1996). Local dependency, correlations and principal components. Rasch Measurement Transactions, 10, 509–511. Wright, B. D., & Linacre, J. M. (1994). Reasonable mean-square fit values. Rasch Measurement Transactions, 8, 370. Wright, B. D., & Masters, G. N. (1982). Rating scale analysis. Chicago, IL: MESA. Wu, L. T., Woody, G. E., Yang, C., Pan, J. J., Reeve, B. B., & Blazer, D. G. (2012). A dimensional approach to understanding severity estimates and risk correlates of marijuana abuse and dependence in adults. International Journal of Methods in Psychiatric Research, 21, 117–133.