Journal of Substance Abuse Treatment 32 (2007) 11 – 17
Regular article
An examination of the Motivational Interviewing Treatment Integrity code Heather M. Pierson, (M.A.)a,4, Steven C. Hayes, (Ph.D.)d, Elizabeth V. Gifford, (Ph.D.)a, Nancy Roget, (M.S.)c, Michele Padilla, (M.A.)c, Richard Bissett, (Ph.D.)c, Kristen Berry, (M.A.)c, Barbara Kohlenberg, (Ph.D.)b, Robert Rhode, (Ph.D.)e, Gary Fisher, (Ph.D.)c a Veteran Affairs Palo Alto Health Care System, Palo Alto, CA 94025, USA Department of Pschiatry, University of Nevada, School of Medicine, Reno, NV 89557, USA c Center for the Application of Substance Abuse Technologies, University of Nevada, Reno, NV 89557, USA d Department of Psychology, University of Nevada, School of Medicine, Reno, NV 89557, USA e University of Arizona, College of Medicine, Tucson, AZ 85724, USA b
Received 10 January 2006; received in revised form 19 June 2006; accepted 3 July 2006
Abstract This study examines the reliability of the Motivational Interviewing Treatment Integrity (MITI) code, a brief scale designed to evaluate the integrity of the use of motivational interviewing (MI). Interactions between substance abuse counselors with one person role-playing a client were audiotaped and scored by trained teams of graduate and undergraduate students. Segments of 10 minutes and 20 minutes were compared and found to yield the same reliability and integrity results. Interrater reliability showed good-to-excellent results for each MITI item even with undergraduate raters. Correlations between items showed a coherent pattern of interitem correlations. The MITI is a good measure of treatment integrity for MI and seems superior to existing measures when indicators of client behavior are not needed. D 2007 Elsevier Inc. All rights reserved. Keywords: Motivational interviewing; Treatment integrity; Adherence and competence
1. Introduction Motivational interviewing (MI) is a directive, humanistic approach that focuses on facilitating clients’ own motivation through resolution of their ambivalence toward change (Rollnick & Miller, 1995). An MI treatment approach emphasizes a nonjudgmental empathic stance and proscribes confrontation and advising the client without his or her permission. MI has been gaining increasing attention and support as an effective intervention for many different problem behaviors, both when used alone and when added to existing treatments (Burke, Arkowitz, & Menchola, 2003). However, many studies purportedly using MI have not 4 Corresponding author. Veteran Affairs Palo Alto Health Care System, Program Evaluation and Resource Center, 794 Willow Road (152MPD), Menlo Park, CA 94025, USA. Tel.: +1 650 493 5000x22354; fax: +1 650 617 2736. 0740-5472/07/$ – see front matter D 2007 Elsevier Inc. All rights reserved. doi:10.1016/j.jsat.2006.07.001
measured the integrity of the intervention being delivered (Burke et al., 2003). As a result, it is not known whether the treatment providers were actually using MI techniques and whether they are doing so effectively. According to the developers of MI, some studies examining the therapy even include descriptions of techniques that contradict the MI style (Rollnick & Miller, 1995). As MI has gained support, the desire for training has increased, and evaluations of the effectiveness of such training have begun (Miller & Mount, 2001; Miller, Yahne, Moyers, Martinez, & Pirritano, 2004). One major issue that has to be addressed when evaluating the training is how to measure change in the treatment providers’ behavior. This includes answering the questions of whether the providers are using MI effectively and whether they are still exhibiting proscribed behaviors. Additionally, researchers would want to answer the same questions when evaluating the integrity of the treatment delivered in randomized control trials.
12
H.M. Pierson et al. / Journal of Substance Abuse Treatment 32 (2007) 11 – 17
Several issues must be considered when developing a measure of treatment integrity/adherence (Waltz, Addis, Koerner, & Jacobson, 1993). One such issue is the purpose of the scale. For example, is the scale meant to solely measure change in the provider’s behavior or is it meant to look at the dynamic between the provider and client? Another issue to consider is the length of time consumed using the scale because a time-consuming scale may be less likely to be used in both training and research. One final issue is how experienced do the users of the scale need to be both in treatment in general and in the specific treatment being examined. This issue also has clinical and research applications. A clinic implementing a new treatment is not likely to have access to raters highly experienced in the new treatment, and research studies save resources if they are able to use less experienced raters. Several scales have been developed to measure adherence to MI. The first two systematic attempts at creating MI adherence and competence scales are the Motivational Interviewing Skills Code (MISC; Miller & Mount, 2001) and the Motivational Interviewing Process Code (MIPC; Barsky & Coleman, 2001). More recently, the Motivational Interviewing Supervision and Training Scale (MISTS) was developed (Madson, Campbell, Barrett, Brondino, & Melchert, 2005). The MISC includes measures for the behaviors of both the interviewer and the client and requires that raters listen to each interview sample three times, making it very time-consuming. In a recent study examining the MISC, interrater reliability was found to be good for the global ratings and for the measure of overall adherence to MI (Moyers, Martin, Catley, Harris, & Ahluwalia, 2003). However, good interrater reliability was harder to obtain for the behavior frequencies, especially the MI inconsistent items. The authors concluded that a simpler scale might produce higher interrater agreement. In addition, some components may not be necessary to answer certain questions regarding MI integrity. For example, the client behavior codes are not relevant to whether the interviewer adopted an MI style. The MIPC and MISTS only measure the interviewer’s behavior, and the MIPC does not include behavior counts. A study examining the MIPC found that although raters were highly trained (i.e., two faculty and one graduate student, all having prior familiarity with MI), interrater reliability was lower than expected (i.e., 51% for functional skills and 75% for dysfunctional skills; Barsky & Coleman, 2001). The MIPC, however, is less time-consuming than the MISC because it only requires listening to the behavior sample one time and because the behavior samples used were only 10 minutes long. A study examining the MISTS obtained interrater reliability scores ranging from fair to excellent for the different items on the scale (intraclass correlation coefficient [ICC]=.41 to .81; Madson et al., 2005). In this study, they used raters with experience in therapy and supervision but without previous experience in MI. Addi-
tionally, the scale requires only a one-time session and is thus less time-consuming than the MISC. However, the amount of time rated was not reported and appears to have been the entire session. The Motivational Interviewing Treatment Integrity (MITI) code was developed to solve some of these problems (Moyers, Martin, Manuel, Hendrickson, & Miller, 2005; Moyers, Martin, Manuel, & Miller, 2003). The MITI does not rate client behaviors and includes only two global scores: empathy and the overall display of the spirit of MI. It also includes fewer frequencies of specific behaviors including giving information, MI adherent statements (i.e., asking permission before giving advice, affirming the client, emphasizing the client’s autonomy, or supportive statements), MI nonadherent statements (i.e., advising without permission, confronting, or directing through commands), closed questions, open questions, simple reflections, and complex reflections, making it a simpler scale than the MISC and MISTS. Similar to the MISC, the separate behavior frequencies can be combined to answer additional questions; for example, what is the rate of the total number of questions to total number of reflections? Once the raters are sufficiently trained, all ratings can be obtained with only one pass through the interview sample, making it less timeconsuming than the MISC. The only published data on the validity and reliability of the MITI are found in Moyers et al. (2005). The MITI was created through the use of exploratory factor analysis of MISC ratings. By condensing MISC items into MITI items, more than half of the variability in the MISC was accounted for by this simpler measure. They analyzed the MITI’s interrater reliability with one graduate student and two undergraduate students and found mostly good-to-excellent interrater reliability. Finally, they showed that the MITI was sensitive to clinicians’ behavior change. This study expands on what was reported in Moyers et al. (2005) by considering how long the interview samples must be to obtain reliable information and by examining interitem relationships on the MITI. It also provides preliminary evidence about the generalization of their reliability to teams not involved in the development of the measure. This study used role-play samples of substance abuse counselors being trained in MI as part of a larger study. Moyers et al. (2005) used 20-minute segments, but that is fairly long for many counselors in early training. In this study, some participants voluntarily ended the interviews before the full 20-minute segment was completed, despite instructions to persist for the full period. For that reason, a random sample of the available full-length tapes was selected, and both 10- and 20-minute segments were scored to see if the results were comparable. Raters with little or no experience concerning therapeutic interventions were trained to use the scale. If the MITI produces coherent and reliable data even under these conditions (i.e., short sample of behavior, raters being inexperienced in counseling
H.M. Pierson et al. / Journal of Substance Abuse Treatment 32 (2007) 11 – 17
and MI), then a very practical scoring system would be available for MI research.
2. Methods 2.1. Source of the tapes The tapes were obtained through a study conducted by the Nevada Practice Improvement Collaboration at the University of Nevada, Reno. Participants provided informed consent, and the study was conducted under the approval of the University of Nevada, Reno Office of Human Research Protection. The study examined the effects of providing additional motivation, enhancing training on substance abuse counselors’ acquisition of MI. All study counselors received a 2-day MI workshop in addition to one of three 4-hour motivational trainings. The tapes were collected before the counselors received any training, after the full training, and at a 2-month follow-up. Two counselors were paired for each role-play, with both taking turns as bclientQ and interviewer. Each of the 86 counselors who participated in the initial study was scheduled for pre-, post-, and follow-up taped interviews, for a possible total of 258 tapes. Some tapes (n = 30) were either missing or not created (e.g., due to the participant being absent, equipment failure, not turning on the tape recorder) or were available but not included (n = 22) either because the participant stopped role-playing before at least 10 minutes had passed or because the tape was inaudible. The final sample with at least a 10-minute interview was 206 tapes.
13
The counselors were asked to do 20-minute segments; however, many counselors ended the interviews early. In the results below, a random sample of 39 of the full-length tapes available were scored for both the first 10 minutes and the full 20 minutes to see if the adherence results were comparable (see Fig. 1 for a diagram of which tapes were used for which analyses). All 206 tapes were used for the interrater reliability analyses, and 80 pretraining tapes were used for the interitem correlations. 2.2. Selection and training of raters A graduate student served as the criterion rater and conducted training with 10 undergraduate raters who were recruited from undergraduate psychology courses and who were selected based on availability and desire to participate. Two raters were excluded based on incomplete ratings. Four groups consisting of the graduate student and two undergraduate raters were formed based on scheduling matches. The training included receiving and individually reviewing the MITI manual (which can be obtained for free at casaa.unm.edu) followed by a group meeting where the manual was collectively reviewed and questions were answered. The groups then reviewed relevant segments from the Motivational Interviewing: Professional Training Videotape Series (Miller, Rollnick, & Moyers, 1998) in two 2-hour meetings. Coded and uncoded transcripts of two MI interviews from the tape series were obtained from the developers of the MITI. The uncoded versions were given to the raters, and they were asked to assign behavior codes according to the instructions in the manual. The behavior codes were compared with the coded transcripts, and
Fig. 1. Schematic of the distribution of tapes for each analysis.
14
H.M. Pierson et al. / Journal of Substance Abuse Treatment 32 (2007) 11 – 17
Table 1 ICC scores for each item as well as overall ICCs per group Item
Group 1
Group 2
Group 3
Group 4
Average across groups
Level of reliability
Global empathy Global MI spirit Giving information MI adherent MI nonadherent Closed questions Open questions Simple reflections Complex reflections Mean overall ICC
.69 .81 .87 .82 .79 .98 .9 .85 .87 .89
.68 .56 .79 .5 .82 .99 .92 .8 .86 .85
.85 .82 .85 .91 .9 .97 .96 .81 .92 .87
.67 .65 .81 .78 .84 .97 .91 .83 .88 .87
.72 .71 .83 .75 .84 .98 .92 .82 .88 .87
Good Good Excellent Excellent Excellent Excellent Excellent Excellent Excellent Excellent
possible global ratings were discussed. Two sample tapes from the training study that could not be used in the formal analysis (e.g., because of length or unusable pretraining tapes) were then coded. Each group met separately, and the graduate student went through the tapes statement by statement, helping the other raters to learn to parse and code the statements appropriately. The tapes were also then reviewed without stopping, and the global scores were discussed. Another three such sample tapes were then rated separately as a check for interrater reliability. None of these tapes were used in the final analyses in this study. ICCs using the two-way mixed-effects model (Shrout & Fleiss, 1979; [2, 1]) were then calculated as a measure of interrater reliability between the undergraduate raters and the graduate criterion rater. ICCs are a more conservative measure of interrater reliability than Pearson’s product moment correlations or Cronbach’s alpha because ICC takes into account both chance and systematic difference between raters. After each rater’s scores were given and ICCs were calculated, discrepancies in scoring were reviewed. Training ended when the ICCs showed adequate agreement (i.e., overall ICCs of at least .7 on each of the final three tapes). The total training involved was approximately 30 hours, which took place over a 4-week period. After the training, weekly 1- to 1 1/2-hour meetings where discrepancies in ratings were discussed with each group were held. The purpose of these meetings was to control for
rater drift, but they tended to become shorter as the study proceeded and as experience was gained. In the main analysis of the 206 scored tapes, each group of raters scored a different number of tapes based on when they completed training and how many hours they were able to contribute to the project. Group 1 completed training the earliest and therefore rated the most at 60 tapes; Group 2 rated 41 tapes; Group 3 rated 52 tapes; and Group 4 rated 53 tapes.
3. Results 3.1. MITI reliability ICCs were calculated separately for each rating group. An overall ICC for the scale and ICC scores for each item were calculated (see Table 1). Cicchetti (1994) presented ranges for interpreting the strength of ICCs. Any interrater reliability below .4 is considered poor; between .4 and .59 is considered fair; between .6 and .74 is considered good; and between .75 and 1.00 is considered excellent. Considering that there are nine scores (two global, seven specific) and four rater teams, there were 36 ICCs that emerged from the present data set (not counting averages). None were in the poor range; 2 were in the fair range; 4 were good; and 30 were excellent. The only average ICCs across
Table 2 Means, ranges, and standard deviations for the 10- and 20-minute segments 10 minutes
20 minutes
Item
Range
M
SD
Range
M
SD
Global empathy Global MI spirit Giving information MI adherent MI nonadherent Closed questions Open questions Simple reflections Complex reflections ICC
2–7 2–6 0–6 0–7 0–7 1–63 1–14 0–12 0–13 .67–.96
4.69 4.18 0.74 1.36 1.49 15.46 5.51 4.59 3.95 .88
1.13 0.97 1.37 1.44 2.26 11.43 2.99 2.89 3.16 .07
1–7 1–6 0–12 0–11 0–24 2–100 2–25 2–24 0–27 .72–.99
4.59 4.05 2.48 3.49 4.95 24.33 9.44 7.38 7.51 .88
1.37 1.41 2.90 2.67 6.22 17.59 5.02 4.92 6.12 .07
H.M. Pierson et al. / Journal of Substance Abuse Treatment 32 (2007) 11 – 17
15
Table 3 Paired sample t tests comparing 10- and 20-minute rates 10-minute rates
20-minute rates
Item
M
SD
M
SD
Giving information MI adherent MI nonadherent Closed questions Open questions Simple reflections Complex reflections
0.07 0.14 0.15 1.55 0.55 0.46 0.39
0.14 0.14 0.23 1.14 0.3 0.29 0.32
0.12 0.17 0.25 1.22 0.47 0.37 0.38
0.15 0.12 0.31 0.88 0.25 0.25 0.31
t test 2.92 2.72 3.61 5.11 3.64 3.58 0.95
p value .006 .01 .0014 b.0014 .0014 .0014 .35
4 Significant at the .005 Bonferroni corrected level.
the four groups that were below the excellent range were the global empathy and MI spirit scores, which were good (.72 and .71, respectively). All the average ICCs for the behavior frequencies were in the excellent range (.75 to .98). Because the interrater reliabilities were found to be high, all the following analyses use the criterion rater’s scores. 3.2. Segment length The next analysis examined the impact of segment length on ratings. Paired-sample t tests were run on each individual scale item and the ICCs to look for differences between scores for 10- and 20-minute segments. Because multiple t tests were run, a Bonferroni correction was used, which requires a p value of .005 or less to reach significance (see Table 2 for means, ranges, and standard deviations of the items). There were no significant differences found for either the global scores or the ICCs, empathy: t(38) = 0.89, p = .38; MI spirit: t(38) = 0.90, p = .38; ICC: t(38) = 0.21, p = .84. Rates were calculated for each behavior frequency to make the frequency counts for the 10- and 20-minute segments comparable. The frequencies were divided by the number of minutes in the segment, and the rates were compared (see Table 3). Even with the stringent p value, MI nonadherent, t(38) = 3.61, p = .001, closed questions, t(38) = 5.11, p b .001, open questions, t(38) = 3.64, p = .001, and simple reflections, t(38) = 3.58, p = .001, showed significant differences. Giving information and MI adherent approached significance, giving information: t(38) = 2.92, p = .006; MI adherent: t(38) = 2.72, p = .01, and complex reflections were not significant, t(38) = 0.95, p = .35. Although the absolute values were different, a second analysis was conducted to see if the two lengths show similar pictures of MI adherence (e.g., if both the 10- and 20-minute segments show good or poor adherence). This more closely assessed the degree of relative adherence detected by the two rated segments by correlating the scores for the 10- and 20-minute segments. Due to a low occurrence of some MITI items, the more conservative Spearman’s correlation statistic was chosen. Despite the rate differences, correlations calculated between 10- and 20-minute ratings for each item showed high consistency for every item (r values between .68 and .94, all p values b.01; see Table 4), resulting in R 2 scores of greater than .46.
3.3. Interitem correlations The last analysis focused on the validity and structure of the MITI by looking at interitem correlations. It was expected that certain items would be positively correlated, whereas other items were expected to be negatively correlated if the scale as a whole was actually measuring adherence and competence of MI delivery. For example, the global empathy and MI spirit scores were expected to be positively correlated to each other, to MI adherent behaviors, and to the reflection behavior frequencies. However, the global scores were expected to be negatively correlated to MI nonadherent and giving information behavior frequencies. For this analysis, only the 80 usable tapes collected during the pretraining phase were used to avoid dependence caused by using more than one tape from each counselor because doing so would require complex analytic methods, which are not necessary due to the already sufficient sample size. Pearson correlations between all items were calculated. Because so many correlations were being calculated, the more stringent alpha of less than .01 was chosen as the cutoff for significance. As can be seen in Table 5, global empathy and global MI spirit were highly correlated (r = .75, p b .001). Both global items were moderately correlated with complex reflections (empathy: r = .46, p b .001; MI spirit: r = .3, p b .01) and were negatively correlated with the MI nonadherent behavior count (empathy: r = .68, p b .001; MI spirit: r = .71, p b .001) and giving information (empathy: r = .41, p b .001; MI spirit: r = .32, p b .01). MI nonadherent behaviors were moderately correlated with
Table 4 Spearman correlations for 10 minutes compared with 20 minutes Item
r value
p value
Giving information MI adherent MI nonadherent Closed questions Open questions Simple reflections Complex reflections
.68 .78 .83 .94 .82 .88 .89
.01 .01 .01 .01 .01 .01 .01
16
H.M. Pierson et al. / Journal of Substance Abuse Treatment 32 (2007) 11 – 17
Table 5 Pearson correlations between MITI items Global MI spirit Global empathy Global MI spirit Giving information MI adherent MI nonadherent Closed questions Open questions Simple reflections
.754
Giving information .414 .324
MI adherent .05 .05 .21
MI nonadherent .684 .714 .414 .10
Closed questions .02 .08 .10 .16 .11
Open questions .20 .28 .18 .02 .304 .13
Simple reflections .17 .18 .19 .08 .22 .28 .15
Complex reflections .464 .304 .15 .14 .21 .14 .04 .26
4 p = .01.
giving information (r = .41, p b .001) and were moderately negatively correlated with open questions (r = .3, p b .01).
4. Discussion The current investigation of the MITI attempted to address issues with past MI integrity measures. The interrater reliability results suggest that good-to-excellent interrater reliability can be obtained on the MITI using undergraduate raters. Previous studies using more experienced raters with the MISC found that interrater agreement on the global ratings is easier to obtain compared to the behavior frequencies (Moyers, Martin, Catley, et al., 2003; Moyers, Martin, Manuel, et al., 2003). However, in the Moyers et al. (2005) study using the MITI, they found similar interrater reliabilities compared with this study, except for complex reflections. The present ICC scores for complex reflections are higher than those found in the Moyers et al. study. Given that the global ratings are lower than the behavior frequencies in both studies, it is possible that using undergraduate raters who have not received any formal training in therapeutic interventions may decrease agreement on the global items. In this study, however, the average ICCs on global items are still in the good range. Two time segments (10 and 20 minutes) were compared to see if each gave equally reliable results. The comparison showed that 10-minute segments differed in the rate of some behaviors detected but could be used as a reliable comparative measure of MI adherence as compared with rating 20-minute segments. One possible explanation for the differences in the rates is that the first 10 minutes were used for the comparisons. The 10-minute segments could have been randomly sampled from the tapes. However, the first 10 minutes of the tapes are likely to be similar to those tapes that only contain 10 minutes, which was the length many of the clinicians in beginning MI training were able to sustain in this study. Therefore, knowing if 10-minute segments are enough to reliably code adherence to MI is important. The rates for 10- and 20-minute segments were different; thus, in absolute value terms, the two are not comparable. However, if the goal is to assess adherence, the two revealed a similar picture overall based on correlations. These findings suggest
that when the 20-minute interval cannot be collected or is otherwise not feasible, 10-minute segments will detect high and low adherence similarly. It is important to keep in mind, however, given that within-subject comparisons show a significant change in the pattern of individual item scores, that studies using the shorter 10-minute intervals should not necessarily be compared with other studies using the full 20-minute rating interval. Another possible confound in this analysis involves rater bias because the criterion rater’s scores were used for both the 10- and 20-minute segment ratings. However, the high interrater reliability shows that other observers agreed with the ratings that were used for the segment comparisons. Finally, the structure of the MITI was examined by looking at the interitem correlations. These interitem correlations suggest that the MITI is in accord with the underlying MI theory. For example, the global items were negatively correlated with MI nonadherent behaviors, meaning that when counselors are capturing the spirit of MI, they are not simultaneously performing MI inconsistent behaviors. Based on the correlations, all items seemingly address different dimensions of MI, except for possibly the global items of empathy and MI spirit. It is possible that MI spirit subsumes empathy, and based on the above results, both global items may not be necessary. The use of role-plays instead of real client interviews may have influenced the obtained reliability. For example, the real interviews may give a wider range of behavior on the part of the interviewer, which may be harder for undergraduates to rate reliably. A wider range of behavior may also change the strength of the correlations between the items. However, role-plays do add a behavioral measure of the use of MI and are preferred to self-report, which is currently the most common way of assessing adaptation to new therapeutic approaches. In addition, real client interviews can be difficult to obtain in some circumstances. In the absence of real client interviews, role-plays are a good alternative. The MITI appears to be easier to use and less timeconsuming than previous MI adherence and competence scales and, thus, has advantages for the general assessment of MI training. However, if a more detailed analysis of clinician integrity or an analysis involving the effectiveness
H.M. Pierson et al. / Journal of Substance Abuse Treatment 32 (2007) 11 – 17
of MI processes is being examined, then the MISC has advantages mainly in that it also includes ratings of client behaviors. In addition, as Moyers et al. (2005) noted, the MITI does not take the context of the clinicians’ statements into account, making it difficult to obtain a thorough competence measure. They suggest that the MITI may be best used for measuring beginning levels of proficiency, for example, measuring whether or not clinicians have begun using MI after receiving training. This study provides additional support for the usefulness of the MITI. Obtaining reliable results with shorter segments of the interviews makes the MITI an even more simplified measure, which can reduce the cost of performing integrity checks. The MITI also reduces the necessary resources needed because raters inexperienced in MI can reliably use it, and therefore, expert raters are not needed. The results presented here show additional promise for the use of the MITI as a reliable and valid adherence and competence measure for MI.
Acknowledgments Funding was provided by the Substance Abuse and Mental Health Services Administration and the Center for Substance Abuse Treatment as part of Grant No. TI12899.
References Barsky, A., & Coleman, H. (2001). Evaluating skill acquisition in motivational interviewing: The development of an instrument to measure practice skills. Journal of Drug Education, 31, 69 – 82.
17
Burke, B. L., Arkowitz, H., & Menchola, M. (2003). The efficacy of motivational interviewing: A meta-analysis of controlled clinical trials. Journal of Consulting and Clinical Psychology, 71, 843 – 861. Cicchetti, D. V. (1994). Guidelines, criteria, and rules of thumb for evaluating normed and standardized assessment instruments in psychology. Psychological Assessment, 6, 284 – 290. Madson, M. B., Campbell, T. C., Barrett, D. E., Brondino, M. J., & Melchert, T. P. (2005). Development of the Motivational Interviewing Supervision and Training Scale. Psychology of Addictive Behaviors, 19, 303 – 310. Miller, W. R., & Mount, K. A. (2001). A small study of training in motivational interviewing: Does one workshop change clinician and client behavior? Behavioural and Cognitive Psychotherapy, 29, 457 – 471. Miller, W. R., Rollnick, S., Moyers, T. B. (Director). (1998). Motivational interviewing: Professional Training Videotape Series (Available from Delilah Yao UNM/CASAA, 2650 Yale SE, Albuquerque, NM 87106). Miller, W. R., Yahne, C. E., Moyers, T. B., Martinez, J., & Pirritano, M. (2004). A randomized trial of methods to help clinicians learn motivational interviewing. Journal of Consulting and Clinical Psychology, 72, 1050 – 1062. Moyers, T., Martin, T., Catley, D., Harris, K. J., & Ahluwalia, J. S. (2003). Assessing the integrity of motivational interviewing interventions: Reliability of the motivational interviewing skills code. Behavioural and Cognitive Psychotherapy, 31, 177 – 184. Moyers, T., Martin, T., Manuel, J. K., Hendrickson, S. M. L., & Miller, W. R. (2005). Assessing competence in the use of motivational interviewing. Journal of Substance Abuse Treatment, 28, 19 – 26. Moyers, T. B., Martin, T., Manuel, J. K., & Miller, W. R. (2003). The Motivational Interviewing Treatment Integrity (MITI) code (coding manual). Albuquerque, NM: University of New Mexico, Center on Alcoholism, Substance Abuse and Addictions (CASAA). Rollnick, S., & Miller, W. (1995). What is motivational interviewing? Behavioural and Cognitive Psychotherapy, 23, 325 – 334. Shrout, P. E., & Fleiss, J. L. (1979). Intraclass correlations: Uses in assessing rater reliability. Psychological Bulletin, 86, 420 – 428. Waltz, J., Addis, M. E., Koerner, K., & Jacobson, N. S. (1993). Testing the integrity of a psychotherapy protocol: Assessment of adherence and competence. Journal of Consulting and Clinical Psychology, 61, 620 – 630.